Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Asymptotic analysis of the generalized traveling salesman problem and its application
(USC Thesis Other)
Asymptotic analysis of the generalized traveling salesman problem and its application
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Asymptotic analysis of the generalized traveling salesman problem and its application by Xiangfei Meng A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Industrial and Systems Engineering) August 2018 Copyright 2018 Xiangfei Meng Acknowledgements Throughout the last four years at University of Southern California, I have been so fortunate to get help and support from lots of people. Among them all, I would like to give my deepest gratitude to my advisor, Professor John Gunnar Carlsson. He is such an amazing person that you feel you can always learn from, not only from his astonishing research abilities, but also his great personality, impressive mentorship, along with his attitude towards work and life. His passion for research and life has been very inspirational for me. I feel so fortunate to meet Professor Carlsson and compete my PhD with him. I would like to thank my committee members - Professor Sheldon Ross, Professor Maged Dessouky, Professor Suvrajeet Sen, and Professor Ketan Savla. They provided very insightful suggestions during my qualifying exam, which greatly helped on this dissertation. It was my privilege to be in Professor Carlsson’s research group for my PhD life. This group is full of talented and fun people who always gave me useful advice and left me with unforgettable moments throughout the years. I would like to thank all my labmates Mehdi Behroozi, Siyuan Song, Ye Wang, Jiachuan Chen and Bo Jones. I also appreciate the aid and help from other ISEers and all the friends that I met. With them, life has been so meaningful and joyful. Last but not least, I would like to take this opportunity to thank my parents and all my families, who have been so supportive and encouraging all the way along. It is their unconditional love and care that helped me go through all the ups and downs, to which I am indebted forever. At the same time, I want to thank my wife, Xuesong Wang, who is always by my side no matter what happens. Meeting her in my second year is the most beautiful thing that has ever happened. ii Table of Contents Acknowledgements ii List Of Tables v List Of Figures vi Abstract x Chapter 1: Introduction 1 Chapter 2: Literature Review 4 2.1 Research on TSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Research on generalized TSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Research on Traveling purchaser problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.4 Research on last mile delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.5 Research on trip chaining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Chapter 3: Summary of key facts and findings from related work 15 Chapter 4: Analysis of the generalized TSP when n→∞ 17 4.1 Analysis under uniform distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2 Analysis under clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.3 Another way to tackle clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Chapter 5: Analysis of the generalized TSP when|X i |→∞ 33 Chapter 6: Traveling Purchaser Problem 43 Chapter 7: Numerical Experiments on GTSP 50 7.1 GTSP on generated data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 7.2 GTSP on real data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Chapter 8: Application on Warehouse Random Stow Strategy 59 8.1 Relevant Literature on Warehouse Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 8.2 Amazon’s Kiva System vs. GTSP Warehouse System . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 8.3 Simulation on Kiva Warehouse and GTSP Warehouse . . . . . . . . . . . . . . . . . . . . . . . . . . 63 8.3.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 8.3.2 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 iii Chapter 9: Application on Trip Chaining Models 72 9.1 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 9.1.1 A simple example: luddites and shut-ins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 9.1.2 Marginal costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 9.1.3 Multiple delivery services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 9.1.4 A probabilistic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 9.2 A numerical example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 9.3 Incorporating inbound-and-outbound costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 9.3.1 Revised simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Chapter 10: More generalized setting for GTSP 86 Chapter 11: Conclusion 93 Chapter 12: Future Research 96 Reference List 99 Appendix A Proof of Theorem 25 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 iv List Of Tables 2.1 Notational conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.1 Valid constants for a d and b d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 8.1 Warehouse Fulfillment Simulation Results Using GTSP and Kiva Systems (50 random orders per scenario) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 9.1 Input parameter estimates for our numerical example. . . . . . . . . . . . . . . . . . . . . . . . . . . 79 9.2 The number of grocery storesk, the populationsN, and the critical thresholdsp ∗ at which emissions decrease due to adoption of delivery services. The first and second columns are obtained from [1]. Here the cells marked “> 1” indicate that, even at 100% adoption of delivery services, the carbon footprint of the region is still larger than the case where p = 0, i.e. no delivery services are used. . . 80 9.3 The number of grocery storesk, the populationsN, and the critical thresholdsp ∗ at which emissions decrease as adoption of delivery services increases under the revised model. The first and second columns are obtained from [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 v List Of Figures 1.1 A generalized TSP tour of six sets of pointsX 1 ,...,X 6 , each consisting ofk = 4 points. The optimal tour contains one element from each such set (and is the shortest such tour to do so). . . . . . . . . 2 2.1 The above image is reproduced from Figure 1 of [129], which compares the cost of direct trips originating from a central location (at left) with traveling salesman tours that visit multiple locations (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.1 A generalized TSP tour of six sets of pointsX 1 ,...,X 6 , each consisting of k = 3 points, whereS is a “jellybean” type shape that is placed uniformly at random in a “toroidal” fashion in the unit square. 23 4.2 This is a (d− 1)-dimensional cube (d≥ 2, if d = 2, the problem reduces to the one dimension case), with side length being r, centered at O, and r 0 is the half diagonal. . . . . . . . . . . . . . . . . . . . 24 4.3 This figure demonstrates how the zigzagging tour is disturbed in a two-dimensional (d = 2) clustered GTSP instance wheren = 2,k = 3. In each cluster, we pick the point that is nearest to the zigzagging tour, and disturb the zigzagging tour to connect that point by two straight lines. In any arbitrary dimensiond, we perturb the zigzagging tour in the same manner, namely in each cluster, picking the point that is nearest to the zigzagging tour and connect it to the tour by two straight lines. . . . . . 27 4.4 This is a d-dimensional unit cell, centered at O, with side length r = 1 m . . . . . . . . . . . . . . . . . 29 4.5 Figure 4.5a shows a tour of n = 20 neighborhoods; the optimal tour intersects each ball and is the shortest such tour to do so. Figure 4.5b shows that one can always augment an optimal tour in such a way as to touch the centers of each ball. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.1 The above path has a length of approximately 13.8, and therefore is a member ofP if we have ` equal to (say) 15. We will construct its triplet (x,d,q) as follows: obviously, we have x = (5, 5) (the first element of the path), as indicated in 5.1a. Figure 5.1b shows that the ` ∞ distances between the four points are d 1 = 4, d 2 = 5, and d 3 = 2, respectively, which would then imply that d 4 = `− (d 1 +d 2 +d 3 ) = 4. Finally, the construction of q is shown in 5.1c: given a point x i on the path and a distance d i , there are 8d i possible places where the consecutive point x i+1 could be located. The path shown has q 1 = 6, q 2 = 35, and q 3 = 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 vi 5.2 The figures above correspond to the case where n = 4 and ` = 8. As 5.2a shows, any n-tuple d can be uniquely represented by simply placing n points on the number line from 0 to `, where the first point is placed a distance d 1 from the origin and each subsequent point i is placed a distance d i to the right of its predecessor. Since d 4 is defined so that the entries of d sum to `, the last such point that is placed must be precisely at a distance ` from the origin. The placement shown corresponds to the case d = (3, 2, 0, 3). In 5.2b, we show the same point placement for the n-tuple d 0 which is obtained by adding 1 to each of the entries of d. It is obvious that any such n-tuple d 0 can be uniquely constructed by selectingn− 1 = 3 points from the` +n− 1 = 11 valid locations of points in 5.2b and computing the sequential distances between those points; the diagram corresponds to the case d 0 = (4, 3, 1, 4). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 6.1 An example of our variation of Traveling purchaser problem. There are four sets of pointsX 1 ,...,X 4 withX 2 outlined for purpose of clarity. In this case, k = 4,p = 1 2 , thus the optimal tour contains k×p = 2 elements from each set (and is the shortest such tour to do so). . . . . . . . . . . . . . . . 44 6.2 In the figures above, we demonstrate a traveling purchaser problem instance withn = 2,k = 6,p = 1 3 . In this instance, we are looking for a cycle that visits 2 points from each set. To be more specific, in (6.2a), we show the zigzagging path, which traverses the regionR horizontally a total of m = 8 times. In (6.2b), we show the perturbed path, where in each point set, we pick 2 points that are nearest to the zigzagging path, and perturbed the zigzagging path to connect them both inbound and outbound. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 7.1 All points are sampled from a uniform distribution in a unit square. There are n sets with k points in each set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 7.2 In a unit square, we uniformly sample n clusters and in each cluster we uniformly sample k points. Here the cluster is a small square with side length 0.005. . . . . . . . . . . . . . . . . . . . . . . . . . 52 7.3 In a unit square, we uniformly sample n clusters and in each cluster we uniformly sample k points. Here the cluster is a small square with side length 0.02. . . . . . . . . . . . . . . . . . . . . . . . . . 52 7.4 In a unit square, we uniformly sample n clusters and in each cluster we uniformly sample k points. Here the cluster is a small square with side length 0.05. . . . . . . . . . . . . . . . . . . . . . . . . . 53 7.5 In a unit square, we uniformly sample n clusters and in each cluster we uniformly sample k points. Here the cluster is a small square with side length 0.2. . . . . . . . . . . . . . . . . . . . . . . . . . . 53 7.6 In a unit square, we uniformly sample n clusters and in each cluster we uniformly sample k points. Here the cluster is a small square with side length 0.5. . . . . . . . . . . . . . . . . . . . . . . . . . . 54 7.7 Figure 7.7a shows 1,694 random samples in Eureka, CA. Figure 7.7b shows the ratio between the length of non-clustered tour and clustered tour, where the radius of the cluster is 0.7 mile, the number of points in each set or cluster is equal to 2, 3, 4, i.e., k = 2, 3, 4, and the number of sets or clusters n is from 4 to 100. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 7.8 Figure 7.8a shows 1,846 random samples in Glendale, CA. Figure 7.8b shows the ratio between the length of non-clustered tour and clustered tour, where the radius of the cluster is 1.0 mile, the number of points in each set or cluster is equal to 2, 3, 4, i.e., k = 2, 3, 4, and the number of sets or clusters n is from 4 to 100. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 vii 7.9 Figure 7.9a shows 3,342 random samples in Modesto, CA. Figure 7.9b shows the ratio between the length of non-clustered tour and clustered tour, where the radius of the cluster is 1.0 mile, the number of points in each set or cluster is equal to 2, 3, 4, i.e., k = 2, 3, 4, and the number of sets or clusters n is from 4 to 100. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 7.10 Figure 7.10a shows 963 random samples in Palm Desert, CA. Figure 7.10b shows the ratio between the length of non-clustered tour and clustered tour, where the radius of the cluster is 1.0 mile, the number of points in each set or cluster is equal to 2, 3, 4, i.e., k = 2, 3, 4, and the number of sets or clusters n is from 4 to 100. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 7.11 Figure 7.11a shows 1,883 random samples in Pasadena, CA. Figure 7.11b shows the ratio between the length of non-clustered tour and clustered tour, where the radius of the cluster is 1.0 mile, the number of points in each set or cluster is equal to 2, 3, 4, i.e., k = 2, 3, 4, and the number of sets or clusters n is from 4 to 100. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 7.12 Figure 7.12a shows 2,849 random samples in Redding, CA. Figure 7.12b shows the ratio between the length of non-clustered tour and clustered tour, where the radius of the cluster is 4.0 mile, the number of points in each set or cluster is equal to 2, 3, 4, i.e., k = 2, 3, 4, and the number of sets or clusters n is from 4 to 100. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 7.13 Figure7.13ashows1,263randomsamplesinRedwoodCity, CA.Figure7.13bshowstheratiobetween the length of non-clustered tour and clustered tour, where the radius of the cluster is 1.0 mile, the number of points in each set or cluster is equal to 2, 3, 4, i.e., k = 2, 3, 4, and the number of sets or clusters n is from 4 to 100. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 7.14 Figure 7.14a shows 1,856 random samples in Sunnyvale, CA. Figure 7.14b shows the ratio between the length of non-clustered tour and clustered tour, where the radius of the cluster is 1.0 mile, the number of points in each set or cluster is equal to 2, 3, 4, i.e., k = 2, 3, 4, and the number of sets or clusters n is from 4 to 100. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 8.1 The picture is taken from Dr. Kay’s personal website http://www4.ncsu.edu/ kay/Warehousing.pdf. A,BandCinthepicturesabovedenotethreeSKUs, andeachpictureshowsthestorageareaforSKUs under different storage policies. The dedicated storage policy determines a particular predetermined location for each product to be stored. In randomized storage policy, each SKU can be stored in any (sometimes the closest) available slot. The class-based policy is a combination of dedicated and randomized storage, where each SKU is assigned to one of several different storage classes. . . . . . . 61 8.2 This is a sketch of the automated warehouse using Kiva, reproduced from [101]. There are three major components in a Kiva warehouse - (1) green squares, which denote vertical shelves containing different SKUs (one single shelf contains several different SKUs), (2) orange squares, which denote Kiva robots, and each Kiva is able to carry an entire shelf and move around, (3) Blue squares, which denote stations. When an order comes, or more specifically, a particular item is requested, a Kiva is assigned to carry a shelf which contains that item to the station, and human workers at the station will manually pick the the SKUs needed to fulfill the order, then the shelf will be carried back to the storage area (may not be the original location) by a Kiva (may not be the original Kiva that carried it to the station). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 8.3 In a GTSP warehouse, same SKUs can be placed at different places. In the figure above, all red rectangles refer to the same products, and same thing for the red and cyan rectangles. A warehouse runner may collect multiple different SKUs in a single run, so this random strategy could benefit the runner in terms of reducing the overall running distance in the long term. . . . . . . . . . . . . . . . 63 viii 8.4 In this warehouse that is using GTSP system, we have n = 10 different SKUs labeled from 1 to 10 and 8 shelves, and each shelf contains 4 kinds of different SKUs. Let’s say now we need to fulfill an order that contains{1, 4, 7, 9, 10}, a warehouse picker leaves from the station, visit two shelves, and then comes back to the station to fulfill this order. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 8.5 First, let’s look at Figure 8.5a. In this warehouse that is using Kiva system, we have n = 10 different SKUs labeled from 1 to 10 and 8 shelves, and each shelf contains 4 kinds of different SKUs. Let’s say now we need to fulfill an order that contains{1, 4, 7, 9, 10}, then Kiva robots will need to visit some nearest shelves (distance from station to shelves) that contains all of these SKUs, in the fashion of Figure 8.5a. Note that Kiva carries an entire shelf to the station one at a time and then put it back. However, we can still solve this problem as a GTSP instance by manipulating the distance matrix. For all SKUs in a particular shelf, let them be points with the same coordinates as the shelf itself and the distance between each two points be 0, e.g., for{1, 4, 5, 10} in the same shelf, we treat them as 4 points having the same coordinates as the shelf containing them, and the distance between these 4 points are 0. For each pair of points from two different shelves, the distance between them is the distance from the station to one point plus the distance from the station to another point, as indicated in Figure 8.5b. Any distance to or from the station is just Euclidean distance. . . . . . . . 66 9.1 A plot of the total carbon footprint, f(p), for four different cities, for the model described in model in Section 9.1.2. Here we assume that n = 6. The value f(0) simply represents the total emissions when no delivery services are used, and therefore when f(p)/f(0)< 1 we find that delivery services result in a net improvement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 10.1 In (10.1a), we show the pathP, which traverses the regionR horizontally a total of m = 8 times. In (10.1b), we show the perturbed pathP 0 , where we have n = 3 point setsX i consisting of k = 4 points each. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 ix Abstract The traveling salesman problem (TSP) is a fundamental problem in many diverse fields, including transportation, delivery services, circuit board production,and crystallography, among many others. Apart from extensive research on solving particular instances of this problem, there have been substantial efforts on the probabilistic analysis on TSP, such as the Beardwood-Halton-Hammersley theorem. In this dissertation, we will analyze the asymptotic behavior on a generalized version of the TSP, which we call the Generalized Travelling Salesman Problem (GTSP), in which the goal is to select one point each from multiple sets of points and come up with a tour with the minimum length. Two different limiting cases on GTSP are examined: one is the case where the number of point sets goes to infinity, and the other is the case where the number of points in each set goes to infinity. In addition, we define a variation of traveling purchaser problem, and perform asymptotic analysis as well. Numerical simulations confirm that our analysis on GTSP is valid when applied to simulations in the Euclidean plane and on real map. To demonstrate the practical usage of GTSP, we apply the model to warehouse order fulfillment strategies - a warehouse runner picks multiple different items from multiple shelves in a single run, and show that a warehouse using GTSP strategy can greatly reduce order picking traveling distance compared to Amazon’s semi-automatic warehouse using Kiva robots to fulfill orders. Finally, we use GTSP to quantify the changes in overall carbon footprint efficiency due to delivery services by looking at “household-level” economies of scale in transportation: a person might perform many errands in a day (such as going to the bank, grocery store, and post office), and that person has many choices of locations at which to perform these tasks (e.g., a typical metropolitan region has many banks, grocery stores, and post offices). Thus, the total driving distance (and therefore the overall carbon footprint) that that person traverses is the solution to a generalized travelling salesman problem (GTSP) in which he or she selects both the best locations to visit and the sequence in which to visit them. x Chapter 1 Introduction The generalized travelling salesman problem (GTSP) is a variation of the traditional traveling salesman problem (TSP) in which one is given a collection of sets of points,X 1 ,...,X n , and one seeks a tour of minimal length that visits one member of each setX i ; an example of such a tour is shown in Figure 1.1. The generalized TSP is a reasonably well-known combinatorial problem and has been studied for almost 50 years [18, 73, 92, 121, 116, 130]. The GTSP arises for us as a natural quantity of interest in understanding one of the fundamental concerns in the analysis of logistical systems, namely, the trade-off between localized, independent provision of goods and services versus provision along a centralized infrastructure such as a backbone network. On the one hand, service executed at a local level features the obvious benefits of proximity and specialization, inasmuch as people and communities obtain things from locations that are close to them. Conversely, by aggregating network flows via a backbone network, individuals and communities are able to reap the benefits of economies of scale, economies of agglomeration, and economies of density. One phenomenon in which this trade-off has recently been made manifest is the transition of businesses from traditional brick-and-mortar stores to retail sales facilitated via e-commerce [9, 79, 89], such as grocery delivery services. Particular recent examples include Google Shopping Express, Amazon Prime, Instacart, and Walmart To Go [20, 26, 51, 102], among many others. As discussed in [120] for the case of grocery delivery: 1 2 2 2 1 1 1 1 3 3 3 3 6 6 6 5 6 5 5 5 2 4 4 4 4 Figure 1.1: A generalized TSP tour of six sets of pointsX 1 ,...,X 6 , each consisting of k = 4 points. The optimal tour contains one element from each such set (and is the shortest such tour to do so). There’s still a lot of debate over what works and what doesn’t. Is it a good idea to have a warehouse for food storage, or ask the customers to pick up their food? How much should delivery cost? How often and where should online grocery companies deliver? In the past, the costs associated with delivery service have been so big – huge warehouses and refrigerators, gas-guzzling trucks traveling door to door ... that the math has never worked out. A major complication in problems of this kind is the difficulty of creating a model that is mathematically tractable enough to give useful insights as well as faithful to the original phenomenon being modelled. Our GTSP can be directly applied to model multi-stop trips made by households: on a given day, a person will often visit multiple locations on one outing (such as running errands on the way to or from one’s place of work), and each of these locations will usually have alternatives (e.g. there are usually multiple choices of which grocer or post office to use). Thus, the calculation of the “cost” of a multi-stop trip is more complicated than a mere travelling salesman tour or a sequence of direct trips to and from the various destinations and the household. For example, one might be willing to travel a long distance to visit a bank that is farther away than the nearest available branch if it is located more closely to other businesses that they will also visit (say, by virtue of being located in a central business district or shopping center). 2 The GTSP is also fundamentally important in studying randomized strategies in warehouses, in which one stores a stock keeping unit (SKU) in many available location (as opposed to designating specific regions of the warehouse for different SKUs). This is because a warehouse picker will often select multiple SKUs at a time, and can benefit if those SKUs are dispersed throughout the warehouse. Amazon, for example, calls this process random stow and attributes its rapid growth to the efficiency that is realized as a result [29]: Random stow: The storage of items in a randomised order at fulfilment centres to maximise the chance of multiple items on the same order being near each other. The fulfilment centre management system knows the location of every item and is able to work out the shortest travel distance to pick the orders. Apart from the applications above, GTSP plays an important role in “last mile” delivery as well. For example, one recently proposed approach for mitigating the inefficiencies in “last mile” delivery has been the use of a socially networked system in which parcel recipients can “opt in” for packages to be delivered at multiple possible locations (as opposed to their doorstep), such as their workplaces [124]. The parcel delivery company then solves a GTSP, with a setX i for each customer. This work is primarily concerned with the asymptotic behavior of the generalized travelling salesman problem and as such our analysis is closely related to such papers as [17, 21, 50, 107, 122]. We perform the analysis in an arbitrary dimension, under the assumption that each set of points is uniformly distributed or clustered in some Jordan measurable shape. In addition, we also perform asymptotic analysis for a variation of traveling purchaser problem. For practical usage, we provide two applications of GTSP model. One is to apply GTSP to warehouse random stow strategy mentioned above, and compare it to existing random warehouse using Kiva system, which shows the great potential for GTSP to bring down the operating cost. Another one is to use GTSP to study how much delivery service to adopt so that carbon footprint can be sufficiently reduced. 3 Chapter 2 Literature Review As discussed in Chapter 1, the generalized traveling salesman problem is a generalized version of the traditional traveling salesman problem, and this model is fundamentally important in the study of “last mile” delivery and “trip chaining” behavior. We will review some of the current research in all those aspects in this chapter, and we will leave the literature review of warehouse system in Chapter 8 for completeness of contents. 2.1 Research on TSP The traveling salesman problem (TSP) is one which has drawn much attention of mathematicians and computer scientists specifically because it is so easy to describe and so difficult to solve [55]. The problem can be simply stated as: there are n cities on a map, a salesman wants to start from a home city, visit all the other cites exactly once, and come back to the home city, with the minimum amount of traveling. The research on TSP has been extensive, in terms of devising efficient and effective algorithms and performing probabilistic analysis. To find an exact solution for a particular TSP instance, one often formulates an integer program [55] and uses a branch-and-bound algorithm looking for an upper bound and a lower bound that are tight enough to come up with the optimal solution. Any feasible solution that visits all the cities once is an upper bound, but a good one is always desired. To obtain a lower bound, some relaxation on the integer programming is needed and cuts will be added into constraint sets, such as [47, 62, 90, 13]. Apart from exact approaches, there is also lots of literature on heuristics for TSP. These algorithms usually depend on the initial starting solution, but they are often faster 4 than exact approaches and can find high quality solutions that are called “sub-optimal”. Some famous ones are listed here. The heuristic algorithm of Lin and Kernighan [84] appears so far to be the most effective in terms of solution quality, in particular with the variant proposed by Helsgaun [52, 53]. Grefenstette et al [46] propose genetic algorithms for TSP. Simulated annealing is introduced by Aarts et al [3]. For neural network on TSP, see [100]. Fiechter [37] uses tabu search on TSP. Nagata [91] proposes a very effective evolutionary algorithm. Another part of the research of TSP is the probabilistic analysis, or more specifically, asymptotic analysis, which is of interest to us at present. Some most famous results are summarized in Chapter 3, and our analysis is greatly based on them. The theorem that is most relevant to us is the Beardwood-Halton-Hammersley (BHH) theorem. Many graph structures over Euclidean sample points have been studied in the context of BHH theorem and its extensions. The BHH theorem states that the law of large numbers (LLN) holds for certain spanning graphs over random samples. Such graph structures include the travelling salesman path (TSP), the minimal spanning tree (MST), and the nearest neighbor graphs (k-NNG) [60]. The reason that it’s very important is that it gives us an approximation of the length of the TSP tour, so that we can estimate a TSP tour when we don’t have enough computing power to actually solve it. One reason that TSP draws so much attention is that many practical problems that can be formulated as a TSP problem, yet it is so hard to solve it fast in terms of time complexity. For example, TSP can be used to model the production of printed circuit boards having holes (cities) [12, 11], the analysis of the structure of crystals [24], material handling in a warehouse [106], cutting stock problems [41], the clustering of data arrays [81], the sequencing of jobs on a single machine [42], the assignment of routes for planes of a specified fleet [16], genome sequencing [19], and so forth. It’s also worth mentioning that there is this TSPLIB [108], which is Gerhard Reinelt’s library of hundreds of instances of the traveling salesman problem with sizes ranging from 17 to 85,900 cities. This library usually serves the purpose of testing the performance of different algorithms. 2.2 Research on generalized TSP The generalized TSP is a reasonably well-studied combinatorial problem and has been studied for almost 50 years. It was simultaneously introduced by Srivastava et al (1969) [121], who addressed the symmetrical version of the 5 problem and proposed a dynamic programming approach for its solution, and by Henry-Labordere (1969) [7] who addressed the asymmetrical case. Different from TSP where each point needs to be visited, the GTSP allows node alternatives. The first application (1969) can be found in [7] for sequencing computer files. About the same time (1970), Saksena [114] modeled the routing of welfare clients through governmental agencies as a symmetric GTSP. Other relevant applications [92] include warehouse order picking with multiple stock locations, airport selection and routing for courier planes, and certain types of flexible manufacturing scheduling. The primary focus of study on this problem has been on the rapid solution of particular problem instances in a combinatorial setting [18, 73, 92, 116, 130, 93]. We will first go over a few exact solution below. Laporte et al [75] develop an exact algorithm for GTSP by formulating an integer programming and finding the shortest Hamiltonian circuit through some clusters of nodes. Noon and Bean [92] propose a Lagrangian based approach for the asymmetric GTSP problem, in which a Lagrangian relaxation is used to compute the lower bound and a heuristic is used to determined upper bound, for the purpose of removing arcs and nodes which are guaranteed not to be in the optimal solution. Finally a branch-and-bound procedure is used to obtain the high-quality solution. Another efficient exact branch and bound solution scheme is proposed by Fischetti et al [40], in which at each node of the branch-decision tree, a lower bound on the optimal solution value is obtained by solving an LP relaxation of GTSP. The relaxation is iteratively tightened by adding inequalities that are violated by the current LP optimal solution. The other method is to transform GTSP to a known problem. Transformation of a GTSP into traveling salesman problem (TSP) is first introduced by Lien et al [83], in which the number of nodes of the transformed TSP was very large, more than three times the number of nodes in the associated GTSP. Later, Dimitrijevic and Saric [33] develop another transformation to decrease the size of the corresponding TSP. In their method, the number of nodes of the TSP is twice the number of nodes of the original GTSP. Noon and Bean [93] present a currently well-know transformation by cleverly rearranging the weight of each edge that transforms a GTSP instance to an asymmetric TSP problem with the same number of vertices in the original problem, so that any algorithm that solves asymmetric TSP can be employed to solve GTSP. One implementation is the solver called GLKH [54]. The first phase of the algorithm is to transform the GTSP instance to an asymmetric TSP problem, and then a famous 6 solve called LKH [52] (which is an effective implementation of the Lin-Kernighan heuristic [84] for solving the traveling salesman problem) is called to solve the asymmetric TSP problem. In fact, most of the simulation results in this work are obtained with the help of GLKH solver. Another generic transformation would be Fischetti-Salazar-Toth Transformation [39] and some later modifica- tions [18] by Ben-Arieh et al. They come to the conclusion that one of the modifications is better than the original FST transformation in the worst case. While these kinds of conversions work well in terms of finding high-quality solutions, there is a drawback suggested by Gutin and Karapetyan [48]: While the known transformations normally allow to produce GTSP optimal solutions from the ob- tained optimal TSP tours, all known transformations do not preserve suboptimal solutions. Moreover, conversions of near-optimal TSP tours may well result in infeasible GTSP solutions. Thus, the trans- formation do not allow us to obtain quickly approximate GTSP solutions and there is a necessity for specific GTSP heuristics. Basically, what they mean is that transformation does not preserve optimality or sub-optimality, thus some direct method should be devised. In their paper [48], Gutin and Karapetyan propose a memetic algorithm that involves traditional genetic algorithm and a powerful local search part. Shi et al propose a novel particle swarm optimization (PSO)-based algorithm for TSP and GTSP, which is substantiated by a few numerical examples [116], while a discrete particle swarm optimization is presented by Tasgetiren et al [125] that exploits the basic features of its continuous counterpart, hybridized with a local search, variable neighborhood descend algorithm to further improve the solution quality. Yang et al propose another heuristic method - ant colony optimization method to this field with some good numerical results [130]. Snyder et al implement a genetic algorithm encoded using random keys that is able to quickly solve most of the test cases to optimality and large test cases to within 1% of optimality [119]. Renaud et al [109] develop a very sophisticated heuristic which they call GI 3 (Generalized Initialization, Insertion and Improvement). GI 3 is a generalization of I 3 heuristic [110] proposed by the same group of researchers that was used to solve symmetric TSP problem. 7 Apart from that, there are a bunch of approximation algorithms for generalized TSP out there. Slavik [118] presented a 3/2ρ-approximation algorithm for the GTSP, where ρ is the number of cities in the largest cluster. Unfortunately, the worst-case bound may be relatively weak, as ρ may be quite large in some situations. Some other approximation algorithms do not solve general problems, but study in a special setting where clusters are restricted to be in grid systems [23, 64]. It is shown by Bhattacharya et al [23] that this problem has (1.5+8 √ 2+ε)- approximation algorithm with complexity bound depending on n and k polynomially, where k is the number of clusters and n is the number of vertices. Khachay et al [64] give three schemes that find (1 +ε)-approximate solutions in polynomial time when n and k have some special relations. Note that, in the literature, our definition (Definition 7) often stands for E-GTSP [40] (exactly one point in each set is visited) where E denotes Equality, while GTSP stands for the tour with minimum length that visits at least one point from each set. However, it’s obvious that GTSP is equivalent to E-GTSP if triangle inequality holds, which is the case in our work since everything is Euclidean in our setting. 2.3 Research on Traveling purchaser problem There are many extensions and generalizations of the well-known TSP problems. Apart from generalized traveling salesman problem we mentioned in the last section, another generalization with lots of applications is the Traveling Purchaser Problem (TPP), first introduced by Ramesh [105] in 1981. In the TSP, one has to find a closed tour of minimal length connecting m given cities. Each city must be visited once. In the TPP, cities represent markets which provide a set of commodities. The selling price for each product depends on the market. The TPP consists in finding a tour through a subset of markets such that each product is purchased and that routing costs plus purchase costs are minimized [25]. To form it more mathematically: we have a set V ={1, 2,...,m} of m markets, a depot s∈V, and a set I ={1, 2,...,n} of n items. For all possible connections between each pair of markets,i,j∈V, letc ij denote the cost of traveling from from marketi to market j. For items, if itemk is available at marketi, thend ik denotes the cost of purchasing itemk at marketi, otherwise d ik =∞. The TPP problem is to find a tour that starts and ends at the depot s while visiting a subset of the m markets to purchase each of the n items, such that the total of travel and purchase costs is minimized. Note that 8 some assumptions are naturally included to ensure the validity of the problem, such as each item is available at at least one market, the purchaser can purchase multiple items at a market, the purchaser can visit a market without buying anything, etc. The TPP is NP-hard, since it reduces to the TSP when each market sells a commodity that is not available in any other markets. In this case, every market needs to be visited and the problem amounts to a TSP problem [25]. This problem arises in several industrial contexts, for example the purchasing of parts and raw materials for manufacturing plants [98]. The scheduling of n jobs on an m-state machine can also be modeled as a TPP [95]. Markets, commodities, routing costs and purchase costs then respectively represent machine states, jobs, setup times and processing times. The first attempt to solve the TPP goes back to Ramesh [105] in 1981, who proposed an algorithm based on a lexicographic search procedure. Most of the literature on the TPP concerns the solution of the problem with heuristic methods. One pioneer work is by Golden et al. [43], who developed a construction heuristic based on a savings strategy. Ong [95] modified this heuristic and proposed the Tour Reduction Heuristic based on deletions of markets from a complete tour. Pearn and Chien [98] suggested some improvements on these procedures and introduced another heuristic, the Commodity Adding Heuristic, which assumes that all products are available at all markets. Voß presented metaheuristic approaches based on dynamic tabu search and simulated annealing, where he presented two strategies - the Reverse Elimination Method and the Cancellation Sequence Method. In 2006, Boris and Dominique [25] used ant colony optimization (ACO), which gave new best solutions to some benchmark instances. Apart from heuristics, there are also some attempts on exact algorithms. In 1997, Singh and Van Oudheusden [117] developed a branch and bound algorithm that can solve a plant location problem. Computational experiments showed that it is able to solve moderate size problems to optimal in reasonable computation time, up to 25 cities and 100 items. In 2003, Laporte et al [76] proposed a branch and cut algorithm for the undirected TPP instances. The problem was formulated as an integer linear program, and several families of valid inequalities were derived to strengthen the linear relaxation. Their computational results showed that the proposed algorithm outperformed all previous approaches and can optimally solve instances containing up to 200 markets. 9 2.4 Research on last mile delivery The term “last mile” was originally used in the telecommunications field but has since been applied to supply chain management. Transporting goods via freight rail networks and container ships is often the most efficient and cost-effective manner of shipping [115]. However, when goods arrive at a high-capacity freight station or port, they must then be distributed to their final destinations. This last leg of the supply chain is often less efficient, comprising up to 28% of the total cost to move goods [45]. This has become known as the “last mile problem”, and this is why “last mile” delivery has become more and more important. There have been many ways proposed by researchers to improve last-mile delivery scheme. Based upon Punakivi et al’s simulation results [104], home delivery solutions enabling secure unattended re- ception are operationally the most cost efficient for last mile distribution, able to reduce home delivery costs considerably, by up to 60%. The basic idea is to install reception box (with refrigerator) or delivery box in cus- tomers garage or front door so that grocery delivery can be made without customers’ presence, which reduces the cost of re-delivery attempts, and their simulations results also shows that it takes on average two years to payback the operational cost. Lee et al [79] present five E-fulfillment strategies that e-commerce retailers can rely on - logistic postponement, dematerialization, resource exchange, leveraged shipments, and the clicks-and-mortar model. In the current literature, to analyze last mile delivery, the problem is handled often by either simplifying the problem at hand or by introducing additional assumptions into the problem structure. For example, in the recent paper [129], the authors perform a detailed computational study that estimates the changes in net CO 2 emissions that result by introducing grocery delivery services in Seattle, Washington. As suggested in Figure 1 of that paper (which we reproduce here in Figure 2.1), the authors do not consider the possibility of multi-stop trips and compare the cost of a direct trip between one’s house and back to the marginal cost incurred by adding oneself to a travelling salesman tour that services many households. Another field that last-mile delivery is significantly important is humanitarian relief. It refers to delivery of relief supplies from local distribution centers (LDCs) to beneficiaries affected by disasters [14]. Haghani and Oh [49] determine detailed routing and scheduling plans for multiple transportation modes carrying various commodities from multiple supply points in a disaster relief operation. Barbaroso et al [15] focuses on helicopters planning 10 Figure 2.1: The above image is reproduced from Figure 1 of [129], which compares the cost of direct trips originating from a central location (at left) with traveling salesman tours that visit multiple locations (right). during a disaster relief operation. Ozdamar et al [97] address an emergency logistics problem for distributing multiple commodities from a number of supply centers to distribution centers near the affected areas. Angelis et al [31] constructs an ILP model to consider a multidepot, multivehicle routing and scheduling problem for air delivery of emergency supply deliveries for World Food Programme (WFP) in Angola based on WFP’s operations in that country in the year 2001. Balcik et al [14] propose a two-phase modeling approach to determine a delivery schedule for each vehicle and make inventory allocation decisions by considering supply, vehicle capacity, and delivery time restrictions, to minimize the sum of transportation costs and penalty costs in the last mile distribution of humanitarian relief. 2.5 Research on trip chaining Tripchainingcanbecharacterizedasatour(i.e., asequenceoftripsegmentsthatbeginsandendsatone’sresidence) that includes multiple out-of-home destinations [103]. Combining multiple destinations into one tour, rather than returning home after each one, will reduce the amount of travel required to reach a given set of places. The seminal paper [4] studies this phenomenon from a theoretical perspective by describing a particular utility function at the household level that is justified with empirical travel data. Several studies have noted that households’ non-work travel is composed of relatively complex patterns of interdependent travel activities [5, 94]. A relatively large number (30%) of non-work trip links in one urban area were found to be components of travel tours which included 11 more than one non-home stop [4]. Most of the work have been using some empirical data to come up with some conclusions. For example, in [86], McGuckin et al uses data from the 1995 Nationwide Personal Transportation Survey (NPTS) and the 2001 National Household Travel Survey (NHTS) to examine trip-chaining trends in U.S., and concludes that a robust growth in trip chaining occurred between 1995 and 2001, nearly all in the direction of home-to-work. In [85], same group of researchers uses NPTS, to study the difference between men’s and women’s daily trip, showing that women continue to make more trips to perform household-sustaining activities such as shopping and family errands to a greater extent than men. We must also acknowledge that trip chaining has its own downsides in spite of the fact that it can potentially reduce carbon emission. There have been a couple of papers [68, 86] pointing out that the trip chaining usually happens when people are commuting to and from work, when traffic is the worst. Thus, even if trip chaining may reduce overall carbon emission, it may worsen the congestion in rush hour and thus significantly increase the carbon emission. Consequently, researchers have been devising strategies that can even reduce carbon emission during peak hour by encouraging public bikes, walking, or transit for private vehicles [35]. Despite these limitations, trip chaining provides a both practical and less-costly method to deal with fast increasing carbon emission, especially when it comes to policy makers to decide whether or not to invest on some massive public transportation system such as railway system. Traditional approaches to travel modeling cannot effectively incorporate the inherent complexity of trip chaining and, consequently, much of the existing literature on trip chaining is highly technical and focuses on how to better model this behavior, but they are usually not very easy to deal with. For example, Golob [44] used canonical correlation analysis to identify a bunch of effective variables that can explain trip chaining behavior best, and these are life cycle, age and income. Kitamura [67] developed a sequential, history-dependent Markovian model with some simplification on the representation of the history of the chain. A model proposed by Lerman [82] combines both disaggregate choice model and semi-Markov process, and the model is tested at Rochester, New York. Some other approaches are activity-based that can better model complex tours. On the other hand, some researchers will simplify the problem to make it easier to handle. As an example, the paper [87] acknowledges the importance of multi-stop trips (which they call “trip chaining”) in calculating carbon 12 footprints, and assumes constant values for trip lengths, such as 12.8 miles for an average shopping trip by car and a constant number of stops: Shopping can be part of a wider combined trip and involve only a minor detour. We assume that where a shopper undertakes trip chaining, the shopping component of the trip makes up a quarter of the overall total mileage. A second example can be found in [124], which considers a closely related problem in which a customer receiving a package can specify multiple locations at which the delivery service may drop off the package (e.g. “please drop off my package at my home, my work, my gym, or my friend’s house”). The analysis therein is based on Monte Carlo simulation and is highly sensitive to problem specifics, and the issue of trip chaining at the household level is addressed by assigning a fixed amount of trip chaining to estimate marginal costs (the authors also cite [22], which makes a similar assumption): Generally, social network members will not participate or choose the burden of pickup if they have to go to a pickup point solely for the purpose of making a pickup for another person. Pickup trips for social network actors can be regarded as a chain event and is a determining variable. We assumed a 100% trip chain to additional mileage for pickup in both PLS and SPLS – in other words, the entire detour distance for pickup is attributed to the package. By contrast, previous research has applied a 0% trip chain effect for pickup. [22] 13 Notational conventions The notational conventions for this proposal are summarized in Table 2.1. We will also make use of some standard k·k Euclidean distance inR d R S ·dx Standard Euclidean integral overS⊂R d Γ(·) The gamma function Vol(·) Euclidean volume B d (r) The ball of radius r in dimension d, always centered at the origin S d−1 (r) The surface area of a (d− 1)-sphere of radius r Table 2.1: Notational conventions conventions in asymptotic analysis: • We say thatf(x)∈O(g(x)) if there exists a constantc and a valuex 0 such thatf(x)≤c·g(x) for allx≥x 0 , • We say thatf(x)∈ Ω(g(x)) if there exists a constantc and a valuex 0 such thatf(x)≥c·g(x) for allx≥x 0 , and • We say that f(x)∼g(x) if lim x→∞ f(x)/g(x) = 1. When necessary, we will clarify this notation in some particular cases because we are interested in limiting behavior that concerns two values, the number of locations n and the number of choices k, and such notation is known to introduce complicated ambiguities [32, 57]. 14 Chapter 3 Summary of key facts and findings from related work The first three lemmas are stated without proof and are standard textbook material. The rest are related to the famous Beardwood-Halton-Hammersley (BHH) Theorem[17]. Lemma 1. Let f :R→R be a real-valued function and letB d (r)⊂R d be a ball of radius r centered at the origin. We have Z B d (r) f(kxk)dx = Z r 0 S d−1 (t)f(t)dt, where S d−1 (t) is the surface area of a (d− 1)-sphere of radius t, which is given by S d−1 (t) = 2π d/2 Γ(d/2) t d−1 . Lemma 2. The volume of a d-dimensional ball of radius r is π d/2 r d /Γ(d/2 + 1). Lemma 3. The gamma function Γ(x) satisfies log Γ(x + 1) =x logx−x + 1 2 logx + 1 2 log 2 + 1 2 logπ +O(1/x) as x→∞. The following lemma gives an upper bound for an arbitrary TSP tour. Lemma 4. For any set ofn 0 pointsx 1 ,...,x n 0 contained in a square of areaA 0 , the length of the optimal TSP tour through x 1 ,...,x n 0, denoted TSP(x 1 ,...,x n 0 ), satisfies TSP(x 1 ,...,x n 0 )≤ √ 2A 0 n 0 + 7 /4 √ A 0 < α 1 √ A 0 n 0 , where α 1 = 2.7. 15 Proof. See [36, 63]. This result can be stated more strongly in a probabilistic fashion as in the celebrated Beardwood-Halton- Hammersley (BHH) Theorem: Lemma 5. Suppose that{X 1 ,X 2 ,...} is a sequence of random points i.i.d. according to the uniform distribution defined on a compact planar regionR with area A. Then with probability one, lim N→∞ TSP(X 1 ,...,X N ) √ AN =β, where β is a constant. Proof. See for example [17, 107, 123]. It is additionally known that 0.6250≤β≤ 0.9204 and conjectured that β≈ 0.7124; see [10, 38]. Furthermore, thereisamoregeneralizedversionofBHHtheoremthatdealswitharbitrarydimensionandnon-uniformdistribution of points. Lemma 6 (BHH Theorem). For any dimensiond, there is a dimension-dependent constantβ d such that, for almost any sequence of independent random variables{X i } sampled from an absolutely continuous density f on R d with compact support, we have lim n→∞ TSP(X 1 ,...,X n ) n (d−1)/d =β d Z R d f(x) (d−1)/d dx with probability one. 16 Chapter 4 Analysis of the generalized TSP when n→∞ In this chapter, we will first give the formal definition of the GTSP, and then probabilistic analysis will be performed on the case where the number of sets goes to infinity. Definition 7. Given n sets of pointsX 1 ,...,X n in the plane, the generalized TSP tour GTSP(X 1 ,...,X n ) is the shortest cycle that contains one element from each point setX i . See Figure 1.1 on page 2 for an example. Clearly, when each point set is a singleton (i.e.X i ={x i } for alli), we see that GTSP(X 1 ,...,X n, ) = TSP(x 1 ,...,x n ). We will commit a minor abuse of notation throughout this paper by using the term GTSP(·) to refer both to the tour itself and to its length. The GTSP is, of course, a generalization of the TSP, which has been analyzed extensively from a geometric probabilistic perspective [107, 122]. However, many of these results for the TSP cannot be generalized in a straight- forward way to the GTSP. To give one example, page 30 of [123] establishes a simple nearest-neighbor argument that explains why ETSP(X 1 ,...,X n )∈ Ω( √ n) for independent, uniformly distributed X i in the unit square: for any point X i , it can be shown that E min j:j6=i kX i −X j k∈ Ω(1/ √ n), from which the desired result follows by summing over alln points. This does not carry over to the GTSP because we havek 1 +k 2 +··· +k n points in total and we are summing over only n of them. It is also worth noting that not like TSP, there are two kinds of limits of interest to us, namely the case where the number of point sets gets large, i.e. n→∞, and the case where the cardinality of these point sets gets large, 17 i.e. average of k i ’s goes to infinity, and the latter does not appear to have much connection to previous works on the TSP. 4.1 Analysis under uniform distribution The following theorem describes the behavior of the GTSP when we fix|X i | =k for all i and we let n→∞: Theorem 8. Let d,k≥ 2 be fixed and letX 1 ,...,X n be point sets of cardinality k that are all drawn independently and uniformly at random in the unit cube in R d . Then GTSP(X 1 ,...,X n )∈ Θ( √ d(n d−1 /k) 1/d ). More precisely, for any d≥ 2, there exist constants a d and b d such that, for all fixed k≥ 2, we have Pr a d ≤ GTSP(X 1 ,...,X n ) √ d(n d−1 /k) 1/d ≤b d → 1 (4.1) as n→∞. The following values of a d and b d in Table 4.1 satisfy (4.8): d a d b d 2 0.3421 1.3016 3 0.3035 0.9266 4 0.2863 0.7425 5 0.2766 0.6543 ≥ 6 1/ √ 2πe> 0.2419 d d−1 · 12 1/(2d) / √ 6< 0.6027 Table 4.1: Valid constants for a d and b d . To prove Theorem 8, we need to prove a few probabilistic lemmas first. The first lemma is fairly routine volume computation. Lemma 9. Let `> 0 and letD⊂R dn denote the set of all n-tuples (x 1 ,...,x n ) of points in R d such that kx 1 k + n X i=2 kx i −x i−1 k≤`. 18 The volume ofD, Vol(D), satisfies Vol(D) = 2π d/2 Γ(d/2) n · Γ(d) n Γ(dn + 1) ·` dn . (4.2) Proof. This becomes much easier after we apply the (volume-preserving) change of variables u 1 = x 1 u 2 = x 2 −x 1 . . . u n = x n −x n−1 , so that we are equivalently interested in the volume of the regionD 0 defined by D 0 = ( (u 1 ,...,u n )∈R dn : n X i=1 ku i k≤` ) . This can be expressed as the integral Z B d (`) Z B d (`−kunk) ··· Z B d (`− P n i=3 kuik) Z B d (`− P n i=2 kuik) 1du 1 du 2 ··· du n−1 du n , which we can compute using an inductive argument. We first note that the desired relationship holds for the base case n = 1 because Vol(D 0 ) = Z B d (`) 1du 1 = Vol(B d (`)) = π d/2 ` d Γ(d/2 + 1) by Lemma 2, and indeed, evaluating (4.2) at n = 1 yields 2π d/2 Γ(d/2) · Γ(d) Γ(d + 1) ·` d = 2π d/2 Γ(d/2) · 1/2 d/2 ·` d = π d/2 ` d Γ(d/2 + 1) 19 as desired. Our induction hypothesis says that, for any ` 0 > 0, we have Z B(` 0 ) Z B(` 0 −kun−1k) ··· Z B(` 0 − P n−1 i=3 kuik) Z B(`− P n−1 i=2 kuik) 1du 1 du 2 ··· du n−2 du n−1 = 2π d/2 Γ(d/2) n−1 · Γ(d) n−1 Γ(d(n− 1) + 1) · (` 0 ) d(n−1) , and by setting ` 0 =`−ku n k in our original expression, we see that Z B(`) Z B(`−kunk) ··· Z B(`− P n i=3 kuik) Z B(`− P n i=2 kuik) 1du 1 du 2 ··· du n−1 du n = Z B(`) Z B(`−kunk) ··· Z B(`− P n i=3 kuik) Z B(`− P n i=2 kuik) 1du 1 du 2 ··· du n−1 ! du n = Z B(`) 2π d/2 Γ(d/2) n−1 · Γ(d) n−1 Γ(d(n− 1) + 1) · (`−ku n k) d(n−1) du n = 2π d/2 Γ(d/2) n−1 · Γ(d) n−1 Γ(d(n− 1) + 1) Z B(`) (`−ku n k) d(n−1) du n = 2π d/2 Γ(d/2) n−1 · Γ(d) n−1 Γ(d(n− 1) + 1) Z ` 0 S d−1 (t)(`−t) d(n−1) dt (4.3) = 2π d/2 Γ(d/2) n · Γ(d) n−1 Γ(d(n− 1) + 1) Z ` 0 t d−1 (`−t) d(n−1) dt = 2π d/2 Γ(d/2) n · Γ(d) n−1 Γ(d(n− 1) + 1) · ` dn Γ(d(n− 1) + 1)Γ(d) Γ(dn + 1) = 2π d/2 Γ(d/2) n · Γ(d) n Γ(dn + 1) ·` dn as desired, where we have used Lemma 1 in line (4.3). It is not difficult to derive a result concerning the TSP on a set of points: Corollary 10. Let X 0 be the origin in R d and let X 1 ,...,X n be a collection of independent, uniform samples drawn from the unit cube in R d . Then Pr(TSP(X 0 ,X 1 ,...,X n )≤`)≤ Γ(n + 1)· 2π d/2 Γ(d/2) n · Γ(d) n Γ(dn + 1) ·` dn . 20 Proof. It is easy to see that Pr kX 1 k + n X i=2 kX i −X i−1 k≤` ! | {z } (∗) ≤ 2π d/2 Γ(d/2) n · Γ(d) n Γ(dn + 1) ·` dn ; (4.4) this is because we can regard the samples X 1 ,...,X n as being a single sample drawn uniformly from the unit cube inR dn , so that the probability of interest (∗) is simply the probability that this single sample lies in the domainD described in Lemma 9. This probability is of course equal to Vol(D∩ [0, 1] dn )≤ Vol(D), which gives us the desired inequality (4.4). We obtain our corollary by applying the union bound to (4.4) over all n! = Γ(n + 1) permutations of X 1 ,...,X n . Now we are ready to prove Theorem 8. Proof. To derive the lower bound, let E be the event that GTSP(X 1 ,...,X n ) < a d √ d n d−1 /k 1/d . Applying the union bound to Corollary 10 and using the fact that there are k n different possible ways to select one member from each setX i , we see that Pr(E) ≤ k n · Γ(n + 1)· 2π d/2 Γ(d/2) n · Γ(d) n Γ(dn + 1) · " a d √ d n d−1 k 1/d # dn =⇒ log Pr(E) ≤ log Γ(d)− log Γ(d/2) +d loga d + d 2 logd + d 2 logπ + log 2 n + log Γ(n + 1)− log Γ(dn + 1) + (d− 1)n logn = log Γ(d)− log Γ(d/2) +d loga d + d 2 logπ + log 2− d 2 logd +d− 1 n (4.5) − 1 2 logd +O(1/n). where we have applied Lemma 3. We see that (4.5)→−∞ if and only if the coefficient of n is negative: 0 > log Γ(d)− log Γ(d/2) +d loga d + d 2 logπ + log 2− d 2 logd +d− 1 m a d < √ d 2 π (d−1)/2 Γ d+1 2 e (d−1) 1/d ; (4.6) 21 it is straightforward to verify that the terms in Table 4.1 satisfy this relationship. The upper bounding terms b d are very simple: from each point setX i , we let point X i = (x 1 ,...,x d ) be the member ofX i whose first entryx 1 is the smallest. It is then not hard to see that the pointsX i follow the probability distribution f(x = (x 1 ,...,x d )) =k(1−x 1 ) k−1 . The BHH theorem says that a TSP tour of a collection of points following this distribution must be proportional (with probability one) to β d n (d−1)/d Z [0,1] d f(x) (d−1)/d dx =β d n (d−1)/d · dk (d−1)/d (d− 1)k + 1 < β d d d− 1 n d−1 k 1/d , (4.7) and so the terms forb d in Table 4.1 are simply obtained by taking the best-known upper bounds ofβ d and evaluating β d √ d/(d− 1). We used the bounds of β d from Section 8.5 of [38]. It is not hard to show that the preceding argument applies to any compact region with unit area, as opposed to merely the unit cube. 4.2 Analysis under clustering In this section we apply Corollary 10 to study the case where samples are not independently drawn; rather, we will assume that each point setX i consists of points that are clustered together. This assumption holds in the vast majority of instances of the GTSP; for example, the benchmark data in [40] is constructed by selecting problems from the TSPLIB library [108] and then grouping point sets together based on proximity. Our model of clustering is as follows: we assume that we are given a fixed (compact) Jordan measurable shape S of arbitrary volume, and that eachX i is obtained by placingS uniformly at random in the unit cube [0, 1] d and sampling k points uniformly withinS. In order to sidestep boundary effects that might occur by havingS only partially contained in the cube, we assume that the uniform placement ofS is done in a “toroidal” fashion (in which 22 1 1 1 5 5 5 6 6 6 Figure 4.1: A generalized TSP tour of six sets of pointsX 1 ,...,X 6 , each consisting of k = 3 points, whereS is a “jellybean” type shape that is placed uniformly at random in a “toroidal” fashion in the unit square. opposing sides of the cube are “glued” together), as suggested in Figure 4.1. It turns out that the bounds from Theorem 8 remain valid even in this situation but with different constants for the upper bounds: Theorem 11. The bounds from Theorem 8 remain valid in the same order in the sense of expectation when point setsX i are sampled from a uniformly placed shapeS as described above, i.e., a d ≤ EGTSP(X 1 ,...,X n ) √ d(n d−1 /k) 1/d ≤b 0 d (4.8) as n→∞, where constants a d are the same in Theorem 8, and b 0 d satisfies the following table d 2 3 4 5 ∞ b 0 d 1.4142 1.2684 1.2169 1.1872 1 Proof. We first note that, if one selects points X 1 ∈ X 1 ,...,X n ∈ X n arbitrarily, then the resulting samples X 1 ,...,X n are still uniformly and independently drawn in the cube, by virtue of the fact that S was placed uniformly at random. Hence, we can again apply the union bound to Corollary 10 exactly as in the proof of Theorem 8, so that the lower bounds a d from Table 4.1 still apply. Thus, our proof is complete if we can show 23 r r' x1 x3 x2 x5 x4 d3 d1 d4 d2 d5 O Figure 4.2: This is a (d− 1)-dimensional cube (d≥ 2, if d = 2, the problem reduces to the one dimension case), with side length being r, centered at O, and r 0 is the half diagonal. that the upper bounds b d apply as well. Note that in Theorem 8, we derived those upper bounds by selecting the member of eachX i whose first entry was the smallest. It is obvious that this is not guaranteed to work here. To prove the upper bound in the clustered setting, we must first introduce a probabilistic lemma: Lemma 12. For a (d− 1)-dimensional (d≥ 2) cube with side length r (if d = 2, the problem is reduced to one- dimensional, and the result obviously holds), as shown in Figure 4.2, if we sample k points{x 1 ,x 2 ,...,x k } inside the cube, and letd i denote the distance betweenx i and the center of the cube. DefineD =min{d 1 ,d 2 ,...,d k }, then ED≤ Γ( d d−1 ) √ d− 1 2 · 1 k 1 d−1 ·r. Proof. Letp k (x) be the probability thatk points sampled in the cube are all at distance at least x from the center. Then we have ED = Z r 0 0 p k (x)dx, 24 where r 0 = √ d−1 2 r. Now we try to give an upper bound of p k (x). When each of k points is drawn, the probability that it’s with in x of the center is at least π d−1 2 x d−1 Γ( d−1 2 +1) · r d−1 π d−1 2 r 0d−1 r d−1 = x d−1 (r 0 ) d−1 . Note that we can explain the left hand side of the formula above in the following way - π d−1 2 x d−1 Γ( d−1 2 +1) is the volume of a ball that is centered at the center of the cube with radius x, and at least this fraction r d−1 π d−1 2 r 0d−1 of the volume will be contained in the cube. Thus we have p k (x)≤ (1− x d−1 (r 0 ) d−1 ) k . It follows that ED ≤ Z r 0 0 (1− x d−1 (r 0 ) d−1 ) k dx = r 0 Z 1 0 (1−y d−1 ) k dy = Γ( d d−1 )Γ(k + 1) Γ( d d−1 +k) ·r 0 = Γ( d d−1 )Γ(k + 1) √ d− 1 2· Γ( d d−1 +k) ·r. To compete the proof, it suffice to prove the following Γ(k + 1) Γ( d d−1 +k) ≤ 1 k 1 d−1 . Define the follow function g(k) and it suffices to prove g(k)≤ 1,∀k∈N + g(k), Γ(k + 1) Γ( d d−1 +k) ·k 1 d−1 . 25 Let’s look at the following ratio g(k + 1) g(k) = Γ(k+2) Γ( d d−1 +k+1) · (k + 1) 1 d−1 Γ(k+1) Γ( d d−1 +k) ·k 1 d−1 = k + 1 d/(d− 1) +k · (k + 1) 1 d−1 k 1 d−1 = (k + 1) d d−1 (k + d d−1 )·k 1 d−1 . Define h(k) to be following h(k), log( g(k + 1) g(k) ) = d d− 1 log(k + 1)− log(k + d d− 1 )− 1 d− 1 log(k). Take the derivative of h(k) h 0 (k) = d d− 1 · 1 k + 1 − 1 k + d d−1 − 1 d− 1 · 1 k =− d (d− 1)k(k + 1)(dk +d−k) < 0. We conclude that h(k) is decreasing with respect to variable k, so g(k+1) g(k) is also decreasing with respect to k. It’s obvious that lim k→∞ g(k + 1) g(k) = lim k→∞ (k + 1) d d−1 (k + d d−1 )·k 1 d−1 = 1. So we have g(k + 1) g(k) > 1,∀k∈N + , which implies that g(k) is increasing with respect to k. According to a well-known result [127] lim n→∞ Γ(n)·n α Γ(n+α) = 1,∀α∈C, we claim that lim k→∞ g(k) = lim k→∞ Γ(k + 1) Γ( d d−1 +k) ·k 1 d−1 = 1, which implies that g(k)≤ 1,∀k∈N + , as desired. Now we are ready to prove the upper bound using the idea in a recently published paper [27] by constructing a zigzagging tour. Let’s look at the figure 4.3. For any fixed shape S, because of its Jordan measurability, for any ε > 0, we can put an m×m square integer lattice that is fine enough such that if we put a shape S in the unit 26 x x x Start End (a) x x x Start End (b) Figure 4.3: This figure demonstrates how the zigzagging tour is disturbed in a two-dimensional (d = 2) clustered GTSP instance where n = 2,k = 3. In each cluster, we pick the point that is nearest to the zigzagging tour, and disturb the zigzagging tour to connect that point by two straight lines. In any arbitrary dimension d, we perturb the zigzagging tour in the same manner, namely in each cluster, picking the point that is nearest to the zigzagging tour and connect it to the tour by two straight lines. square, the area that is not completely covered by a unit cell of the lattice is at most ε, where a unit cell is a square of 4 points in the 2-dimensional case, see Figure 4.4. It’s easy to generalize the lattice to an arbitrary d-dimensional case where we will have an m d lattice, and the same ε-covering still applies because shape S is Jordan measurable. For the following proof, we fix the dimension d. As shown in the Figure 4.3, letR be the region and letP be the path that traverses all grid points of the lattice in a zigzagging matter. Also we assume that m is an even number. This can always be achieved because for any ε-covering, we can always make it finer such that m will be an even number. The pathP starts at the upper leftmost corner ofR and move downwards to visit every single grid point and go back to the starting point. The total length of this tour is obviously less thanm+2. Ifd = 3, we evenly divide the 3-dimensional cube intom layers of 2-dimensional planes, and each plane is actually a copy of regionR. To construct the zigzagging tour, we start at the upper leftmost corner of the top plane, visit each grid point on that plane, and end at the lower leftmost corner, then move downwards to the second plane, visit every grid point on it in the same way and moves to the 27 next plane. After the tour visits every grid points in the cube, it goes back the starting point on the top plane in a straight line. The total length of the tour is at mostm(m + 1) + 1 =m 2 +m + 1. Ifd = 4, we cut the 4-dimensional cube into m 3-dimensional cube, the zigzagging tour visits each grid point in each 3-dimensional cube as described above and move to the next cube until all grid points are visited. The length of the tour is at mostm 3 +m 2 +m+1. By a simple induction, we can construct this zigzagging tour with length at most m d +m d−1 +··· +m + 1 in a d-dimensional case. Givenk clustersX 1 ,...,X n in thed-dimensional cube, we will perturbP to form a newP 0 that visits one point from each point setX i by simply inserting a pair of line segments betweenP and the nearest point (measured in each possible direction toP) in each clusterX i , as shown in Figure 4.3b. Let X be the random variable that denotes the distance between the nearest point in a cluster to zigzagging pathP. Obviously, EX ≤ ED· (1−ε) k + 2r 0 · (1− (1−ε) k ) ≤ ED +r 0 · (1− (1−ε) k ), where random variableD is defined in Lemma 12,r 0 = √ d−1 2 ·r andr = 1 m (will be used later) are defined in Figure 4.4. According to Lemma 12, ED≤ Γ( d d−1 ) √ d− 1 2 · 1 k 1 d−1 ·r =ED≤ Γ( d d−1 ) √ d− 1 2 · 1 k 1 d−1 · 1 m . Note that for any δ> 0, we can let m be large enough, so that the length ofP satisfies m d−1 +m d−2 +··· +m + 1≤ (1 +δ)m d . 28 r Figure 4.4: This is a d-dimensional unit cell, centered at O, with side length r = 1 m . Our objective is therefore to choose a value m in order to minimize the total length ofP plus the additional n deviations, i.e., to minimize (1 +δ)·m d−1 + 2n·ED + 2n∗ (1− (1−ε) k )· √ d− 1 2m , or equivalently, (1 +δ)· " m d−1 + n √ d− 1· Γ( d d−1 )k − 1 d−1 +n· (1− (1−ε) k )· √ d− 1 1 +δ ! · 1 m # , where we have an additional multiplier “2” since each deviation consists of an outbound and inbound trip. As n becomes large, we see that the optimal m satisfies m ∗ ∼ d− 1 C −1/d , where C = n √ d− 1· Γ( d d−1 )k − 1 d−1 +n· (1− (1−ε) k )· √ d− 1 1 +δ . 29 Plug the optimal m ∗ back into the objective function, we obtain a total length that satisfies length(P 0 ) ∼ (d− 1)(δ + 1) √ d− 1nk − 1 d−1 Γ d d−1 + √ d− 1n(1− (1−ε) k ) −1/d d−1 + √ d− 1nk − 1 d−1 Γ d d−1 + √ d− 1n(1− (1−ε) k ) (d−1)(δ+1) √ d−1nk − 1 d−1 Γ( d d−1 )+ √ d−1n(1−(1−ε) k ) ! 1/d δ + 1 . Thus we have lim sup n→∞ EGTSP(X 1 ,...,X n ) length(P 0 ) ≤ 1, where length(P 0 ) is defined above. Since ε and δ are any arbitrary positive number, we can let both of them approach 0, i.e., ε→ 0 + and δ→ 0 + , which gives us lim sup n→∞ EGTSP(X 1 ,...,X n ) √ d(n d−1 /k) 1/d ≤ √ d−1 Γ( d d−1 ) −1/d ! d−1 + √ d− 1 √ d−1 Γ( d d−1 ) 1/d Γ d d−1 √ d . (4.9) This proves the upper bounds in Theorem 11. 4.3 Another way to tackle clustering Another possibility to deal with the phenomenon of clustering is the TSP with neighborhoods [34], which is a special case of the GTSP in which each point setX i is a ballB i of radiusr (as opposed to being a finite set of points) that is centered at a point x i , and the goal is to find the shortest tour that touches every ball; see Figure 4.5a for an example. We will derive asymptotic expressions for the TSP with neighborhoods with the help of the lemma below: Lemma 13. LetB 1 ,...,B n be a collection of balls of radius r, centered at points x 1 ,...,x n , all of which are contained in the unit square. We have 30 (a) (b) Figure 4.5: Figure 4.5a shows a tour of n = 20 neighborhoods; the optimal tour intersects each ball and is the shortest such tour to do so. Figure 4.5b shows that one can always augment an optimal tour in such a way as to touch the centers of each ball. TSP(x 1 ,...,x n )− 2nr≤GTSP(B 1 ,...,B n )≤ min TSP(x 1 ,...,x n ), 1 2r + 3 . Proof. The leftmost inequality holds because one can always make a tour that touches x 1 ,...,x n by augmenting the tour GTSP(B 1 ,...,B n ) with n line segments of length at most r, as shown in Figure 4.5b. The fact that GTSP(B 1 ,...,B n )≤ TSP(x 1 ,...,x n ) is obvious. The fact that GTSP(B 1 ,...,B n )≤ 1 2r + 3 is due to essentially the same idea as that expressed in Figure 4.3a; if we construct a tour that traverses the width ofR horizontally a total ofd 1 2r e times (with the “+3” term added because we also travel one unit down and one unit up as before and because, ifd 1 2r e is odd, then we must make one additional horizontal traversal), then we must touch each ball at some point. Using the fact that TSP(x 1 ,...,x n )∼β √ n for uniformly distributed points x i , we can write an approximation of Lemma 13 for large n as β √ n− 2nr>GTSP(B 1 ,...,B n )> min β √ n, 1 2r + 3 where the notation “>” reflects the lower-order terms that we are dropping by introducing the square root approx- imation. There are two aspects of the inequalities above that need correction: the first is that we can tighten the 31 left-hand inequality by using the fact that β √ n 0 − 2n 0 r > GTSP(B 1 ,...,B n ) for all n 0 ≤ n. The lower bound is maximized when n 0 = β 2 16r 2 , at which the bound evaluates to β √ n 0 − 2n 0 = β 2 8r . Thus, a tighter lower bound is to use β √ n− 2nr if n< β 2 16r 2 and to use β 2 8r otherwise. The second correction is that we should not require that r be constant, because this would result in both inequalities becoming constant as n→∞. Thus, we represent r as a sequence indexed by n,{r n }, with the assumption that r n → 0 as n→∞. In summary, our new bounds are β √ n− 2nr n if n< β 2 16r 2 n β 2 8rn otherwise >GTSP(B 1 ,...,B n )> min β √ n, 1 2r n , and it is straightforward to verify that the left- and right-hand sides of the inequalities above are always within a factor of 4/β 2 of one another. 32 Chapter 5 Analysis of the generalized TSP when|X i |→∞ This chapter studies the limiting behavior of the GTSP when we assume that the number of sets,n, is fixed, and the cardinalities of each setX i become large. In order to describe these n cardinalities in terms of a single parameter, we assume that|X i | = k i = tq i , where q is a vector of probabilities, and we let the single parameter t approach infinity. Before stating the main theorem for this chapter, we’ll first introduce a combinatorial lemma. Lemma 14. LetL ⊂ Z 2 denote an m×m square integer lattice in the plane, let n≥ 2 be an integer, and let `> 0. LetP denote the set of all paths of the form{x 1 ,...,x n }, with x i ∈L for each i, and whose length does not exceed `. Then |P|≤m 2 · ` +n− 1 n− 1 · 8` n− 1 n−1 . Proof. We thank Douglas Zare. Note that we have allowed “replacement” in the construction of P, i.e. we are also considering paths in which x i = x j for some (i,j) pairs. We note that any elementP∈P can be uniquely described by specifying the triplet (x,d,q) defined as follows: • The point x∈L is simply the first member ofP. • The n-tuple d ={d 1 ,...,d n } represents the distance travelled from each point to the next, measured in the ` ∞ norm. In other words, for a path{x 1 ,...,x n }, we have d i =kx i+1 −x i k ∞ . Note that d n is not defined according to this definition; we therefore define d n := `− P n−1 i=1 d i , so that P n i=1 d i = ` for all valid d. Obviously, we have 0≤d i ≤` for all i. 33 (a) d 2 = 5 d 1 = 4 d 3 = 2 (b) (c) Figure 5.1: The above path has a length of approximately 13.8, and therefore is a member ofP if we have` equal to (say) 15. We will construct its triplet (x,d,q) as follows: obviously, we have x = (5, 5) (the first element of the path), as indicated in 5.1a. Figure 5.1b shows that the ` ∞ distances between the four points are d 1 = 4, d 2 = 5, andd 3 = 2, respectively, which would then imply that d 4 =`− (d 1 +d 2 +d 3 ) = 4. Finally, the construction of q is shown in 5.1c: given a point x i on the path and a distance d i , there are 8d i possible places where the consecutive point x i+1 could be located. The path shown has q 1 = 6, q 2 = 35, and q 3 = 1. • The (n− 1)-tuple q ={q 1 ,...,q n−1 } represents the “angles” between pairs of points. Specifically, given a point x i and a corresponding distance d i , we see that there are at most 8d i possible places where x i+1 could be located (since x i+1 must lie on the boundary of a square of side length 2d i centered at x i ). The element q i specifies which of these is the correct location of x i+1 . An example of this is shown in Figure 5.1. We will bound|P| from above by looking at the set of all triplets (x,d,q) such thatx∈L, P n i=1 d i =`, andq i ≤ 8d i for alli. Assume without loss of generality that` is an integer, and letD denote the set of all permissible n-tuplesd. By construction, of course,D is simply the set of all integer n-tuples{d 1 ,...,d n } such that d i ≥ 0 and P n i=1 d i = `. Given any d∈D, let d 0 denote the n-tuple defined by setting d 0 i = d i + 1 for all i. We then see that P n i=1 d 0 i = ` +n and that 1≤ d 0 i ≤ ` +n for all i. As Figure 5.2 suggests, each d 0 corresponds to a selection of n− 1 elements out of ` +n− 1 possibilities. Thus, we see that |D|≤ ` +n− 1 n− 1 . 34 (a) (b) Figure 5.2: The figures above correspond to the case where n = 4 and ` = 8. As 5.2a shows, any n-tuple d can be uniquely represented by simply placing n points on the number line from 0 to `, where the first point is placed a distanced 1 from the origin and each subsequent pointi is placed a distanced i to the right of its predecessor. Since d 4 is defined so that the entries of d sum to `, the last such point that is placed must be precisely at a distance ` from the origin. The placement shown corresponds to the case d = (3, 2, 0, 3). In 5.2b, we show the same point placement for the n-tuple d 0 which is obtained by adding 1 to each of the entries of d. It is obvious that any such n-tuple d 0 can be uniquely constructed by selecting n− 1 = 3 points from the ` +n− 1 = 11 valid locations of points in 5.2b and computing the sequential distances between those points; the diagram corresponds to the case d 0 = (4, 3, 1, 4). It is simpler to bound the setQ of all valid (n− 1)-tuples q. For any fixed d, the number of possible choices of q is at most 8 n−1 Q n−1 i=1 d i . By the AM-GM inequality, using the fact that P n−1 i=1 d i ≤`, we see that 8 n−1 n−1 Y i=1 d i ≤ 8 n−1 n−1 X i=1 d i n− 1 ! n−1 = 8` n− 1 n−1 and thus |Q|≤ 8` n− 1 n−1 . Finally, since there are m 2 choices of the initial point x in the triplet (x 1 ,d,q), we conclude thatP satisfies |P| ≤ m 2 · ` +n− 1 n− 1 · 8` n− 1 n−1 as desired. Theorem 15. LetX 1 ,...,X n denote n sets of points, each having cardinality k i > 0, and suppose that all P n i=1 k i points are distributed independently and uniformly at random in a regionR having area A. Further assume that k i =tq i for all i (where q is a probability vector), and let k G = ( Q n i=1 k i ) 1/n =t( Q n i=1 q i ) 1/n be the geometric mean of the k i ’s. Then the expected length of a generalized TSP tour ofX 1 ,...,X n satisfies EGTSP(X 1 ,...,X n )∈O r An k G ! 35 and EGTSP(X 1 ,...,X n )∈ Ω s An k n/(n−1) G ! as t→∞ with n and q fixed. In particular, there exist constants α 1 < 2.7 and α 2 > 0.0681 such that, for any n≥ 2, there exists a threshold ¯ k such that EGTSP(X 1 ,...,X n )≤α 1 r An k G (5.1) and EGTSP(X 1 ,...,X n )≥α 2 s An k n/(n−1) G (5.2) whenever k G ≥ ¯ k. In addition, the upper bound (5.1) can be tightened as follows: EGTSP(X 1 ,...,X n )≤α 1 s An k n/(n−1) G · (n 2 logk G + logn) 1 2(n−1) (5.3) whenever k G ≥ ¯ k. Proof. As in the proof of Theorem 8, we will assume that the service regionR is the unit square. The proof of the upper bound (5.1) proceeds as follows: since we are examining the limiting behavior ofEGTSP(X 1 ,...,X n ) for large k G , we can divide the regionR into k G squares 1 ,..., k G of area 1/k G . By Lemma 4, we see that if one of the k G squares i happens to contain an element from each of the n point setsX 1 ,...,X n , then GTSP(X 1 ,...,X n )≤ α 1 p n/k G . Ifnoneofthesquareshavethisproperty,thenasacrudeupperboundwesimplyuseGTSP(X 1 ,...,X n )≤ α 1 √ n, so that EGTSP(X 1 ,...,X n )≤pα 1 p n/k G + (1−p)α 1 √ n, where p is the probability that one of the k G squares contains an element from eachX i . Thus, our goal is now to show that p→ 1 at a sufficiently rapid rate as k G →∞. 36 Our proof now requires a Poissonization argument [123]: for each of the k G boxes i , let Y i 1 ,...,Y i n denote n point sets uniformly and independently distributed within i , where|Y i j | follows a Poisson distribution with mean kj k G = qj ( Q n i=1 qi) 1/n . If we define point setsY 1 ,...,Y n by setting Y j = k G [ i=1 Y i j , then it is immediately obvious that E(|Y j |) =k j for all j; it is also easy to verify that the distribution of the point setsX 1 ,...,X n is the same as the distribution of the point setsY 1 ,...,Y n , conditioned on the event that|Y j | =k j for all j (see page 100 of [88], for example). For a particular box i , the probability that i contains at least one element from each of the setsY j is Pr( i contains at least one from eachY j ) = Pr(|Y i 1 |≥ 1)··· Pr(|Y i n |≥ 1) = n Y j=1 (1−e −kj/k G ) and therefore, Pr(none of the boxes i contains at least one from eachY j | {z } =:E ) = 1− n Y j=1 (1−e −qj ) k G . By the law of total probability, we have Pr(E) = ∞ X k 0 1 =0 ··· ∞ X k 0 n =0 Pr E |Y 1 | =k 0 1 ∩···∩|Y n | =k 0 n Pr |Y 1 | =k 0 1 ∩···∩|Y n | =k 0 n and therefore in particular, Pr(E)≥ Pr E |Y 1 | =k 1 ∩···∩|Y n | =k n Pr (|Y 1 | =k 1 ∩···∩|Y n | =k n ) . 37 We next observe that Pr (|Y 1 | =k 1 ∩···∩|Y n | =k n ) = n Y j=1 Pr(|Y j | =k j ) = n Y j=1 k kj j k j ! e −kj > 1 e n √ k 1 ···k n , where the last inequality holds because k 0 !<e √ k 0 (k 0 /e) k 0 for all positive integers k 0 (see Lemma 5.8 of [88]). We therefore conclude that 1− n Y j=1 (1−e −qj ) k G = Pr(E)> Pr E |Y 1 | =k 1 ∩···∩|Y n | =k n | {z } 1−p 1 e n √ k 1 ···k n , or in other words, 1−p< 1− n Y j=1 (1−e −qj ) k G e n p k 1 ···k n . (5.4) It then follows that EGTSP(X 1 ,...,X n ) ≤ pα 1 r n k G + (1−p)α 1 √ n ≤ α 1 r n k G +α 1 √ n 1− n Y j=1 (1−e −qj ) k G e n p k 1 ···k n . Our proof is therefore complete if we can show that, for any fixed n, we have α 1 q n k G +α 1 √ n h 1− Q n j=1 (1−e −qj ) i k G e n √ k 1 ···k n α 1 q n k G → 1 or equivalently that 1− n Y j=1 (1−e −qj ) k G e n k (n+1)/2 G → 0 38 as k G →∞. Taking natural logarithms, this is equivalent to proving that k G log 1− n Y j=1 (1−e −qj ) +n + n + 1 2 logk G →−∞ as k G →∞. Since 0 < e −qj < 1 for all j, we have 0 < 1− Q n j=1 (1−e −qj ) < 1, which guarantees that the inner term of the logarithm above is always negative. The limit above therefore holds, which completes the proof of the upper bound (5.1). In order to prove the tighter upper bound (5.3), the argument is nearly the same, except that we instead divide the regionR into b(k G ) squares 1 ,..., b(k G ) of area 1/b(k G ), where we set b(k G ) = k n/(n−1) G (n 2 logk G + logn) 1 n−1 . In the interest of brevity we will simply write b instead of b(k G ). Applying precisely the same reasoning as before, the counterpart to inequality (5.4) is now p> 1− 1− n Y j=1 (1−e −kj/b ) b e n p k 1 ···k n , so that EGTSP(X 1 ,...,X n )≤α 1 p n/b +α 1 √ n 1− n Y j=1 (1−e −kj/b ) b e n p k 1 ···k n . Our proof is therefore complete if we can show that, for any fixed n, we have α 1 p n/b +α 1 √ n h 1− Q n j=1 (1−e −kj/b ) i b e n √ k 1 ···k n α 1 p n/b → 1 or equivalently that 1− n Y j=1 (1−e −kj/b ) b e n p k 1 ···k n √ b→ 0 39 as k G →∞. This is straightforward. Taking logarithms, the above statement is equivalent to proving that b log[1− (1−e − k 1 b )··· (1−e − kn b )] + n 2 logk G + 1 2 logb→−∞, which is accomplished as follows: as k G →∞, we have k i /b→ 0, this is true regardless of limiting behavior of the particular k i . A Taylor series expansion shows that log[1− (1−e − k 1 b )··· (1−e − kn b )]∼−(1−e − k 1 b )··· (1−e − kn b )∼− k 1 k 2 ···k n b n =−( k G b ) n and thus we want to prove that n 2 log (tq) +b −k 1 k 2 ···k n b n + 1 2 logb→−∞, where q = (p 1 p 2 ···p n ) 1 n . Since q and n are constants, it is equivalent to show 1 2 log(t n b)− t n q n b n−1 →−∞. Substituting for b = (tq) n/(n−1) (n 2 log(kq)+logn) 1 n−1 , we see that 1 2 log(t n b)− t n q n b n−1 = ( 1 2 n+ n 2(n− 1) ) logt− log(n 2 logt +n 2 logq + logn) 2(n− 1) + n 2(n− 1) logq−n 2 logt−n 2 logq−logn Dropping some constant terms, it suffices to show (−n 2 + n 2 + n 2(n− 1) ) logt− log(n 2 logt +n 2 logq + logn) 2(n− 1) →−∞ as t→∞. It is clearly true. Thus upper bound is proved. 40 To prove the lower bound (5.2), we apply Lemma 14: we letL denote a lattice withinR of the form 0, 1 m , 2 m ,..., m− 1 m , 1 × 0, 1 m , 2 m ,..., m− 1 m , 1 and we assume that all of the elements of the setsX i lie in elements ofL (recall that we are assuming that m is arbitrarily large). By scaling the lattice of Lemma 14 by a factor of 1/m (i.e. our latticeL), we see that the number of paths inL whose length does not exceed length ` is at most m 2 · m` +n− 1 n− 1 · 8m` n− 1 n−1 ≤m 2 · (m` +n− 1) n−1 (n− 1)! · 8m` n− 1 n−1 for all ` > 0. Thus, ifP is the path obtained by selecting the first element from each setX i and visiting these elements in a sequence chosen uniformly at random, we see that lim sup m→∞ Pr(length(P)≤`) ≤ lim sup m→∞ m 2 · (m`+n−1) n−1 (n−1)! · 8m` n−1 n−1 m 2n = 8 n−1 ` 2n−2 (n− 1)!(n− 1) n−1 . The number of all possible GTSP tours throughX 1 ,...,X n is at mostn! Q n i=1 k i =n!·k n G , and therefore, it follows from the union bound that Pr(GTSP(X 1 ,...,X n )≤`) ≤ n!·k n G · 8 n−1 ` 2n−2 (n− 1)!(n− 1) n−1 = n·k n G · 8 n−1 ` 2n−2 (n− 1) n−1 . 41 We now set ` =c q n k n/(n−1) G with c = √ 6/24 to obtain EGTSP(X 1 ,...,X n ) ≥ 1− Pr GTSP(X 1 ,...,X n )≤c s n k n/(n−1) G ·c s n k n/(n−1) G ≥ c 1− (8c 2 ) n−1 · n n (n− 1) n−1 s n k n/(n−1) G ≥ 0.0681 s n k n/(n−1) G for n≥ 2 as desired, which completes the proof. Remark 16. The problem described in Theorem 8 is closely related to the BHH Theorem (i.e. Lemma 6), which corresponds to the special instance of our problem where k = 1. For the purpose of studying the household-level economies of scale obtained by multi-stop trips, we assert that Theorem 15 is more relevant. This is because a typical person rarely visits more than, say, 10 destinations in a given day, whereas there are likely to be much more than 10 banks, grocery stores, and so forth in a given metropolitan region. Remark 17. Our analysis of the GTSP has assumed that all of the demand points are independently and uniformly distributed in the unit square. Of course, there are many reasons why these assumptions might not hold, such as the presence of spatial competition or economies of density. Fortunately, many phenomena of this kind can already be addressed using our existing models by appropriately selecting the cardinalities|X i |: one example of this is the classical Hotelling model [56], which predicts that competing stores will often locate themselves immediately next to one another. Thus, although there might be a total of k i stores of type i inR, the total number of distinct locations of these stores would bek 0 i <k i , and it would be more realistic to assume that|X i | =k 0 i instead. Another example is the existence of shopping malls; here, suppose that tasks 1,...,i ∗ can be performed at shopping malls (with i ∗ ≤n, obviously), and that ¯ X denotes the set of shopping malls inR. One can then compare the cost of a tour that does not use a mall,GTSP(X 1 ,...,X n ), with the cost of a tour that uses a mall, GTSP( ¯ X,X i ∗ +1 ,...,X n ). Which of the two of these is shorter depends on i ∗ and the cardinality of| ¯ X|. 42 Chapter 6 Traveling Purchaser Problem In this chapter, we will consider a variation of the traditional Traveling purchaser problem. In our setting, we still have n sets of points, and each set contains k points, however, instead of visiting one point in each set as in the generalized traveling salesman problem, we have a fraction p (0<p< 1), and we would like to visit k·p (assuming k·p is an integer) points in each set to achieve the minimum length of the cycle. Again, we claim and prove asymptotic bounds for this variation of the Traveling purchaser problem. Here is a formal definition of the Traveling purchaser problem we are going to study. Definition 18. Given n sets of pointsX 1 ,...,X n in the plane, each set contains k points, and also a constant p between 0 and 1 such that k×p is an integer, the traveling purchaser problem is looking for a shortest cycle TPP(X 1 ,...,X n ) that contains k·p elements from each point setX i . See Figure 6.1 for an example. Now we are ready to present the asymptotic theorem on traveling purchaser problem. Theorem 19. Let k≥ 2 be a fixed integer, p ( 1 k ≤p≤ k−1 k ) be a fixed real number such that k×p is an integer, and letX 1 ,...,X n be point sets of cardinality k that are all drawn independently and uniformly at random in a unit square in R 2 . Then ETPP(X 1 ,...,X n )∈ Θ(p √ nk) as n→∞. In particular, there exists constants α 1 > 0.1497 and α 2 < 2 such that for any k≥ 2, there exists a threshold ¯ n, such that ETPP(X 1 ,...,X n )≥α 1 p √ nk and ETPP(X 1 ,...,X n )≤α 2 p √ nk, whenever n≥ ¯ n. 43 1 2 Figure 6.1: An example of our variation of Traveling purchaser problem. There are four sets of pointsX 1 ,...,X 4 withX 2 outlined for purpose of clarity. In this case,k = 4,p = 1 2 , thus the optimal tour containsk×p = 2 elements from each set (and is the shortest such tour to do so). 44 Proof. We will first prove the lower bound, and the idea will be similar to the lower bound proof of Theorem 8. By the nature of the traveling purchaser problem, we now haven×p×k points in a single valid tour. Thus to generalize Corollary 10 to the setting of traveling purchaser problem, we now have the following inequality in 2d dimension Pr(TSP(X 0 ,X 1 ,...,X npk )≤`)≤ (npk)!· (2π) npk · 1 Γ(2npk + 1) ·` 2npk . LetE be the event that TPP(X 1 ,...,X n )≤α 1 p √ nk. We now apply the union bound to the inequality above with the fact that there are [ k pk ] n different possible ways to select pk elements from each point setX i , so we have Pr(E) ≤ k pk n · (npk)!· (2π) npk · 1 Γ(2npk + 1) · (α 1 p √ nk) 2npk =⇒ log Pr(E) ≤ log k! (pk)!· (k−pk)! + log(2π)pk + 2pk· log(α 1 ) + 2pk· log(p) +pk logn +pk logk n + log Γ(npk + 1)− log Γ(2npk + 1) = log k! (pk)!· (k−pk)! + log(2π)pk + 2pk log(α 1 ) +pk log(p)− 2 log 2·pk +pk n (6.1) − 1 2 log 2 +O(1/n). where we have applied Lemma 3. We see that (6.1)→−∞ if and only if the coefficient of n is negative: 0 > log k! (pk)!· (k−pk)! + log(2π)pk + 2pk log(α 1 ) +pk log(p)− 2 log 2·pk +pk m 2 log(α 1 ) < 1 pk − log k! (pk)!· (k−pk)! − log(2π)pk−pk log(p) + 2 log 2·pk−pk ,f(k,p) (6.2) From (6.2), we can see that it suffices to find proper α 1 that satisfies 2 log(α 1 )< (f(k,p)) min ≤f(k,p). (6.3) 45 In order to find bounds for f(k,p), we state Stirling’s formula here [111]. For any positive integer n, we have √ 2π·n n+ 1 2 e −n ≤n!≤e·n n+ 1 2 e −n . Thus we have f(k,p) = 1 pk log (pk)!· (k−pk)! k! − logπ− logp + log 2− 1 = 1 pk [log(pk)! + log(k−pk)!− logk!]− logπ− logp + log 2− 1 ≥ 1 pk h log( √ 2π(kp) (kp+1/2) e −kp ) + log( √ 2π(k−kp) (k−kp+1/2) e −(k−kp) )− log(log(e·k (k+1/2) e −k ) i − logπ− logp + log 2− 1 = 1 pk log(2π) + 1 2 log(kp) +k(1−p) log(1−p) + 1 2 log(1−p)− 1 − logπ + log 2− 1 ≥ 1 pk 1 2 log(kp) +k(1−p) log(1−p) + 1 2 log(1−p)− 1 − logπ + log 2− 1 = 1 2kp log(kp(1−p))− 1 kp + p 1−p log(1−p)− logπ + log 2− 1 (6.4) Now we verify a few straightforward inequalities, then will come back to (6.4). Since 1 k ≤p≤ k−1 k and k≥ 2, we have 1≤kp≤k− 1, then − 1 kp ≥−1 (6.5) and p(1−p)≥ 1 k × (1− 1 k ) = k− 1 k 2 . (6.6) Also, let h(p), p 1−p log(1−p),p∈ (0, 1), using the well-know inequality log(1−x)≤−x,x< 1, we have h 0 (p) = − log(1−p) p 2 − 1 p − 1 1−p ≥ − −p p 2 − 1 p − 1 1−p = 0. 46 Thus we have h(p) = p 1−p log(1−p)≥ lim p→0 + p 1−p log(1−p) =−1. (6.7) Incorporating the results of (6.5), (6.6) and (6.7) into (6.4), we conclude that f(k,p) ≥ 1 2kp · log(k· k− 1 k 2 )− 1− 1− logπ + log 2− 1 = 1 2kp · log( k− 1 k )− 1− 1− logπ + log 2− 1 ≥ 1 2 · log( k− 1 k )− 1− 1− logπ + log 2− 1 ≥ 1 2 · log( 2− 1 2 )− 1− 1− logπ + log 2− 1 = −3 + log 2 2 − log(π) = −3.79816. Looking at (6.3), we can now see that it suffices that α 1 satisfies the following 2 log(α 1 )<−3.79816≤f(k,p),∀k≥ 2, 1 k ≤p≤ k− 1 k , thus we can conclude any α 1 ≤ 0.1497 will suffice to show that (6.1) will go to−∞ as n→∞. This concludes the proof for the lower bound. Now we are going to prove the upper bound. Before we proceed, let me state a key fact of order statistics. Suppose we have n random variables Y 1 ,Y 2 ,...,Y n , and they are independently and identically distributed as U[0, 1]. Let Y (1) ,Y 2 ...,Y (n) be the order statistics associated with Y 1 ,Y 2 ,...,Y n , where Y (1) ≤ Y 2 ≤···≤ Y (n) , then EY (k) = k n + 1 ,∀1≤k≤n. We can now proceed to the proof of the upper bound, which has similar idea to the proof in [27]. Let’s call the unit squareR, the zigzagging path in Figure (6.2a)P, and the perturbed zigzagging path in Figure (6.2b)P 0 . Let m be an even integer and consider the pathP obtained by traversing the width ofR horizontally a total of m times, 47 starting at the upper leftmost corner ofR and moving downward by an amount 1 /m−1, as shown in Figure (6.2a); it is obvious that the length ofP is simplym + 2. Given a collection of point setsX 1 ,...,X n inR, we can perturb P to form a new pathP 0 that visits k×p point from each point setX i by simply inserting a pair of vertical line segments betweenP and the nearest k×p point (measured only in the vertical direction) in eachX i , as shown in Figure (6.2b). Of course, the vertical distances betweenP and all points in a particular point setX i follow uniform distribution Y 1 ,Y 2 ,...,Y n on [0, 1 2(m−1) ]. In order to determine the perturbed distance of the nearest k×p points to pathP, we consider order statisticsY (1) ≤Y (2) ≤···≤Y (n) associated withY 1 ,Y 2 ,...,Y 3 . Letd denote the sum of perturbed vertical distances for an individual point setX i , so we have Ed = EY (1) +EY (2) +··· +EY (kp) = 1 k + 1 + 2 k + 1 +··· + kp k + 1 · 1 2(m− 1) ≤ kp· kp k + 1 · 1 2(m− 1) = k 2 p 2 2(k + 1)(m− 1) Our objective is therefore to select a value m so as to minimize the total length ofP plus these additional n displacements, i.e. to minimize m + 2 + 2n·Ed ≤ m + 2 + 2n· k 2 p 2 2(k + 1)(m− 1) ≤ m + 2 +n· kp 2 m− 1 where we have an additional multiplier “2” forEd since each vertical displacement consists of an outbound and and inbound trip. As n becomes large, we see that the optimal m satisfies m ∗ ∼ p nkp 2 =p √ nk, 48 x x x Start End x x x (a) x x x Start End x x x (b) Figure 6.2: In the figures above, we demonstrate a traveling purchaser problem instance with n = 2,k = 6,p = 1 3 . In this instance, we are looking for a cycle that visits 2 points from each set. To be more specific, in (6.2a), we show the zigzagging path, which traverses the regionR horizontally a total of m = 8 times. In (6.2b), we show the perturbed path, where in each point set, we pick 2 points that are nearest to the zigzagging path, and perturbed the zigzagging path to connect them both inbound and outbound. which results in a total length that satisfies length(P 0 )≤ 2p √ nk which proves that EGTSP(X 1 ,...,X n )≤α 2 p √ nk. 49 Chapter 7 Numerical Experiments on GTSP In this chapter, we perform some numerical experiments on GTSP where k is fixed and let n goes large, on both uniform distribution and clustered case. 7.1 GTSP on generated data In the part, we artificially generate random points from a unit square, for both uniform distribution and clustered case. Square is selected as the clustering shape for simplicity, and we change the side length of the square in different runs to see if it makes a difference. Figure 7.1 refers to the case where every point is sampled form uniform distribution. Figure 7.2, 7.3, 7.4, 7.5 and 7.6 illustrate the cases where points are sampled from clusters of different sizes. One observation we can make from the uniform case (Figure 7.1) is that for different k, the ratios of GTSP over p n k are different, but for each k the curve is approaching a limit. We did some simulations with large n. When k = 2,n = 5000, the constant is 0.7997; when k = 3,n = 2000, the constant is equal to 0.8648; when k = 4,n = 2000, the constant is equal to 0.9248. The results match the trend of the graph in Figure 7.1. As for the clustered case (Figure 7.2, 7.3, 7.4, 7.5 and 7.6), we can observe that when the cluster is relatively large (Figure 7.6), the graph looks very much like the uniform case (Figure 7.1). This aligns with our intuition because our sampling strategy is to uniformly sample clusters first, and then uniformly sample points inside each cluster, so when the cluster is larger and larger, it will be approaching the case where we sample every point uniformly 50 n 0 50 100 150 200 250 GTSP(X1,...,Xn) 0 1 2 3 4 5 6 7 8 9 Uniform Distribution k = 2 k = 3 k = 4 n 0 50 100 150 200 250 GTSP/ p n/k 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05 1.1 Uniform Distribution GTSP/ p n/2 GTSP/ p n/3 GTSP/ p n/4 Figure 7.1: All points are sampled from a uniform distribution in a unit square. There are n sets with k points in each set. inside the unit square. When the cluster is really small, e.g., Figure 7.2, we would expect that for different k, the plots should be similar because the problem will downgrade to TSP, at least when n is not too large (We see from Figure 7.6 that the plots overlap for k = 2,k = 3,k = 4 and n≤ 300). Figure 7.3, 7.4, 7.5 show how the GTSP tour behave when the size of the cluster is growing. 7.2 GTSP on real data In this part, instead of generating random points in a unit square, we use points inside real cities to calculate non- clustered and clustered GTSP tour. The source of the data points are centers of census tracts in selected cities in 2010 Census [128]. The distance is Euclidean distance on earth, i.e., using the Haversine formula [112] to calculate the great-cycle distances between latitude/longitude points. The clustering shape we choose is a ball. We can see from Figure 7.7, 7.8, 7.9, 7.10, 7.11, 7.12, 7.13, 7.14, that generalized TSP works well in the real map, in the sense that the ratio of the length of the non-clustered tour over clustered tour is around a constant when the size of the clustering is not too small, which matches our observation in Section 7.1. 51 n 0 50 100 150 200 250 300 GTSP(X1,...,Xn) 0 2 4 6 8 10 12 14 r = 0.005 k = 2 k = 3 k = 4 n 0 50 100 150 200 250 300 GTSP/ p n/k 0.8 1 1.2 1.4 1.6 1.8 2 2.2 r = 0.005 GTSP/ p n/2 GTSP/ p n/3 GTSP/ p n/4 Figure 7.2: In a unit square, we uniformly sample n clusters and in each cluster we uniformly sample k points. Here the cluster is a small square with side length 0.005. n 0 50 100 150 200 250 300 350 400 GTSP(X1,...,Xn) 0 2 4 6 8 10 12 r = 0.02 k = 2 k = 3 k = 4 n 0 50 100 150 200 250 300 350 400 GTSP/ p n/k 0.8 1 1.2 1.4 1.6 1.8 2 2.2 r = 0.02 GTSP/ p n/2 GTSP/ p n/3 GTSP/ p n/4 Figure 7.3: In a unit square, we uniformly samplen clusters and in each cluster we uniformly samplek points. Here the cluster is a small square with side length 0.02. 52 n 0 50 100 150 200 250 300 350 GTSP(X1,...,Xn) 1 2 3 4 5 6 7 8 9 r = 0.05 k = 2 k = 3 k = 4 n 0 50 100 150 200 250 300 350 GTSP/ p n/k 0.8 1 1.2 1.4 1.6 1.8 2 r = 0.05 GTSP/ p n/2 GTSP/ p n/3 GTSP/ p n/4 Figure 7.4: In a unit square, we uniformly samplen clusters and in each cluster we uniformly samplek points. Here the cluster is a small square with side length 0.05. n 0 50 100 150 200 GTSP(X1,...,Xn) 1 2 3 4 5 6 7 r = 0.2 k = 2 k = 3 k = 4 n 0 50 100 150 200 GTSP/ p n/k 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 r = 0.2 GTSP/ p n/2 GTSP/ p n/3 GTSP/ p n/4 Figure 7.5: In a unit square, we uniformly samplen clusters and in each cluster we uniformly samplek points. Here the cluster is a small square with side length 0.2. 53 n 0 50 100 150 200 250 300 GTSP(X1,...,Xn) 0 1 2 3 4 5 6 7 8 r = 0.5 k = 2 k = 3 k = 4 n 0 50 100 150 200 250 300 GTSP/ p n/k 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 r = 0.5 GTSP/ p n/2 GTSP/ p n/3 GTSP/ p n/4 Figure 7.6: In a unit square, we uniformly samplen clusters and in each cluster we uniformly samplek points. Here the cluster is a small square with side length 0.5. (a) n 0 10 20 30 40 50 60 70 80 90 100 non−clustered clustered 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 k=2 k=3 k=4 (b) Figure 7.7: Figure 7.7a shows 1,694 random samples in Eureka, CA. Figure 7.7b shows the ratio between the length of non-clustered tour and clustered tour, where the radius of the cluster is 0.7 mile, the number of points in each set or cluster is equal to 2, 3, 4, i.e., k = 2, 3, 4, and the number of sets or clusters n is from 4 to 100. 54 (a) n 0 10 20 30 40 50 60 70 80 90 100 non−clustered clustered 0 0.5 1 1.5 2 2.5 3 k=2 k=3 k=4 (b) Figure 7.8: Figure 7.8a shows 1,846 random samples in Glendale, CA. Figure 7.8b shows the ratio between the length of non-clustered tour and clustered tour, where the radius of the cluster is 1.0 mile, the number of points in each set or cluster is equal to 2, 3, 4, i.e., k = 2, 3, 4, and the number of sets or clusters n is from 4 to 100. (a) n 0 10 20 30 40 50 60 70 80 90 100 non−clustered clustered 0.4 0.6 0.8 1 1.2 1.4 1.6 k=2 k=3 k=4 (b) Figure 7.9: Figure 7.9a shows 3,342 random samples in Modesto, CA. Figure 7.9b shows the ratio between the length of non-clustered tour and clustered tour, where the radius of the cluster is 1.0 mile, the number of points in each set or cluster is equal to 2, 3, 4, i.e., k = 2, 3, 4, and the number of sets or clusters n is from 4 to 100. 55 (a) n 0 10 20 30 40 50 60 70 80 90 100 non−clustered clustered 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 k=2 k=3 k=4 (b) Figure 7.10: Figure 7.10a shows 963 random samples in Palm Desert, CA. Figure 7.10b shows the ratio between the length of non-clustered tour and clustered tour, where the radius of the cluster is 1.0 mile, the number of points in each set or cluster is equal to 2, 3, 4, i.e., k = 2, 3, 4, and the number of sets or clusters n is from 4 to 100. (a) n 0 10 20 30 40 50 60 70 80 90 100 non−clustered clustered 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 k=2 k=3 k=4 (b) Figure 7.11: Figure 7.11a shows 1,883 random samples in Pasadena, CA. Figure 7.11b shows the ratio between the length of non-clustered tour and clustered tour, where the radius of the cluster is 1.0 mile, the number of points in each set or cluster is equal to 2, 3, 4, i.e., k = 2, 3, 4, and the number of sets or clusters n is from 4 to 100. 56 (a) n 0 10 20 30 40 50 60 70 80 90 100 non−clustered clustered 0.5 1 1.5 2 2.5 3 k=2 k=3 k=4 (b) Figure 7.12: Figure 7.12a shows 2,849 random samples in Redding, CA. Figure 7.12b shows the ratio between the length of non-clustered tour and clustered tour, where the radius of the cluster is 4.0 mile, the number of points in each set or cluster is equal to 2, 3, 4, i.e., k = 2, 3, 4, and the number of sets or clusters n is from 4 to 100. (a) n 0 10 20 30 40 50 60 70 80 90 100 non−clustered clustered 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 k=2 k=3 k=4 (b) Figure 7.13: Figure 7.13a shows 1,263 random samples in Redwood City, CA. Figure 7.13b shows the ratio between the length of non-clustered tour and clustered tour, where the radius of the cluster is 1.0 mile, the number of points in each set or cluster is equal to 2, 3, 4, i.e., k = 2, 3, 4, and the number of sets or clusters n is from 4 to 100. 57 (a) n 0 10 20 30 40 50 60 70 80 90 100 non−clustered clustered 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 k=2 k=3 k=4 (b) Figure 7.14: Figure 7.14a shows 1,856 random samples in Sunnyvale, CA. Figure 7.14b shows the ratio between the length of non-clustered tour and clustered tour, where the radius of the cluster is 1.0 mile, the number of points in each set or cluster is equal to 2, 3, 4, i.e., k = 2, 3, 4, and the number of sets or clusters n is from 4 to 100. 58 Chapter 8 Application on Warehouse Random Stow Strategy As briefly discussed in Chapter 1, GTSP model can be used in warehouse random stow strategy. Essentially, warehousing process can be divided into different phases, i.e., receiving, storage, order-picking and shipping [96]. Our main focus has been lying on the order-picking phase. In this chapter, we will carefully study how we can apply GTSP model to the order-picking strategy, and the main comparison of our strategy is Amazon’s Kiva system [71]. Wewillfirstgiveabriefliteraturereviewofthetraditionalwarehousestoragestrategy, thenexplaintheAmazon’s Kiva system, and finally propose our GTSP warehouse strategy. In the end, we will try to approximate the performance of the Kiva system and GTSP system, and compare them with each other according to our numerical simulation. 8.1 Relevant Literature on Warehouse Design Order picking refers to the retrieval of goods from their storage locations on the basis of customer orders, and it is one of the most important activities in a warehouse. This process is also the most laborious and time-consuming process of all warehousing processes [96]. There have been many studies on warehouse location design, as well as finding optimal routes for pickers. For example, Koster et al [70] proposed a heuristic algorithm for finding the minimum length of the order-picking tour, under three realistic order picking systems - narrow-aisle high-bay pallet warehouse, picking in shelf area with decentralized depositing of picked items, and conventional order picking from wide-aisle pallet locations. The newly presented algorithm was compared with S-shape heuristic that has been 59 used widely in practice, and it turned out that the new heuristic can reduce travel time in all three situations. In another paper, a travel time model with general item location assignment in a rectangular warehouse system was presented by Chew et al [28]. They applied the model to analyze order batching and storage allocation strategies in an order picking system. Roodbergen and De Koster [113] considered a parallel aisle warehouse, where order pickers can change aisles at the end of every aisle and also at a cross aisle halfway along the aisles, and devised a dedicated algorithm that can find shortest path in this type of warehouse. Their research showed that the average order picking time can be decreased significantly by adding the middle aisles. Jewkes et al [61] were interested in the concurrent problems of: (1) product location, (2) picker home base location, and (3) allocating products to each picker so that the expected order cycle time is minimized. The problem was tackled by dynamic programming. Hsieh and Tsai [58] studied effects on the order picking system performance for factors such as quantity and layout type of cross aisles in a warehouse system, storage assignment policy, picking route, average picking density inside an aisle, and order combination type, etc. They used eM-plant (a software) to perform the simulation, and developed warehouse design database which provides a reference for the industry. Önüt el al[96] formulated a mathematical model that determines a multi-level warehouse layout. They considered storage policy, picking process, and optimal number of docks all together, and developed a particle swarm optimization (PSO) algorithm to solve it. Up until now, all the literature that we covered has been focusing on how to find optimal order picking routes so that order fulfillment efficiency can be greatly boosted. It is also worth pointing out that the performance of the order picking system is largely dependent on the demand distribution, warehouse layout, storage process, batching method, etc. We focus on the literature of “storage policy” here. Traditionally, there are 3 types of storage policies - dedicated storage, randomized storage, and class-based storage, see Figure 8.1 for detailed explanation. There are plenty of articles on these storage policies as well. For example, Lee and Elsayed [80] formulated the warehouse storage capacity problem as a non-linear programming model to minimize the total cost of owned and leased storage space. Larson el al [77] presented a three-phase procedure for warehouse layout based on class-based storage policy - (1) determination of aisle layout and storage zone dimensions, (2) assignment of material to a storage medium, and (3) allocation of floor space. Le-Duc et al [78] investigated the problem of determining the optimal storage boundaries (zones) of classes in each aisle for manually operated warehouses. To minimize the average traveling 60 Figure 8.1: The picture is taken from Dr. Kay’s personal website http://www4.ncsu.edu/ kay/Warehousing.pdf. A, B and C in the pictures above denote three SKUs, and each picture shows the storage area for SKUs under different storage policies. The dedicated storage policy determines a particular predetermined location for each product to be stored. In randomized storage policy, each SKU can be stored in any (sometimes the closest) available slot. The class-based policy is a combination of dedicated and randomized storage, where each SKU is assigned to one of several different storage classes. distance, they presented a mathematical formulation for the storage zone optimization problem, and a heuristic was devised to solve this optimization problem. For randomized storage policy, there is not much to do on the layout optimization, but there have been many papers on travel time models in the setting of randomized storage [59, 99, 66, 126], and this thesis will be another of them. 8.2 Amazon’s Kiva System vs. GTSP Warehouse System Kiva Systems was a Massachusetts-based company that manufactures mobile robotic fulfillment systems, acquired by Amazon in 2012 [30], and was renamed as Amazon Robotics afterwards. In 2014, Amazon started to use the machine made by the company, to build a automated warehouse that can cut the operating expenses by about 20% [65]. In this chapter, we will use the word “Kiva” to represent the robot used in Amazon’s automated warehouse. In order to understand how a Kiva robot does the order picking, see Figure 8.2 for the detailed explanation of how the system works. The advantage of incorporating Kiva to replace some of the human labors is manifest. It can greatly boost the efficiency of order picking (humans do not need to go into the shelves area to manually pick SKUs for each order). In Section 8.3, we will give numerical analysis to estimate the performance of the Kiva 61 Figure 8.2: This is a sketch of the automated warehouse using Kiva, reproduced from [101]. There are three major components in a Kiva warehouse - (1) green squares, which denote vertical shelves containing different SKUs (one single shelf contains several different SKUs), (2) orange squares, which denote Kiva robots, and each Kiva is able to carry an entire shelf and move around, (3) Blue squares, which denote stations. When an order comes, or more specifically, a particular item is requested, a Kiva is assigned to carry a shelf which contains that item to the station, and human workers at the station will manually pick the the SKUs needed to fulfill the order, then the shelf will be carried back to the storage area (may not be the original location) by a Kiva (may not be the original Kiva that carried it to the station). system in terms of the average traveling distance of the robots, and compare it with the GTSP warehouse we will introduce later. As we can see, by the nature of warehouse randomized storage policy, GTSP can be directly applied to solve the routing problem of warehouse pickers. We simplify the Kiva system in Figure 8.2 by Figure 8.3, in which same SKUs have the same color and are randomly located at multiple places in the warehouse. When an order comes, a warehouse picker (most likely a human, to my best knowledge, currently there is no robot that is able to pick multiple SKUs from shelves in a single run) will go into the shelves and find all SKUs needed to fulfill the order. Again, we will estimate the average traveling distance of the human pickers in Section 8.3 62 Figure 8.3: In a GTSP warehouse, same SKUs can be placed at different places. In the figure above, all red rectangles refer to the same products, and same thing for the red and cyan rectangles. A warehouse runner may collect multiple different SKUs in a single run, so this random strategy could benefit the runner in terms of reducing the overall running distance in the long term. 8.3 Simulation on Kiva Warehouse and GTSP Warehouse 8.3.1 Experiment Setup In this section, our goal is to simulate the order fulfillment process in a Kiva warehouse and a GTSP warehouse, and compare the cost of these two types of warehouses. We will first go over the experiment settings, and then give numerical results under different circumstances. We will start from the setup of GTSP warehouse in the next paragraph. To simulate the GTSP warehouse, we assume there are n different SKUs stored in this warehouse, denoted by {1, 2,...,n}, and let #Diff SKUs =n for future reference. Also, we assume the warehouse is a unit square, and the warehouse contains ak byk grid that represents shelves, and let #Shelves =k×k for future reference. In addition, we have a variable #Diff SKUs/shelf, i.e., number of different SKUs in each shelf. Specifically, for each shelf, we randomly generate a subset of{1, 2,..., #Diff SKUs} with fixed cardinality (i.e., #Diff SKUs/shelf) associated with the shelf, which means a particular shelf can provide these kinds of SKUs. Also, we have one single station that is located somewhere in the unit square, e.g., at the origin, at the center of the square, or at the middle point of a side. 63 Station 1 2 3 4 1 6 8 10 2 6 7 9 3 4 6 8 5 7 9 10 1 4 5 10 4 5 7 9 6 7 9 10 Figure 8.4: In this warehouse that is using GTSP system, we have n = 10 different SKUs labeled from 1 to 10 and 8 shelves, and each shelf contains 4 kinds of different SKUs. Let’s say now we need to fulfill an order that contains {1, 4, 7, 9, 10}, a warehouse picker leaves from the station, visit two shelves, and then comes back to the station to fulfill this order. At the fulfillment phase, we randomly generate an order which contains a subset of the{1, 2,..., #Diff SKUs}, and use the GLKH solver [54] to find the shortest cycle that starts and ends at the station to fulfill the order. See Figure 8.4 and Algorithm 1 for detailed steps, and note that we have 4 parameters in our simulation - Station Location, #Diff SKUs, #Shelves, #Diff SKUs/shelf. Some assumptions are made to make this problem tractable - (1) all the shelves are single points in a unit square, we ignore shapes; (2) each shelf contains infinite number of the SKUs; (3) a warehouse runner has infinite capacity, i.e., in a single run, he can pick arbitrary amount of items; (4) distances are measured by two-dimensional Euclidean distance. To simulate the Kiva warehouse, we have the same setting with the GTSP warehouse, as described above, except we use Kiva robots to fulfill orders. Apart from the four assumptions above, we have some more specific for Kiva 64 Algorithm 1 GTSP Warehouse Simulation 1: procedure GTSP-Warehouse-Simulation(Station Location, #Diff SKUs, #Shelves, #Diff SKUs/shelf) 2: 3: Generate #Shelves shelves (a k× k grid where k = √ #Shelves), for each shelf, randomly generate #Diff SKUs/shelf numbers in the set{1, 2,..., #Shelves} . need to make sure each SKU is appeared in at least one shelf 4: Let order_count = 0 5: while order_count<order_times do 6: Randomly generate a subset of{1, 2,..., #Shelves}, say{i 1 ,i 2 ,...,i n } . this is a randomly generated order 7: Find all shelves that contain at least one SKU in the set{i 1 ,i 2 ,...,i n }, calculate distance matrix based on 2D-Euclidean distance 8: Solve the GTSP instance based on 2D-Euclidean distance matrix . this is a GTSP instance where each SKU appearing in the order will form a set, and the elements belonging to that set are those shelves that contain this particular SKU 9: order_count =order_count + 1 10: return the average traveling distance system. Compared with a real Kiva system described in Figure 8.2, we assume that each shelf being carried to the work station will be carried back to the same position afterwards; for each round trip of a shelf, we only care about the distance of the round trip and disregard the additional trip that Kiva traveled to get the shelf. In order to carry out the simulation for Kiva, we follow Algorithm 2. It’s very similar to the GTSP simulation as described in Algorithm 1, except we manipulate the distance matrix of the Kiva system (see Figure 8.5) so that we can take advantage of the GLKH solver. Algorithm 2 Kiva Warehouse Simulation 1: procedure Kiva-Warehouse-Simulation(Station Location, #Diff SKUs, #Shelves , #Diff SKUs/shelf) 2: 3: Generate #Shelves shelves (a k× k grid where k = √ #Shelves), for each shelf, randomly generate #Diff SKUs/shelf numbers in the set{1, 2,..., #Shelves} . need to make sure each SKU is appeared in at least one shelf 4: Let order_count = 0 5: while order_count<order_times do 6: Randomly generate a subset of{1, 2,..., #Shelves}, say{i 1 ,i 2 ,...,i n } . this is a randomly generated order 7: Find all shelves that contain at least one SKU in the set{i 1 ,i 2 ,...,i n } 8: Transform the problem and calculate the distance matrix in the way indicated in Figure 8.5b 9: Solve the GTSP instance based on the distance matrix calculated in the previous step . this is a GTSP instance where each SKU appearing in the order will form a set, and the elements belonging to that set are those shelves that contain this particular SKU 10: order_count =order_count + 1 11: return the average traveling distance 65 Station 1 2 3 4 1 6 8 10 2 6 7 9 3 4 6 8 5 7 9 10 1 4 5 10 4 5 7 9 6 7 9 10 (a) Station 1 2 3 4 1 6 8 10 2 6 7 9 3 4 6 8 5 7 9 10 1 4 5 10 4 5 7 9 6 7 9 10 distance = a distance = b distance = a + b (b) Figure 8.5: First, let’s look at Figure 8.5a. In this warehouse that is using Kiva system, we have n = 10 different SKUs labeled from 1 to 10 and 8 shelves, and each shelf contains 4 kinds of different SKUs. Let’s say now we need to fulfill an order that contains{1, 4, 7, 9, 10}, then Kiva robots will need to visit some nearest shelves (distance from station to shelves) that contains all of these SKUs, in the fashion of Figure 8.5a. Note that Kiva carries an entire shelf to the station one at a time and then put it back. However, we can still solve this problem as a GTSP instance by manipulating the distance matrix. For all SKUs in a particular shelf, let them be points with the same coordinates as the shelf itself and the distance between each two points be 0, e.g., for{1, 4, 5, 10} in the same shelf, we treat them as 4 points having the same coordinates as the shelf containing them, and the distance between these 4 points are 0. For each pair of points from two different shelves, the distance between them is the distance from the station to one point plus the distance from the station to another point, as indicated in Figure 8.5b. Any distance to or from the station is just Euclidean distance. 66 8.3.2 Numerical Results In this section, we compare the traveling distance of GTSP warehouse and Kiva warehouse, according to Algorithm 1 and 2. Based on the experiment setup in section 8.3.1, we vary the parameters - Station Location, #Diff SKUs, #Shelves , #Diff SKUs/shelf, and have the numerical results as follows in Table 8.1. Note that for each scenario, we generate 50 orders, and take the average to compute the average cost. As we look carefully at Table 8.1, we divide the table into three parts by changing #Shelves (marked by double- line), because this number decides the fulfillment power of the warehouse in our case as we implicitly assumed warehouse runner in GTSP system and Kiva in the Kiva system have unlimited picking power when orders come in, which means that if we can have more shelves, warehouse runner or Kiva would have more choices to pick up a particular item, and this would potentially help decrease the traveling cost. We set #Shelves to be 64, 81 and 100. Within each part, we have the same number of shelves, and we experiment based on different Station Location, #Diff SKUs and #Diff SKUs/shelf. Here some observations we can make from Table 8.1. 1. Most obviously and most importantly, GTSP warehouse can be much more efficient than Kiva warehouse. In all scenarios, average cost of GTSP is significantly smaller than Kiva. To be more specific, GTSP is at least two times more efficient than Kiva, sometimes even seven times, depending on different warehouse setups. On average, Kiva system travels four times of GTSP warehouse (take the average of all scenarios). If someone claims that Kiva system is superior in terms of operating cost, it has to be four times cheaper than human pickers. If we take the factor of potential collision problem into consideration, this number can be even bigger. 2. In general, as we look at the data in three different parts of the table (i.e., different #Shelves), the efficiency increases as #Shelves becomes large, for both GTSP and Kiva. This aligns with our expectation, because as we mentioned above, #Shelves decides the fulfillment power based on our assumption. 3. Asfor threedifferent stationlocations, we can see that the center of the warehouse (with coordinates (0.5, 0.5)) is the best location. This makes sense because the station itself will form a set of a single point in the GTSP 67 instance when we call the solver, so being at the center of the unit square makes this point “nearer” to the rest of the destinations. 4. For the rest of the warehouse parameters, i.e., #Different SKUs and #Diff SKUs/shelf, their effects on the warehouse are naturally expected. For both GTSP and Kiva warehouses, larger number of different SKUs will result in larger average cost because the order coming in can contain larger number of different SKUs, while larger #Diff SKUs/shelf will help decrease average traveling distance because in this case there are more different SKUs available when visiting each shelf. We have to admit that the comparison between GTSP and Kiva is a little bit unrealistic because of all the assumptions we made to simplify the problem. However, by the essence of structure of GTSP warehouse system, it obviously has its own advantage over Kiva system because a warehouse picker can collect multiple items in a single run. Currently, for GTSP warehouse, we need human pickers (such labor can be very cheap in some developing countries) to collect items since there is no reliable robot that can perform this task. In the future, when such robot becomes available, GTSP warehouse will be a more promising way to increase order fulfillment efficiency. 68 Table8.1: WarehouseFulfillmentSimulationResultsUsingGTSPandKivaSystems(50randomordersperscenario) (0, 0) 60 64 5 2.436 14.188 (0, 0) 60 64 10 1.697 6.329 (0, 0) 80 64 5 3.118 19.237 (0, 0) 80 64 10 2.144 9.082 (0, 0) 100 64 5 3.474 26.671 (0, 0) 100 64 10 2.577 12.589 (0.5, 0.5) 60 64 5 2.069 5.478 (0.5, 0.5) 60 64 10 1.389 3.281 (0.5, 0.5) 80 64 5 2.927 9.286 (0.5, 0.5) 80 64 10 1.896 4.347 (0.5, 0.5) 100 64 5 3.273 14.471 (0.5, 0.5) 100 64 10 2.574 7.235 (0.5, 0) 60 64 5 2.211 9.323 (0.5, 0) 60 64 10 1.463 4.487 (0.5, 0) 80 64 5 2.866 13.735 (0.5, 0) 80 64 10 2.189 7.701 (0.5, 0) 100 64 5 3.427 20.713 (0.5, 0) 100 64 10 2.553 10.559 (0, 0) 60 81 5 2.071 11.944 (0, 0) 60 81 10 1.645 6.341 (0, 0) 80 81 5 2.786 16.436 (0, 0) 80 81 10 1.987 8.566 (0, 0) 100 81 5 3.262 23.892 (0, 0) 100 81 10 2.474 12.534 69 Station Location # Different SKUs # Shelves # Different SKUs/shelf GTSP Avg Cost Kiva Avg Cost (0.5, 0.5) 60 81 5 1.810 5.269 (0.5, 0.5) 60 81 10 1.203 2.713 (0.5, 0.5) 80 81 5 2.332 9.274 (0.5, 0.5) 80 81 10 1.611 4.057 (0.5, 0.5) 100 81 5 2.877 10.510 (0.5, 0.5) 100 81 10 1.967 5.600 (0.5, 0) 60 81 5 2.101 6.744 (0.5, 0) 60 81 10 1.411 3.914 (0.5, 0) 80 81 5 2.641 12.289 (0.5, 0) 80 81 10 1.832 5.578 (0.5, 0) 100 81 5 3.088 18.102 (0.5, 0) 100 81 10 2.115 9.255 (0, 0) 60 100 5 2.082 9.825 (0, 0) 60 100 10 1.495 4.530 (0, 0) 80 100 5 2.544 16.570 (0, 0) 80 100 10 1.963 7.464 (0, 0) 100 100 5 3.154 23.390 (0, 0) 100 100 10 2.234 10.586 (0.5, 0.5) 60 100 5 1.637 5.374 (0.5, 0.5) 60 100 10 1.109 2.162 (0.5, 0.5) 80 100 5 2.093 7.362 (0.5, 0.5) 80 100 10 1.402 3.887 (0.5, 0.5) 100 100 5 2.876 10.102 (0.5, 0.5) 100 100 10 2.038 6.004 (0.5, 0) 60 100 5 2.016 6.636 70 Station Location # Different SKUs # Shelves # Different SKUs/shelf GTSP Avg Cost Kiva Avg Cost (0.5, 0) 60 100 10 1.116 3.392 (0.5, 0) 80 100 5 2.330 11.21 (0.5, 0) 80 100 10 1.678 5.513 (0.5, 0) 100 100 5 2.844 18.359 (0.5, 0) 100 100 10 2.123 6.920 71 Chapter 9 Application on Trip Chaining Models In this chapter, we will study how the adoption of delivery service will benefit the society by reducing carbon footprint. We will build a few models with increasing complexity first, and then conduct a numerical analysis, followed by a model correction and revised simulation. GTSP plays an important role in our models because it helps capture the household-level trip chaining behavior which as far as we know, most of the previous articles failed to consider. 9.1 Models This section considers several models of increasing complexity. In each model, we assume that each person has n “tasks” that they must complete each day (such as going to work, the grocer, and so forth), and each of these tasks can be performed at k different locations. Thus, it would be sensible to postulate that the distance travelled by each person in the region would be given by the expression GTSP(X 1 ,...,X n ), where the point setsX j each denote the locations at which these tasks can be performed. By the analysis in Remark 16, we would approximate this with the expressionα p n/k n/(n−1) . However, some potential error may result because this expression does not take into account the additional distance incurred by leaving and returning to each person’s house, which we assume is distributed uniformly at random in the region. Ideally, we would like to approximate this with the expression GTSP({x i },X 1 ,...,X n ), where x i denotes person i’s home location, and we will do so in Section 9.3. For now, we simply remark that this expression is somewhat unwieldy because it involves computing a generalized TSP tour of 72 sets of different magnitudes and thus involves a comparison of “inbound-and-outbound” costs of starting and ending a trip as well as the “peddling” cost of moving between destinations on this trip (see [72] for a detailed overview). Since the purpose of this study is to examine the benefits of multi-stop trips at the household level, we instead opt to approximate each person’s distance travelled (from their house to their n destinations) as α p n/k n/(n−1) as justified in Remark 16. 9.1.1 A simple example: luddites and shut-ins The scenario we describe in this section is too simple to be of practical use, but is helpful as a “minimum working example” that explains what factors affect the carbon footprint in a region most significantly. Suppose that our city is a square regionR of area 1 and has a population N. Each person has n locations to visit daily (n− 1 errands plus their home) and each errand has k possible locations where that errand can be performed (e.g. there arek grocery stores andk banks in the region). Each of theN people in the region corresponds to a point sampled independently and uniformly at random inR, and each person belongs to one of two classes, either “luddites” or “shut-ins”, distinguished as follows: • A luddite performs all of their tasks by themselves and drives to each of the n locations. • A shut-in shops for everything online and remains stationary while packages are delivered to them. Let the fraction of shut-ins in the city bep, implying thatpN people are to be served by a delivery truck. This truck performs a travelling salesman tour of pN points, whose length is approximately β √ pN (for large N) by Lemma 5, with β≈ 0.7124. Therefore, the total carbon footprint due to these shut-ins is φβ √ pN, where φ represents the amount of emissions produced per mile driven by a delivery truck. A luddite visits n places each day (their house, plus their n− 1 tasks) and has k choices for each place to visit. From Theorem 15 and Remark 16, as well as the introductory paragraph to this section, we adopt the expression α p n/k n/(n−1) to model the distance traversed by each luddite, where α = 0.29 is the proportionality constant of Remark 16. There are (1−p)N such people, and their total carbon footprint is therefore ψ(1−p)Nα p n/k n/(n−1) , 73 whereψ represents the amount of emissions produced per mile driven by a passenger car. The total carbon footprint of the region, regarded as a function of p, is then given by f(p) :=φβ p pN +ψ(1−p)Nα q n/k n/(n−1) , (9.1) which is concave in p. Note that when p = 0 (i.e. there are no shut-ins and everyone does their own driving), the carbon footprint is ψNα p n/k n/(n−1) . We also note that f(p)| p= φ 2 β 2 k n/(n−1) ψ 2 α 2 nN = f(p)| p=0 =ψNα q n/k n/(n−1) , which (coupled with the concavity off(·)) tells us that we must havep≥ φ 2 β 2 k n/(n−1) ψ 2 α 2 nN =:p 0 in order for the carbon footprint to be reduced as a result of using delivery services. It is also worth pointing out that f(·) is maximized when p =p 0 /4, and attains a maximum value of φβ p pN +ψ(1−p)Nα q n/k n/(n−1) p=p0/4 = φ 2 β 2 √ k n/(n−1) + 4ψ 2 α 2 nN/ √ k n/(n−1) 4ψα √ n . 9.1.2 Marginal costs The preceding model describes an extreme case in which each person inR either makes no use whatsoever of delivery services or uses delivery services exclusively. One middle ground that is also worth studying is the case where people belong to two classes as before, but the two classes differ by only one task: • A luddite performs all of their tasks by themselves and drives to each of the n locations (this is the same as in the preceding model). • An early adopter visits n− 1 locations and uses a delivery service for the remaining task. A model of this kind is useful when one wants to understand the benefits of implementing a new delivery service for a specific good, such as groceries; one example of this can be found in [129], which discusses the consequences of introducing grocery delivery services in Seattle, Washington. A more nationalized phenomenon would be the 74 recent introduction of “last-mile” services such as Google Shopping Express [26], which offers same-day deliveries facilitated by a specialized fleet of vehicles. If we let p denote the fraction of early adopters inR, we then see that the total carbon footprint in the region is given by f(p) : = φβ p pN +ψpNα q (n− 1)/k n/(n−1) | {z } early adopters +ψ(1−p)Nα q n/k n/(n−1) | {z } luddites = φβ p pN + ψNα √ k n/(n−1) √ n−p( √ n− √ n− 1) ≈ φβ p pN + ψNα √ k n/(n−1) √ n− p 2 √ n where all terms (except for p) are the same as in (9.1), and we have used the series approximation √ n− √ n− 1 = 1 /2 √ n +O(n −3/2 ) in the last line. In the same manner as in the preceding section, we note that when p = 0, the carbon footprint is ψNα p n/k n/(n−1) , and also that f(p)| p= 4φ 2 β 2 nk n/(n−1) ψ 2 α 2 N = f(p)| p=0 =ψNα q n/k n/(n−1) , which tells us that we must have p≥ 4φ 2 β 2 nk n/(n−1) ψ 2 α 2 N in order for the carbon footprint to be reduced as a result of using delivery services. Note that this threshold is greater than that of the preceding section (which had a threshold of φ 2 β 2 k n/(n−1) ψ 2 α 2 nN ) by a factor of 4n 2 ; this is simply a mathematical manifestation of the intuition that larger values of n (i.e. more trip chaining at the household level) lead to significantly greater economies of scale at the household level. This in turn implies that delivery services must be adopted at a larger rate in order for the carbon footprint to decrease. 9.1.3 Multiple delivery services The model in Section 9.1.1 assumes that a single delivery truck serves all of the pN luddites. It is not hard to model the case where there are multiple such services; the main difference is that there is a loss in efficiency because competing delivery services do not consolidate their routes together, thus reducing the benefits of economies of scale. Suppose that there are m delivery services in the region, and that service i delivers goods to a fraction δ i 75 of the shut-ins in the region, visiting δ i pN customers in total (this model assumes that each shut-in is uniquely associated with one delivery service). Therefore, applying Lemma 5, we see that the work done by service i can be approximated asβ √ δ i pN, and therefore the total carbon footprint due to shut-ins isφβ √ pN P m i=1 √ δ i . Obviously, since P m i=1 √ δ i ≥ 1 always holds, we see that the carbon footprint within the region will only increase as a result of employing multiple delivery services (provided that these services do not cooperate to share their loads efficiently). The total carbon footprint of the region is then given by f(p) :=φβ p pN m X i=1 p δ i +ψ(1−p)Nα q n/k n/(n−1) = ¯ φβ p pN +ψ(1−p)Nα q n/k n/(n−1) , where we define ¯ φ :=φ P m i=1 √ δ i , which reduces to the same problem as (9.1). 9.1.4 A probabilistic model The model in this section improves on that of 9.1.1 by modelling customer behavior in a smoother way than the luddite/shut-in dichotomy. Rather than assigning a set fraction of the population to be shut-ins, we assume that each customer uses a delivery service to do each of their n daily tasks with probability 1−q. Thus, the number of locations that the person actually visits is a binomial random variable, X, with parameters n and q, and the expected amount of driving for that person is E(α p X/k n/(n−1) ) = α √ k n/(n−1) E( √ X). If a person chooses to perform a task online, then a delivery truck will visit their house. Note that the only circumstance under which a delivery truck does not visit their house is if that person chooses to complete all n activities by driving to n different locations. Thus, the probability that a person is visited by a delivery truck is given by p := 1−q n , and we see that the number of houses that the truck visits is a binomial random variable Y with parameters N and p, and the expected distance that the delivery truck travels is βE( √ Y ). The total carbon footprint in the region is therefore f(p) :=φβE( √ Y ) +ψ αN √ k n/(n−1) E( √ X), (9.2) where X∼B(n,q) and Y ∼B(N,p). In order to simplify the above expression, the following lemma is useful: 76 Lemma 20. Let X∼ B(n,p) be a binomially distributed random variable. Then as p→ 0 with n fixed, we have E( √ X)∼np, and as p→ 1 with n fixed, we have E( √ X)∼ √ np. Proof. If X∼B(n,p), then by definition we have E( √ X) = n X i=0 n i p i (1−p) n−i √ i = n X i=1 n i p i (1−p) n−i √ i d dp E( √ X) = n X i=1 n i i p − n−i 1−p p i (1−p) n−i √ i = n 1− p(n− 1) 1−p (1−p) n−1 + n X i=2 n i i p − n−i 1−p p i (1−p) n−i √ i d dp E( √ X) p=0 = n since the differential terms fori≥ 2 are all equal to 0. Thus, the approximationE( √ X)≈np is nothing more than a first-order approximation evaluated at p = 0. To prove that E( √ X)∼ √ np as p→ 1, we observe that the series expansion for √ x about the point x = 1 is given by √ x = 1 + x− 1 2 − (x− 1) 2 8 +O(x 3 ) which says that E( √ X)≈ 1− Var(X) 8 for any random variable such that E(X) = 1, or equivalently, E( √ X)≈ p E(X) 1− Var(X) 8 = √ np 1− (1−p) 8np ∼ √ np as p→ 1 for any binomial random variable X, which completes the proof. 77 The preceding lemma allows us to analyze the limiting behavior of the total carbon footprint with respect to p. For values of p near 0 (which implies that q is close to 1, i.e. very little delivery is used), we can approximate (9.2) by f(p)≈φβNp +ψαN r nq k n/(n−1) =N " φβp +ψα r n(1−p) 1/n k n/(n−1) # . Note that according to this approximation, we have df dp p=0 ≈N φβ− ψα 2 √ nk n/(n−1) , which we expect to be positive sinceφβ andψα ought to be approximately the same order of magnitude. This tells that initially, as people inR begin to make more use of delivery services, the total carbon footprint in the region increases since using trucks for a delivering products to a small number of locations is not efficient. On the contrary, for values of p near 1 (which implies thatq is close to 0, i.e. delivery system is used for almost everything), we can approximate (9.2) by f(p)≈φβ p Np + ψαNnq √ k n/(n−1) . Then according to this approximation, we have df dp ≈ φβ √ N 2 √ p − ψαN (1−p) (n−1)/n √ k n/(n−1) , which goes to−∞ as p→ 1. This means that if everybody inR uses delivery services, the total carbon footprint in the region decreases rapidly since trucks can serve large number of locations efficiently. 9.2 A numerical example In this section, we give a simple example of an instance of the model in Section 9.1.2 using estimates of the relevant input parameters. This model seems to be the most timely, as evidenced by the prevalence of “last-mile” delivery 78 Parameter Estimate Source φ 1303 grams CO 2 mile [69] ψ 350 grams CO 2 mile [6] α 0.29 Numerical simulations based on [74] β 0.7124 [10] (a) Parameter estimates and their sources. Table 9.1: Input parameter estimates for our numerical example. Figure 9.1: A plot of the total carbon footprint, f(p), for four different cities, for the model described in model in Section 9.1.2. Here we assume that n = 6. The value f(0) simply represents the total emissions when no delivery services are used, and therefore when f(p)/f(0)< 1 we find that delivery services result in a net improvement. services such as Google Shopping Express, Amazon Prime, Instacart, and Walmart To Go [20, 26, 51, 102]. Table 9.1 shows our estimates for parameters φ, ψ, α, and β, which we do not expect (for the most part) to vary on the region being served. In order to estimate k for various regions, we used census data obtained from [1] that gives the number of grocery stores in various metropolitan regions; these numbers (as well as N, the populations of these regions) are shown in Table 9.2. Figure 9.1 shows plots of the total emissions, f(p) normalized by f(0), for four metropolitan areas. From Table 9.2, we see that the critical thresholds p ∗ are quite high, and in many instances, the household-level economies of scale are high enough that even a 100% usage of delivery services is less efficient than leaving drivers to their own devices (this corresponds to the entries in the table that are marked “> 1”). The alternative analysis in the next section is somewhat more optimistic. 79 p ∗ = 4φ 2 β 2 nk n/(n−1) ψ 2 α 2 N Region k N n = 3 n = 4 n = 5 n = 6 n = 7 Los Angeles-Long Beach-Anaheim, CA Metro Area 3358 13052921 > 1 > 1 > 1 > 1 > 1 Chicago-Naperville-Elgin, IL-IN-WI Metro Area 2889 9522434 > 1 > 1 > 1 > 1 > 1 Indianapolis-Carmel-Anderson, IN Metro Area 295 1928982 > 1 > 1 > 1 0.95 0.92 Salt Lake City, UT Metro Area 192 1123712 > 1 > 1 > 1 0.98 0.96 Tulsa, OK Metro Area 136 951880 > 1 0.98 0.81 0.76 0.75 Albuquerque, NM Metro Area 119 901700 > 1 0.86 0.72 0.68 0.68 El Paso, TX Metro Area 138 830735 > 1 > 1 0.95 0.89 0.88 McAllen-Edinburg-Mission, TX Metro Area 132 806552 > 1 > 1 0.92 0.87 0.86 Little Rock-North Little Rock-Conway, AR Metro Area 124 717666 > 1 > 1 0.96 0.90 0.90 Colorado Springs, CO Metro Area 83 668353 > 1 0.72 0.62 0.60 0.60 Boise City, ID Metro Area 73 637896 0.98 0.64 0.55 0.54 0.54 Provo-Orem, UT Metro Area 50 550845 0.64 0.44 0.40 0.39 0.40 Killeen-Temple, TX Metro Area 84 420375 > 1 > 1 > 1 0.97 0.97 Green Bay, WI Metro Area 43 311098 0.90 0.64 0.59 0.58 0.60 Clarksburg, WV Micro Area 25 94310 > 1 > 1 0.99 > 1 > 1 Elmira, NY Metro Area 24 88911 > 1 > 1 0.99 > 1 > 1 DuBois, PA Micro Area 22 81184 > 1 > 1 0.98 > 1 > 1 Table 9.2: The number of grocery stores k, the populations N, and the critical thresholds p ∗ at which emissions decrease due to adoption of delivery services. The first and second columns are obtained from [1]. Here the cells marked “> 1” indicate that, even at 100% adoption of delivery services, the carbon footprint of the region is still larger than the case where p = 0, i.e. no delivery services are used. 80 9.3 Incorporating inbound-and-outbound costs As described in the beginning of Section 9.1, the models we have described thus far have not paid special attention to the “inbound-and-outbound” costs associated with leaving and returning to one’s home. This section describes a result that is related to Theorems 8 and 15 in which we take a generalized TSP tour of n point sets,X 1 ,...,X n , in addition to a fixed point x 0 (which represents a person’s home). Note that the limiting behavior for fixed k and n→∞ is the same as in Theorems 8 and 15 because we are merely inserting one additional point, and therefore it will suffice to consider only the limiting behavior for the case where n is fixed and k→∞: Theorem 21. LetX 1 ,...,X n denote n sets of points, each having cardinality k, and suppose that all nk points are distributed independently and uniformly at random in a regionR having area A. Let x 0 be a point in the interior ofR. Then the expected length of a generalized TSP tour of{x 0 },X 1 ,...,X n satisfies EGTSP({x 0 },X 1 ,...,X n )∈O( p An/k· p logk) and EGTSP({x 0 },X 1 ,...,X n )∈ Ω p An/k as k→∞ with n fixed. Specifically, there exists a constant α 3 > 0.136 such that the following statements hold: 1. For any n, there exists a threshold k 0 such that EGTSP({x 0 },X 1 ,...,X n )≤α 1 p An/k· p logk whenever k≥k 0 , where α 1 = 2.7 is the constant from Lemma 4. 2. For any n, there exists a threshold k 0 such that EGTSP({x 0 },X 1 ,...,X n )≥α 3 r An k whenever k≥k 0 . 81 Proof. Assume as in the previous proofs thatR is the unit square. To prove Claim 1, consider a square 0 of area a := logk /2k centered at the point x 0 , and suppose that k is sufficiently large (i.e. that a is sufficiently small) that 0 is entirely contained withinR. Let p denote the probability that 0 contains at least one element from each point setX i , in which case we clearly have GTSP({x 0 },X 1 ,...,X n )≤ α 1 p a(n + 1) from Lemma 4. Since p = (1− (1−a) k ) n , we therefore see that as k→∞, we have EGTSP({x 0 },X 1 ,...,X n ) ≤ pα 1 p a(n + 1) + (1−p)α 1 √ n + 1 = α 1 2 r n + 1 k · " 1−k −k k− 1 2 logk k # n p 2 logk− 2 √ k + 2 √ k ! ∼ α 1 2 r n + 1 k · h 1−n p 1/k p 2 logk− 2 √ k + 2 √ k i = √ 2α 1 2 r n + 1 k · p logk≤α 1 p n/k· p logk as desired. To prove the second claim, we find it useful to revisit Lemma 14, and we again letL denote a lattice withinR of the form 0, 1 m , 2 m ,..., m− 1 m , 1 × 0, 1 m , 2 m ,..., m− 1 m , 1 . We first see that there are at mostk n total distinct subsets of the form{x 0 ,x 1 ,...,x n }, withx i ∈X i for eachi≥ 1. For each of these subsets, there are n! different orderings, and we therefore conclude that there are at most n!·k n valid paths that originate at x 0 and visit one point from each of the subsetsX i . By applying the union bound, we see that for any `, Pr(GTSP(x 0 ,X 1 ,...,X n )≤`) = Pr(one of the n!·k n valid paths has length ≤`) ≤ n!·k n Pr(length(P)≤`) whereP is the path obtained by selecting the first element from each setX i and visiting these elements in a sequence chosen uniformly at random. By construction, we see thatP is simply a path that is sampled uniformly at random 82 fromP, the collection of all possible paths originating at x 0 that visit n additional points taken from the lattice L. Of course, we can again see that|P| = m 2n . By scaling the lattice of Lemma 14 by a factor of 1/m in the vertical and horizontal directions, there are at most m` +n n · 8m` n n ≤ (m` +n) n n! · 8m` n n paths inP whose length is at most `. Thus, lim sup m→∞ Pr(length(P)≤`) ≤ lim sup m→∞ (m`+n) n n! · 8m` n n m 2n = ` 2n 8 n n −n n! =⇒ Pr(GTSP({x 0 },X 1 ,...,X n )≤`) ≤ k n ` 2n 8 n n −n for all `. We now set ` =c p n/k, with c = √ 6/12, obtaining EGTSP({x 0 },X 1 ,...,X n ) ≥ 1−k n ` 2n 8 n n −n ` = 1 12 · √ 6n(1− 3 −n ) √ k > 0.136 p n/k, which completes the proof. Remark 22. By performing numerical simulations for small n and large k in the same way as in Remark 16, we adopt the approximation GTSP({x 0 },X 1 ,...,X n )≈α 0 p n/k, where α 0 = 0.47. 83 9.3.1 Revised simulations In this section we present a revised numerical simulation that is entirely analogous to that of Section 9.2, only we now incorporate inbound and outbound costs as in Theorem 21 by utilizing Remark 22. Thus, whereas we previously had a critical threshold of p ∗ = 4φ 2 β 2 nk n/(n−1) ψ 2 α 2 N , we now see by a straightforward analysis that under the revised model, the critical threshold is instead p ∗ = 4φ 2 β 2 nk ψ 2 (α 0 ) 2 N . These thresholds are shown in Table 9.3. These are somewhat more encouraging than those of Section 9.2, although we still see that a significant amount of adoption of delivery services is required. A recent survey [2] of 22,000 homes suggests that approximately 13% of shoppers have purchased groceries online in the last 30 days, which happens to be close to the average of the entries of Table 9.3. This would suggest that, at present, the benefits to carbon footprints due to delivery services are just beginning to be realized, if at all. 84 p ∗ = 4φ 2 β 2 nk ψ 2 (α 0 ) 2 N Region k N n = 2 n = 3 n = 4 n = 5 n = 6 Los Angeles-Long Beach-Anaheim, CA Metro Area 3358 13052921 0.10 0.14 0.17 0.20 0.23 Chicago-Naperville-Elgin, IL-IN-WI Metro Area 2889 9522434 0.12 0.16 0.20 0.24 0.28 Indianapolis-Carmel-Anderson, IN Metro Area 295 1928982 0.06 0.08 0.10 0.12 0.14 Salt Lake City, UT Metro Area 192 1123712 0.07 0.09 0.11 0.14 0.16 Tulsa, OK Metro Area 136 951880 0.06 0.08 0.10 0.11 0.13 Albuquerque, NM Metro Area 119 901700 0.06 0.07 0.09 0.11 0.12 El Paso, TX Metro Area 138 830735 0.07 0.09 0.11 0.13 0.15 McAllen-Edinburg-Mission, TX Metro Area 132 806552 0.07 0.09 0.11 0.13 0.15 Little Rock-North Little Rock-Conway, AR Metro Area 124 717666 0.07 0.09 0.12 0.14 0.16 Colorado Springs, CO Metro Area 83 668353 0.05 0.07 0.08 0.10 0.12 Boise City, ID Metro Area 73 637896 0.05 0.06 0.08 0.09 0.11 Provo-Orem, UT Metro Area 50 550845 0.04 0.05 0.06 0.07 0.09 Killeen-Temple, TX Metro Area 84 420375 0.08 0.11 0.13 0.16 0.18 Green Bay, WI Metro Area 43 311098 0.06 0.08 0.09 0.11 0.13 Clarksburg, WV Micro Area 25 94310 0.11 0.14 0.17 0.21 0.24 Elmira, NY Metro Area 24 88911 0.11 0.14 0.18 0.21 0.25 DuBois, PA Micro Area 22 81184 0.11 0.14 0.18 0.21 0.25 Table 9.3: The number of grocery stores k, the populations N, and the critical thresholds p ∗ at which emissions decrease as adoption of delivery services increases under the revised model. The first and second columns are obtained from [1]. 85 Chapter 10 More generalized setting for GTSP In Theorem 8 and Theorem 21, we made the assumption that all the setsX i has the same number of points. In the chapter, we explore the case where each set can have different number of points. Now we introduce the more generalized version of Theorem 8 and Theorem 21. Theorem23. LetX 1 ,...,X n denoten sets of points, each having cardinalityk i , and suppose that allk 1 +k 2 +···+k n points are distributed independently and uniformly at random in a regionR having area A. The following two statements hold: 1. For any k i sampled from a possibly infinite discrete distribution Z where P (Z =L i ) =p i > 0,L i ∈N,i≥ 1, if E[logZ]<∞ and 8α 2 1 ·e E[logZ] < 1, then there exists a threshold ¯ n such that EGTSP(X 1 ,...,X n )≥α 1 p An/k 0 whenever n≥ ¯ n, where k 0 =HM{k i } is the harmonic mean of k i . 2. For any k 1 ,k 2 ,...,k n , there exists a threshold ¯ n such that EGTSP(X 1 ,...,X n )≤α 2 p An/k 0 (10.1) whenever n≥ ¯ n, where k 0 =HM{k i } means the harmonic mean of k i . 86 Proof. As in the proof of Theorem 8, we will assume that the service regionR is the unit square, and we will then letL denote a lattice withinR of the form 0, 1 m , 2 m ,..., m− 1 m , 1 × 0, 1 m , 2 m ,..., m− 1 m , 1 . It is immediately clear that for any pathP :={x 1 ,...,x n }⊂R, there exists a path inL obtained by rounding the terms ofP to their nearest neighbor inL whose length is withinn/m of the original path. Thus, it will suffice to prove Theorem 8 for the special case where all of the elements of the setsX i lie in elements ofL, and study the limiting behavior as m→∞. We first see that there are at most n Q i=1 k i total distinct subsets of the form{x 1 ,...,x n }, withx i ∈X i for eachi. For each of these subsets, there aren! different orderings, and we therefore conclude that there are at most n! n Q i=1 k i valid paths that visit one point from each of the subsetsX i . By applying the union bound, we see that Pr(GTSP(X 1 ,...,X n )≤α 1 p n/k 0 ) = Pr(one of the n! n Y i=1 k i valid paths has length ≤α 1 p n/k 0 ) ≤ n! n Y i=1 k i Pr(length(P)≤α 1 p n/k 0 ) whereP is the path obtained by selecting the first element from each setX i and visiting these elements in a sequence chosen uniformly at random. By construction, we see thatP is simply a path that is sampled uniformly at random from the collection of all possible paths between n points taken fromL. Of course, we can see immediately that there are exactly m 2n of these. By scaling the lattice of Lemma 14 by a factor of 1/m, we see that there are at most m 2 · mα 1 p n/k 0 +n− 1 n− 1 · 8mα 1 p n/k 0 n− 1 ! n−1 ≤ m 2 · (mα 1 p n/k 0 +n− 1) n−1 (n− 1)! · 8mα 1 p n/k 0 n− 1 ! n−1 87 paths of length α 1 p n/k 0 through our latticeL and therefore lim sup m→∞ Pr(length(P)≤α 1 p n/k 0 ) ≤ lim sup m→∞ m 2 · (mα1 √ n/k0+n−1) n−1 (n−1)! · 8mα1 √ n/k0 n−1 n−1 m 2n = (8α 2 1 ) n−1 · n n−1 (n− 1)!· (n− 1) n−1 ·k n−1 0 =⇒ Pr(GTSP(X 1 ,...,X n )≤α 1 p n/k 0 ) ≤ n!· n Y i=1 k i · (8α 2 1 ) n−1 · n n−1 (n− 1)!· (n− 1) n−1 ·k n−1 0 = (GM{k i }) n (HM{k i }) n−1 · (8α 2 1 ) n−1 · n n (n− 1) n−1 . Where GM{k i } andHM{k i } denote the geometric mean and harmonic mean of{k 1 ,k 2 ,...,k n }, respectively. Now we will show that the above quantity approaches 0 as n→∞. Since all the k i are sampled from the distribution Z, we have lim n→∞ GM{k 1 ,...,k n } = ∞ Y j=1 L pj j =e E[logZ] ,A lim n→∞ HM{k 1 ,...,k n } = 1 ∞ P j=1 pj Lj ,B≥ 1. To show the results we desire, it is equivalent to show n· [ GM{k 1 ,...,k n } HM{k 1 ,...,k n } · 8α 2 1 ] n → 0 as n→ 0. For any ε> 0 small enough,∃N > 0, if n>N, we have |GM{k 1 ,...,k n }−A|<ε |HM{k 1 ,...,k n }−B|<ε. Thus, when n>N, GM{k 1 ,...,k n } HM{k 1 ,...,k n } · 8α 2 1 ≤ A +ε B−ε · 8α 2 1 ≤ A +ε 1−ε · 8α 2 1 . 88 Let n go to∞, for any ε> 0, we have lim n→∞ n· [ GM{k 1 ,...,k n } HM{k 1 ,...,k n } · 8α 2 1 ] n ≤ lim n→∞ n· [ A +ε 1−ε · 8α 2 1 ] n . Let ε→ 0, we have lim n→∞ n· [ GM{k 1 ,...,k n } HM{k 1 ,...,k n } · 8α 2 1 ] n ≤ lim n→∞ n· [A· 8α 2 1 ] n = lim n→∞ n· [e E[logZ] · 8α 2 1 ] n . The above limit goes to 0 because 8α 2 1 ·e E[logZ] < 1. Thus, EGTSP(X 1 ,...,X n ) ≥ 1− (GM{k i }) n (HM{k i }) n−1 · (8α 2 1 ) n−1 · n n (n− 1) n−1 α 1 p n/k 0 ∼ α 1 p n/k 0 as n→∞ which proves Claim 1. To prove Claim 2, assume without loss of generality that A = 1, and make an additional further assumption that R is the unit square (we will later explain why our proof generalizes to the case for generalR). The proof of Claim 2 is a fairly straightforward generalization of the result from [36], and a stronger convergence result can be found in [8] which studies a closely related problem that is a lower bound of the GTSP. Let m be an even integer and consider the pathP obtained by traversing the width ofR horizontally a total of m times, starting at the upper leftmost corner ofR and moving downward by an amount 1 /m−1, as shown in Figure 10.1a; it is obvious that the length ofP is simply m + 2. Given a collection of point setsX 1 ,...,X n inR, we can perturbP to form a new pathP 0 that visits one point from each point setX i by simply inserting a pair of vertical line segments betweenP and the nearest point (measured only in the vertical direction) in eachX i , as shown in Figure 10.1a. Of course, the vertical distance betweenP and any arbitrary point x∈X i simply follows a uniform distribution between 0 and 1 /2(m−1), and therefore we see that the vertical distance d betweenP and its nearest neighbor inX i (again, in the vertical sense) satisfies E(d i ) = 1 /2(m−1) k i + 1 . 89 Our objective is therefore to select a value m so as to minimize the total length ofP plus these additional n displacements, i.e. to minimize m + 2 + 2 n X i=1 1 /2(m−1) k i + 1 wherewehaveanadditionalmultiplier“2”sinceeachverticaldisplacementconsistsofanoutboundandandinbound trip. Take the derivative of the quantity above with respect to m, and set it to be 0. 1 + −1 (m− 1) 2 n X i=1 1 k i + 1 = 0 Thus as n becomes large we see that the optimal m satisfies m ∗ ∼ v u u t n X i=1 1 k i + 1 which results in a total length that satisfies length(P 0 )∼ 2 v u u t n X i=1 1 k i + 1 < 2 v u u t n X i=1 1 k i = 2 √ n v u u t n P i=1 1 ki n = 2 r n k 0 which proves (10.1). Remark 24. The essence of Theorem 23 is that it allows the number of points to be different for each set if the number is sampled from a distribution that satisfies some certain conditions, which is more realistic. We can see that the place of “k” in theorem in 8 is replaced by the harmonic mean of all the k i . One draw back for this theorem though is that for the lower bound, the constant is dependent on the distribution, i.e., if the distribution of the number of points is fixed, then we can find some positive number that justifies the lower bound (this constant depends on the distribution). The difficulty comes from the harmonic mean. When the numbers are very different from each other, it is impossible to have a good bound for the harmonic mean. 90 (a) (b) Figure 10.1: In (10.1a), we show the pathP, which traverses the regionR horizontally a total of m = 8 times. In (10.1b), we show the perturbed pathP 0 , where we have n = 3 point setsX i consisting of k = 4 points each. Now let’s introduce another theorem that can generalize Theorem 21 by allowing the number of points in each set to be different. Theorem 25. LetX 1 ,...,X n denoten sets of points, each having cardinalityk i , and suppose that allk 1 +k 2 +···k n points are distributed independently and uniformly at random in a regionR having area A. Let x 0 be a point in the interior ofR. Then the expected length of a generalized TSP tour of{x 0 },X 1 ,...,X n satisfies EGTSP({x 0 },X 1 ,...,X n )∈O( p An/k 0 · p logk 0 ) and EGTSP({x 0 },X 1 ,...,X n )∈ Ω p An/k 0 as k 0 =GM{k i } = (k 1 k 2 ···k n ) 1 n →∞ with n fixed. Specifically, the following statements hold: 91 1. Suppose fixed p i denotes the fraction of k i , n P i=1 p i = 1, i.e., (k 1 ,k 2 ,··· ,k n ) = k(p 1 ,p 2 ,··· ,p n ). Let L = min{p i }. Then for any n, there exists a threshold ¯ k such that EGTSP({x 0 },X 1 ,...,X n )≤α 1 r An Lk · p logk whenever k≥ ¯ k, where α 1 = 2.7 is the constant from Lemma 4. 2. For any n, there exists a threshold ¯ k and a constant α 2 such that EGTSP({x 0 },X 1 ,...,X n )≥α 2 r An k 0 whenever k 0 ≥ ¯ k. Proof. The proof will be very similar to Theorem 21, see Appendix A for details. Remark 26. This theorem gives the same lower bound as Theorem 21, but the upper bound involves a key parameter of the distribution of the number of points in each set. This is somewhat weak because if we force each set to have the same number of points as in Theorem 21, the upper bound becomes weaker than Theorem 21 by a factor of √ n. The good side is, however, it at least describes a reasonable upper bound for almost any distribution. 92 Chapter 11 Conclusion We have conducted asymptotic analysis of the generalized TSP tour of n point setsX 1 ,...,X n with cardinalities k for each set, in the unit square, for two limiting cases. The first case, where n→∞ and k is a fixed number, if all n×k points are uniformly distributed in the unit square, we prove Pr a d ≤ GTSP(X 1 ,...,X n ) √ d(n d−1 /k) 1/d ≤b d → 1, as n→∞, for each dimension d≥ 2. To tackle the case of clustering, we pick any Jordan measurable shapeS, uniformly sample n shapeS in the unit square, then uniformly sample k points inside each shapeS, we are able to show a d ≤ EGTSP(X 1 ,...,X n ) √ d(n d−1 /k) 1/d ≤b 0 d , where a d are the same as the uniform case. For the second case, where n is fixed and|X i |→∞, we are able to deal with the more general case where each setX i containsk i points, but only in dimension two. The asymptotic behavior is analyzed under the condition that 93 k G →∞, where k G is the geometric mean of k i ’s. However, our upper and lower bounds differ by a term of order (n 2 logk G + logn) 1 2(n−1) , which is fairly small. We conjecture that, in this case, one actually has EGTSP(X 1 ,...,X n )∈O s n k n/(n−1) G , although this will clearly require further analysis. Furthermore, weperformasymptoticanalysisonavariationofTravelingpurchaserproblem. Themaindifference of Traveling purchaser problem from GTSP is that, instead of visiting one point in each set, it is required to visit a fixed fraction p of the points in each set. We prove that in 2D plane, the expected length of TPP tour has the following magnitude ETPP(X 1 ,...,X n )∈ Θ(p √ nk). In Chapter 7, we conduct numerical experiments on GTSP, using artificially generated data and real map data. The numerical results align with our theoretical analysis, and from there we can conjecture stronger convergence results, e.g., for GTSP whenn→∞, even though we can prove lower bound and upper bound have the same order with probability one, we couldn’t prove GTSP actually converges with probability one, which is more desirable because this is the case in BHH theorem. In addition, We apply GTSP model to a warehouse system, and compare it with existing Kiva warehouse system. It turns out that GTSP warehouse system can greatly improve the order fulfillment efficiency, potentially two to seven times less traveling cost (four times on average). For now, GTSP warehouse will resort to human pickers (such labor can be very cheap in some developing countries) to find the right SKUs, because there is no robot that is able to perform the pick-up task as required by GTSP warehouse system. In the future, if this kind of robots is available and cheap enough, we should take advantage of that as well. The last part of the work is to study the household trip chaining behavior. Numerical analyses of the preceding GTSP models, when applied to the problem of estimating the change in carbon footprint that results from using delivery services, suggest that a considerable amount of adoption of delivery services is necessary before one begins to see a decrease in carbon footprint. The reason for this is simply that the economy of scale that is realized 94 by delivery services requires a significant initial level of adoption in order to compete with the “household-level” economy of scale that the customers already possess, given their wide choice of locations to visit and the number of daily locations they visit. At the very end, in Chapter 10, we prove two GTSP theorems in a more generalized setting, allowing the number of points in each set to be different. The results still have space for improvement as we remarked after the proofs, but it is a good starting point to analyze the more complicated cases. 95 Chapter 12 Future Research Generalized traveling salesman problem has been proven a very useful tool in modeling some real world problems. In this thesis, we have proved a few theoretical results, and also applied GTSP to two real world problems - warehouse random strategy, and house-level trip chaining behavior. There are a few directions that we think worth working on in the future, both in the theoretical and practical aspects. One field we want to work on is the non-uniformity of GTSP. As we can see from BHH theorem (Lemma 6), for any absolute continuous distribution of points, we are able to approximate the length of the tour in the asymptotic sense, which makes BHH a very powerful tool. In our analysis of GTSP, however, uniformity is assumed. For now, we have a couple of tentative ideas to tackle this non-uniformity. In the non-clustered case, we have in total n×k points. Instead of assuming they are uniformly distributed, we can assume they follow some distribution f. This new problem is complicated though. Remember in our upper bound proof of uniform case, we use the distribution of points where the first coordinate x 1 is the smallest and that is some distribution we can work out. However, in the non-uniform case, we would hope to choose points that are near the probabilistically dense region, but this is something that is not easy to describe mathematically. The other way to work on the non-uniform case is to incorporate a distribution into the clustered case. One possible way is to sample the n “shape”s (clusters) from some distribution f 1 , then sample k points inside each cluster from some other distribution f 2 . A second direction that is worth working on is to bridge the gap between upper and lowers bounds in two limiting cases. For the case where n→∞, the constants for the lower bounds are quite weak because of our relatively crude analysis, so we hope to come up with some arguments that make some improvement. Also, from 96 our simulation results in Section 7.1, the numerical experiments suggest that in the uniform case, the limiting constants are different whenk are different, so we hope to make some discoveries about that. As for the case where k G →∞, our efforts will be put on the gap of orders of lower and upper bounds. A third direction is the extension of Chapter 10. It is notable that the results in that chapter are very desirable, because allowing different number of points in each set is a much more realistic assumption than restricting same numberofpointsineachset. Ifwecanhaveatheoreticalresultthatachievesthiswithoutputtingstrongassumptions for the theorems, our trip chaining model in Chapter 9 can be more realistic and convincing. As we can see, we use the same methodology of Theorem 8 and Theorem 21 to give the proof for Theorem 23 and Theorem 25, which can be one of the reasons why the results are less strong. If we can somewhat modify the proof or come up with a new methodology, we may be able to prove much stronger results. Apart from the potential theoretical efforts as described above, some numerical efforts are also worthwhile. First is the numerical analysis on the Traveling purchaser problem. As far as I know, there is no existing solver that can directly solve our version of THH problem, and that is why we didn’t put any effort on numerical analysis of THH problem. However, the variation of THH problem we propose has its own advantage in analyzing some real world problems, such as warehouse fulfillment problem, thus, it will be beneficial if we can devise some efficient algorithm for this particular problem. Obviously, another part that is worth significant efforts is the GTSP warehouse system. From our simulation, we can see huge potential gain compared to existing Kiva warehouse system. Since our simulation is based on some unrealistic assumptions, the result is informative and promising but may be unconvincing. The next step should be building better models that are closer to real life. If our simulation result can convince the industry that GTSP warehouse is the next generation of automated warehouse, we can push the industry to put more efforts on building better robots that can perform complicated tasks as the human pickers do. One final comment about the future work is to apply GTSP model to other practical fields. One promising application that we would have done is the new “last mile” delivery scheme which is mentioned in Chapter 1, i.e., customers can “opt in” to provide multiple package drop off locations, so that the delivery company have the 97 freedom to choose the most efficient location to make the delivery. After customers provide the drop off locations, the delivery problem naturally becomes a GTSP instance. 98 Reference List [1] 2012 MSA Business Patterns (NAICS). http://censtats.census.gov/cgi-bin/msanaic/msasect.pl. [Online; accessed 22-August-2014]. [2] 13% bought groceries online: survey, June 2013. [Online; posted 18-June-2013]. [3] Emile HL Aarts, Jan HM Korst, and Peter JM van Laarhoven. A quantitative analysis of the simulated annealing algorithm: A case study for the traveling salesman problem. Journal of Statistical Physics, 50(1- 2):187–206, 1988. [4] T. Adler and M. Ben-Akiva. A theoretical and empirical model of trip chaining behavior. Transportation Research Part B: Methodological, 13(3):243–257, 1979. [5] Thomas J Adler and Moshe Ben-Akiva. Joint-choice model for frequency, destination, and travel mode for shopping trips. Transportation Research Record, (569), 1976. [6] National Highway Traffic Safety Administration. Summary of fuel economy performance. US Department of Transportation, NHTSA, 2011. [7] HENRYLAB. AL. Record balancing problem-a dynamic programming solution of a generalized traveling salesman problem. Revue Francaise D Informatique De Recherche Operationnelle, 3(NB 2):43, 1969. [8] D. Aldous and M. Krikun. Percolating paths through random points. arXiv preprint math/0509492, 2005. [9] W.P.Anderson,L.Chatterjee,andT.R.Lakshmanan. E-commerce,transportation,andeconomicgeography. Growth and Change, 34(4):415–432, 2003. [10] D. Applegate, W. Cook, D. S. Johnson, and N. J. A. Sloane. Using large-scale computation to estimate the Beardwood-Halton-Hammersley TSP constant. Presentation at 42 SBPO, 2010. [11] David Applegate, Robert Bixby, Vašek Chvátal, and William Cook. Finding cuts in the TSP (A preliminary report), volume 95. Citeseer, 1995. [12] David L Applegate, Robert E Bixby, Vasek Chvatal, and William J Cook. The traveling salesman problem: a computational study. Princeton university press, 2011. [13] Egon Balas and Matteo Fischetti. Polyhedral theory for the asymmetric traveling salesman problem. In The traveling salesman problem and its variations, pages 117–168. Springer, 2007. [14] Burcu Balcik, Benita M Beamon, and Karen Smilowitz. Last mile distribution in humanitarian relief. Journal of Intelligent Transportation Systems, 12(2):51–63, 2008. [15] Gülay Barbarosoğlu, Linet Özdamar, and Ahmet Cevik. An interactive approach for hierarchical analysis of helicopter logistics in disaster relief operations. European Journal of Operational Research, 140(1):118–133, 2002. [16] Cynthia Barnhart, Natashia L Boland, Lloyd W Clarke, Ellis L Johnson, George L Nemhauser, and Rajesh G Shenoi. Flight string models for aircraft fleeting and routing. Transportation science, 32(3):208–220, 1998. 99 [17] J. Beardwood, J. H. Halton, and J. M. Hammersley. The shortest path through many points. Mathematical Proceedings of the Cambridge Philosophical Society, 55(4):299–327, 1959. [18] D. Ben-Arieh, G. Gutin, M. Penn, A. Yeo, and A. Zverovitch. Transformations of generalized atsp into atsp. Operations Research Letters, 31(5):357–365, 2003. [19] Amir Ben-Dor and Benny Chor. On constructing radiation hybrid maps. Journal of Computational Biology, 4(4):517–533, 1997. [20] G. Bensinger. Startup grocery deliverer Instacart eyes Amazon. Wall Street Journal, August 7 2013. [21] D. J. Bertsimas and D. Simchi-Levi. A new generation of vehicle routing research: robust algorithms, ad- dressing uncertainty. Operations Research, 44(2):286–304, 1996. [22] A. Berube, W. H. Frey, A. Friedhoff, E. Garr, E. Istrate, E. Kneebone, R. Puentes, A. Singer, A. Tomer, H. Wial, et al. State of metropolitan america: On the front lines of demographic transformation, 2010. [23] Binay Bhattacharya, Ante Ćustić, Akbar Rafiey, Arash Rafiey, and Vladyslav Sokol. Approximation algo- rithms for generalized mst and tsp in grid clusters. In Combinatorial Optimization and Applications, pages 110–125. Springer, 2015. [24] Robert G Bland and David F Shallcross. Large travelling salesman problems arising from experiments in x-ray crystallography: a preliminary report on computation. Operations Research Letters, 8(3):125–128, 1989. [25] Boris Bontoux and Dominique Feillet. Ant colony optimization for the traveling purchaser problem. Com- puters & Operations Research, 35(2):628–637, 2008. [26] Robert Bowman. Will Google Shopping Express help retailers fend off challenge from Amazon?, June 2014. [Online; posted 17-June-2014]. [27] John Gunnar Carlsson, Mehdi Behroozi, Raghuveer Devulapalli, and Xiangfei Meng. Household-level economies of scale in transportation. Operations Research, 64(6):1372–1387, 2016. [28] Ek Peng Chew and Loon Ching Tang. Travel time analysis for general item location assignment in a rectan- gular warehouse. European Journal of Operational Research, 112(3):582–597, 1999. [29] S Curtis. Amazon at 15: The technology behind amazon-uk’s success. The Telegraph, 15, 2013. [30] Raffaello D’Andrea. Guest editorial: A revolution in the warehouse: A retrospective on kiva systems and the grand challenges ahead. IEEE Transactions on Automation Science and Engineering, 9(4):638–639, 2012. [31] Vanda De Angelis, Mariagrazia Mecoli, Chris Nikoi, and Giovanni Storchi. Multiperiod integrated routing and scheduling of world food programme cargo planes in angola. Computers & Operations Research, 34(6):1601– 1615, 2007. [32] E. Demaine, L. W. Sun, and C. E. Leiserson. Introduction to Algorithms, Problem Set 1. http://courses. csail.mit.edu/6.046/fall01/handouts/ps1.pdf, September 2001. [Online; accessed 8-Sep-2014]. [33] Vladimir Dimitrijević and Zoran Šarić. An efficient transformation of the generalized traveling salesman problem into the traveling salesman problem on digraphs. Information Sciences, 102(1-4):105–110, 1997. [34] A. Dumitrescu and J. S. B. Mitchell. Approximation algorithms for tsp with neighborhoods in the plane. Journal of Algorithms, 48(1):135–159, 2003. [35] Michael Duncan. How much can trip chaining reduce vmt? a simplified method. Transportation, 43(4):643– 659, 2016. [36] L. Few. The shortest path and the shortest road through n points. Mathematika, 2:141–144, 1955. 100 [37] C-N Fiechter. A parallel tabu search algorithm for large traveling salesman problems. Discrete Applied Mathematics, 51(3):243–267, 1994. [38] S.R. Finch. Mathematical Constants. Encyclopedia of Mathematics and Its Applications. Cambridge Univer- sity Press, 2003. [39] Matteo Fischetti, Juan José Salazar González, and Paolo Toth. The symmetric generalized traveling salesman polytope. Networks, 26(2):113–123, 1995. [40] Matteo Fischetti, Juan José Salazar González, and Paolo Toth. A branch-and-cut algorithm for the symmetric generalized traveling salesman problem. Operations Research, 45(3):378–394, 1997. [41] Robert S Garfinkel. Minimizing wallpaper waste, part 1: a class of traveling salesman problems. Operations Research, 25(5):741–751, 1977. [42] Paul C Gilmore and Ralph E Gomory. Sequencing a one state-variable machine: A solvable case of the traveling salesman problem. Operations research, 12(5):655–679, 1964. [43] Bruce Golden, Larry Levy, and Roy Dahl. Two generalizations of the traveling salesman problem. Omega, 9(4):439–441, 1981. [44] ThomasFGolob. Anonlinearcanonicalcorrelationanalysisofweeklytripchainingbehaviour. Transportation Research Part A: General, 20(5):385–399, 1986. [45] RUSSELL W Goodman. Whatever you call it, just don ´ t think of last-mile logistics, last. Global Logistics & Supply Chain Strategies, 9(12), 2005. [46] John Grefenstette, Rajeev Gopal, Brian Rosmaita, and Dirk Van Gucht. Genetic algorithms for the traveling salesman problem. In Proceedings of the first International Conference on Genetic Algorithms and their Applications, pages 160–165, 1985. [47] Martin Grötschel, Manfred W Padberg, et al. Polyhedral theory. The traveling salesman problem, pages 251–305, 1985. [48] Gregory Gutin and Daniel Karapetyan. A memetic algorithm for the generalized traveling salesman problem. Natural Computing, 9(1):47–60, 2010. [49] Ali Haghani and Sei-Chang Oh. Formulation and solution of a multi-commodity, multi-modal network flow model for disaster relief operations. Transportation Research Part A: Policy and Practice, 30(3):231–250, 1996. [50] M. Haimovich and A. H. G. Rinnooy Kan. Bounds and heuristics for capacitated routing problems. Mathe- matics of Operations Research, 10(4):527–542, 1985. [51] C. Harris and J. Cook. Amazon starts grocery delivery service. Seattle Post-Intelligencer, August 1 2007. [52] Keld Helsgaun. An effective implementation of the lin–kernighan traveling salesman heuristic. European Journal of Operational Research, 126(1):106–130, 2000. [53] Keld Helsgaun. General k-opt submoves for the lin–kernighan tsp heuristic. Mathematical Programming Computation, 1(2):119–163, 2009. [54] Keld Helsgaun. Solving the equality generalized traveling salesman problem using the lin–kernighan–helsgaun algorithm. Mathematical Programming Computation, 7(3):269–287, 2015. [55] Karla L Hoffman, Manfred Padberg, and Giovanni Rinaldi. Traveling salesman problem. In Encyclopedia of operations research and management science, pages 1573–1578. Springer, 2013. [56] H. Hotelling. Stability in competition. The Economic Journal, 39(153):41–57, 1929. 101 [57] R. Howell. On asymptotic notation with multiple variables. Technical Report 2007-4, Department of Com- puter Science, Kansas State University, 2008. [58] Ling-feng Hsieh and Lihui Tsai. The optimum design of a warehouse system on order picking efficiency. The International journal of advanced manufacturing technology, 28(5):626–637, 2006. [59] H Hwang*, YH Oh, and YK Lee. An evaluation of routing policies for order-picking operations in low-level picker-to-part system. International Journal of Production Research, 42(18):3873–3889, 2004. [60] Sung Jin Hwang, Steven B Damelin, and Alfred O Hero III. Shortest path through random points. arXiv preprint arXiv:1202.0045, 2012. [61] Elizabeth Jewkes, Chulung Lee, and Ray Vickson. Product location, allocation and server home base location for an order picking line with multiple servers. Computers & Operations Research, 31(4):623–636, 2004. [62] Michael Jünger, Gerhard Reinelt, and Giovanni Rinaldi. The traveling salesman problem. Handbooks in operations research and management science, 7:225–330, 1995. [63] H. J. Karloff. How long can a euclidean traveling salesman tour be? SIAM Journal on Discrete Mathematics, 2(1):91–99, 1989. [64] Michael Khachay and Katherine Neznakhina. Approximation algorithms for generalized tsp in grid clusters. [65] Eugene Kim. Amazon’s $775 million deal for robotics company kiva is starting to look really smart. http:// www.businessinsider.com/kiva-robots-save-money-for-amazon-2016-6, June 2016. [Online; accessed 15-Nov-2017]. [66] Jeongseob Kim and Abraham Seidmann. A framework for the exact evaluation of expected cycle times in automated storage systems with full-turnover item allocation and random service requests. Computers & Industrial Engineering, 18(4):601–612, 1990. [67] Ryuichi Kitamura. Sequential, history-dependent approach to trip-chaining behavior. Transportation Re- search Record, (944), 1983. [68] Ryuichi Kitamura and Yusak Susilo. Does a grande latte really stir up gridlock?: Stops in commute journeys and incremental travel. Transportation Research Record: Journal of the Transportation Research Board, (1985):198–206, 2006. [69] D. Kodjak. Policy discussion – heavy-duty truck fuel economy. Presentation at 10th Diesel Engine Emissions Reduction (DEER) Conference, 2004. [70] Rene de Koster and Edo Van Der Poort. Routing orderpickers in a warehouse: a comparison between optimal and heuristic solutions. IIE transactions, 30(5):469–480, 1998. [71] Danielle Kucera. Amazon acquires kiva systems in second-biggest takeover. Availabl e at http://bloom. bg/Gzo6GU, 2012. [72] A. Langevin, P. Mbaraga, and J. F. Campbell. Continuous approximation models in freight distribution: An overview. Transportation Research Part B: Methodological, 30(3):163 – 188, 1996. [73] G. Laporte, A. Asef-Vaziri, and C. Sriskandarajah. Some applications of the generalized travelling salesman problem. Journal of the Operational Research Society, pages 1461–1467, 1996. [74] G. Laporte and F. Semet. Computational evaluation of a transformation procedure for the symmetric gener- alized traveling salesman problem. INFOR, 37(2):114–120, 1999. [75] Gilbert Laporte, Yves Nobert, and Serge Taillefer. Solving a family of multi-depot vehicle routing and location-routing problems. Transportation science, 22(3):161–172, 1988. 102 [76] Gilbert Laporte, Jorge Riera-Ledesma, and Juan-José Salazar-González. A branch-and-cut algorithm for the undirected traveling purchaser problem. Operations Research, 51(6):940–951, 2003. [77] T Nick Larson, Heather March, and Andrew Kusiak. A heuristic approach to warehouse layout with class- based storage. IIE transactions, 29(4):337–348, 1997. [78] Tho Le-Duc* and R(M) BM De Koster. Travel distance estimation and storage zone optimization in a 2-block class-based storage strategy warehouse. International Journal of Production Research, 43(17):3561– 3581, 2005. [79] H. L. Lee and S. Whang. Winning the last mile of e-commerce. MIT Sloan Management Review, 42(4):54–62, 2001. [80] M-K Lee* and EA Elsayed. Optimization of warehouse storage capacity under a dedicated storage policy. International Journal of Production Research, 43(9):1785–1805, 2005. [81] Jan K Lenstra and AHG Rinnooy Kan. Some simple applications of the travelling salesman problem. Journal of the Operational Research Society, 26(4):717–733, 1975. [82] Steven R Lerman. The use of disaggregate choice models in semi-markov process models of trip chaining behavior. Transportation Science, 13(4):273–291, 1979. [83] Yao-Nan Lien, Eva Ma, and Benjamin W-S Wah. Transformation of the generalized traveling-salesman problem into the standard traveling-salesman problem. Information Sciences, 74(1-2):177–189, 1993. [84] Shen Lin and Brian W Kernighan. An effective heuristic algorithm for the traveling-salesman problem. Operations research, 21(2):498–516, 1973. [85] Nancy McGuckin and Elaine Murakami. Examining trip-chaining behavior: Comparison of travel by men and women. Transportation Research Record: Journal of the Transportation Research Board, (1693):79–85, 1999. [86] Nancy McGuckin, Johanna Zmud, and Yukiko Nakamoto. Trip-chaining trends in the united states: under- standing travel behavior for policy making. Transportation Research Record: Journal of the Transportation Research Board, (1917):199–204, 2005. [87] A.C.McKinnonandA.Woodburn. Shoppingtriporhomedelivery? –whichhasthelargestcarbonfootpring. Logistics and Transport Focus, 11(7):20–25, 2009. [88] M. Mitzenmacher and E. Upfal. Probability and Computing: Randomized Algorithms and Probabilistic Anal- ysis. Cambridge University Press, 2005. [89] P. L. Mokhtarian. A conceptual analysis of the transportation impacts of b2c e-commerce. Transportation, 31(3):257–284, 2004. [90] Denis Naddef. Polyhedral theory and branch-and-cut algorithms for the symmetric tsp. In The traveling salesman problem and its variations, pages 29–116. Springer, 2007. [91] Yuichi Nagata. New eax crossover for large tsp instances. Parallel Problem Solving from Nature-PPSN IX, pages 372–381, 2006. [92] C. E. Noon and J. C. Bean. A lagrangian based approach for the asymmetric generalized traveling salesman problem. Operations Research, 39(4):623–632, 1991. [93] Charles E Noon and James C Bean. An efficient transformation of the generalized traveling salesman problem. INFOR: Information Systems and Operational Research, 31(1):39–44, 1993. 103 [94] Netherlands. Department of Public Works. Urban Research Division and Velibor Vidakovic. A Study of Individual Journey Series-an Integrated Interpretation of the Transportation Process. Department of Public Works, 1971. [95] Hoon Liong Ong. Approximate algorithms for the travelling purchaser problem. Operations Research Letters, 1(5):201–205, 1982. [96] SemihÖnüt,UmutRTuzkaya,andBilgehanDoğaç. Aparticleswarmoptimizationalgorithmforthemultiple- level warehouse layout design problem. Computers & Industrial Engineering, 54(4):783–799, 2008. [97] Linet Özdamar, Ediz Ekinci, and Beste Küçükyazici. Emergency logistics planning in natural disasters. Annals of operations research, 129(1):217–245, 2004. [98] Wen Lea Pearn and RC Chien. Improved solutions for the traveling purchaser problem. Computers & Operations Research, 25(11):879–885, 1998. [99] IztokPotrč, ToneLerher, JanezKramberger, andMatjažŠraml. Simulationmodelofmulti-shuttleautomated storage and retrieval systems. Journal of Materials Processing Technology, 157:236–244, 2004. [100] Jean-Yves Potvin, Université de Montréal. Département d’informatique et de recherche opérationnelle, and Québec) Centre for Research on Transportation (Montréal. The traveling salesman problem: a neural network perspective. Université de Montréal, Centre de recherche sur les transports, 1992. [101] Dev Bahadur Poudel. Coordinating hundreds of cooperative, autonomos vehicles in a warehouse. https: //www.slideshare.net/devbp/kiva-system, July 2012. [Online; accessed 15-Nov-2017]. [102] Associated Press. Got groceries? Wal-Mart testing home delivery. The Herald Bulletin, April 23 2011. [103] Frank Primerano, Michael AP Taylor, Ladda Pitaksringkarn, and Peter Tisato. Defining and understanding trip chaining behaviour. Transportation, 35(1):55–72, 2008. [104] Mikko Punakivi, Hannu Yrjölä, and Jan HolmstroÈm. Solving the last mile issue: reception box or delivery box? International Journal of Physical Distribution & Logistics Management, 31(6):427–439, 2001. [105] T Ramesh. Traveling purchaser problem. Opsearch, 18(1-3):78–91, 1981. [106] H Donald Ratliff and Arnon S Rosenthal. Order-picking in a rectangular warehouse: a solvable case of the traveling salesman problem. Operations Research, 31(3):507–521, 1983. [107] C. Redmond and J. E. Yukich. Limit theorems and rates of convergence for euclidean functionals. The Annals of Applied Probability, 4(4):pp. 1057–1073, 1994. [108] Gerhard Reinelt. Tsplib-a traveling salesman problem library. ORSA journal on computing, 3(4):376–384, 1991. [109] Jacques Renaud and Fayez F Boctor. An efficient composite heuristic for the symmetric generalized traveling salesman problem. European Journal of Operational Research, 108(3):571–584, 1998. [110] Jacques Renaud, Fayez F Boctor, and Gilbert Laporte. A fast composite heuristic for the symmetric traveling salesman problem. INFORMS Journal on computing, 8(2):134–143, 1996. [111] Herbert Robbins. A remark on stirling’s formula. The American Mathematical Monthly, 62(1):26–29, 1955. [112] C Carl Robusto. The cosine-haversine formula. The American Mathematical Monthly, 64(1):38–40, 1957. [113] KeesJanRoodbergenandRenéDeKoster. Routingorderpickersinawarehousewithamiddleaisle. European Journal of Operational Research, 133(1):32–43, 2001. 104 [114] J_ P Saksena. Mathematical Model of Scheduling Clients Through Welfare Angencies: II. Depts. of Electrical Engineering and Medicine, University of Southern California, 1967. [115] Marcia Scott, Edward O ´ Donnell, and Sebastian Anderka. Improving freight movement in delaware central business districts. 2009. [116] X. H. Shi, Y. C. Liang, H. P. Lee, C. Lu, and Q. X. Wang. Particle swarm optimization-based algorithms for tsp and generalized tsp. Information Processing Letters, 103(5):169–176, 2007. [117] Kashi N Singh and Dirk L van Oudheusden. A branch and bound algorithm for the traveling purchaser problem. European Journal of operational research, 97(3):571–579, 1997. [118] PSlavik. Ontheapproximationofthegeneralizedtravelingsalesmanproblem. Rapport technique, Department of Computer Science, SUNY-Buffalo, 1997. [119] LawrenceVSnyderandMarkSDaskin. Arandom-keygeneticalgorithmforthegeneralizedtravelingsalesman problem. European Journal of Operational Research, 174(1):38–53, 2006. [120] H. Somerville. Startups try to find sweet spot in grocery delivery. San Jose Mercury News, March 29 2013. [121] S. S. Srivastava, S. Kumar, R. C. Garg, and P. Sen. Generalized traveling salesman problem through n sets of nodes. CORS journal, 7:97–101, 1969. [122] J. M. Steele. Subadditive euclidean functionals and nonlinear growth in geometric probability. The Annals of Probability, 9(3):pp. 365–376, 1981. [123] J.M. Steele. Probability Theory and Combinatorial Optimization. CBMS-NSF Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics, 1987. [124] K. Suh, T. Smith, and M. Linhoff. Leveraging socially networked mobile ICT platforms for the last-mile delivery problem. Environmental science & technology, 46(17):9481–9490, 2012. [125] Mehmet Fatih Tasgetiren, Ponnuthurai N Suganthan, and Quan-Qe Pan. A discrete particle swarm optimiza- tion algorithm for the generalized traveling salesman problem. In Proceedings of the 9th annual conference on Genetic and evolutionary computation, pages 158–167. ACM, 2007. [126] Ulrich W Thonemann and Margaret L Brandeau. Note. optimal storage assignment policies for automated storage and retrieval systems with stochastic demands. Management Science, 44(1):142–148, 1998. [127] Francesco Giacomo Tricomi, Arthur Erdélyi, et al. The asymptotic expansion of a ratio of gamma functions. Pacific J. Math, 1(1):133–142, 1951. [128] United States Census Bureau. American FactFinder. https://factfinder.census.gov/faces/nav/jsf/ pages/index.xhtml. [129] E. Wygonik and A. Goodchild. Evaluating the efficacy of shared-use vehicles for reducing greenhouse gas emissions: a US case study of grocery delivery. In Journal of the Transportation Research Forum, volume 51, 2012. [130] J. Yang, X. Shi, M. Marchese, and Y. Liang. An ant colony optimization method for generalized tsp problem. Progress in Natural Science, 18(11):1417–1422, 2008. 105 Appendix A Proof of Theorem 25 Proof. Assume as in the previous proofs thatR is the unit square. To prove Claim 1, consider a square 0 of area a := logk 2Lk centered at the point x 0 , and suppose that k is sufficiently large (i.e. that a is sufficiently small) that 0 is entirely contained withinR. Let p denote the probability that 0 contains at least one element from each point setX i , in which case we clearly have GTSP({x 0 },X 1 ,...,X n )≤α 1 p a(n + 1) from Lemma 4. Since p = (1− (1−a) kp1 )(1− (1−a) kp2 )··· (1− (1−a) kpn ) , we therefore see that as k→∞, we have EGTSP({x 0 },X 1 ,...,X n ) ≤ pα 1 p a(n + 1) + (1−p)α 1 √ n + 1 = α 1 √ n + 1· (p √ a + 1−p) = α 1 √ n + 1· [(1− (1−a) kp1 )··· (1− (1−a) kpn )( r logk 2Lk − 1) + 1] = α 1 √ n + 1 √ 2Lk · h (1− (1−a) kp1 )··· (1− (1−a) kpn )( p logk− √ 2Lk) + √ 2Lk i ∼ α 1 √ n + 1 √ 2Lk · h (1−k − p 1 2L )··· (1−k − pn 2L )( p logk− √ 2Lk) + √ 2Lk i = α 1 √ n + 1 √ 2Lk · h ((1−k − p 1 2L )··· (1−k − pn 2L )− 1)( p logk− √ 2Lk) + p logk i = α 1 √ n + 1 √ 2Lk p logk + α 1 √ n + 1 √ 2Lk h ((1−k − p 1 2L )··· (1−k − pn 2L )− 1)( p logk− √ 2Lk) i To obtain the result, it suffices to show that ((1−k − p 1 2L )··· (1−k − pn 2L )− 1)( √ logk− √ 2Lk) √ logk → 0 as k→∞. The limit above is obviously true because k − p i 2L ·k 1 2 =k − p i 2L + 1 2 ≤k − 1 2 + 1 2 = 1 Thus EGTSP({x 0 },X 1 ,...,X n ) ≤ α 1 r n + 1 2Lk p logk ≤ α 1 r n Lk p logk 106 as desired. To prove the second claim, we find it useful to revisit Lemma 14, and we again letL denote a lattice withinR of the form 0, 1 m , 2 m ,..., m− 1 m , 1 × 0, 1 m , 2 m ,..., m− 1 m , 1 . We first see that there are at most k n total distinct subsets of the form{x 0 ,x 1 ,...,x n }, with x i ∈X i for each i≥ 1. For each of these subsets, there are n! different orderings, and we therefore conclude that there are at most n!· n Q i=1 k i valid paths that originate at x 0 and visit one point from each of the subsetsX i . By applying the union bound, we see that for any `, Pr(GTSP(x 0 ,X 1 ,...,X n )≤`) = Pr(one of the n!· n Y i=1 k i valid paths has length ≤`) ≤ n!· n Y i=1 k i Pr(length(P)≤`) = n!·k n 0 Pr(length(P)≤`) whereP is the path obtained by selecting the first element from each setX i and visiting these elements in a sequence chosen uniformly at random. By construction, we see thatP is simply a path that is sampled uniformly at random fromP, the collection of all possible paths originating at x 0 that visit n additional points taken from the lattice L. Of course, we can again see that|P| = m 2n . By scaling the lattice of Lemma 14 by a factor of 1/m in the vertical and horizontal directions, there are at most m` +n n · 8m` n n ≤ (m` +n) n n! · 8m` n n paths inP whose length is at most `. Thus, lim sup m→∞ Pr(length(P)≤`) ≤ lim sup m→∞ (m`+n) n n! · 8m` n n m 2n = ` 2n 8 n n −n n! =⇒ Pr(GTSP({x 0 },X 1 ,...,X n )≤`) ≤ k n 0 ` 2n 8 n n −n for all `. We now set ` =c p n/k 0 , with c = √ 6/12, obtaining EGTSP({x 0 },X 1 ,...,X n ) ≥ 1−k n 0 ` 2n 8 n n −n ` = 1 12 · √ 6n(1− 3 −n ) √ k 0 > 0.136 p n/k 0 , which completes the proof. 107
Abstract (if available)
Abstract
The traveling salesman problem (TSP) is a fundamental problem in many diverse fields, including transportation, delivery services, circuit board production, and crystallography, among many others. Apart from extensive research on solving particular instances of this problem, there have been substantial efforts on the probabilistic analysis on TSP, such as the Beardwood-Halton-Hammersley theorem. In this dissertation, we will analyze the asymptotic behavior on a generalized version of the TSP, which we call the Generalized Traveling Salesman Problem (GTSP), in which the goal is to select one point each from multiple sets of points and come up with a tour with the minimum length. Two different limiting cases on GTSP are examined: one is the case where the number of point sets goes to infinity, and the other is the case where the number of points in each set goes to infinity. In addition, we define a variation of traveling purchaser problem, and perform asymptotic analysis as well. Numerical simulations confirm that our analysis on GTSP is valid when applied to simulations in the Euclidean plane and on real map. To demonstrate the practical usage of GTSP, we apply the model to warehouse order fulfillment strategies - a warehouse runner picks multiple different items from multiple shelves in a single run, and show that a warehouse using GTSP strategy can greatly reduce order picking traveling distance compared to Amazon’s semi-automatic warehouse using Kiva robots to fulfill orders. Finally, we use GTSP to quantify the changes in overall carbon footprint efficiency due to delivery services by looking at ""household-level"" economies of scale in transportation: a person might perform many errands in a day (such as going to the bank, grocery store, and post office), and that person has many choices of locations at which to perform these tasks (e.g., a typical metropolitan region has many banks, grocery stores, and post offices). Thus, the total driving distance (and therefore the overall carbon footprint) that that person traverses is the solution to a generalized traveling salesman problem (GTSP) in which he or she selects both the best locations to visit and the sequence in which to visit them.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
The warehouse traveling salesman problem and its application
PDF
A continuous approximation model for the parallel drone scheduling traveling salesman problem
PDF
Dynamic programming-based algorithms and heuristics for routing problems
PDF
Continuous approximation for selection routing problems
PDF
Continuous approximation formulas for cumulative routing optimization problems
PDF
Cost-sharing mechanism design for freight consolidation
PDF
Train routing and timetabling algorithms for general networks
PDF
Applications of Wasserstein distance in distributionally robust optimization
PDF
Applications of explicit enumeration schemes in combinatorial optimization
PDF
Package delivery with trucks and UAVs
PDF
Bayesian optimal stopping problems with partial information
PDF
Some bandit problems
PDF
Queueing loss system with heterogeneous servers and discriminating arrivals
PDF
Computational geometric partitioning for vehicle routing
PDF
Pricing OTC energy derivatives: credit, debit, funding value adjustment, and wrong way risk
PDF
Novel queueing frameworks for performance analysis of urban traffic systems
PDF
Information design in non-atomic routing games: computation, repeated setting and experiment
PDF
Integrated control of traffic flow
PDF
Adaptive control: transient response analysis and related problem formulations
PDF
Congestion reduction via private cooperation of new mobility services
Asset Metadata
Creator
Meng, Xiangfei
(author)
Core Title
Asymptotic analysis of the generalized traveling salesman problem and its application
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Industrial and Systems Engineering
Publication Date
06/29/2018
Defense Date
05/02/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
asymptotic analysis,carbon footprint,GTSP,OAI-PMH Harvest,Transportation,warehouse random strategy
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Carlsson, John Gunnar (
committee chair
), Dessouky, Maged (
committee member
), Ross, Sheldon Mark (
committee member
), Savla, Ketan (
committee member
)
Creator Email
mengxf33@gmail.com,xiangfem@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-14234
Unique identifier
UC11670348
Identifier
etd-MengXiangf-6362.pdf (filename),usctheses-c89-14234 (legacy record id)
Legacy Identifier
etd-MengXiangf-6362.pdf
Dmrecord
14234
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Meng, Xiangfei
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
asymptotic analysis
carbon footprint
GTSP
warehouse random strategy