Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Dynamic programming-based algorithms and heuristics for routing problems
(USC Thesis Other)
Dynamic programming-based algorithms and heuristics for routing problems
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Dynamic Programming-Based Algorithms
and Heuristics for Routing Problems
by
Michael Poremba
A dissertation submitted to the Faculty of the USC Graduate School
in partial fulfillment of the requirement of the degree of
Doctor of Philosophy
(Industrial and Systems Engineering)
at the University of Southern California
December 2017
Committee:
Alejandro Toriello (Chair)
Maged Dessouky (Chair)
John Carlsson
Ketan Savla
Contents
1 Introduction 1
1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Structure of Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Literature Review 7
2.1 Exact Algorithms for Traveling Salesman Problems . . . . . . . . . . . . . . 7
2.2 Approximate Dynamic Programming and Approximate Linear Program-
ming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Heuristics for Traveling Salesman Problems . . . . . . . . . . . . . . . . . . . 12
2.4 Heuristics for Vehicle Routing Problems . . . . . . . . . . . . . . . . . . . . . 13
2.5 Branch-and-Bound and Graph Search Algorithms . . . . . . . . . . . . . . . 18
2.6 Research Gap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3 Problem Definitions 24
3.1 Traveling Salesman Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Time-Dependent TSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3 Vehicle Routing Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4 Branch-and-Bound Algorithm and Heuristic 33
4.1 Branch-and-Bound Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 Traversing the Search Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3 Modification to a Heuristic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.4.1 Experimental Results for the TSP . . . . . . . . . . . . . . . . . . . . . 45
4.4.2 Experimental Results for the TD-TSP . . . . . . . . . . . . . . . . . . 49
5 Stochastic TSP Heuristic Policy 55
5.1 Dynamic TSP with Stochastic Arc Costs . . . . . . . . . . . . . . . . . . . . . 55
5.2 Price-Directed Heuristic Policy . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.3.1 Test Instance Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.3.2 Discussion of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
i
6 Stochastic VRP Heuristic Policies 69
6.1 VRP with Stochastic Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2 Cyclic Heuristic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.3 Single Resequencing Heuristic . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.4.1 Test Instance Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.4.2 Discussion of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7 Conclusion 103
7.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Bibliography 107
ii
Acknowledgement
To Alejandro Toriello, for teaching me so much, for sticking with me through it all,
and for being the best advisor, friend, and mentor I could ever have asked for,
To Maged Dessouky, for starting me down this path and for being the one with sage
advice, who at every critical juncture, nudged me in the right direction,
To Ketan, John, Fernando, and Suvrajeet, my other committee members, for challeng-
ing me, while at the same time being so patient and understanding,
To the other professors and staff in the ISE department who I crossed paths with, for
the many invaluable lessons and the countless hours of advisement and support,
To Christine, Jae, and all the other students in the office, for the help and camaraderie
and for blazing a trail that the rest of us could follow,
To Kiwi, Nate, Moose, Mark, LB, Blayz, Theo, Aly, and the rest of the boys, for being
the brothers I never had but was fortunate enough to find,
To Victor, Bryson, Kahniley, Derrick, Dan, and the many other friends I made at USC,
for keeping me grounded and for helping me figure out the things in life that really matter,
To Ryan, Dana, Matthias, and Caron, for believing in me, for guiding me, and for
being the examples I aspire to everyday when I come into work,
To my Zappos family, for embracing me from the start and for reminding me there’s
nothing wrong with having a lot of fun and being a little weird,
To Trisha, for being the encouragement, the motivation, and the inspiration I so des-
perately needed, through all the ups and downs, all the good and the bad,
To Grandma and Curt, my Las Vegas family, for challenging me to step outside my
comfort zone now and then, and for when I fall, being the first ones there to help me up,
To Nana, Uncle Rick, Aunt Terry, Aunt Jeanne, Uncle Dennis, and all my other aunts,
uncles, and cousins, for sending me so much love, while cheering me on from afar,
To Sam and Sydney, the best little sisters in the world, for putting up with a brother
like me, for being there for me always, and for being so smart, so talented, so resilient,
and so amazing that I had work harder every single day just to keep up,
To Dad, for teaching me what it means to work hard, for pushing me to think bigger,
be tougher, and never settle, and for sacrificing so much just to make sure we always had
the things we needed to be happy and go after our dreams,
To Mom, for being proud of me, so I could be humble, for taking care of me, so I could
stay focused, for teaching me, so I could grow, and for loving me, each and every day, so
I would never feel discouraged or alone,
And to everyone else who has helped me along the way, I thank you. I know without
your love, friendship, and support, I would not be where I am today. This is for you.
iii
1
Introduction
1.1 Background and Motivation
The traveling salesman problem (TSP) is one of the most important combinatorial
optimization problems studied in the operations research community. It is a
fundamental problem with many applications, most notably in the area of routing, but
also in other contexts including scheduling, manufacturing, genetics, and more [10].
While important, the TSP is also known to be NP-hard, and the worst-case running time
of even the best exact TSP algorithm is exponential. The combination of a high degree of
difficulty and a wide range of applicability has made the TSP one of the most
well-studied problems in the combinatorial optimization field.
The conventional interpretation of the TSP is as follows. A salesman must travel to a
given number of cities, or customers, and then return home. There is a distance, or cost,
associated with traveling between any ordered pair of cities. The objective is to identify
a permutation of cities such that the overall cost of the tour is minimized.
Along with the traditional TSP , a number of other variants of the problem have been
proposed and researched [71]. Some examples of these variants include the
1
time-dependent traveling salesman problem (TD-TSP), the prize-collecting traveling salesman
problem (PC-TSP), and the precedence-constrained traveling salesman problem (PR-TSP). Each
of these problems maintain the essential TSP structure, while also incorporating
additional constraints and conditions which mimic real-life applications.
The time-dependent traveling salesman problem, for example, mirrors the structure
of the standard TSP but with one key difference. The cost to move from one location to
another is not static over time; rather, it can vary depending on when that action is
taken. Adding this element to the problem makes an NP-hard problem even more
difficult to solve, but in doing so, the problem can now better account for real-life
circumstances. For certain applications, it may be unreasonable to assume travel times to
stay constant regardless of the time of day. Certain routes may take more time to
traverse depending on daily traffic patterns or other time-dependent factors.
Researchers have taken many different approaches to solving the TSP and its
variants, including dynamic programming (DP), integer programming (IP), and other
methods. In this dissertation, we consider these approaches, and then ultimately, we
present an exact solution method of our own, which combines ideas from both DP and
IP .
As an alternative to solving the TSP to optimality, we can take a heuristic approach.
One aspect that makes a heuristic approach so appealing for the TSP is that, in practice,
it may be more beneficial to find a near-optimal solution in a few seconds versus an
optimal solution in a few hours. This is especially true for more complicated TSP-related
problems, where exact algorithms are even more limited. Using heuristics for the TSP is
a well-studied line of research, and in that spirit, we contribute a heuristic adaptation of
our exact solution algorithm in this dissertation.
To make the TSP better suited for real-world applications, we can also introduce a
2
stochastic element. In doing so, however, we are adding another level of complexity. For
example, consider the dynamic traveling salesman problem with stochastic arc costs (D-TSP).
In this problem, the travel costs are no longer deterministic and known ahead of time.
Instead, we observe the travel costs out of a city only when we are at that city. This could
resemble the case where a salesman checks the traffic report as he is leaving a location to
help him decide where to drive next. The salesman can follow a number of different
policies when making these decisions, some more effective than others. We present one
such heuristic policy, a dynamic approach based on our deterministic TSP heuristic.
The TSP , though fundamental, is just one example of a routing problem solvable
with combinatorial optimization techniques. A problem more challenging, but perhaps
even more useful in practice, is the vehicle routing problem (VRP). In the simplest,
deterministic version of the problem, a single vehicle is stationed at a depot and a
number of customers are distributed in other locations. Each customer has a certain
amount of demand that the vehicle must satisfy by traveling to that customer, and the
vehicle must plan a series of routes to visit all customers and satsify their demands.
Typically, the vehicle is capacitated in such a way that it cannot fulfill all demand
requirements for all customers in a single TSP tour. Instead, a number of subtours must
be taken, and the ultimate goal is to design these subtours in such a way so that total
travel costs are minimized.
As is the case with the TSP , the VRP is a well-studied problem in the optimization
community in both its traditional deterministic form and in many of its variations. These
include things like the multi-vehicle routing problem where a fleet of vehicles must be
routed, as opposed to just one, and the vehicle routing problem with time windows
where each customer has a certain block of time during the day where they must be
serviced. One of the most relevant extensions is the vehicle routing problem with stochastic
3
demand (S-VRP), in which the demand at each customer is not a known quantity, but
instead, represented by some random variable.
Also like the TSP , the family of vehicle routing problems has a wide range of
practical applications in areas like routing, scheduling, supply chain management, and
more. Unfortunately, the VRP is an even more difficult problem to solve exactly, and
relative to the TSP , solution approaches are limited to instances with fewer customers.
Even heuristic approaches are often restricted in the sizes and types of problems they
can handle in a reasonable amount of time, mostly due to the computational difficulty.
For this reason, a common heuristic approach is to fix an a priori tour for the vehicle to
follow and not allow for any dynamic decision making along the route. Alternative
approaches are willing to endure more burdensome computation times and instead
emphasize dynamic reoptimization.
For all of these routing problems, whether deterministic or stochastic, it is not
enough just to generate solutions at or near optimality. For an approach to have any
utility in practice, it must be able to generate a solution quickly without sacrificing too
much in the way of solution quality. It is for that reason that heuristics are so vital. If a
heuristic can distinguish itself along those two dimensions for the TSP or VRP , the
benefits would extend not only to the field of routing, but to many other areas and
applications as well.
1.2 Research Contributions
To solve these routing problems, we attempt a number of different approaches. For
the TSP , we present a DP-based branch-and-bound algorithm. This algorithm can be
adapted to suit a number of different deterministic TSP problems, including the
4
traditional TSP and the TD-TSP . For each variant, we present a heuristic based on a
branch-and-bound algorithm. We also modify this branch-and-bound approach into a
price-directed heuristic policy, applicable to stochastic routing problems like the D-TSP .
For the S-VRP , we explore the advantages and disadvantages of a reoptimization
policy versus an a priori one, and ultimately, we design a hybrid technique combining
aspects of both. This single resequencing heuristic uses a fixed tour but allows for a
limited amount of reoptimization to occur, and in this way, we are able to improve on
traditional a priori heuristics while still remaining tractable.
In addition to a discussion of the algorithms and heuristics we have designed and
what makes them novel, we provide details on implementation and results from
computational experiments against benchmark instances. With these experiments, we
are able to show the effectiveness of our heuristics and illustrate how we have been able
to make improvements over existing methodology, in terms of both solution quality and
computational requirement.
To summarize our contribution thus far, we have:
1. Developed new DP-based branch-and-bound algorithms for deterministic
traveling salesman problems.
2. Modified these procedures into heuristics and demonstrated they improve upon
other approaches by completing computational experiments.
3. Adapted these ideas to develop a dynamic heuristic policy for the dynamic
traveling salesman problem with stochastic arc costs.
4. Designed a single resequencing heuristic for the vehicle routing problem with
stochastic demand and demonstrated it improves upon other approaches by
completing computational experiments.
5
1.3 Structure of Dissertation
This dissertation is structured in the following way. Chapter 2 contains a
comprehensive literature review of the research relevant to this topic. In Chapter 3, we
formally define the problems being considered, along with all the necessary notation.
Additionally, we formulate relevant problems as a DP and an IP and give related
bounds. Chapter 4 presents a branch-and-bound algorithm to solve traveling saleman
problems, as well as a heuristic adaptation of that algorithm. This section also includes
results from our computational experiments with this heuristic. In Chapter 5, we define
a stochastic TSP , and we outline how the heuristic can be modified into a policy to
generate solutions for it. Again, we include experimental results. Chapter 6 shifts focus
to the vehicle routing problem, and we present a single resequencing heuristic for a
stochastic version of it. Computational experiments are again included; this time to
benchmark the heuristic against a commonly used a priori approach and demonstrate
the value of incorporating limited reoptimization. In the final section, Chapter 7, we
conclude our findings and touch on potential future work that could be done to further
this line of research.
6
2
Literature Review
2.1 Exact Algorithms for Traveling Salesman Problems
The traveling salesman problem has been a well-known combinatorial optimization
problem for the greater part of the past century, and a fundamental problem in the
operations research community, with particular significance in vehicle routing
applications. While in this paper we will consider the TSP in this context, the problem is
also relevant to a number of other areas, including manufacturing and scheduling [10].
Over the years, many different algorithms for the TSP have been proposed, and we will
discuss several of them here.
In the 1960’s, dynamic programming formulations for the TSP appeared in three
different articles [21, 65, 74]. Though exponential, the worst-case running time for DP is
nevertheless the best of any known exact algorithm for the TSP . This curse of
dimensionality makes the application of conventional DP approach impractical for the
TSP and any problem with similar structure [10]. However, DP formulations can be
useful in certain lower bounding procedures [33].
In following sections, we will revisit DP . First, we will discuss it in the context of
7
approximate dynamic programming and approximate linear programming. It will also
appear in our summary of heuristic methods for traveling salesman and vehicle routing
problems.
Researchers have also approached the TSP from an integer programming
perspective. As early as the 1950’s, Dantzig, Fulkerson, and Johnson formulated the TSP
as an IP and presented a cutting plane approach to verify optimality of a solution [37]. In
the early 1970’s, Held and Karp proposed a column-generation and an ascent method to
compute bounds on the symmetric TSP , which they then utilize in a branch-and-bound
procedure to solve the problem optimally [75]. This work was based on the relationship
between the TSP and the minimum spanning tree problem. They would later present an
efficient method to approximate this bound [76].
Though the arc-based IP formulation for the TSP was first proposed in the 1950’s by
Dantzig, Fulkerson, and Johnson [38], Held and Karp worked extensively with the LP
relaxation of this formulation in the early 1970’s. The Held-Karp (H-K) bound takes its
name from them. It is the solution to the LP relaxation of the TSP’s arc-based
formulation [75, 76].
Later in that decade, Grotschel and Padberg investigated a number of inequality
classes for the symmetric TSP and proved several of them to be facet-defining [69, 70].
These inequalities can be utilized in a cutting plane method, similar to the one Dantzig,
Fulkerson, and Johnson proposed nearly two decades earlier [38]. Padberg and Hong
performed a computational study on these types of methods, solving instances of just
over 300 cities [99], and later, Padberg and Rinaldi would implement a branch-and-cut
procedure that handled an instance over 500 cities in size [100]. Currently,
state-of-the-art TSP solvers, like Concorde, can solve instances up to 85,000 cities [10].
In addition to the classical TSP , much work has been done on related problems.
8
Gutin and Punnen describe many of these TSP variations in their book [71]. We discuss a
few of the most relevant ones here and in later sections.
Picard and Queyranne presented a branch-and-bound algorithm for the
time-dependent traveling salesman problem in the 1970’s [103]. Years later, Lucena
would propose a branch-and-bound algorithm for the simpler time-dependent delivery
man problem [89], while Vander Wiel and Sahinidis offered an exact approach for the
TD-TSP based on Bender’s decomposition [129]. Published more recently, a paper by
Abeledo et al describes a branch-cut-and-price algorithm for the TD-TSP [1]. They were
able to solve instances of up to 107 cities.
Balas first defined the prize-collecting traveling salesman problem in the late
1980’s [13], and he would follow up this work with some polyhedral results years
later [14, 15]. The PC-TSP is one of the most extensively studied variations of the TSP ,
and over time it has gone by a number of different names, including the profitable tour
problem, the quota TSP , and the TSP with profits. The orienteering problem is a close
relative to the PC-TSP , in which a path is found rather than a tour. However, it is fairly
straightforward to see how the two problems can be made equivalent [52]. Fischetti and
Toth did some early work studying the linear programming bound to the problem [54],
which would lead into solution procedures in later work.
Branch-and-cut algorithms for the orienteering problem were proposed in the late
1990’s by Fischetti, Gonzalez, and Toth [53] and by Gendreau, Laporte, and Semet [60]
which could solve symmetric instances containing up to 500 cities. These algorithms
both adapted an earlier vertex-insertion/deletion heuristic that had been proposed by
Ramesh and Brown into a branch-and-cut procedure [110].
The precedence-constrainted TSP has a wide variety of applications, particularly in
pickup and delivery problems [49]. In the 1990’s, Balas, Fischetti, and Pulleyblank
9
conducted a study on the PR-TSP polytope, showing two main classes of inequalities to
be facet-defining, under certain assumptions [16]. Others would propose integer
programming-based [11] and dynamic programming-based [94] exact algorithms for the
problem.
Another deterministic variation of the traveling salesman problem is the TSP with
time windows (TW-TSP). In this problem, there are not only travel costs, but travel times
as well. The salesman is required to arrive at each city during a specific time window.
Dumas et al [48] and Mingozzi et al [94] initially explored DP solution methods to the
problem, but more recently, Baldacci, in collaboration with a number of different
researchers, has discussed other algorithms in different papers. One approach is a
branch-and-cut-and-price algorithm [17] to the more general pickup and delivery
problem, while another is a DP algorithm for the TW-TSP [18].
2.2 Approximate Dynamic Programming and
Approximate Linear Programming
When using DP we typically cannot obtain an exact solution, due to the curse of
dimensionality. Solving the TSP with DP is equivalent to solving a shortest path problem
on a network with a number of nodes and arcs exponential with respect to the number
of cities. The DP recursion comes out of the linear programming (LP) dual of this
formulation. Though the problem is exponential in size, if we can restrict the size of the
dual, we have the opportunity to compute valid dual bounds and to use feasible dual
solutions as approximate costs-to-go to find primal solutions.
The previously described method is in fact the approximate linear programming (ALP)
method for DP , which first appeared in a paper by Schweitzer and Seidmann in
10
1985 [115] and then again in the 1990’s in two papers by Trick and Zin [127, 128].
In the years that followed, many others in the operations research field published
work using ALP . De Farias and van Roy proposed a linear programming approach to DP
for stochastic control problems [40,41]. Desai, Farias, and Moallemi built upon that work
with a paper on smoothed ALP for stochastic control problems [43]. Throughout the
2000’s, Adelman also was utilizing ALP to develop price-directed approaches for a
number of inventory and routing applications [3, 4]. This also includes some of his work
in collaboration with Klabjan [6, 7].
ALP can be applied in other areas, not necessarily related to inventory or routing.
Adelman has utilized ALP in the area of revenue management [5], as have Farias and
van Roy [51]. Others have used ALP for commodity valuation [97].
More recently, Toriello applied ALP to obtain a nested family of lower bounds to the
TSP [124], of which a base member of that bound family is shown to be no worse than
the Held-Karp bound. Though the arc-based IP formulation for the TSP was first
proposed in the 1950’s by Dantzig, Fulkerson, and Johnson [38], Held and Karp worked
extensively with the LP relaxation of this formulation in the early 1970’s [75, 76].
In later work, Toriello goes on to show that the ALP bound and the Held-Karp
bound are in fact equivalent, and also to propose ALP bounds for a number of other TSP
problems, including the TD-TSP , the PC-TSP , and the PR-TSP [123]. Toriello, Haskell,
and Poremba also apply these same techniques to a stochastic variant of the TSP [125].
Approximate dynamic programming (ADP) is more general than ALP . The idea behind
ADP is to replace DP value functions with a statistical approximation in order to
overcome the curse of dimensionality. We then step forward in time from an initial state,
making decisions using these statistical value function approximations to guide us [105].
In his book, Powell presents ways to model and solve a large number of industrial
11
problems through the use of ADP [104]. This text builds on his previous joint work with
a number of other researchers using ADP for real-world applications, including resource
allocation [101, 107, 108, 122] and fleet management [64, 120].
2.3 Heuristics for Traveling Salesman Problems
In many real-world applications, we need solutions to the TSP in a relatively short
amount of time. If finding the optimal solution requires too much computation time, we
can instead implement a heuristic algorithm that generates a near-optimal solution
much more quickly. Over the years, a large number of TSP heuristics have been
studied [78, 81]. We present an overview of them now.
Throughout the 1970’s and 1980’s, the TSP heuristic generally believed to be the best
performing was one developed by Lin and Kernighan [87]. In the year 2000, Helsguan
implemented the heuristic in such a way that it was able to find optimal and
near-optimal solutions to instances of size greater than 10,000 cities [77]. The original
Lin-Kernighan heuristic was essentially an extension of the 2-OPT [36] and 3-OPT [86]
heuristics proposed over a decade earlier. The procedure requires what are essentially
tabu lists to be maintained as it iteratively improves solutions. Traditional tabu search
would be proposed over a decade later by Glover [62, 63].
Along with tabu search, the majority of classical metaheuristics have in fact been
applied to the TSP , as Johnson and McGeoch detail in a TSP case study [78]. As Blum
describes it, metaheuristics are not problem specific; rather, they are “high level
strategies for exploring search spaces by using different methods” [29]. Some others that
have been applied to the TSP include simulated annealing [80], variable neighborhood
search [96], ant colony optimization [45], and local search [23].
12
Other heuristics take better advantage of the TSP’s structure [114]. In 1976,
Christofides introduced a construction heuristic which builds a solution from the
minimum spanning tree and a minimum-length matching of the odd-degree vertices of
that tree [32]. Given the triangle inequality, the heuristic has a worst-case performance
guarantee of 3/2. Also in the 1970’s, Karp described another construction approach, a
patching algorithm which patches together the cycles of an optimum assignment to form
a tour [79].
The insertion heuristic proposed by Gendreau, Hertz, and Laporte in the 1990’s is
yet another construction method [56]. An insertion heuristic builds a tour from scratch,
inserting one city at a time. Gendreau and Laporte, collaborating with various other
authors, have adapted insertion heuristics to other versions of the TSP as well, such as
the TSP with time windows [58] and the pickup/delivery TSP [61]. A number of other
researchers have presented insertion heuristics for the TSP and its variants [110,112,113].
Malandraki and Dial offered a DP-based heuristic for the TD-TSP in the 1990’s [92],
while other researchers have developed heuristics for the PC-TSP . Dell’Amico, Maffioli,
and Varbrand offer a heuristic based on a Lagrangian approach [42]. Bienstock et al
propose a different technique, an approximation algorithm built upon the Christofides
algorithm [28].
2.4 Heuristics for Vehicle Routing Problems
First introduced in 1959 by Dantzig and Ramser [39], the vehicle routing problem,
like the TSP , is an important problem in combinatorial optimization. While there are
many versions of it, the essential idea is to determine an optimal way to deliver goods to
a number of different customers with individual demands. We must determine a plan to
13
route a vehicle or a fleet of vehicles to accommodate all customers’ demand at minimum
cost, where some limitation, usually vehicle capacity, prevents a TSP tour from being
feasible. Toth and Vigo describe the problem and its variants in greater detail in their
book [126].
Also like the TSP , there are a wide variety of heuristics which can be applied to the
deterministic problem [34, 82]. Fisher and Jaikumar gave a generalized assignment
heuristic in the 1980’s [55]. Another example is the DP-based heuristic for the
time-dependent VRP presented by Malandraki and Daskin [91]. Metaheuristic
algorithms can also be applied, such as tabu search for the VRP [57] and the VRP with
time windows [35].
In the stochastic demand version of the problem, customer demand is not known
ahead of time, but rather, it is discovered upon the vehicle’s arrival at that customer’s
location. Generally, all that is known in advance of visiting a customer is the probability
distribution of the customer’s demand. This information can be used to inform routing
decisions; though even with it, situations can arise where the demand realization at a
customer exceeds the remaining supply on the vehicle and the vehicle is unable to
satisfy that customer’s full demand. We refer to this as a service or delivery failure. In
the event of a failure, the vehicle must take some recourse action.
Exact approaches to solving the VRP with stochastic demand do exist [31, 83], but
due to the difficulty in solving the S-VRP on medium to large-sized instances, heuristics
are common and in practice quite often preferred [59, 121]. For that reason, we focus our
attention on these heuristic approaches.
One popular heuristic strategy is to fix a sequence in which to visit the customers
and then follow this sequence with no deviation. Traditionally, in the event of a failure,
the recourse action taken would be to travel directly to the depot, restock the vehicle,
14
and then return immediately to the customer at which the failure occurred. These types
of heuristics, where the sequence is fixed ahead of time, are commonly referred to as a
priori approaches.
Bertsimas demonstrated the effectiveness of an a priori heuristic for the S-VRP in a
1992 paper [27]. In this paper, he introduces the cyclic heuristic for the vehicle routing
problem with stochastic demand and demonstrates that an a priori approach can, on
average, perform comparably to a more dynamic approach where the sequence is
reoptimized throughout the course of the route. Heuristics which allow resequencing to
occur generally require a great deal more computational effort than an a priori heuristic,
and thus, where the performance is comparable, the a priori heuristic will generally have
the edge. It may be unrealistic or infeasible to have the vehicle wait on additional
computations to occur while it is out visiting customers. An a priori approach, on the
other hand, provides the vehicle with a complete route at the start with no need for
adjustment later on. For another example of an a priori approach, see [66] where
Goodson, Ohlmann, and Thomas build a heuristic which uses cyclic-order
neighborhood search in a simulated annealing framework to obtain high quality and
often optimal static S-VRP solutions.
Bertsimas, Chervi, and Peterson later explore a number of other a priori techniques
for routing problems, including the S-VRP , and again show them to be viable
alternatives to full reoptimization strategies [26]. They build upon the cyclic heuristic
which uses the solution to the deterministic TSP using the customers in the S-VRP as the
fixed a priori tour. Through use of dynamic programming, they allow the vehicle to
make preemptive resupply decisions along the course of its route. Rather than adhering
to the fixed tour and returning to the depot only in the event of a failure, the vehicle can
now anticipate likely failures and make proactive decisions to travel to the depot in an
15
attempt to avoid them. As failures can be extremely costly, it can often be beneficial to
take this preventive action. By considering the expected costs and likelihoods of future
failures and balancing them with the expected costs of executing a preemptive resupply,
the optimal restocking strategy can be determined. While the vehicle is still following a
fixed a priori tour, by allowing for these preemptive resupplies to be made, the cyclic
heuristic can also be considered a dynamic restocking approach.
The concept of dynamic restocking appeared in literature even before it was
incorporated into the cyclic heuristic for the S-VRP , including papers by Dror, Laporte,
and Trudeau and by Dror, Laporte, and Louveaux [46, 47]. In fact, in these papers the
authors go beyond dynamic restocking and introduce the concept of dynamic
reoptimization, where the ordering of unvisited customers can be optimally resequenced
at any stop on the vehicle’s tour. Yang, Mathur, and Ballou explored the idea of
incorporating the restocking policy into the route design, where the route is crafted with
restocking points in mind so that the expected cost can be minimized even further [130].
In an even earlier paper by Beasley, the concept of incorporating preemptive resupplies
into an a priori tour was explored, albeit for the deterministic version of the
problem [20]. Under this approach, an initial tour is generated and then optimally
partitioned to plan return trips to the depot.
A number of other authors have considered reoptimization procedures for the
S-VRP and made the case for their utility in solving practical routing
applications [106,109]. Secomandi and Margot in [119] propose partial reoptimization, in
which they formulate the problem as a finite-horizon Markov decision process and
compute the optimal policy on a restricted subset of possible states through use of
dynamic programming. Based on the way this restricted subset is selected, different
heuristics can be constructed; they focus on two particular families of heuristics and
16
show on a number of test instances that their top heuristic generates solutions within
10-13% of the lower bound.
Others have extended reoptimization approaches to variants of the S-VRP problem.
Erera, Morales, and Savelsbergh consider the S-VRP with duration constraints and
explore recourse policies for it, where they can efficiently generate solutions which look
to minimize expected travel time while simultaneously satisfying all duration
constraints [50]. Though it is a modified version of the S-VRP , the concepts of a priori
optimization, dynamic restocking, and reoptimization are vital. Similarly, Lei, Laporte,
and Guo study another variant of the problem, the S-VRP with time windows and
model the problem as a stochastic program with recourse [85]. They then use a large
neighborhood search heuristic to generate solutions.
Looking at the multi-vehicle version of the S-VRP , Ak and Erera present a
paired-vehicle recourse strategy where the vehicle routes are coordinated in pairs to
yield higher quality solutions than previous approaches [8]. In this 2007 paper, their
recourse strategy assigns each vehicle an a priori route to follow and then allows for
route adjustments to be made in the event a vehicle encounters a failure. In certain
instances, they demonstrate cost savings can be achieved by not routing a vehicle back to
the previously visited customer post-failure, but rather, keeping that vehicle at the depot
and having another vehicle pick up the remainder of the truncated route after it finishes
servicing its own customers. Given this scheme, the authors utilize a tabu search
heuristic to generate solutions.
Another common methodology for solving the S-VRP is the rollout heuristic, a
sequential approach to optimization similar to dynamic reoptimization. Given an initial
heuristic policy, we evaluate its performance and one step at a time, tune it to output
incrementally better solutions. Bertsekas, Tsitsiklis, and Wu in a 1997 paper present the
17
rollout technique for combinatorial optimization [25], and a few years later, Secomandi
proposed rollout policies for the S-VRP [116, 117]. In [117], the cyclic heuristic of
Bertsimas [27] is used to generate partial routes and improve upon a base sequence in a
rollout fashion. Novoa and Storer later extend this approach by utilizing different base
sequences, a longer look-ahead policy, and incorporating pruning schemes [98].
In a 2013 paper, Goodson, Ohlmann, and Thomas propose another rollout policy
framework applied to the S-VRP with duration limits [67]. They utilize a fixed-route
heuristic and first-improving local search in the rollout framework, while incorporating
the concept of pre- and post-decision states and a dynamic decomposition scheme to
reduce the computational effort required. In a follow-up paper, these same authors
improve upon this heuristic by considering a priori policies which allow for preemptive
resupply and embedding them into a rollout framework [68].
2.5 Branch-and-Bound and Graph Search Algorithms
Branch-and-bound is a widely known search algorithm. Published in the 1960’s, a
paper by Lawler and Wood [84] and a paper by Mitten [95] both provide overviews of
the general branch-and-bound algorithm. The main concept behind branch-and-bound
is to explore all possible feasible solutions to a problem by assigning a value to one
variable at a time. We branch every time a decision is made, and we incorporate bounds
to eliminate possible branching options. The tighter the bounds we have, the less
options we will need to explore.
In 1970, Benichou et al introduced a concept known as pseudocost branching to solve
mixed integer programming problems [22]. The idea behind this technique is to obtain a
quantitative measure of the importance of the different integer variables based on their
18
per unit effect on the objective function. By estimating these pseudocosts, we can
determine which variables will be most critical and elect to branch on them early on in
the process. In satisfiability problems, pseudocosts can be used to reflect the amount of
conflict a variable assignment causes, rather than its effect on an objective function [93].
Strong branching is a similar notion, in which we look to utilize other information to
guide our branching decisions. It was first introduced by Applegate et al in the
1990’s [9], and when following this approach, we test out the candidate variables to
branch on before making a decision. This is typically done by solving LP relaxations to
compute bounds. By incorporating rules such as pseudocost or strong branching, the
performance of branch-and-bound algorithms can be vastly improved [88].
To observe the most dramatic improvement, often it is a hybrid method that is most
effective. One such approach is to use strong branching to initialize values for the
pseudocosts [88]. Achterberg, Koch, and Martin present a method known as reliability
branching, which combines elements of both strategies [2]. This approach uses strong
branching to initialize the pseudocosts for the variables, but it maintains a measure of
how “reliable” each variable’s pseudocost is in relation to an input parameter. If a
variable’s pseudocost is deemed unreliable, strong branching is performed to update the
value.
In addition to branch-and-bound, there are a number of other algorithms we can
use to search graphs. The A* graph search algorithm uses best-first search to find the
minimum cost path from an initial node to a terminal node. There may be more than one
potential terminal node, and the algorithm may find a path to any one of them, so long
as the cost is minimized. This procedure was first introduced by Hart, Nilsson, and
Raphael [72, 73] in the late 1960’s, although in fact, it is an extension of Dijkstra’s
algorithm [44]. In turn, the AO* graph search algorithm is an extension of the A*
19
algorithm.
By utilizing heuristics for the cost functions, the performance of these algorithms
can be greatly increased. The heuristics must be admissible, in that they do not exceed
the minimum cost from the current node to the terminal node. Pearl discusses a number
of heuristic search methods in his book [102]. Mahanti and Bagchi [90] give conditions
for AO* to terminate with admissible solutions, and Bagchi and Mahanti [12] prove that
if using an admissible heuristic function, AO* will yield an optimal solution upon the
algorithm’s completion.
In the past decade, AO* has been applied to stochastic routing problems, including
the stochastic shortest path problem [19]. Recently, Cheong and White presented an
optimal dynamic routing policy for a stochastic version of the TSP , based on the graph
search algorithm, AO* [30]. Cheong and White’s work will be explored in more detail
during the discussion of our branch-and-bound algorithm.
2.6 Research Gap
In this literature review, we have discussed a number of different exact algorithms
to a number of different routing problems, both deterministic and stochastic in nature.
Though we primarily focus on the TSP and its variants, we also consider another
classical routing problem, the VRP . Much work has been done to solve the TSP exactly,
and the research generally follows two paths. One technique is to utilize DP , although
the curse of dimensionality makes this approach largely impractical. IP-based
approaches can bound the TSP , but they can also be adapted into branch-and-cut
procedures to solve the problem to optimality. The difficulty of the problem generally
increases when additional constraints are added, like in the case of the TD-TSP , PC-TSP ,
20
and PR-TSP; nonetheless, many have researched solution methods to these related
problems.
We have placed special emphasis on ALP , as this method overcomes the curse of
dimensionality by restricting the size of the dual problem. In this way, we can compute
valid dual bounds and approximate costs-to-go, forming the basis of a price-directed
tour generation approach. In our work, we have extended this concept to a
combinatorial branch-and-bound algorithm, inspired by the ALP approach. To our
knowledge, such an approach has not been applied to the TSP before in the way we
have, and in fact, it can be extended not just to the TSP but to any TSP variant that has a
DP formulation, as shown in this dissertation where we apply it to the TD-TSP .
Additionally, we have provided an overview of a variety of heuristic algorithms for
the TSP , the VRP , and related variants. These heuristics for the TSP typically fall into the
category of construction- or improvement-themed. Improvement heuristics take existing
feasible tours and incrementally improve them until a certain stopping criteria is met.
Construction heuristics build solutions from scratch, often by a vertex insertion method.
Another approach altogether would be to use a metaheuristic, such as simulated
annealing or some other general heuristic procedure not specific to the TSP .
We see a great need for TSP heuristics in order to accommodate most real-world
applications. Our exact branch-and-bound algorithm can easily be adapted into a
heuristic approach. In addition to an exact solution methodology to the TSP , we
contribute a heuristic based on the same branch-and-bound approach. Most practical
applications contain an element of uncertainty, and for this reason, it is perhaps even
more critical to develop approaches for stochastic routing problems, and thus, we have
also modified our branch-and-bound algorithm into a price-directed heuristic policy for
the D-TSP .
21
Similarly, the VRP is a well-studied but challenging problem with a great number of
practical uses. We have provided an overview of several different solution frameworks
for the stochastic demand version of the problem, including a priori, dynamic
restocking, reoptimization, and rollout heuristics. Oftentimes, these approaches fall on
one of the extremes, being either a completely fixed a priori policy or a completely
dynamic reoptimization routine. In each case, limitations exist. The former can be too
restrictive and the latter runs the risk of being intractable, particularly as the size of the
instance or expected number of necessary return trips to the depot increases beyond a
certain point. To address this, we propose a hybrid approach, which takes an a priori
policy and allows a limited amount of reoptimization to occur. Through a series of
computational experiments, we are able to show this single resequencing heuristic can
improve upon the classic cyclic heuristic and remain computationally feasible where
other heuristics fall short.
Finally, to complete this literature review, we have included a discussion of
branch-and-bound and graph search strategies. A large amount of work has been done
in the area of sophisticated branching strategies for general branch-and-bound
algorithms, and we can incorporate aspects of this research into our procedure. Notions
like pseudocost and strong branching typically apply to IP-based branch-and-bound
algorithms, but we have adapted the idea to our combinatorial approach. Similarly, we
can build on ideas from the graph search literature, including strategies like AO*, and in
this dissertation, we present a search strategy for our branch-and-bound algorithm
well-suited for the routing problems being solved.
In conclusion, after surveying the literature, we believe we have made a
contribution in a number of areas. We have developed a DP-based branch-and-bound
algorithm for the TSP , and we have also developed it into a heuristic. For practical
22
applications, we believe the heuristic to be the preferred option of the two, and we have
adapted it not only to the deterministic TSP , but also to the TD-TSP . We have also
extended our work to the stochastic setting, building a dynamic price-directed heuristic
policy to solve the D-TSP and a single resequencing heuristic for the S-VRP . In both
cases, we show, through a comprehensive set of computational experiments, how
incorporating a controlled amount of dynamism into a more traditional heuristic
framework can produce quality solutions in a tractable way.
23
3
Problem Definitions
3.1 Traveling Salesman Problem
The asymmetric TSP is defined over cities in N[ 0, where the cost from city i to city
j is c
i,j
2R for i, j= 0, . . . ,jNj with i6= j. The objective of the problem is to find a
minimum cost Hamiltonian tour, the minimum cost cycle visiting each city exactly once.
We can model the TSP as a binary integer program [37], in an arc-based formulation
shown here:
min
å
i2N[0
å
j2(N[0)ni
c
i,j
x
i,j
(3.1a)
s.t.
å
j2(N[0)ni
x
i,j
= 1 8 i 2 N[ 0 (3.1b)
å
j2(N[0)ni
x
j,i
= 1 8 i 2 N[ 0 (3.1c)
å
i2U
å
j2(N[0)nU
x
j,i
1 8Æ 6= U ( N[ 0 (3.1d)
x
i,j
0 , x
i,j
2Z 8 i 2 N[ 0 , j 2 (N[ 0)n i (3.1e)
24
A central component to our branch-and-bound algorithm is the ability to efficiently
find lower bounds. One well-known bound to the TSP follows from the arc-based
formulation (3.1). Specifically, by relaxing the integrality condition we can solve the
resulting linear program and obtain a lower bound on the solution to the original
problem. This bound is often referred to as the Held-Karp bound [75, 76].
Notice that in order to solve for the Held-Karp bound in this way, we must be able
to efficiently separate over the subtour elimination constraints (3.1d). These constraints
ensure that the solution is connected, and there are an exponential number of them.
Fortunately, we can implement a min-cut algorithm as a polynomial-time separation
routine [75].
We now present the dual to the LP relaxation of the arc-based formulation (3.1). It
will be useful during the bound inheritance step of our branch-and-bound algorithm.
The formulation is as follows:
max
å
UN
(l
U
+m
U
) (3.2a)
s.t.l
Æ
+
å
UNni
m
U[i
c
0,i
8 i 2 N (3.2b)
å
UNnfi,jg
(l
U[i
+m
U[j
) c
i,j
8 i 2 N, j 2 Nn i (3.2c)
å
UNni
l
U[i
+m
Æ
c
i,0
8 i 2 N (3.2d)
l
U
,m
U
0 8 2jUj n 1 (3.2e)
l,m2R (3.2f)
25
Another way to approach the TSP is through use of dynamic
programming [21, 65, 74]. In the DP formulation of the TSP , a state is given by two pieces
of information: the current city, i, and the subset of remaining cities, U2 N. We can
construct a directed network where the nodes correspond to these states. We can
represent all possible states by the set,
S=f(i, U) : i2 N, U Nn ig[f(0, N),(0,Æ)g.
The arcs in the network would correspond to the set of all possible actions,
A=f((0, N),(i, Nn i)) : i2 Ng[f((i,Æ),(0,Æ)) : i2 Ng
[f((i, U[ j),(j, U)) : i2 N, j2 Nn i, U Nnfi, jgg.
Finally, we can define the arc costs as the cost of the action,
Cost
((i,U[j),(j,U))
= c
i,j
where((i, U[ j),(j, U)) is the action.
A solution to the TSP is then the shortest path on this network from node(0, N) to node
(0,Æ). This is equivalent to the following LP:
min
å
i2N
0
@
c
0,i
x
(0,N),(i,Nni)
+
å
UNni
å
j2U
c
i,j
x
(i,U),(j,Unj)
+ c
i,0
x
(i,Æ),(0,Æ)
1
A
(3.3a)
s.t.
å
i2N
x
(0,N),(i,Nni)
= 1 (3.3b)
x
(0,N),(i,Nni)
å
j2Nni
x
(i,Nni),(j,Nnfi,jg)
= 0 8 i 2 N (3.3c)
å
k2NnfU[ig
x
(k,U[i),(i,U)
å
j2U
x
(i,U),(j,Unj)
= 0 8 i 2 N, Æ 6= U ( Nn i (3.3d)
26
å
k2Nni
x
(k,i),(i,Æ)
x
(i,Æ),(0,Æ))
= 0 8 i 2 N (3.3e)
å
i2N
x
(i,Æ),(0,Æ)
= 1 (3.3f)
x
a
0 8 a 2 A (3.3g)
From shortest path formulation (3.3), we can formulate a dual linear program. By a
standard transformation, we omit the variable corresponding to state(0,Æ), and we are
left with the following dual LP:
max y
0,N
(3.4a)
s.t. y
0,N
y
i,Nni
c
0,i
8 i 2 N (3.4b)
y
i,U[j
y
j,U
c
i,j
8 i 2 N , j 2 Nn i , U Nnfi, jg (3.4c)
y
i,Æ
c
i,0
8 i 2 N (3.4d)
y
0,N
2R; y
i,U
2 R 8 i 2 N, U Nn i (3.4e)
In the dual of the shortest path formulation, a y
i,U
variable can also be interpreted as
a cost-to-go from state(i, U) to the terminal state,(0,Æ). In the optimal solution, y
corresponds to the DP backwards recursion, y
i,U
= min
j2U
fc
i,j
+ y
j,Unj
g.
In our branch-and-bound algorithm, we will make use of these costs-to-go to obtain
bounds and make decisions as we construct tours. Due to the exponential size of the
problem, we cannot realistically expect to solve even the LP dual relaxation exactly.
However, by solving a tractable restriction of the LP , we can instead obtain valid dual
bounds. These dual solutions will serve as approximate costs-to-go.
In his paper, Toriello presents a restriction that yields a family of lower bounds, all
efficiently solvable [124]. This is achieved by applying approximate linear programming
27
techniques, restricting the dual LP through use of a set of basis vectors. In this
dissertation, we implement the base case, in which the following substitution is made:
y
i,U
= p
i,Æ
+
å
k2U
p
i,k
(3.5)
After the substitution, we can formulate the ALP:
max y
0,N
s.t. y
0,N
p
i,Æ
å
k2Nni
p
i,k
c
0,i
8i2 N (3.6a)
p
i,Æ
p
j,Æ
+p
i,j
+
å
k2U
(p
i,k
p
j,k
) c
i,j
8i2 N, j2 Nn i , U Nnfi, jg (3.6b)
p
i,Æ
c
i,0
8i2 N (3.6c)
y
0,N
2R, p 2R (3.6d)
Constraint class (3.6b) is exponential, but we can separate over it in polynomial time
using a greedy procedure. For every ordered pair(i, j), add k 2 U to the sum if
(p
i,k
p
j,k
) > 0.
This bound (3.6) has been shown to be equivalent to the Held-Karp bound [123]. In
practice, we found it was more effective to compute Held-Karp solutions and then
convert them into feasible ALP solutions. We make use of these ALP solutions in our
branch-and-bound algorithm.
This procedure is not restricted only to the classical TSP . The TSP has many
variations [71], and by making a few modifications, the algorithm can be applied to a
number of them. We provide a few examples in the following sections.
28
3.2 Time-Dependent TSP
The TSP with time-dependent costs is identical to the traditional TSP , except for the
definition of the costs. Given a set of cities, N[ 0, the cost from city i to city j depends
additionally on the time period that the move is made. We call this cost c
t
i,j
where
t= 0, 1, . . . ,jNj is a time index corresponding to the time period of the move. For
example, the cost of moving from start city 0 to city j in time period 0 is represented as
c
0
0,j
.
The arc-based formulation for the TD-TSP occurs over a time-expanded network [1].
The x variables are now not only indexed by i and j, but also by t, giving us a cubic
number of variables. We show the IP formulation here:
min
å
i2N[0
å
j2(N[0)ni
å
t2f0,jNjg
c
t
i,j
x
t
i,j
(3.7a)
s.t.
å
j2(N[0)ni
(x
t
j,i
x
t+1
i,j
)= 0 8 i 2 N[ 0 , t2f0, 1, . . . ,jNjg (3.7b)
å
t2f0,1,...,jNjg
å
j2(N[0)ni
x
t
i,j
= 1 8 i 2 N[ 0 (3.7c)
å
t2f0,1,...,jNjg
å
j2(N[0)ni
x
t
j,i
= 1 8 i 2 N[ 0 (3.7d)
å
t2f0,1,...,jNjg
å
i2U
å
j2(N[0)nU
x
t
j,i
1 8Æ 6= U ( N[ 0 (3.7e)
x
t
i,j
0 , x
t
i,j
2Z 8 i 2 N[ 0 , j 2 (N[ 0)n i , t2f0, 1, . . . ,jNjg (3.7f)
In addition to the time-indexed variables and costs, the other difference between the
TD-TSP arc-based formulation (3.7), the arc-based formulation for the TSP (3.1) is
constraint class (3.7b). These are flow balance constraints necessary to ensure that the
new flow into a node i at time t is equal to the net flow out of node i at time t+ 1. In
29
other words, if we enter city i at time t, we must leave city i at time t+ 1.
Just as we use the Held-Karp bound for the time-independent TSP , we can relax the
integrality condition to obtain a similar bound for the TD-TSP . The polynomial-time
separation routine is a min-cut problem, like it is for the classical TSP .
We also include the dynamic programming formulation for the TD-TSP , which
corresponds to the DP formulation for the TSP (3.4). This LP is equivalent to the dual of
the shortest path formulation:
max y
0,N
(3.8a)
s.t. y
0,N
y
i,Nni
c
0
0,i
8 i 2 N (3.8b)
y
i,U[j
y
j,U
c
jNjjUj1
i,j
8 i 2 N , j 2 Nn i , U Nnfi, jg (3.8c)
y
i,Æ
c
jNj
i,0
8 i 2 N (3.8d)
y
0,N
2R; y
i,U
2 R 8 i 2 N, U Nn i (3.8e)
Toriello [123] demonstrates how to apply ALP techniques to efficiently obtain
bounds for this DP formulation (3.8). Below is the ALP formulation:
max y
0,N
(3.9a)
s.t. y
0,N
p
i,0
å
k2Nni
p
i,k
c
0
0,i
8 i 2 N (3.9b)
p
i,0
p
j,0
+p
i,j
+
å
k2U
(p
i,k
p
j,k
) c
jNjjUj1
i,j
8 i 2 N , j 2 Nn i , U Nnfi, jg
(3.9c)
p
i,0
c
jNj
i,0
8 i 2 N
y
0,N
2R; p 2 R (3.9d)
30
Whether or not this bound (3.9) is polynomial-time solvable depends on the cost
structure. In our work, we consider the average-cost TSP, where the goal is to minimize
the average cost of the tour over time. This would apply in a scenario where we are
serving a number of customers and would like to minimize their average waiting time.
For this special case, c
t
i,j
=(jNj t)c
i,j
. The separation problem for this average-cost TSP
is polynomial-time solvable, in a way similar to the routine for the time-independent
TSP . We choose a U to maximize the sum å
k2U
(p
i,k
p
j,k
c
i,j
). This can be done
greedily, by adding k2 U if(p
i,k
p
j,k
c
i,j
)> 0 [123].
Rather than solve this ALP , however, we have found it to be less computationally
intensive to solve the relaxation of the arc-based formulation (3.7). We utilize this
method in our branch-and-bound algorithm implementation.
3.3 Vehicle Routing Problem
We define the VRP [38] over customers in N and depot 0, where the cost to travel
from location i to location j is c
i,j
2R for i, j= 0, . . . ,jNj with i6= j and each customer
has integer valued demand of D
i
8 i 2 N. A single vehicle with full capacity of Q
begins at the depot 0 and must travel to all customers in order to satisfy their demand,
returning to the depot once all customers have been satisfied. The vehicle can return to
the depot between customers as often as needed to replenish its supply up to the max
capacity of Q.
A route or subtour is an ordered subset of customers the vehicle should travel to,
beginning and ending at the depot and visiting each customer in the subset only once. A
feasible solution to the VRP is a collection of routes which, if followed, will leave all
customers with zero remaining demand. In the unsplit delivery version of the problem,
31
each customer may appear in the solution exactly once, meaning its entire demand was
satisfied in a single visit. In the split delivery version of the problem, customers may
appear in the solution multiple times across multiples routes, meaning their demand
was satisfied over multiple visits.
The objective of the VRP is to find a feasible set of routes which satisfies all
customer demand fully and minimizes the total travel cost incurred by the vehicle.
32
4
Branch-and-Bound Algorithm and
Heuristic
In this section, we outline a branch-and-bound algorithm that finds TSP tours.
Branch-and-bound algorithms have been used extensively in the field of combinatorial
optimization, most often as a technique to solve IP formulations [84, 95]. We take a
different approach, as we use a DP-based branch-and-bound algorithm. We construct
the tour dynamically, beginning at the depot. As the algorithm progresses, we build a
search tree to help us decide which city to visit next, and by maintaining this search tree,
we give ourselves the opportunity to backtrack and explore different options.
4.1 Branch-and-Bound Framework
At any given point as we build a tour, we say we are at city i2 N[ 0 with cities
U Nn i left to visit. Initially, i = 0 and U = N. In the end, we return to city 0. A node
s
k
in the search tree contains information including the current city and the remaining
cities left to visit. Also associated with each node are the node’s predecessor,d
(s
k
), the
33
node’s successors,d
+
(s
k
), a bound, Y
s
k
, and the dual solution associated with it, y
s
k
.
Throughout the algorithm, we maintain a list of nodes, L=fs
1
, s
2
, . . .g, the
minimum cost tour found up until that point,
ˆ
T, and the cost of that tour, c(
ˆ
T). We
denote the optimal tour by T
. The algorithm terminates when all nodes have been
processed. At that point,
ˆ
T is known to be an optimal tour, c(T
)= c(
ˆ
T).
Processing a node, s
k
, occurs in two steps. First, we check if the node can be
fathomed (removed from L without processing). We accomplish this by comparing the
node’s bound, Y
s
k
, to the cost of the best tour found so far, c(
ˆ
T). If Y
s
k
c(
ˆ
T), remove s
k
from L. No tour going through s
k
can have cost less than the cost of
ˆ
T; therefore, it is
unnecessary to further explore this branch of the search tree. Otherwise, we move to the
second step of node processing, node splitting.
In the pseudocode for this algorithm, we refer to a number of general functions.
One of these functions is GetTour(), which takes a given node and returns the partial or
completed tour associated with it. Other functions will be addressed later on.
Node splitting is the procedure we follow to generate successor nodes to the current
node, s
k
. Typically in branch-and-bound algorithms, we only split the node into two
successors. However, in this algorithm, we split the node into O(n) successors, one for
each remaining city left to visit. This correlates to the decision we have to make when
constructing the tour. If we are at city i and have cities U remaining, we must choose a
city j2 U to travel to next. Each successor represents a different choice.
In our algorithms, we refer to the node splitting step by the function NodeSplit().
This function takes an input node s
k
and outputs a list of the successors nodes of s
k
.
During node splitting, we create new nodes and add them to L. To do this, we must also
assign a bound to that node. Consider node s
k
, where the current city is i, the remaining
cities U, and the partial tour is T
s
k
, obtainable by tracing back through the node’s
34
predecessors. The bound is the sum of two quantities: the cost of the tour up until that
node, c(T
s
k
), and the approximate cost-to-go of being at that node and having to
complete the tour from there, y
i,U
.
Y
s
k
= y
i,U
+ c(T
s
k
) (4.1)
One option is to use the dual solution from the node’s predecessor to derive a
bound for the new node. Another option is to compute a new bound, by solving a
relaxation of the Hamiltonian path problem of moving to that node and then completing
the tour from there. This can be equivalently formulated as another TSP .
In order to inherit the bound from the node’s predecessor, we perform a simple
calculation. In his paper, Toriello [124] describes a few different ways to accomplish this.
First, we consider the case where we use the ALP solution to derive a bound and
solution for the successor node:
Example: Bound Inheritance from an ALP Solution
Given a node s
k
, its bound y
q,U
, and its ALP solutionp.
Let s
ˆ
k
be a successor node to s
k
in need of a bound y
ˆ q,
ˆ
U
, and a solution ˆ p.
ˆ p
i,j
p
i,j
8 i2
ˆ
U, j2f
ˆ
Un ig[Æ
y
ˆ q,
ˆ
U
ˆ p
ˆ q,Æ
+ å
k2
ˆ
U
ˆ p
ˆ q,k
Return y
ˆ q,
ˆ
U
and ˆ p as the bound and solution, respectively, for node s
ˆ
k
.
Now consider the situation where we must derive a bound and solution from a dual
H-K solution (3.2), rather than an ALP solution:
35
Example: Bound Inheritance from a Dual H-K Solution
Given a node s
k
, its bound y
q,U
, and its dual H-K solution(l,m).
Let s
ˆ
k
be a successor node to s
k
in need of a bound y
ˆ q,
ˆ
U
, and a solution(
ˆ
l, ˆ m).
ˆ
l
i,j
l
i,j
8 i2
ˆ
U, j2f
ˆ
Un ig[Æ
ˆ m
i,j
m
i,j
8 i2 Nn
ˆ
U, j2fNnf
ˆ
U[ igg[Æ
y
ˆ q,
ˆ
U
ˆ
l
ˆ q,Æ
+ ˆ m
ˆ q,Æ
+ å
k2
ˆ
U
ˆ
l
ˆ q,k
+ å
k2Nn(
ˆ
U[i)
ˆ m
ˆ q,k
Return y
ˆ q,
ˆ
U
and(
ˆ
l, ˆ m) as the bound and solution, respectively, for node s
ˆ
k
.
One other alternative is to convert the dual H-K solution into an ALP solution, and
then compute the bound as shown in the first method. Toriello demonstrates how to
perform this transformation in his paper [124]. We include the procedure here:
Example: Converting from a Dual H-K Solution to an ALP Solution
Given a dual H-K solution(l,m).
Letp be the corresponding ALP solution.
p
i,Æ
å
WNni
l
W[i
p
i,j
å
WNnfi,jg
l
W[j
+m
W[j
jWj+1
Return y
ˆ q,
ˆ
U
and(
ˆ
l, ˆ m) as the bound and solution, respectively, for node s
ˆ
k
.
We denote the bound assignment step by the functions InheritBound() and
ComputeBound(). Each function takes a node as input and then outputs the bound for
36
that node. That bound may either be inherited from the parent node, or it may be
determined by solving an LP .
If when computing a bound to a new node, the solution to the relaxation is integral,
then we have in fact solved the IP as well. We have thus identified an optimal way of
completing the tour from that node onward. There is no need to add to node to the
search tree. Instead, we can use the integer solution to obtain a completed tour.
Once we have a completed tour, whether from an integer solution as just described
or from reaching a node at the bottom of the search tree with no cities left to visit, we
compare the current tour, T
s
k
, to
ˆ
T. If c(T
s
k
)< c(
ˆ
T), we replace
ˆ
T with T
s
k
. We have
identified a solution which improves upon the previous best.
After processing a node, remove it from L, and move to the next node in the list.
When no nodes remain in L, we will have fully explored the search tree.
ˆ
T is in fact an
optimal tour, T
.
This general algorithm draws inspiration from a number of classical
branch-and-bound procedures. The bounds associated with each node can be
considered as a form of pseudocosts. Traditionally when implementing a pseudocost
approach, each integer variable has a pseudocost associated with it that is a quantitative
measure of its importance. For example, it may be a measure of its effect on the objective
function [22] or of the number of assignment conflicts it will cause [93]. Our
“pseudocosts” are in fact bounds corresponding to the estimated cost of choosing to visit
a particular city next, given the group of cities remaining on the tour. We use these
bounds to guide our decisions throughout the algorithm, just as pseudocosts do.
Strong branching is another classical branch-and-bound technique, often used in
conjunction with pseudocost branching [2, 88]. In strong branching, before deciding on
which integer variable to branch, we do a preliminary test, typically by solving an LP
37
relaxation to obtain a bound [9]. In the algorithm we present, we also solve LP’s to
compute bounds. The information obtained from these computations influence which
branches of the tree we explore.
Algorithm 1 A General Branch and Bound Algorithm
ˆ
T =Æ
c(
ˆ
T)=¥
L fs
0
g
Y
s
k
= ComputeBound(s
k
)
m= 0
while L6=fÆg do
choose s
k
2 L
if Y
s
k
c(
ˆ
T) then
L Lnfs
k
g
else
l
k
NodeSplit(s
k
)
for i = 1, . . . ,jl
k
j do
Z
k
= ComputeBound(l
k
[i]) (or InheritBound(l
k
[i]))
if Z
k
< c(
ˆ
T) then
s
m+1
= l
k
[i]
Y
s
m+1
= Z
k
T GetTour(s
m+1
)
if T is not a complete tour then
L fs
m+1
g[ L
m m+ 1
else if T is a complete tour and c(T)< c(
ˆ
T) then
ˆ
T = T
end if
end if
end for
L Lnfs
k
g
end if
end while
return
ˆ
T
38
4.2 Traversing the Search Tree
There are a number of policies we may follow as we traverse the search tree. Two
natural possibilities would be to use either breadth-first search (BFS) or depth-first
search (DFS). With BFS, we have the advantage of exploring more branches of the tree
earlier on in the algorithm. However, by the nature of this search, we will spend a lot of
time processing nodes at the top of the tree before ever reaching the bottom nodes. This
makes the probability of fathoming nodes less likely, since we will not be updating
ˆ
T
very often.
In contrast, with DFS we will reach the bottom of the tree in no more than n steps,
giving us a completed tour in as few steps as possible. This gives us a chance to update
ˆ
T more frequently than with BFS, since we are emphasizing depth over breadth in our
search. However, we may explore many unfruitful branches.
Another search strategy would be a greedy, or best-first-search, approach. Once a
node has been processed, choose the node with the minimum bound to process next.
This is precisely the A* graph search algorithm first proposed by Hart, Nilsson, and
Raphael [72, 73]. The benefit of this approach is that we will always move to the most
appealing node in terms of its bound. We will not waste time exploring nodes with poor
bounds. However, we lose the n-step guarantee in regard to the maximum number of
steps before finding a completed tour that DFS afforded us. Therefore, we may not be
updating
ˆ
T and fathoming nodes as frequently as we would like.
To address this concern, we traverse the search tree by utilizing a greedy “plunging”
strategy. That is, we start from an initial node and perform a “plunge” down to the
bottom of the tree, following a greedy DFS policy. Once we reach the bottom of the tree,
we then identify a node, s
k
, such that the bound is minimum, Y
s
k
. We move to that node,
and from there, we execute another “plunge” down the tree. This technique allows us to
39
obtain a completed tour every n steps at most. This assures us the opportunity to update
ˆ
T regularly, and the possibility of lower c(
ˆ
T) values creates the possibility for more
nodes to be fathomed. By following a greedy strategy after the “plunge,” we allow our
bounds to inform our decision about what part of the tree to explore next.
In their 2012 paper, Cheong and White propose a real-time tour-construction
algorithm for a stochastic version of the TSP , where arc costs are random [30]. They
discuss an AO* algorithm, using an assignment bound as a heuristic function, and while
it is a best-first-search strategy, nodes are weighted based on their depth. Nodes nearer
to the start node are less likely to be chosen than nodes further down the tree. Because of
the stochastic arc costs, another dimension is added to the state space, and the size of the
search tree increases significantly. Nonetheless, Cheong and White’s work with the
stochastic TSP may provide us with some useful insight to our algorithm for the
deterministic problem. For one, their work appears to reaffirm our notion to combine the
benefits of a best-first-search strategy with DFS approach.
Regardless of the specific way in which we move about the search tree, there may be
times when we encounter a node with a redundant bound. That is, the subproblem we
solve to obtain the bound for this node is identical to a subproblem already solved at a
previously visited node. Two nodes will have an identical bound if the current city i and
the remaining cities U2 N are the same. However, the paths leading up to the two
nodes will be distinct permutations of the set of cities NnfU[ ig.
For this reason, we maintain a list of computed bounds and solutions for each
subproblem we encounter. Before computing a bound for a new node, we check the list
to see if the work has already been done. If we are using inherited bounds, we compare
the inherited bound from the node’s parent to the best inherited bound for that
subproblem that we have found up until that point. We choose to use the tighter bound
40
and store it in the list.
On the following pages, we present a graphic illustration of the search procedure.
We illustrate how the algorithm progresses on a sample four-city instance by including a
step-by-step depiction of the nodes visited in the search tree during a single plunge.
0,f1,2,3g
1,f2,3g
2,f3g
3,fÆg
3,f2g
2,fÆg
c
0,1
+ y
1,f2,3g
2,f1,3g
1,f3g
3,fÆg
3,f1g
1,fÆg
c
0,2
+ y
2,f1,3g
3,f1,2g
1,f2g
2,fÆg
2,f1g
1,fÆg
c
0,3
+ y
3,f1,2g
Figure 4.1: Choose next city to visit using approx. costs-to-go.
0,f1,2,3g
1,f2,3g
2,f3g
3,fÆg
3,f2g
2,fÆg
2,f1,3g
1,f3g
3,fÆg
c
0,2
+ c
2,1
+ y
1,f3g
3,f1g
1,fÆg
c
0,2
+ c
2,3
+ y
3,f1g
3,f1,2g
1,f2g
2,fÆg
2,f1g
1,fÆg
Figure 4.2: Continue to “plunge” down the tree. Compute bounds at nodes.
41
0,f1,2,3g
1,f2,3g
2,f3g
3,fÆg
3,f2g
2,fÆg
2,f1,3g
1,f3g
3,fÆg
c
0,2
+ c
2,1
+ c
1,3
+ c
3,0
3,f1g
1,fÆg
3,f1,2g
1,f2g
2,fÆg
2,f1g
1,fÆg
Figure 4.3: Once we have a completed tour, we can use it to fathom other nodes.
0,f1,2,3g
1,f2,3g
2,f3g
3,fÆg
3,f2g
2,fÆg
c
0,1
+ y
1,f2,3g
2,f1,3g
1,f3g
3,fÆg
3,f1g
1,fÆg
c
0,2
+ c
2,3
+ y
3,f1g
3,f1,2g
1,f2g
2,fÆg
2,f1g
1,fÆg
c
0,3
+ y
3,f1,2g
Figure 4.4: Find a new node to “plunge” from. Check integrality as we go down.
42
4.3 Modification to a Heuristic
In order to guarantee optimality, we must process or fathom every node in the
search tree through the procedure mentioned previously. However, computation times
may be quite large as we are dealing with a number of nodes exponential relative to the
size of the problem. We can significantly reduce the size of the search tree, and the total
computation time, if we can increase the number of nodes that we fathom. One way to
achieve this is by adapting the algorithm into a heuristic by relaxing our fathoming rules.
To accomplish this, we incorporated a modification to the node processing step.
Specifically, when splitting a node, s
k
, where the current city is i and the remaining cities
are U, we first assign a temporary bound to the new nodes by using the inherited dual
solution from s
k
. We then sort the new nodes in increasing order of these temporary
bounds. We next begin to compute updated bounds for each node, in this order.
Call
ˆ
Y the computed bound for the first node, and set a threshold parameter,r 1.
Say the temporary bound for node s
q
is
˜
Z
s
q
. If
˜
Z
s
q
r
ˆ
Y, we fathom node s
r
and do not
add it to the list. We make this comparison for all of the remaining nodes in the sorted
list, and then move to the next node. Update
ˆ
Y so that it is always the minimum
computed bound. At the end, perform the same comparison a final time using the
nodes’ computed bounds.
The idea of this procedure is to prevent us from adding new nodes that do not look
appealing. By settingr to its minimum value of 1, we only will only split a node in cases
where there is a tie for the minimum computed bound. Asr is increased up to¥, we
allow more nodes to remain in the tree.
43
Algorithm 2 A General Branch and Bound Algorithm using a Threshold Heuristic,r
L fs
0
g
Y
s
k
= ComputeBound(s
k
)
m= 1
while L6=fÆg do
choose s
k
2 L : k is minimum
ˆ
Y =¥
if Y
s
k
c(
ˆ
T) then
L= Lnfs
k
g
else
l
k
NodeSplit(s
k
)
for i = 1, . . . ,jl
k
j do
˜
Z
k
[i]= InheritBound(l
k
[i])
end for
sort l
k
and
˜
Z
k
in increasing order of the inherited bounds,
˜
Z
k
for i = 1, . . . ,jl
k
j do
if
˜
Z
k
[i] r
ˆ
Y then
Z
k
[i]= ComputeBound(l
k
[i])
if Z
k
[i]<
ˆ
Y then
ˆ
Y = Z
k
[i]
end if
else
l
k
= l
k
nfl
k
[i]g
end if
end for
if
ˆ
Y<
ˆ
T then
for i = 1, . . . ,jl
k
j do
if Z
k
[i] r
ˆ
Y then
s
m+1
= l
k
[i]
Y
s
m+1
= Z
k
[i]
T = GetTour(s
m+1
)
if T is not a complete tour then
L fs
m+1
g
m= m+ 1
else if T is a complete tour and c(T)< c(
ˆ
T) then
ˆ
T = T
end if
end if
end for
end if
L= Lnfs
k
g
end if
end while
return
ˆ
T
44
4.4 Experimental Results
In the following section, we present experimental results for our branch-and-bound
algorithm and heuristic. This includes results for the traditional TSP and the TD-TSP .
Test instances for these experiments are TSP instances from TSPLIB [111], ranging up to
107 cities. Experiments for the TSP were performed using the LP solver CPLEX 9.0 on a
Dell workstation with dual Intel Xeon 3.2 GHz processor and 2 GB RDRAM.
Experiments for the TD-TSP were performed using the Georgia Institute of Technology
ISyE computational Linux cluster using CPLEX 12.
4.4.1 Experimental Results for the TSP
We include four tables of data on our experiments using our branch-and-bound
heuristic on the traditional TSP . The first two tables contain results of experiments using
the heuristic where bounds are computed for every node by solving an LP subproblem,
as described in the previous section. The other two tables reflect experiments where all
bounds are inherited from the initial dual solution. We also vary the heuristic threshold
parameterr.
The lower the value ofr, the more nodes we discard during the algorithm. This
increases the chance that we discard a node through which an optimal tour runs,
decreasing the chance that our algorithm terminates with an optimal tour. In the extreme
case wherer=¥, we have the exact algorithm. In the case wherer= 1.0, we essentially
have a price-directed, single-pass approach. We include results from experiments on the
TSP wherer= 1.0 andr= 1.1.
In these tables, the first few columns contain information about the exact algorithm,
including the optimal value, Opt., the number of nodes processed, Nodes, and the
45
runtime, Runtime. The remainder of the table contains information about the heuristic,
including the best tour cost found by the heuristic, Heur., a ratio of this value to the
optimum Heur./Opt., the number of nodes processed Nodes, and the runtime, Runtime.
We can draw a few conclusion from looking at the results. As expected, there is a
trade-off that occurs when choosing between computing or inheriting bounds. When
inheriting bounds, we generally consider a much larger number of nodes and terminate
in a much shorter amount of time. However, the quality of the solution is typically
worse.
For most instances, when we inherited the bounds the runtime did not exceed a
second. Conversely, when we compute the bounds, the heuristic could require minutes
or hours to terminate. The other aspect to this is that in the former case the output tour
can have a cost sometimes 20 30% over the optimal tour cost. When we perform
bound computation, the heuristic always finds a solution within 1% of the optimum.
Finally, varyingr did not seem to have a dramatic effect on the solution quality,
particularly when bound computation is being done. It does seem to influence the
runtime though. The higher value ofr leads to more nodes being considered and thus,
the runtime generally increases.
46
Opt. Nodes Runtime (s).
gr17 2085 5 4
gr21 2707 5 15
gr24 1272 12 41
ftv33 1286 2 168
ftv35 1473 66 417
ftv38 1530 88 692
dantzig42 699 73 6515
swiss42 1273 49 7009
ftv44 1613 90 2053
ry48p 14422 73 11378
Figure 4.5: TSP Exact Algorithm
Opt. Heur. Heur./Opt. Nodes Runtime (s).
gr17 2085 2088 100.14% 1 1
gr21 2707 2707 100.00% 1 1
gr24 1272 1272 100.00% 2 2
ftv33 1286 1286 100.00% 1 4
ftv35 1473 1473 100.00% 60 86
ftv38 1530 1530 100.00% 144 144
dantzig42 699 700 100.14% 31 683
swiss42 1273 1273 100.00% 53 797
ftv44 1613 1615 100.12% 210 561
ry48p 14422 14507 100.59% 292 3904
Figure 4.6: TSP Heuristic with Computed Bounds (r= 1.00)
Opt. Heur. Heur./Opt. Nodes Runtime (s).
gr17 2085 2088 100.14% 1 1
gr21 2707 2707 100.00% 1 1
gr24 1272 1272 100.00% 4 18
ftv33 1286 1286 100.00% 1 5
ftv35 1473 1473 100.00% 58 273
ftv38 1530 1530 100.00% 131 582
dantzig42 699 700 100.14% 32 6771
swiss42 1273 1273 100.00% 109 15645
ftv44 1613 1615 100.12% 233 3422
ry48p 14422 14507 100.59% 1093 88983
Figure 4.7: TSP Heuristic with Computed Bounds (r= 1.10)
47
Opt. Heur. Heur./Opt. Nodes Runtime (s).
gr17 2085 2187 104.89% 27 < 1
gr21 2707 3098 114.44% 34 < 1
gr24 1272 1553 122.09% 22 < 1
ftv33 1286 1683 130.87% 32 < 1
ftv35 1473 1791 121.59% 34 < 1
ftv38 1530 1778 116.21% 71 < 1
dantzig42 699 954 136.48% 98 < 1
swiss42 1273 1601 125.77% 188 < 1
ftv44 1613 2014 124.86% 89 < 1
ry48p 14422 16757 116.19% 1246 < 1
Figure 4.8: TSP Heuristic with Inherited Bounds (r= 1.00)
Opt. Heur. Heur./Opt. Nodes Runtime (s).
gr17 2085 2187 104.89% 89 < 1
gr21 2707 3098 114.44% 129 < 1
gr24 1272 1553 122.09% 34 < 1
ftv33 1286 1501 116.72% 782 < 1
ftv35 1473 1756 119.21% 515 < 1
ftv38 1530 1732 113.20% 8361 2
dantzig42 699 838 119.89% 3567 1
swiss42 1273 1400 109.98% 4720 1
ftv44 1613 1980 122.75% 14619 5
ry48p 14422 15787 109.46% 85655 82
Figure 4.9: TSP Heuristic with Inherited Bounds (r= 1.10)
48
4.4.2 Experimental Results for the TD-TSP
We have also performed experiments using our branch-and-bound heuristic on the
TD-TSP . Test instances for these experiments are again TSP instances adapted from
TSPLIB [111]. To conduct a comparison, we used instances tested by Abeledo et al. in
their recent paper [1]. We have categorized these problems into easy and difficult
instances, based on the runtimes they report. We define an easy instance to be one with a
runtime reported by Abeledo et al. of under 6 hours; difficult instances had reported
runtimes of greater than 10 hours. In cases where optimal solutions were not found, best
known bounds are reported.
We solve a version of the TD-TSP known as the average-cost TSP. If you consider the
costs as times and the cities as customers, the objective of the average-cost TSP is to
minimize the average waiting time for each customer. In other words, we are
minimizing the sum of the costs along the path from the depot city to every other city on
the tour including the return trip to the depot. Thus, we can write these costs as
c
t
ij
=(jNj+ 1 t)c
ij
, where c
ij
is the fixed cost of moving from i to j.
We solve an initial LP relaxation at the root node and then use bound inheritance
throughout the remainder of our search. We also performed experiments where bounds
were recomputed at nodes other than the root, but in these cases, the increase in runtime
was severe. For this reason, we rely purely on bound inheritance.
The bound we compute at the root node is the assignment bound for the TD-TSP
(3.7) without the subtour elimination constraints. This decision was made in the interest
of decreasing the runtime. The assignment bound, though weaker than the Held-Karp
bound, can be solved in significantly less time. In the table below, we show the results of
these assignment bound solves compared to the best known bound for each instance [1].
Tight bounds are listed in bold.
49
Best Found Assignment Bound % Below Best Runtime (s.)
st70 20557 13805.48 32.84 1202
eil76 17976 15347.13 14.62 1406
rd100 340047 242879.23 28.57 6026
kroB100 986008 614783.20 37.65 4342
lin105 603910 336972.17 44.20 7447
pr107 2026626 1213256.07 40.13 5575
pr76 3455242 2187939.48 36.68 1612
gr96 2097170 1454000.49 30.67 2825
rat99 57986 45982.03 20.70 6170
kroA100 983128 633164.25 35.60 4503
kroC100 961324 598284.10 37.76 3202
kroD100 976965 587324.81 39.88 4781
kroE100 971266 615928.09 36.59 4751
Table 4.1: TD-TSP Assignment Bounds
The remaining tables depict the results of the corresponding tour generation
experiments. Each table contains results across all instances for a certain value ofr
(ranging from 1.00 to 1.01). We show the cost of the best tour found by our heuristic and
related it to the value of the best bound. We have also included the total number of
nodes explored in the search tree and the runtime of the branch-and-bound procedure
(in addition to the time required to compute the bound at the root).
The best tour found for each instance across all experiments is shown in bold (ties
broken by shorter runtime), while experiments that failed to conclude within a
predetermined time limit have been italicized. For easy instances, we set the time limit to
be 6 hours of additional runtime. For difficult instances, we allow 12 hours. In our tables,
easy instances are grouped together at the top, and difficult instances at the bottom.
50
Tour Cost % Over Bound Total Nodes Addl. Runtime (s.)
st70 23676 15.17 126 0
eil76 19278 7.24 287 0
rd100 394888 16.13 148 0
kroB100 1186690 20.35 161 0
lin105 709260 17.44 183 0
pr107 2154440 6.31 1864 2
pr76 3746270 8.42 149 0
gr96 2397660 14.33 219 0
rat99 60947 5.11 1521 0
kroA100 1122570 14.18 166 0
kroC100 1071590 11.47 99 0
kroD100 1131010 15.77 99 0
kroE100 1151860 18.59 255 0
Table 4.2: TD-TSP Tour Generation (r= 1.0000 )
Tour Cost % Over Bound Total Nodes Addl. Runtime (s.)
st70 23676 15.17 449 0
eil76 19117 6.35 1568 0
rd100 385216 13.28 7731 4
kroB100 1048810 6.37 1895 0
lin105 695440 15.16 1704 0
pr107 2154440 6.31 9593 2
pr76 3715640 7.54 163 0
gr96 2397660 14.33 18735 4
rat99 60947 5.11 1521 0
kroA100 1122570 14.18 12043 11
kroC100 1034920 7.66 5511 0
kroD100 1131010 15.77 228 0
kroE100 1091560 12.39 33346 10
Table 4.3: TD-TSP Tour Generation (r= 1.0025 )
51
Tour Cost % Over Bound Total Nodes Addl. Runtime (s.)
st70 23462 14.13 14916 15
eil76 19117 6.35 12079 0
rd100 394888 16.127 878675 21565
kroB100 1044820 5.96 60266 21
lin105 636408 5.38 719470 16532
pr107 2154440 6.31 139773 715
pr76 3704740 7.22 1148 0
gr96 2397660 14.33 1108943 42781
rat99 60947 5.11 1521 0
kroA100 1057060 7.52 788235 16136
kroC100 1034920 7.66 108354 797
kroD100 1131010 15.77 5615 0
kroE100 1065160 9.67 1305971 36227
Table 4.4: TD-TSP Tour Generation (r= 1.0050 )
Tour Cost % Over Bound Total Nodes Addl. Runtime (s.)
st70 21986 6.95 268337 2757
eil76 19117 6.35 385362 6164
rd100 394888 16.127 1107635 21547
kroB100 1069850 8.503 895795 20072
lin105 673307 11.49 755739 21260
pr107 2154440 6.31 687396 21554
pr76 3623720 4.88 117671 186
gr96 2397660 14.33 756835 43076
rat99 60947 5.11 1521 0
kroA100 1122570 14.18 1081437 35300
kroC100 1071590 11.470 889537 42821
kroD100 1131010 15.77 1533724 46862
kroE100 1106840 13.96 1543946 43122
Table 4.5: TD-TSP Tour Generation (r= 1.0100 )
52
As is depicted in Table 4.1, in most cases we can compute assignment bounds for
these instances in between roughly 20 and 90 minutes. The lone exception is lin105,
which took approximately 2 hours to complete. Despite the fact that these bounds are
between 15% and 45% below the best known bound, they are still effective as costs-to-go
during our tour generation procedure. Furthermore, we notice that for the most part, the
better the bound used, the better the solutions found during tour generation. The
remainder of this section summarizes the results of these experiments.
For small values ofr, the additional runtime required by the tour generation
procedure was inconsequential. However, asr was increased, the additional runtime
increased dramatically, forcing timeouts. The reason for this is reflected in the node
count. Higher values ofr led to an explosion in the number of nodes explored, and thus,
the algorithm takes much longer to terminate. This validates our reasons for introducing
ther threshold parameter. By manipulating the value ofr, we can influence the amount
of time we spend exploring the search tree overall, and instead focus our exploration on
the most appealing branches.
While we were able to dramatically affect the runtime, we observed no clear trend
in terms of which value ofr produced the highest quality solutions. The bestr to use
seems to be highly dependent on the instance. In some cases, we find a good solution
with a low value ofr and then increasing it only inflates the runtime without leading to
a better solution. This suggests that for some instances, a pure price-directed policy
performs quite well in comparison to an expanded search. In other cases, by increasing
r, the algorithm is able to find improved tours in parts of the tree previously unexplored.
The price-directed policy is improved upon by our heuristic.
Quite often, we are able to find tours within about 5% of the best known bound.
Most notable among these results are the experiments for difficult instances pr76 and
53
rat99. For pr76, we can find a solution 4.88% above optimal in under 30 minutes total
runtime. For rat99, we find a solution 5.11% above the best known bound in under 2
hours total runtime.
Our computational experiments have shown that our branch-and-bound algorithm
can be an effective tour generation heuristic for the TD-TSP . By varying the value of ther
parameter, we were able to find a solution within between 4.88% and 7.66% of the best
known bound for 11 of the 13 instances tested. However, the main advantage of this
approach is the improvement in runtime. For the instances we categorized as easy, we
obtained solutions in no longer than 6 hours. For difficult instances, runtimes were at
most 12 hours, although in several cases, we were able to find good solutions in a matter
of minutes, not hours.
One of the greatest strengths of our branch-and-bound TSP heuristic is its
versatility. We have the freedom to use a wide variety of bounds in the framework of
this algorithm, and these bounds can be unique to the structure of the problem. Also, by
constructing tours dynamically, we can account for additional constraints posed by
variants of the TSP , particularly ones that can be formulated as a dynamic program. Our
experiments have demonstrated this approach to be effective as a heuristic algorithm for
the TD-TSP in particular. We can envision adaptations of the heuristic for other traveling
salesman and vehicle routing problems. Future work might include the development
and testing of such heuristic adaptations.
54
5
Stochastic TSP Heuristic Policy
Our methodology can also be used for stochastic routing problems. In related work,
we implemented a branch-and-bound heuristic policy to generate solutions for the
dynamic TSP with stochastic arc costs [125]. In the following section, we will introduce
this problem and then discuss how our methodology can can be adapted into a heuristic
policy for this stochastic application.
5.1 Dynamic TSP with Stochastic Arc Costs
We define the dynamic TSP to be identical to the deterministic TSP , except that the
arc costs are random. When constructing a tour dynamically, the arc costs out of a node
are realized upon arrival to that node. The objective is to determine a policy which
minimizes the expected cost of the tour. Given the current city, the remaining cities, and
the realization of outgoing arc costs, the policy will choose which city to visit next.
We represent the outgoing random cost vector at city i as C
i
=(C
i,j
: j2 N[ 0n i).
While arc costs out of the same city may be correlated, all C
i
are pairwise independent
and do not depend on remaining cities. LetC
i
R
n
be the support set of C
i
, which we
55
assume is compact.
The dynamic TSP can be formulated as a dynamic program similar to the
deterministic TSP . A modification must be made to the state space, as now a state is
given by three pieces of information. As before, a state indicates the current city, i, and
the subset of remaining cities, U2 N. But now it also must include the realized vector of
outgoing costs. We denote the state space as the following,
S=f(i, U, c
i
) : i2 N, U Nn i, c
i
2C
i
g[f(0, N, c
0
) : c
0
2C
0
g[f(0,Æ)g.
When we are in statef(i, U, c
i
), the set of possible actions is defined as the set of
cities we have not yet visited, U. The cost of moving to city j2 U is simply c
i,j
, which
was realized upon arriving to city i.
Call y
i,U
(c
i
) the expected cost-to-go from state(i, U, c
i
). This yields the following LP
formulation:
max E[y
0,N
(C
0
)] (5.1a)
s.t. y
0,N
(c
0
)E[y
i,Nni
(C
i
)] c
0,i
8 i 2 N, c
0
2C
0
(5.1b)
y
i,U[j
(c
i
)E[y
j,U
(C
j
)] c
i,j
8 i 2 N , j 2 N , U Nnfi, jg, c
i
2C
i
(5.1c)
y
i,Æ
(c
i
) c
i,0
8 i 2 N, c
i
2C
i
(5.1d)
y
0,N
(c
0
) 2R8 c
0
2C
0
; y
i,U
(c
i
) 2 R8 i 2 N, U Nn i, c
i
2C
i
(5.1e)
We can again use the expected costs-to-go to guide our decisions as we dynamically
generate tours. By solving a restriction of the LP , we can find approximate costs-to-go,
similar to the deterministic case. Toriello, Haskell, and Poremba present an ALP bound
solvable in polynomial time for certain support sets [125]. This bound does not need
56
complete information on the arc cost distributions. Instead, it only requires the expected
arc costs and the support sets,C
i
8 i2 N[ 0. While this technique allows us to efficiently
model and bound the dynamic TSP , in practice, obtaining these bounds can be
computationally challenging. To address this issue, we implemented a price-directed
heuristic policy which approximates the costs-to-go for intermediate states by solving a
subproblem with deterministic costs.
5.2 Price-Directed Heuristic Policy
To develop a heuristic policy for the dynamic TSP , we can adapt the same
methodology we are using for deterministic traveling salesman problems. In this case,
however, instead of using approximate costs-to-go in a branch-and-bound procedure,
we utilize them in a price-directed policy. Rather than construct a large search tree, we
generate a tour step-by-step in a single pass. By simulating over a statistically significant
number of iterations, we can test the performance of the policy.
When we are at a city i and have remaining cities U Nn i, we also observe costs
c
i
2C
i
. We decide what action to take by comparing approximate costs-to-go for the
remaining cities and choosing the minimum; ties are broken arbitrarily. In our policy, we
obtain the approximate costs-to-go by solving a subproblem for each city j2 U. These
subproblems are the same subproblems that we solve in the deterministic case. For city
j, we must solve the LP relaxation of the minimum cost j 0 Hamiltonian path problem
on the cities U[ 0. We replace the random arc costs with their respective expectations.
While this is not a lower bound, it does allow us to obtain approximate costs-to-go more
quickly than we can by solving the ALP formulation.
Compared to the AO* exact algorithm for the D-TSP proposed by Cheong and
57
White [30], this heuristic performs much faster, a crucial benefit. In our price-directed
approach, we construct a tour in a single pass, while the AO* approach builds a tour by
exploring a search tree. In the worst case, every node in this tree must be explored.
Notice that the state space of the nodes in this tree, relative to the tree in our
deterministic branch-and-bound algorithm, has an added dimension. Given we are at
city i with cities U left to visit, there is a node for every possible cost realization of the arc
into city i, contributing greatly to the runtime of the exact approach.
Algorithm 3 Price-Directed Heuristic Policy
T f0g
U = N
for t= 0, . . . ,jNj 1 do
i = T[t]
c
i
= RealizeCost(C
i
)
for j2 U do
s
j
=fj, Un j, C
j
g
Y
s
j
= SolveSubproblem(s
j
)+ c
i,j
end for
i = argmin
j2U
Y
s
j
T fig
U = Un i
end for
T f0g
return T
58
5.3 Experimental Results
We now present the results of a number of computational experiments done to
evaluate the performance of our heuristic policy for the D-TSP . These results can also be
found in the paper by Toriello, Haskell, and Poremba [125]. We generated test instances
for these experiments using asymmetric TSP instances from TSPLIB [111], all using ftv
instances no greater than 44 cities. All experiments were performed using the LP solver
CPLEX 9.0 on a Dell workstation with dual Intel Xeon 3.2 GHz processor and 2 GB
RDRAM.
5.3.1 Test Instance Design
For each deterministic instance, we created two stochastic instances, each of a
different type. One type has independently distributed arc costs with two possible
realizations, high and low cost. If an arc is at high cost, the deterministic cost is
multiplied by a factor H = 1+a
H
; if at low cost, the deterministic cost is instead
multiplied by a factor L= 1a
L
. When performing an experiment, we must set two
parameters: a
H
and Pr(H). a
H
represents the increment factor for high arc costs, while
Pr(H) is the probability of the high arc cost being observed. It follows that
Pr(L)= 1 Pr(H), where Pr(L) is the probability of the low arc cost occuring.
In order to benchmark our instances, we design the experiment so that the optimal
expected cost of a fixed tour is equal to the optimal cost of the deterministic instance.
Therefore, we choosea
L
in such a way that we can achieve this:
a
L
=
a
H
Pr(H)
Pr(L)
The second variety of stochastic instance we create is similar in that the arc costs can
59
be either high or low. The parameters are all defined in the same fashion. However, the
outgoing arc costs for a city are no longer independent; they are correlated. Arcs out of a
city are either all high or all low.
5.3.2 Discussion of Results
For each ftv instance, we perform an experiment for every combination of
parameters H2f1.05, 1.10, 1.15, 1.20, 1.25, 1.30g and Pr(H)2f0.5, 0.6, 0.675, 0.75g. We
present a number of results for each combination of parameters
The Fixed Tour Cost is the optimal expected cost of a fixed tour. By design of our
instances, this value is equivalent to the optimal cost for the deterministic problem. The
Heuristic Policy Cost is the expected cost of the tour found using our price-directed
heuristic policy. We obtained these results by simulating over 50 trials and taking the
average.
We also present a number of bounds. The Optimistic Bound is the cost of the optimal
tour found in the best case scenario, when all arc costs are observed to be low. The ALP
Bound is the bound we compute when solving the ALP formulation to the problem.
Finally, the A Posteriori Bound represents a bound when using an a posteriori policy,
where all arc costs realizations are known at the beginning of the tour and we solve for
the optimal TSP tour using those observed costs. We obtained this bound by simulating
over 50 trials and taking the average.
The first group of tables contain results from our experiments using independently
distributed arc costs. For each entry, we present the value and in parentheses, the ratio
between it and the a posteriori bound. Notice that the heuristic policy almost always
outperforms the fixed tour cost. The exceptions to this are rare, and all occur for the
ftv44 instance at low settings for H. Additionally, in the places where the fixed tour cost
60
is lower than the heuristic policy cost, the difference is essentially negligible. On the
other hand, for many parameter settings the the heuristic policy cost is considerably less
than the cost of the fixed policy. In a few cases this disparity is nearly a factor of two.
We also note that the heuristic policy costs are decreasing with respect to H, while
the ratio of the cost over the a priori bound is generally increasing with respect to both H
and Pr(H). This suggests that as these parameters are increased, either the policy
declines in performance or the bound worsens.
The next set of tables contains results for the correlated cost instances. All of the
same remarks for the previous results still hold, but it is worth noting that the gaps
between the heuristic policy costs and the a posteriori bounds are generally tighter. In
several cases the heuristic policy actually matches the bound. However, in places where
a gap does exist, either the bound or the policy are to blame.
We conjecture that the disparity between the bound and the optimal expected cost is
greater than the gap between the policy and the optimal expected cost. For one, the
heuristic policy appears to do a good job of approximating the future costs of actions.
Additionally, other authors have encountered similar phenomena computationally [7],
so we are more inclined to suspect the bound of weakening than the heuristic policy.
From these experiments, our price-directed heuristic policy demonstrates itself to be
a viable alternative to the fixed tour policy, in many cases outperforming it by a dramatic
margin. This improvement is most significant when the gaps between high and low arc
costs are at their greatest, which is intuitive, given the advantage a dynamic policy has
over a fixed one when volatility within an instance is more extreme.
61
Instance: ftv33
H 1.05 1.10 1.15 1.2 1.25 1.30
Pr(H)= 0.5
Optimistic Bound 1221.70(95.12%) 1157.40(90.76%) 1093.10(87.28%) 1028.80(84.49%) 964.50(82.02%) 900.20(79.91%)
ALP Bound 1266.19(98.59%) 1222.82(95.89%) 1173.34(93.69%) 1120.05(91.98%) 1063.11(90.41%) 1003.44(89.08%)
A Posteriori Bound 1284.35 1275.20 1252.36 1217.73 1175.90 1126.48
Heuristic Policy Cost 1284.85(100.04%) 1282.17(100.55%) 1274.07(101.73%) 1269.47(104.25%) 1258.63(107.04%) 1239.55(110.04%)
Fixed Tour Cost 1286.00(100.13%) 1286.00(100.85%) 1286.00(102.69%) 1286.00(105.61%) 1286.00(109.36%) 1286.00(114.16%)
Pr(H)= 0.6
Optimistic Bound 1189.55(92.53%) 1093.10(86.21%) 996.65(81.26%) 900.20(77.13%) 803.75(73.74%) 707.30(70.86%)
ALP Bound 1255.72(97.68%) 1193.63(94.14%) 1119.86(91.31%) 1036.81(88.84%) 945.19(86.72%) 847.46(84.90%)
A Posteriori Bound 1285.57 1267.96 1226.48 1167.06 1089.93 998.23
Heuristic Policy Cost 1286.72(100.09%) 1283.71(101.24%) 1276.75(104.10%) 1269.34(108.76%) 1242.72(114.02%) 1213.57(121.57%)
Fixed Tour Cost 1286.00(100.03%) 1286.00(101.42%) 1286.00(104.85%) 1286.00(110.19%) 1286.00(117.99%) 1286.00(128.83%)
Pr(H)= 0.675
Optimistic Bound 1152.45(89.86%) 1018.91(81.33%) 885.36(74.61%) 751.82(69.23%) 618.27(64.89%) 484.72(61.51%)
ALP Bound 1246.28(97.17%) 1160.22(92.61%) 1057.06(89.08%) 928.39(85.49%) 787.13(82.61%) 632.08(80.21%)
A Posteriori Bound 1282.54 1252.83 1186.60 1085.92 952.81 788.04
Heuristic Policy Cost 1284.43(100.15%) 1277.00(101.93%) 1267.91(106.85%) 1239.88(114.18%) 1184.30(124.30%) 1115.72(141.58%)
Fixed Tour Cost 1286.00(100.27%) 1286.00(102.65%) 1286.00(108.38%) 1286.00(118.42%) 1286.00(134.97%) 1286.00(163.19%)
Pr(H)= 0.75
Optimistic Bound 1093.10(85.76%) 900.20(74.16%) 707.30(65.11%) 514.40(57.74%) 321.50(52.48%) 128.60(50.46%)
ALP Bound 1230.60(96.54%) 1107.64(91.25%) 932.65(85.86%) 716.83(80.46%) 462.33(75.47%) 185.35(72.72%)
A Posteriori Bound 1274.67 1213.88 1086.25 890.92 612.62 254.87
Heuristic Policy Cost 1278.12(100.27%) 1265.30(104.24%) 1234.38(113.64%) 1170.34(131.36%) 1042.11(170.11%) 742.13(291.17%)
Fixed Tour Cost 1286.00(100.89%) 1286.00(105.94%) 1286.00(118.39%) 1286.00(144.35%) 1286.00(209.92%) 1286.00(504.56%)
Figure 5.1: D-TSP results ftv33 for instance with independent costs
62
Instance: ftv35
H 1.05 1.10 1.15 1.2 1.25 1.30
Pr(H)= 0.5
Optimistic Bound 1399.35(95.72%) 1325.70(91.95%) 1252.05(88.74%) 1178.40(86.03%) 1104.75(83.73%) 1031.10(81.76%)
ALP Bound 1425.15(97.48%) 1378.83(95.64%) 1325.46(93.94%) 1263.20(92.22%) 1194.44(90.53%) 1122.58(89.02%)
A Posteriori Bound 1461.94 1441.69 1410.98 1369.72 1319.35 1261.11
Heuristic Policy Cost 1470.69(100.60%) 1463.15(101.49%) 1456.03(103.19%) 1451.35(105.96%) 1427.15(108.17%) 1415.95(112.28%)
Fixed Tour Cost 1473.00(100.76%) 1473.00(102.17%) 1473.00(104.40%) 1473.00(107.54%) 1473.00(111.65%) 1473.00(116.80%)
Pr(H)= 0.6
Optimistic Bound 1362.53(93.44%) 1252.05(87.64%) 1141.58(82.67%) 1031.10(78.72%) 920.63(74.86%) 810.15(73.15%)
ALP Bound 1414.33(96.99%) 1345.68(94.19%) 1262.01(91.39%) 1162.56(88.76%) 1052.29(85.56%) 934.76(84.40%)
A Posteriori Bound 1458.16 1428.61 1380.93 1309.84 1229.83 1107.59
Heuristic Policy Cost 1470.85(100.87%) 1461.08(102.27%) 1449.94(105.00%) 1426.23(108.89%) 1416.45(115.17%) 1351.20(121.99%)
Fixed Tour Cost 1473.00(101.02%) 1473.00(103.11%) 1473.00(106.67%) 1473.00(112.46%) 1473.00(119.77%) 1473.00(132.99%)
Pr(H)= 0.675
Optimistic Bound 1320.03(90.74%) 1167.07(82.66%) 1014.10(76.06%) 861.14(70.95%) 708.17(66.99%) 555.21(63.91%)
ALP Bound 1402.88(96.43%) 1310.57(92.82%) 1187.00(89.03%) 1032.84(85.10%) 864.19(81.74%) 687.51(79.14%)
A Posteriori Bound 1454.78 1411.89 1333.27 1213.72 1057.18 868.76
Heuristic Policy Cost 1468.67(100.95%) 1457.82(103.25%) 1439.96(108.00%) 1406.56(115.89%) 1346.17(127.34%) 1259.09(144.93%)
Fixed Tour Cost 1473.00(101.25%) 1473.00(104.33%) 1473.00(110.48%) 1473.00(121.36%) 1473.00(139.33%) 1473.00(169.55%)
Pr(H)= 0.75
Optimistic Bound 1252.05(86.57%) 1031.10(75.14%) 810.15(66.31%) 589.20(59.43%) 368.25(54.92%) 147.30(52.75%)
ALP Bound 1388.88(96.03%) 1249.72(91.07%) 1036.51(84.83%) 780.51(78.73%) 498.32(74.32%) 201.04(72.00%)
A Posteriori Bound 1446.31 1372.23 1221.81 991.42 670.48 279.22
Heuristic Policy Cost 1465.39(101.32%) 1453.57(105.93%) 1411.82(115.55%) 1315.46(132.68%) 1131.87(168.81%) 827.07(296.21%)
Fixed Tour Cost 1473.00(101.85%) 1473.00(107.34%) 1473.00(120.56%) 1473.00(148.58%) 1473.00(219.69%) 1473.00(527.54%)
Figure 5.2: D-TSP results for ftv35 instance with independent costs
63
Instance: ftv38
H 1.05 1.10 1.15 1.2 1.25 1.30
Pr(H)= 0.5
Optimistic Bound 1453.50(95.65%) 1377.00(91.94%) 1300.50(88.76%) 1224.00(86.04%) 1147.50(83.77%) 1071.00(81.90%)
ALP Bound 1477.39(97.22%) 1426.16(95.22%) 1363.75(93.08%) 1295.80(91.09%) 1225.92(89.50%) 1152.47(88.13%)
A Posteriori Bound 1519.67 1497.78 1465.15 1422.60 1369.76 1307.63
Heuristic Policy Cost 1526.46(100.45%) 1522.46(101.65%) 1510.92(103.12%) 1494.98(105.09%) 1477.46(107.86%) 1441.38(110.23%)
Fixed Tour Cost 1530.00(100.68%) 1530.00(102.15%) 1530.00(104.43%) 1530.00(107.55%) 1530.00(111.70%) 1530.00(117.01%)
Pr(H)= 0.6
Optimistic Bound 1415.25(93.42%) 1300.50(87.73%) 1185.75(82.92%) 1071.00(79.02%) 956.25(75.90%) 841.50(73.45%)
ALP Bound 1466.04(96.78%) 1389.87(93.76%) 1299.12(90.85%) 1194.60(88.14%) 1080.20(85.74%) 959.50(83.75%)
A Posteriori Bound 1514.894 1482.37 1429.93 1355.38 1259.83 1145.67
Heuristic Policy Cost 1524.69(100.65%) 1524.49(102.84%) 1510.47(105.63%) 1489.46(109.89%) 1432.64(113.72%) 1393.09(121.60%)
Fixed Tour Cost 1530.00(101.00%) 1530.00(103.21%) 1530.00(107.00%) 1530.00(112.88%) 1530.00(121.45%) 1530.00(133.55%)
Pr(H)= 0.675
Optimistic Bound 1371.12(90.65%) 1212.23(82.75%) 1053.35(76.09%) 894.46(71.03%) 735.58(67.06%) 576.69(64.20%)
ALP Bound 1454.12(96.13%) 1350.42(92.18%) 1221.03(88.21%) 1059.72(84.15%) 886.83(80.85%) 705.43(78.53%)
A Posteriori Bound 1512.61 1465.00 1384.27 1259.30 1096.94 898.25
Heuristic Policy Cost 1527.32(100.97%) 1519.09(103.69%) 1493.90(107.92%) 1455.99(115.62%) 1378.30(125.65%) 1265.95(140.93%)
Fixed Tour Cost 1530.00(101.15%) 1530.00(104.44%) 1530.00(110.53%) 1530.00(121.50%) 1530.00(139.48%) 1530.00(170.33%)
Pr(H)= 0.75
Optimistic Bound 1300.50(86.46%) 1071.00(74.77%) 841.50(65.71%) 612.00(59.38%) 382.50(55.32%) 153.00(53.60%)
ALP Bound 1437.29(95.56%) 1286.90(89.84%) 1064.47(83.12%) 800.80(77.69%) 511.68(74.00%) 205.98(72.16%)
A Posteriori Bound 1504.11 1432.42 1280.70 1030.71 691.46 285.44
Heuristic Policy Cost 1526.59(101.49%) 1507.67(105.25%) 1458.24(113.86%) 1354.56(131.42%) 1163.57(168.28%) 797.46(279.38%)
Fixed Tour Cost 1530.00(101.72%) 1530.00(106.81%) 1530.00(119.47%) 1530.00(148.44%) 1530.00(221.27%) 1530.00(536.02%)
Figure 5.3: D-TSP results for ftv38 instance with independent costs
64
Instance: ftv44
H 1.05 1.10 1.15 1.2 1.25 1.30
Pr(H)= 0.5
Optimistic Bound 1532.35(95.42%) 1451.70(91.39%) 1371.05(87.78%) 1290.40(84.68%) 1209.75(82.06%) 1129.10(79.88%)
ALP Bound 1550.82(96.57%) 1503.80(94.67%) 1442.55(92.35%) 1376.98(90.36%) 1307.02(88.66%) 1230.08(87.02%)
A Posteriori Bound 1605.83 1588.52 1561.97 1523.85 1474.21 1413.48
Heuristic Policy Cost 1634.46(101.78%) 1635.27(102.94%) 1619.47(103.68%) 1609.62(105.63%) 1589.83(107.84%) 1557.18(110.17%)
Fixed Tour Cost 1631.00(101.57%) 1631.00(102.67%) 1631.00(104.42%) 1631.00(107.03%) 1631.00(110.64%) 1631.00(115.39%)
Pr(H)= 0.6
Optimistic Bound 1492.03(93.13%) 1371.05(87.13%) 1250.08(81.97%) 1129.10(77.47%) 1008.13(73.94%) 887.15(71.14%)
ALP Bound 1542.13(96.26%) 1471.02(93.49%) 1380.95(90.55%) 1278.63(87.73%) 1159.59(85.05%) 1032.47(82.80%)
A Posteriori Bound 1602.06 1573.50 1525.05 1457.41 1363.35 1246.97
Heuristic Policy Cost 1635.20(102.07%) 1625.96(103.33%) 1606.05(105.31%) 1570.25(107.74%) 1531.64(112.34%) 1485.02(119.09%)
Fixed Tour Cost 1631.00(101.81%) 1631.00(103.65%) 1631.00(106.95%) 1631.00(111.91%) 1631.00(119.63%) 1631.00(130.80%)
Pr(H)= 0.675
Optimistic Bound 1445.50(90.45%) 1277.99(82.05%) 1110.49(75.07%) 942.98(69.46%) 775.48(65.27%) 607.98(62.28%)
ALP Bound 1533.22(95.94%) 1434.97(92.13%) 1307.76(88.40%) 1145.77(84.40%) 961.63(80.94%) 764.14(78.27%)
A Posteriori Bound 1598.05 1557.49 1479.33 1357.50 1188.04 976.25
Heuristic Policy Cost 1636.57(102.41%) 1624.08(104.28%) 1596.03(107.89%) 1554.67(114.52%) 1485.94(125.07%) 1371.46(140.48%)
Fixed Tour Cost 1631.00(102.06%) 1631.00(104.72%) 1631.00(110.25%) 1631.00(120.15%) 1631.00(137.28%) 1631.00(167.07%)
Pr(H)= 0.75
Optimistic Bound 1371.05(86.10%) 1129.10(74.17%) 887.15(64.73%) 645.20(57.96%) 403.25(53.67%) 161.30(51.86%)
ALP Bound 1517.40(95.29%) 1375.90(90.38%) 1152.58(84.10%) 870.12(78.17%) 554.52(73.80%) 223.20(71.76%)
A Posteriori Bound 1592.44 1522.33 1370.45 1113.16 751.39 311.01
Heuristic Policy Cost 1637.68(102.84%) 1614.17(106.03%) 1565.60(114.24%) 1447.39(130.03%) 1281.98(170.62%) 923.55(296.95%)
Fixed Tour Cost 1631.00(102.42%) 1631.00(107.14%) 1631.00(119.01%) 1631.00(146.52%) 1631.00(217.07%) 1631.00(524.41%)
Figure 5.4: D-TSP results for ftv44 instance with independent costs
65
Instance: ftv33
H 1.05 1.10 1.15 1.2 1.25 1.30
Pr(H)= 0.5
Optimistic Bound 1221.70(95.51%) 1157.40(90.97%) 1093.10(86.39%) 1028.80(81.75%) 964.50(77.09%) 900.20(72.49%)
ALP Bound 1279.12(100.00%) 1259.98(99.04%) 1233.54(97.48%) 1203.24(95.62%) 1171.33(93.63%) 1138.21(91.66%)
A Posteriori Bound 1279.12 1272.25 1265.37 1258.42 1251.09 1241.80
Heuristic Policy Cost 1279.12(100.00%) 1272.25(100.00%) 1265.37(100.00%) 1258.50(100.01%) 1251.62(100.04%) 1244.74(100.24%)
Fixed Tour Cost 1286.00(100.54%) 1286.00(101.08%) 1286.00(101.63%) 1286.00(102.19%) 1286.00(102.79%) 1286.00(103.56%)
Pr(H)= 0.6
Optimistic Bound 1189.55(93.14%) 1093.10(86.19%) 996.65(80.72%) 900.20(72.06%) 803.75(65.09%) 707.30(58.14%)
ALP Bound 1274.37(99.78%) 1243.42(98.04%) 1203.07(97.44%) 1158.01(92.70%) 1108.08(89.74%) 1053.48(86.59%)
A Posteriori Bound 1277.13 1268.26 1234.7375 1249.22 1234.74 1216.56
Heuristic Policy Cost 1277.13(100.00%) 1268.26(100.00%) 1259.39(102.00%) 1250.52(100.10%) 1241.65(100.56%) 1234.66(101.49%)
Fixed Tour Cost 1286.00(100.69%) 1286.00(101.40%) 1286.00(104.15%) 1286.00(102.94%) 1286.00(104.15%) 1286.00(105.71%)
Pr(H)= 0.675
Optimistic Bound 1152.45(90.27%) 1018.91(80.39%) 885.36(70.39%) 751.82(60.60%) 618.27(50.79%) 484.72(40.83%)
ALP Bound 1267.82(99.30%) 1224.09(96.58%) 1169.13(92.95%) 1102.11(88.84%) 1019.07(83.71%) 907.50(76.44%)
A Posteriori Bound 1276.74 1267.48 1257.74 1240.59 1217.41 1187.19
Heuristic Policy Cost 1276.74(100.00%) 1267.48(100.00%) 1258.22(100.04%) 1248.96(100.67%) 1241.08(101.94%) 1226.79(103.34%)
Fixed Tour Cost 1286.00(100.73%) 1286.00(101.46%) 1286.00(102.25%) 1286.00(103.66%) 1286.00(105.63%) 1286.00(108.32%)
Pr(H)= 0.75
Optimistic Bound 1093.10(85.73%) 900.20(71.21%) 707.30(56.78%) 514.40(42.35%) 321.50(27.48%) 128.60(11.54%)
ALP Bound 1258.09(98.67%) 1195.09(94.54%) 1105.75(88.76%) 968.11(79.70%) 772.77(66.05%) 489.57(43.94%)
A Posteriori Bound 1275.05 1264.09 1245.78 1214.68 1170.04 1114.28
Heuristic Policy Cost 1275.05(100.00%) 1264.10(100.00%) 1253.80(100.64%) 1237.74(101.90%) 1224.35(104.64%) 1207.28(108.35%)
Fixed Tour Cost 1286.00(100.86%) 1286.00(101.73%) 1286.00(103.23%) 1286.00(105.87%) 1286.00(109.91%) 1286.00(115.41%)
Figure 5.5: D-TSP results for ftv33 instance with correlated costs
66
Instance: ftv35
H 1.05 1.10 1.15 1.2 1.25 1.30
Pr(H)= 0.5
Optimistic Bound 1399.35(95.34%) 1325.70(90.72%) 1252.05(86.10%) 1178.40(81.48%) 1104.75(75.82%) 1031.10(72.26%)
ALP Bound 1441.72(98.22%) 1419.30(97.13%) 1394.35(95.89%) 1366.42(94.48%) 1333.07(91.49%) 1294.93(90.74%)
A Posteriori Bound 1467.82 1461.29 1454.13 1446.23 1457.06 1427.00
Heuristic Policy Cost 1473.80(100.41%) 1466.46(100.35%) 1461.63(100.52%) 1456.80(100.73%) 1472.86(101.08%) 1447.23(101.42%)
Fixed Tour Cost 1473.00(100.35%) 1473.00(100.80%) 1473.00(101.30%) 1473.00(101.85%) 1473.00(101.09%) 1473.00(103.22%)
Pr(H)= 0.6
Optimistic Bound 1362.53(92.94%) 1252.05(85.92%) 1141.58(78.92%) 1031.10(71.90%) 920.63(64.84%) 810.15(57.70%)
ALP Bound 1435.05(97.89%) 1403.20(96.29%) 1364.31(94.32%) 1315.44(91.73%) 1253.42(88.28%) 1182.27(84.21%)
A Posteriori Bound 1465.99 1457.25 1446.48 1434.09 1419.76 1404.02
Heuristic Policy Cost 1470.45(100.30%) 1464.46(100.50%) 1458.48(100.83%) 1452.49(101.28%) 1446.50(101.88%) 1441.36(102.66%)
Fixed Tour Cost 1473.00(100.48%) 1473.00(101.08%) 1473.00(101.83%) 1473.00(102.71%) 1473.00(103.75%) 1473.00(104.91%)
Pr(H)= 0.675
Optimistic Bound 1320.03(90.17%) 1167.07(80.35%) 1014.10(70.54%) 861.14(60.67%) 708.17(50.65%) 555.21(40.44%)
ALP Bound 1428.30(97.56%) 1384.12(95.30%) 1325.64(92.21%) 1241.04(87.44%) 1137.29(81.33%) 1009.28(73.51%)
A Posteriori Bound 1463.98 1452.41 1437.63 1419.36 1398.31 1372.90
Heuristic Policy Cost 1469.01(100.34%) 1460.94(100.59%) 1452.87(101.06%) 1444.80(101.79%) 1436.73(102.75%) 1427.80(104.00%)
Fixed Tour Cost 1473.00(100.62%) 1473.00(101.42%) 1473.00(102.46%) 1473.00(103.78%) 1473.00(105.34%) 1473.00(107.29%)
Pr(H)= 0.75
Optimistic Bound 1252.05(85.78%) 1031.10(71.52%) 810.15(57.15%) 589.20(42.45%) 368.25(27.28%) 147.30(11.32%)
ALP Bound 1418.38(97.18%) 1352.28(93.80%) 1240.80(87.53%) 1080.29(77.83%) 859.99(63.70%) 539.08(41.44%)
A Posteriori Bound 1459.55 1441.67 1417.64 1388.10 1350.13 1300.72
Heuristic Policy Cost 1465.81(100.43%) 1454.30(100.88%) 1442.79(101.77%) 1431.28(103.11%) 1417.01(104.95%) 1391.31(106.96%)
Fixed Tour Cost 1473.00(100.92%) 1473.00(102.17%) 1473.00(103.91%) 1473.00(106.12%) 1473.00(109.10%) 1473.00(113.24%)
Figure 5.6: D-TSP results for ftv35 instance with correlated costs
67
Instance: ftv38
H 1.05 1.10 1.15 1.2 1.25 1.30
Pr(H)= 0.5
Optimistic Bound 1453.50(95.34%) 1377.00(90.70%) 1300.50(86.06%) 1224.00(81.42%) 1147.50(76.78%) 1071.00(72.16%)
ALP Bound 1499.04(98.32%) 1473.81(97.08%) 1445.99(95.69%) 1415.49(94.16%) 1379.98(92.34%) 1340.45(90.31%)
A Posteriori Bound 1524.61 1518.17 1511.08 1503.35 1494.52 1484.21
Heuristic Policy Cost 1526.60(100.13%) 1521.20(100.20%) 1515.81(100.31%) 1509.53(100.41%) 1503.71(100.62%) 1497.89(100.92%)
Fixed Tour Cost 1530.00(100.35%) 1530.00(100.78%) 1530.00(101.25%) 1530.00(101.77%) 1530.00(102.37%) 1530.00(103.09%)
Pr(H)= 0.6
Optimistic Bound 1415.25(92.88%) 1300.50(85.78%) 1185.75(78.68%) 1071.00(71.58%) 956.25(64.47%) 841.50(57.31%)
ALP Bound 1491.37(97.88%) 1455.65(96.02%) 1413.24(93.77%) 1361.93(91.02%) 1300.43(87.67%) 1227.81(83.62%)
A Posteriori Bound 1523.70 1516.04 1507.12 1496.32 1483.26 1468.32
Heuristic Policy Cost 1525.70(100.13%) 1519.40(100.22%) 1513.10(100.40%) 1508.64(100.82%) 1502.40(101.29%) 1496.16(101.90%)
Fixed Tour Cost 1530.00(100.41%) 1530.00(100.92%) 1530.00(101.52%) 1530.00(102.25%) 1530.00(103.15%) 1530.00(104.20%)
Pr(H)= 0.675
Optimistic Bound 1371.12(90.12%) 1212.23(80.23%) 1053.35(70.33%) 894.46(60.37%) 735.58(50.33%) 576.69(40.17%)
ALP Bound 1483.36(97.50%) 1434.95(94.97%) 1372.87(91.66%) 1289.13(87.01%) 1183.59(80.98%) 1050.56(73.17%)
A Posteriori Bound 1521.36 1510.96 1497.80 1481.52 1461.53 1435.70
Heuristic Policy Cost 1524.18(100.19%) 1516.37(100.36%) 1508.55(100.72%) 1501.92(101.38%) 1493.90(102.22%) 1486.50(103.54%)
Fixed Tour Cost 1530.00(100.57%) 1530.00(101.26%) 1530.00(102.15%) 1530.00(103.27%) 1530.00(104.69%) 1530.00(106.57%)
Pr(H)= 0.75
Optimistic Bound 1300.50(85.65%) 1071.00(71.24%) 841.50(56.73%) 612.00(42.05%) 382.50(27.00%) 153.00(11.20%)
ALP Bound 1472.09(96.95%) 1402.12(93.27%) 1290.13(86.98%) 1124.61(77.26%) 890.58(62.86%) 554.86(40.63%)
A Posteriori Bound 1518.36 1503.33 1483.24 1455.54 1416.67 1365.78
Heuristic Policy Cost 1521.70(100.22%) 1511.41(100.54%) 1501.11(101.20%) 1492.64(102.55%) 1480.63(104.52%) 1456.11(106.61%)
Fixed Tour Cost 1530.00(100.77%) 1530.00(101.77%) 1530.00(103.15%) 1530.00(105.12%) 1530.00(108.00%) 1530.00(112.02%)
Figure 5.7: D-TSP results for ftv38 instance with correlated costs
68
6
Stochastic VRP Heuristic Policies
Thus far, we have discussed a heuristic policy for one stochastic application, the
dynamic TSP with stochastic arc costs. However, we can also expand this idea to other
stochastic problems. In this section, we discuss one such example, the vehicle routing
problem with stochastic demand, and explore some heuristics for it.
6.1 VRP with Stochastic Demand
The S-VRP is a variation of the classical VRP , an extensively studied problem in the
operations research field. In the deterministic version of the problem, a single vehicle
must travel to a number of different locations and satisfy the demand at each. The
vehicle has a limited capacity, meaning it will need to return to the depot periodically to
refill its load. As is the case in the TSP , the objective is to minimize the total travel cost by
designing an optimal tour [39].
In the stochastic version of the problem, the demand is not known at the start of the
operating horizon. We know probability distributions for the demand, but not
realizations. One common approach is to fix a tour at the start and follow it regardless of
69
the actual demand realization, resupplying at the depot when necessary. This is the a
priori approach [26, 27, 66].
Commonly, when following an a priori sequence, a policy is established that
requires the vehicle to return directly to the depot when its load reaches zero and the
customer it was servicing has remaining demand. We refer to this event as a delivery
failure. Once arriving at the depot, the vehicle is restocked up to capacity, and the vehicle
may resume its route. As the previously serviced customer (where the failure took place)
still has unsatisfied demand at this point, it is typical to require that the vehicle return
directly to that customer to complete fulfilling its demand. The vehicle will then move
on to the next customer in the tour.
An alternative to the a priori approach is one that allows the vehicle to make
dynamic decisions on if and when to take preemptive trips to the depot to resupply. We
refer to this as a dynamic restocking policy [26, 130]. The benefit here is that we can
incorporate new demand information in real time as it becomes available and allow the
vehicle to make adjustments to its path whenever it appears beneficial. As before, the
vehicle moves from location to location, returning to the depot when a failure occurs.
Also as before, if the customer where the failure occurred still has remaining demand,
the vehicle will travel back to it before moving on to its next client.
However, we now have the option of sending the vehicle to the depot early to
potentially avoid failures before they happen. In other words, rather than proceed to a
customer where a costly failure is likely, in that it could add significantly to the cost of
the tour, we can preemptively travel to the depot, resupply the vehicle, and then
continue on without risk of that failure. The question of when to execute a preemptive
resupply can be formulated and answered as a dynamic program.
If we remove the requirement of fixing an a priori tour, we can incorporate even
70
more flexibility into the vehicle’s route and potentially see even more cost reductions.
We would not only allow the vehicle to dynamically decide when to restock at the depot,
but also when to change the order in which to visit the remaining customers. This is
referred to as a reoptimization policy [46, 47, 106, 109, 119] and can also be formulated as
a dynamic program.
In this section, we explore the potential benefits a restricted reoptimization policy
can provide over a pure a priori or dynamic restocking approach. In particular, we
discuss a single resequencing heuristic which improves upon a commonly used fixed-tour
heuristic, which we refer to as the cyclic heuristic, by allowing for limited reoptimization
to occur. We relax the requirement that the a priori sequence be followed throughout the
entirety of the route and instead allow for a single tour resequence to occur. With this
relaxation in place, we observe improvements to the solutions generated by the
traditional cyclic heuristic. At the same time, by allowing only a single resequence to
occur, our heuristic is computationally feasible for instances where a full dynamic
reoptimization policy is not.
To support this, we also present results for a series of computational experiments.
The improvement is most dramatic when the fill rate increases beyond the range of fill
rates traditionally tested in the existing S-VRP literature and where more complex
methods are no longer computationally tractable. Fill rate is a measure of how many
return trips to the depot the vehicle will require, on average, in order to serve all
customers on the tour. Higher fill rates correlate to a greater number of expected
subtours. Even at these high fill rates, the single resequencing heuristic remains tractable
and improves upon the more static heuristics.
In Section 6.2, we discuss the cyclic heuristic, a common solution approach to the
S-VRP problem that we use both as a benchmark and as a foundation upon which to
71
build. In Section 6.3, we present the Single Resequencing Heuristic, a limited
reoptimization approach which improves upon the cyclic heuristic. Finally, in Section
6.4, we discuss a set of computational experiments for the Single Resequencing Heuristic
and present the results of those experiments.
6.2 Cyclic Heuristic
Let N be the set of locations to visit, and Q be the capacity of the vehicle. For
simplicity, assume all travel costs to be symmetric, c
i,j
= c
j,i
. At any point on its route,
the state of the vehicle can be defined as(i, U, q) where the vehicle is at location i (after
having just serviced the customer at i) with locations U N yet to be visited and q as
the amount of product remaining on the vehicle. Let D
j
be a random variable
representing the demand at city j.
Denote the cost-to-go of being at customer i with q product left and the locations in
U left to visit as y
i,U
(q). The optimal cost-to-go, assuming full reoptimization, can then
be written as:
y
i,U
(q)= min
(
min
j2U
c
i,j
+ Pr(D
j
> q)(c
j,0
+ c
0,j
)+ Pr(D
j
q)E[y
j,Unj
(q D
j
)jD
j
q]
+ Pr(D
j
> q)E[y
j,Unj
(Q+ q D
j
)jD
j
> q]
,
(6.1a)
min
j2U
c
i,0
+ c
0,j
+E[y
j,Unj
(Q D
j
)]
)
(6.1b)
72
The intuition behind the cost-to-go is the following. At the given state, the vehicle
has two decisions to make. First, it must decide which customer to travel to next.
Second, but equally important, the vehicle must decide whether or not to resupply
before moving to that next customer.
The above equation (6.2a) corresponds to the decision of going directly to the next
location. It is the sum of traveling to that location plus the expected cost of completing
the tour from that location. This also includes the costs associated with the probability
the vehicle will have to restock after arriving at the location due to a service failure.
Conversely, (6.2b) represents the decision of going to the depot directly from
location i and then traveling to j. It contains not only the travel costs, but the expected
cost of being at location j with Q+ q D
j
supply remaining and completing the tour
from there.
If we can approach this dynamic programming recursion in a methodology similar
to what we have followed for other routing problems, we can apply the solution in a
price-directed heuristic policy. By constructing routes step-by-step, we can start at the
depot and make decisions about where to travel next based on the approximate
costs-to-go that we compute.
While this can be an effective way to generate solutions to the S-VRP , one of the
difficulties with attempting this approach is the tremendous size of the state space
pertaining to the DP recursion. So while allowing for full reoptimization can yield high
quality solutions to the S-VRP , for instances of a meaningful size it is quite often
impractical, if not impossible, to perform the necessary computation. The exponential
state space makes the DP recusion intractable.
However, by limiting the number of states, we can construct a similar but more
easily solved formulation. One way to do this is to impose a fixed sequence to which the
73
vehicle must adhere while traversing its route, thus restricting the potential actions it can
take out of each location and limiting the state space. Rather than allow the vehicle to
choose any of the remaining customers to travel to next, we fix the customer sequence
ahead of time and do not allow any deviation from the route. By doing so, we no longer
have a reoptimization policy, but rather, an a priori one.
Though the order of customers is fixed, we can still allow dynamic choices of
whether to visit the next customer on the fixed route directly or to preemptively return
to the depot to restock before then moving to that next customer. This constitutes a
dynamic restocking policy.
Denote the cost-to-go of being at customer i with q product left as y
i
(q). Assume
according to the fixed tour, customers are sequenced 1, 2, ..., N. The optimal cost-to-go
can be written as:
y
i
(q)= min
(
c
i,i+1
+ Pr(D
i+1
> q)(c
i+1,0
+ c
0,i+1
)+ Pr(D
i
+ 1 q)E[y
i+1
(q D
i+1
)jD
i+1
q]
+ Pr(D
i+1
> q)E[y
i+1
(Q+ q D
i+1
)jD
i+1
> q],
(6.2a)
c
i,0
+ c
0,i+1
+E[y
i+1
(Q D
i+1
)]
)
(6.2b)
By solving this recursion through use of dynamic programming, we can determine
the optimal times to preemptively return to the depot, given that we service the
customers in that fixed sequence.
Moving forward, we will refer to any approach using the above framework to
74
generate solutions to the S-VRP as a cyclic heuristic, similar to the approach in [26]. This
is the foundation upon which we will build a limited reoptimization heuristic and the
benchmark we use to measure its effectiveness.
When using a cyclic heuristic, the question of how to best choose the a priori
sequence is a critical one. Given a fixed sequence, the dynamic restocking component of
the algorithm will ensure we are minimizing cost through preemptive resupplies, but
the choice of the fixed tour is equally essential to the heuristic’s success.
One possibility for the fixed a priori sequence is the optimal TSP tour, as seen in the
cyclic heuristic of Bertsimas [26]. By building a TSP using the same customers and
locations as the VRP problem, we can obtain the optimal TSP solution and use it as the
route for the vehicle. This serves as the fixed a priori sequence we input to the cyclic
heuristic.
From a computational perspective, this policy offers some significant advantages.
To execute the strategy, it requires only a single TSP solve and a single DP solve using
that TSP solution. In comparison, fully dynamic reoptimization approaches are
significantly more cumbersome. Because these computations would be done in real-time
as the vehicle is traversing its route, in practice it could be even more restrictive if those
computations cannot be executed in a timely fashion and the vehicle is delayed. On the
other hand, the cyclic approach outlined above allows for all computations to be run in
advance of the vehicle leaving the depot, while still allowing for the vehicle to take
dynamic action based on those pre-computed costs-to-go. These computations could be
performed the night before or the morning of, ensuring the vehicle can execute its route
throughout the day with no risk of delay. Ultimately, given these many benefits, the
dynamic restocking cyclic heuristic using an a priori TSP tour has been shown to be an
effective solution procedure to the S-VRP .
75
6.3 Single Resequencing Heuristic
Regardless of the a priori sequence being used, whether it is a TSP solution or some
other ordering of customers, we suggest that it is possible to make improvements to the
cyclic heuristic by incorporating dynamic reoptimization. Rather than adhere to the
fixed tour and only dynamically optimizing the preemptive resupplies, in this section
we lay out an approach which, in a limited way, allows the remaining sequence of
customers to visit to be dynamically reoptimized. We can demonstrate, through a series
of computational experiments, that this approach can significantly reduce the expected
cost while remaining computationally tractable.
Traditionally, by following a reoptimization approach as opposed to an a priori one,
we are gaining solution quality at the expense of computation time, and if designed
naively, the problem can quickly become intractable. The question becomes how to
strike the right balance between the two. To that end, we propose a single resequencing
heuristic which accomplishes both objectives. It meaningfully improves upon the
traditional cyclic heuristic, while remaining computationally feasible. In essence, it is a
middle ground between the a priori and full reoptimization extremes.
Given any cyclic heuristic, where we serve customers according to an a priori fixed
sequence and utilize a DP to make preemptive decisions on when to resupply, we
propose one key modification. Previously, the vehicle has two actions available to it at
any point in the tour: travel to the next customer on the tour or travel to the depot and
then to the next customer on the tour. We now allow a third action: travel to the depot,
reoptimize the sequence of remaining customers, and then travel to the first customer in
that new sequence. What is important to note is that in an effort to keep this approach
tractable, we allow this resequencing to occur at most one time throughout the entirety of
the tour. This allows the DP to be solvable and the computation times to be manageable.
76
The action of reoptimizing the sequence of remaining customers involves
abandoning the current tour that was fixed a priori and switching to a new one. This
action is only taken when it appears to be more cost effective, given the current state the
vehicle is in. Essentially, we consider the set of remaining customers and the current
position of the vehicle as a S-VRP subproblem and generate a new a priori tour for it in
the same fashion we generated one for the original S-VRP problem. Though you can
generate fixed tours for a cyclic heuristic in any numbers of ways, as we have discussed
in previous sections, one of the most common and effective ways is to use the TSP
solution. Thus, we use this as our benchmark heuristic. Given that, to reoptimize the
remaining sequence once a subset of customers have already been served, we solve a
TSP for the remaining unserved customers. If the new sequence differs from the one we
were already on, we have identified an opportunity to reoptimize our route.
The second component to this is the decision whether or not to commit to the
resequenced route. Put differently, simply because we have the opportunity to change
the remaining sequence of customers does not necessarily mean it will lead to a lower
cost solution in the end. It may actually lead to a worse solution. Alternatively, we may
want to save our resequence for later on in the tour, where the savings could be more
significant. Thus, we incorporate the resequencing decision into the structure of our DP
and weigh it as an option every step of the way, in the same fashion in which we make
the choice of whether or not to preemptively resupply at the depot. At any given state,
we use the costs-to-go of this DP recursion to compare each of the three options (go to
next city, resupply at the depot, or return to depot and resequence) and choose
whichever has the minimum expected cost.
Given the initial a priori tour, we can compute every possible resequence ahead of
time, by solving for the subproblem we would encounter at every step of the tour. This
77
allows us to solve a single DP recursion up front, rather than having to simulate and
compute in real-time as the vehicle is on its route. If we are using the deterministic TSP
to generate tours, this would amount to solving an initial TSP with the full N customers,
a TSP of N 1 customers, a TSP of N 2 customers, and so on for a total of N TSP
solves.
Let R index the TSP tours solved up front. R= 0 corresponds to the a priori TSP
solution used by the traditional cyclic heuristic; this is the initial route the vehicle
follows for the single resequencing heuristic. R= r corresponds to the TSP solution
obtained when solving the subproblem when N r customers remain on the route. In
other words, R= 1 signifies the resequenced tour we obtain when solving the TSP on
N 1 customers, excluding just the first customer visited on the R= 0 tour.
Further, let j[i, R] be a function that outputs the next customer to visit, given the last
customer whose demand was completely satisfied is i and the vehicle is actively
following the tour corresponding to R. When the vehicle finishes service at the final
location on the tour, j[i, R]= 0 for all R. In other words, once all demand has been
satisfied, we return to the depot.
Denote the cost-to-go of being at customer i with q product left and the vehicle on
tour R with customer j[i, R] to visit next as y
i
(q, R). The optimal cost-to-go can be
written as:
y
i
(q, 0)= min
(
c
i,j[i,0]
+ Pr(D
j[i,0]
> q)(c
j[i,0],0
+ c
0,j[i,0]
)
+ Pr(D
j[i,0]
q)E[y
j[i,0]
(q D
j[i,0]
, 0)jD
j[i,0]
q]
+ Pr(D
j[i,0]
> q)E[y
j[i,0]
(Q+ q D
j[i,0]
, 0)jD
j[i,0]
> q],
(6.3a)
78
c
i,0
+ c
0,j[i,0]
+E[y
j[i,0]
(Q D
j[i,0]
, 0)],
(6.3b)
c
i,0
+ c
0,j[i,i]
+E[y
j[i,i]
(Q D
j[i,i]
, i)]
)
(6.3c)
y
i
(q, R)= min
(
c
i,j[i,R]
+ Pr(D
j[i,R]
> q)(c
j[i,R],0
+ c
0,j[i,R]
)
+ Pr(D
j[i,R]
q)E[y
j[i,R]
(q D
j[i,R]
, R)jD
j[i,R]
q]
+ Pr(D
j[i,R]
> q)E[y
j[i,R]
(Q+ q D
j[i,R]
, R)jD
j[i,R]
> q],
(6.3d)
c
i,0
+ c
0,j[i,R]
+E[y
j[i,R]
(Q D
j[i,R]
, R)]
)
(6.3e)
Breaking down the different pieces of the cost-to-go calculation, at the given state,
the vehicle has a set of actions it can take. These actions depend on whether or not it has
already utilized its single resequence. If the vehicle has not resequenced and is currently
following the initial a priori tour (R= 0), there are three actions available. The vehicle
can continue on the existing tour and move to the next city, returning to the depot and
then back to that city in the event of a failure (6.3a). The vehicle can preemptively return
to the depot before moving on to the next city on the existing tour in an attempt to avoid
a costly future failure (6.3b). Or as a final option, the vehicle can choose to preemptively
return to the depot and utilize its resequence, moving from the depot not to the next city
79
on the original tour, but to the first city on the resequenced tour, and continuing on that
path from there (6.3c).
If the vehicle has already used its resequence and is traveling along the tour
corresponding to R, it has just two actions available to it. It can move to the next city on
the tour (6.3d), or it can move to the depot for a preemptive resupply before returning to
the current tour (6.4e). Note that because the vehicle is already on a resequenced tour
and the policy does not allow for multiple resequences, the third option is no longer
available. By solving this DP recursion, we can compute the optimal cost-to-go for any
given state the vehicle might encounter and direct its actions accordingly.
While the above approach allows additional flexibility when compared to a fixed
tour cyclic heuristic, it is still fairly static. Being able to swap out the fixed tour with a
reoptimized one, even if only done once, can be shown to improve upon the cyclic
heuristic, and by incorporating it into the DP recursion, we can actually guarantee that it
will never perform worse. This is because the vehicle can (and when optimal, will)
choose to not resequence at all, equivalent to the traditional cyclic approach. However,
in the remainder of this section, we discuss how by relaxing one additional constraint of
the traditional cyclic policy, we can achieve an even more substantial improvement.
A common condition to build into any S-VRP heuristic is the requirement that upon
failure at a customer, the vehicle must resupply at the depot and then return directly to
the customer where the failure occurred. Though a requirement like this might seem
unintuitive or inconvenient in practice, without it the DP recursion becomes more
challenging to compute. Oftentimes, the justification for making this assumption is that
when we have a delivery failure at a customer, we have very likely disappointed them.
By promising to return directly to them after the resupply is complete, the customer
understands we are taking immediate action to complete their service. More important
80
than anything, however, is that this ensures multiple failures cannot occur at a single
customer.
In that spirit, we propose an approach that guarantees at most one failure occurs at
a single customer, but forgoes the immediate return post-failure requirement. By doing
so, we can make additional improvements to our single resequencing heuristic without
sacrificing the computational feasibility of the DP recursion. In the end, we have a
customer-focused approach that produces better quality solutions and can more
realistically be implemented operationally.
Given the policy outlined previously in this section, we now provide the vehicle
additional opportunities to take advantage of its one allotted tour resequence.
Previously, reoptimization could occur only as a preemptive action. When the vehicle
had successfully served its most recent customer, rather than always move directly to its
next customer, it could elect to return to the depot to resupply and resequence. Upon a
failure, however, the vehicle would be forced to the depot and back to that last customer
without getting the chance to explore a resequence. Now, we extend the resequence
option to any return visit to the depot, whether it was preemptive or the result of a
failure.
Consider the case where a failure has occurred and the vehicle has returned to the
depot. It is important to note that the subset of remaining customers at this point
contains a single customer that has been visited once before and thus, has known
demand. This is the customer where the failure occurred. As we alluded to before, by
executing a resequence here, we may now be following a tour that does not take us
directly back to that customer, but rather, includes that customer later on in the tour. The
customer could potentially experience a second failure later on in the tour, if the vehicle
opts to visit it with insufficient supply to meet the remaining demand. However, in such
81
a case, the demand is known in advance of that decision, and thus, we also know a
failure is guaranteed to occur. As a result, the heuristic would not route the vehicle
directly to that city; rather, it would execute a preemptive resupply, a cheaper option
than going through a guaranteed delivery failure.
Though not a full reoptimization heuristic, where the sequence of customers to visit
can change at any point, this hybrid method combines the strengths of both techniques,
and in fact, is shown in our computational experiments to outperform the traditional
cyclic heuristic in many cases. The cost reductions we observe through this new
approach are likely the direct result of two factors: additional reoptimization
opportunities from allowing resequences to occur either preemptively or post-failure
and increased flexibility from relaxing the requirement that we do not postpone
completing service when a failure occurs.
As before, we can generate solutions according to this approach by solving a DP
recursion. However, the expected cost of a failure is no longer just the cost of returning
to the depot, the cost of going back to the failed city, and the expected cost of moving on
from there. It is now the minimum of that sum and the combined expected cost of
returning to the depot to execute a resequence and of finishing the remaining tour from
that state. If it appears preferable to save our resequence for later and carry on with the
initial route, we choose that option. If the resequence option appears to offer the
maximum cost saving if utilized at that specific point in the tour, we opt for that instead.
Again, let R correspond to the route the vehicle is currently following. R= 0
corresponds to the a priori TSP solution, and R= r corresponds to the TSP solution to
the subproblem when N r customers remain on the route. D is a vector of random
variables or constants which correspond to the remaining demand of the customers. If
customer i has not been visited yet, D
i
is the random variable representing its unknown
82
demand. If customer i has been visited already, D
i
is replaced by a constant representing
the known remaining demand at that city.
Let j[i, R] be a function which outputs the next customer to visit given the last
customer visited and left with zero remaining demand is i and the vehicle is following
tour R. If i> R, j[i, R] is the next customer sequentially after i on tour R. If i R, the last
customer whose demand was fully satisfied does not exist on the resequenced tour R. In
this situation, the next move is to the first customer on the R sequence.
Let
˜
D[i, w] be a function that outputs a demand vector identical to D, except with
the random variable D
i
replaced with a constant value w. In this case, w would equal the
remaining demand at customer i, and this function allows us to update the demand
vector accordingly as we serve customers and satisfy all or some of their demand.
We now denote the cost-to-go of being at location i with q product left, the vehicle
actively following tour R, D corresponding to the remaining demand, and customer
j[i, R] to visit next as y
i
(q, R, D). Note that when R= 0, and we are in a state where a
resequence has not been used yet, D is equivalent to the original vector of random
variables, and thus, can be omitted . The optimal cost-to-go can be written as:
y
i
(q, 0)= min
(
c
i,j[i,0]
+ Pr(D
j[i,0]
q)E
y
j[i,0]
(q D
j[i,0]
, 0)
D
j[i,0]
q
+ Pr(D
j[i,0]
> q)(c
j[i,0],0
+E
min
c
0,j[i,0]
+ y
j[i,0]
(Q+ q D
j[i,0]
, 0),
c
0,j[i,i1]
+ y
j[i,i1]
(Q D
j[i,i1]
, i 1,
˜
D[j[i, 0], D
j[i,0]
q])
D
j[i,0]
> q
,
(6.4a)
c
i,0
+ c
0,j[i,0]
+E
y
j[i,0]
(Q D
j[i,0]
, 0)
,
(6.4b)
83
c
i,0
+ c
0,j[i,i]
+E
y
j[i,i]
(Q D
j[i,i]
, i, D)
)
(6.4c)
y
i
(q, R, D)= min
(
c
i,j[i,R]
+ Pr(D
j[i,R]
> q)(c
j[i,R],0
+ c
0,j[i,R]
)
+ Pr(D
j[i,R]
q)E
y
j[i,R]
(q D
j[i,R]
, R, D)
D
j[i,R]
q
+ Pr(D
j[i,R]
> q)E
y
j[i,R]
(Q+ q D
j[i,R]
, R, D)
D
j[i,R]
> q
,
(6.4d)
c
i,0
+ c
0,j[i,R]
+E
y
j[i,R]
(Q D
j[i,R]
, R, D)
)
(6.4e)
The above costs-to-go mirror the previously described recursion, but now allow for
resequence actions to be considered after failures in addition to preemptively while still
at a customer location. To accommodate this, we now include states where a resequence
has been used that include D in the state definition to explicitly track where failures
have occurred and what demand remains.
Looking at the set of states where we are still on the original tour (R= 0), (6.4a)
corresponds to the action of moving to the next customer on the current tour. If no
failure occurs, the vehicle is simply at that next customer with all demand satisfied and
remaining supply available. If a failure does occur, two options are considered and the
minimum cost one chosen. The first option is to return to the depot and then back to the
same customer, an action we assume in the previous recursion. The second option is to
return to the depot and reoptimize, changing the sequence and moving on to the first
84
customer in the new subtour.
Identical to the previous recursion, (6.4b) and (6.4c) represent the preemptive
resupply and the preemptive resupply plus resequence actions, respectively.
Considering the states where a resequence has been utilized (R> 0), (6.4d) is the action
of traveling to the next customer on the tour, including the cost of a potential failure,
while (6.4e) is the action of executing a preemptive resupply.
In the version of the single resequencing heuristic that allows for resequences to
occur both preemptively and after failures, it should be noted that the DP recursion can
be quite cumbersome to solve for moderate to large size instances. While we were able
to implement the code and test it for a small number of customers, as we scaled up, the
computation times grew to the point where the approach, as is, became impractical to
use. The reason for this is that whereas for the version of the heuristic where resequences
can be performed only preemptively (not after failures) the expected cost of a failure can
be computed as a single expression, the expected cost of a failure for the more
comprehensive heuristic is actually an entire DP recursion of its own. And since we
must compute these expected costs for every state in the main DP where a failure can
occur, the number of subproblems we must solve becomes unrealistic in practice.
Alternatively, we can execute a near equivalent approach through simulation.
Rather than solve DP recursions for every possible failure that could occur, we can build
a solution one step at a time. We solve the initial DP recursion assuming at most one
preemptive resequence can occur (outlined in Recursion 6.3) and begin traversing the
tour according to the solution generated. As each customer is visited, we observe the
actual demand and use the appropriate costs-to-go to direct subsequent actions. In the
event a failure does occur, we solve an additional DP recursion to compute the expected
cost of executing a resequence at that point and compare that expected cost to the
85
expected cost of continuing with the existing tour. We choose whichever option has
lower cost, and as long as we have not used our single resequence, continue simulating
and solving additional DP recursions upon failures.
In Algorithm 4, included below, we outline the simulation approach to the single
resequencing heuristic in more detail. Assume customers 1, . . . , N are ordered according
to the optimal TSP tour. Let the function SolveTSP() take input i and output the optimal
TSP tour T
i
on customers i, . . . , N, which according to our previous notation would
correspond to R= i. Let SolveDP() take the current location, tour, and remaining
demands and determines the minimum cost action to take, given that we can continue
along the T
0
tour or immediately execute a resequence, if we have not already executed
one, and switch over to the T
R
tour corresponding to the customers with remaining
demand. SolveDP() outputs the next customer to visit, s, and a decision on whether or
not to first visit the depot, t. If t= 0, we travel first to the depot. If t= 1, we move
directly to customer s. Once all customers have been served, we output Z
, the total cost
of all actions taken.
Through this simulation approach, we can demonstrate the added value of this
increased flexibility over the more static cyclic heuristic. Though our heuristic is not a
true reoptimization technique, it deploys reoptimization in a limited, manageable way.
As a result, we have an approach that improves upon the cyclic heuristic without
sacrificing computation speed. And though the simulation methodology requires
computations to be performed dynamically as the vehicle is on its route, run times were
trivial for all the instances we tested, meaning we would never anticipate any
computation-related delays.
86
Algorithm 4 Simulation Approach to the Single Resequencing Heuristic
given customers 1, . . . , N ordered by T
0
, the optimal TSP tour, with demand vector D.
Z
= 0
i = 0
R= 0
q= Q
for R= 1, . . . , N do
T
k
= SolveTSP(k)
end for
for k = 0, . . . , N do
if k = 0 then
Move to s= j[i, R] and observe D
s
.
D =
˜
D[s, 0]
q= q D
s
Z
+
= c
i,s
i = s
else if k< N then
(s, t)= SolveDP(i, R, D)
if t= 1 then
Move to s directly and observe D
s
.
if D
s
q then
D =
˜
D[s, 0]
q= q D
s
else if D
s
> q then
D =
˜
D[s, D
s
q]
q= 0
end if
Z
+
= c
i,s
i = s
else if t= 0 then
Move to 0 and then s, update R if resequence used, and observe D
s
.
D =
˜
D[s, 0]
q= Q D
s
Z
+
= c
i,0
+ c
0,s
i = s
end if
else if k = N then
Move to s= 0.
Z
+
= c
i,s
i = 0
end if
end for
return Z
87
6.4 Experimental Results
In the following section, we discuss our stochastic VRP heuristic policies in the
context of a set of computational experiments. We generated test instances to these
experiments in a manner relatively common in the S-VRP literature, including papers by
Secomandi [117, 118] and Bertazzi and Secomandi [24]. We include results tables
comparing our Single Resequencing Heuristic to a few relevant benchmarks. We also
include runtimes.
Experiments were performed using the SYMPHONY 5.6 VRP solver on a Dell
workstation with an Intel Core 2.30 GHz processor and 6.00 GB RAM.
6.4.1 Test Instance Design
A single experiment is defined by a number of different parameters, including the
number of customers N, vehicle capacity Q, percent plus/minus range of demand at
each customer p, and expected fill rate f . More specifically, N represents the number of
customers that the vehicle must service, Q represents the capacity the vehicle can carry,
and p is the percentage above or below the expected demand at a customer,E[D
i
], that
the demand is allowed to vary. Maximum demand at customer i, D
U
i
, is equal to the
expected demand plus that percentage, D
U
i
=E[D
i
]+ pE[D
i
] (rounded to the nearest
integer). Likewise, the minimum demand at customer i, D
L
i
, is the expected demand
minus that percentage, D
L
i
=E[D
i
] pE[D
i
] (rounded to the nearest integer).
Demand is distributed such that it falls exclusively at either the maximum or
minimum value, each with equal probability. GivenE[D
i
], D
U
i
, and D
L
i
, this distribution
gives equal weight to the two extremes and as a result possesses the highest possible
variability.
88
Finally, f is the ratio of total expected demand across all customers,å
N
i=1
E[D
i
], to
vehicle capacity, Q.
f =
N
å
i=1
E[D
i
]/Q
Given settings for those four experimental parameters, N, Q, p, and f , we are able to
construct unique instances to test. Higher values of N lead to more difficult instances, as
an increased number of cities increases the complexity of the problem exponentially.
Higher values of Q make the dynamic programming problem more difficult to solve, as
additional states are created for every integer value from 0 to Q. The value of p
correlates to the level of uncertainty in the problem; higher values mean customer
demand is more variable, while lower values mean demand is more predictable. Lastly,
fill rate, f , represents the expected number of routes the vehicle will have to complete in
order to satisfy all demand of all customers. If f = 1, we can expect to solve the problem
using a single tour, equivalent to the TSP . If f = N, we can expect to solve the problem
using a direct delivery approach, returning to the depot immediately after visiting each
and every customer. The problem becomes more interesting when the fill rate falls
somewhere in between those extremes and a number of multi-customer routes are
required to solve the problem efficiently.
For each instance, we assign values to customer locations, customer expected
demands, and customer demand distributions and keep these values constant across all
samples within that instance. For each experiment, we considered 10 unique instances
and 100 demand realized samples of each instance.
When defining an instance, customer locations are randomly generated and
uniformly distributed across a 100 x 100 unit square grid with the depot always located
in the bottom-left corner of the grid, at coordinates (0,0). By this construction, distances
89
between cities are Euclidean, and thus, all costs are symmetric, c
i,j
= c
j,i
8fi, jg2 N[ 0.
Customer expected demands,E[D
i
] are also generated randomly, uniformly
distributed across an interval such that the expected fill rate, f , is maintained given Q. In
our experiments, demand at each particular customer is equally likely to be one of two
values above or belowE[D
i
] in a range defined by parameter p. The realized demand at
customer i, D
i
is either D
U
i
and D
L
i
with equal probability. Also, we require
D
U
i
Q8 i2 N.
We tested instances of size N = 35 up to N = 85. For instances of 65 customers or
fewer, we used Q= 10. Instances above 65 customers use Q= 5. High, medium, and
low values of p were tested, where p was set to 0.1, 0.2, and 0.3, respectively. Fill rate, f ,
ranges between high and low values as well, listed for each instance in the results tables
below.
6.4.2 Discussion of Results
We executed computational experiments to test the performance of a new heuristic
for the S-VRP , the single resequencing heuristic, which we will refer to simply as RSQ.
More specifically, RSQ refers to the full single resequencing heuristic outlined in the
previous section where resequences can be either preemptive or post-failure. Note that
RSQ solutions are generated through simulation. In addition to running experiments on
this heuristic, we concurrently tested a few other policies.
These other policies serve as benchmarks and bounds and lend some perspective on
our heuristics’ performance. Two extreme bounds are the TSP and direct delivery
solutions, TSP and DD, respectively. The former refers to the deterministic TSP solution
corresponding to the VRP instance, while the latter accounts for the situation where we
visit each customer directly. In other words, the solution is comprised of N routes, each
90
with a single customer on it. TSP gives a sense of what we could do in the best possible
case, where no return trips to the depot are required. On the other hand, DD illustrates
the opposite scenario, where we take a return trip to the depot after every customer
visited.
Another benchmark we provide is the sample mean of a number of solutions to the
deterministic unsplit VRP problem with known demand, VRP
. This serves as
something close to a lower bound on the performance of our heuristic, as it solves the
stochastic VRP as if we knew the exact demand of all customers ahead of time.
However, it is important to note that it is not a true lower bound on performance, since it
imposes the requirement that deliveries cannot be split (each customer may be visited
once and only once). Our heuristics can potentially gain an advantage by relaxing this
requirement and allowing multiple trips to each customer. Even though it is not a true
lower bound for the S-VRP , we believe is still serves as a useful benchmark, just as others
have in the past [119]. Also worth nothing, the runtime for VRP
can be quite unwieldy,
as we are required to solve a new VRP for every realized sample. Thus, we present this
benchmark only for smaller instances and restrict the amount of time spent solving each
deterministic VRP . In places where the time limit is reached, we instead report the best
lower bound.
The final, and most important, comparison heuristic we implemented was a
TSP-based policy similar to the cyclic heuristic of Bertsimas [27]. As described in the
previous section on TSP-based policies, we fix an a priori tour based on the solution to
the S-VRP’s corresponding TSP . We then allow for restocking trips by solving the DP
recursion, using the same procedure we utilize for RSQ. We believe this heuristic, CYC,
to be representative of a number of approaches proposed in the existing VRP literature,
and as such, we regard it as our primary benchmark.
91
Due to the symmetry of the optimal TSP solution, we have the option of considering
both the forward and the backward tour as our fixed a priori sequence for either RSQ or
CYC. This would require us to solve a DP recursion on each direction and choose
whichever one turns out to have the lower expected cost. The same can be done when
multiple optimal TSP tours exists. We can consider each optimal tour in the DP and then
choose the minimum cost option. However, in an effort to reduce the required
computation time without mitigating our ability to compare the two approaches, we
consider only one optimal tour in only one direction.
N = 35
f p RSQ CYC Gap
6 0.1 1527.3 1550.5 1.5 %
12 0.1 2836.3 2993.7 5.3 %
12 0.2 2845.5 3135.3 9.2 %
12 0.3 2864.5 2941.2 2.6 %
18 0.1 3615.5 3801.2 4.9 %
18 0.2 3719.4 3957.8 6.0 %
18 0.3 3749.8 3885.7 3.5 %
24 0.1 4925.6 5139.4 4.2 %
24 0.2 5200.9 5269.8 1.3 %
TSP = 508.9 DD = 5269.8
Figure 6.1: Average RSQ results for N = 35 instances
92
N = 45
f p RSQ CYC Gap
8 0.1 1890.0 1918.2 1.5 %
16 0.1 3503.2 3723.6 5.9 %
16 0.2 3476.5 3968.4 12.4 %
16 0.3 3602.4 3707.3 2.8 %
24 0.1 4209.2 4564.9 7.8 %
24 0.2 4524.8 4938.9 8.4 %
24 0.3 4697.6 4884.8 3.8 %
32 0.1 6271.7 6500.2 3.5 %
32 0.2 6563.3 6696.0 2.0 %
TSP = 557.1 DD = 6696.0
Figure 6.2: Average RSQ results for N = 45 instances
N = 55
f p RSQ CYC Gap
10 0.1 2263.6 2294.9 1.4 %
20 0.1 4215.9 4572.7 7.8 %
20 0.2 4233.4 4813.0 12.0 %
20 0.3 4223.6 4492.6 6.0 %
30 0.1 6236.2 6849.1 8.9 %
30 0.2 6486.1 7214.3 10.1 %
30 0.3 7019.6 7591.7 7.5 %
40 0.1 7315.6 7841.3 6.7 %
40 0.2 7699.4 8152.2 5.6 %
TSP = 608.9 DD = 8152.6
Figure 6.3: Average RSQ results for N = 55 instances
93
N = 65
f p RSQ CYC Gap
12 0.1 2711.6 2753.5 1.5 %
24 0.1 5055.2 5443.2 7.1 %
24 0.2 5204.6 5866.9 11.3 %
24 0.3 5249.0 5468.0 4.0 %
36 0.1 7701.8 8419.2 8.5 %
36 0.2 7829.0 8757.5 10.6 %
36 0.3 8412.5 9388.6 10.4 %
48 0.1 9808.7 9972.4 1.6 %
48 0.2 8782.8 9972.4 11.9 %
TSP = 672.5 DD = 9972.6
Figure 6.4: Average RSQ results for N = 65 instances
N = 75
f p RSQ CYC Gap
28 0.1 5201.0 6461.2 19.5 %
28 0.2 5211.1 6461.2 19.3 %
42 0.1 8786.1 9586.2 8.3 %
42 0.2 8760.2 9587.4 8.6 %
56 0.1 10846.1 11306.2 4.1 %
56 0.2 10716.2 11306.2 5.2 %
TSP = 701.7 DD = 11306.4
Figure 6.5: Average RSQ results for N = 75 instances
N = 85
f p RSQ CYC Gap
32 0.1 5741.1 7178.1 20.0 %
32 0.2 5742.5 7178.5 20.0 %
48 0.1 9108.2 10500.4 13.3 %
48 0.2 9087.3 10498.8 13.4 %
64 0.1 11889.1 12600.7 5.6 %
64 0.2 11865.1 12600.7 5.8 %
TSP = 733.0 DD = 12601.0
Figure 6.6: Average RSQ results for N = 85 instances
94
As is visible in the tables above, the RSQ heuristic reports lower cost solutions than
CYC in all instances. This alone is not remarkable, since the heuristic is in fact
guaranteed to do no worse than the CYC benchmark. Though the vehicle is empowered
to utilize limited route reoptimization to its advantage, it always has the option of
adhering to the a priori route when optimal to, mirroring the traditional TSP-based
cyclic approach.
What is remarkable, however, is that for certain parameter settings, RSQ is making
a significant improvement on CYC. Reviewing the above tables, in the experiments
performed, RSQ consistently finds solutions where the gap over CYC falls between 4%
and 13%. For certain experiments on the N = 85 instances, that improvement gap ranges
as high as 20%.
In general, this improvement tends to increase as the range parameter, p, increases,
though for some parameter settings on some instances, that improvement continues
only up to a certain point. The margin of improvement appears to also be influenced by
the fill rate, f . For medium-sized instances, the margin is widest for the middle-tier
values of f we report results on. As the size of the instance grows, a noticeable
improvement is still made for these f values, but the peak improvement is now seen for
the lower-tier fill rates. At higher-tier fill rates, regardless of instance size or the range
parameter, the improvement margin tends to be only a few percentage points. At very
low fill rates, the improvement we see is similarly very low. This is expected though, as
at low fill rates the problem resembles a TSP , leaving little opportunity for improvement,
and at high fill rates, many customers will be visited alone or in pairs, providing less to
gain from the sequence.
Digging a bit deeper into these trends, we first consider the correlation between the
performance of RSQ versus CYC and the range parameter, p. Higher values of p
95
correspond to higher gaps between the high/low values of possible demand at each
customer. In the extreme case, p= 0 would represent deterministic demand. As p
becomes increasingly positive, we introduce more variability into the problem; the
difference between high demand at a customer and low demand at a customer becomes
more substantial and thus, more impactful to the outcome of the route. For low p values,
demand is more predictable and we observe less of a benefit from allowing any
reoptimization to occur. Failures are also more predictable, and we have fewer
opportunities to take advantage of the post-failure resequence option which makes RSQ
novel. However, the opposite holds as p becomes larger, and quite often, though not
always, we see the gap between RSQ and CYC widen.
For a few sets of experiments at certain combinations of parameters, the maximum
improvement is found not at the highest setting of p, but rather, at the medium setting of
p. As the low p setting is closest to the deterministic version of the problem, its fairly
clear to see why the reoptimization approach built on the traditional cyclic heuristic
would offer little incremental value. However, it is less clear why in some cases as you
increase p that gain, at a certain point, plateaus and then drops off.
We suggest that it could be related to the way in which our instances are designed to
allow p to change. As p grows, the range of demand at each individual customer
widens. The gap between the high demand realization and low demand realization
becomes larger. However, to maintain the instance’s feasibility and match the desired fill
rate, as this grows larger, we must constrict the range in which the expected demand at
every customer may fall. So while the variability at a single customer grows with p, the
variability of expected demand across all customers falls. In the extreme case, when all
customers have an identical expected demand, the order in which you visit them can be
optimized by following the TSP tour, and the added value of resequencing is lessened.
96
As a results, we observe a “sweet spot” where p is at its medium or high setting where
the RSQ heuristic is able to make its most dramatic improvements over the CYC
heuristic.
Fill rate corresponds directly to the expected number of subtours required to satisfy
all demand for all customers, and as such, it has a tangible effect on the effectiveness of
our heuristics. As we mention previously, depending on the size of the instance, we see
the best improvements over CYC when f is set to low or medium values, typically
between 0.2N and 0.4N. The higher the fill rate, the lower the average number of
customers per subtour tends to be. This also translates to more expected return trips to
the depot, since average demand is higher and we cannot serve as many customers
consecutively without a resupply. As the fill rate increases beyond 0.5N, on average we
expected less than two customers per route. In the extreme case where f = N, we would
have a direct delivery, DD, approach.
For these high f settings, we see less value in performing resequences, likely
because the order in which you visit customer is simply less important. Again, in the
extreme direct delivery case, the order makes no difference at all. After every customer,
you return to the depot, and it is thus equivalent to visit them in any sequence. On the
other hand for low and medium fill rates, the sequence is incredibly important; a claim
corroborated by the improvement allowing just a single resequence can have over our
benchmark a priori approach.
Traditionally experiments in the S-VRP literature limit fill rates to fairly low values.
It is quite common to see results reported for a comparable number of customers where
the fill rate varies between 1 and 2, or at most, ranges up to 4. We performed similar
experiments on instances with comparably low fill rates but observed no substantial
improvement over the cyclic heuristic. We report a few examples of experiments with
97
low fill rates in the above tables, in particular for the N = 45 and N = 55 instances,
where the improvement was barely over a percentage point. In the extreme case where
f = 1, we can expect to fulfill all demand with a single TSP tour. With no return trips to
the depot expected, the cost of the tour is exclusively influenced by distances between
cities and how you sequence them. In fact, solving the deterministic TSP is optimal
when no return trips to the depot are necessary. Similarly, for very low fill rates, the
quality of the solution is affected more by the choice of the a priori tour and less by any
actions taken to avoid failures or react to new demand information that becomes
available. As these are the primary benefits provided by the RSQ heuristic, it stands to
reason that they fare better as f increases beyond 1.
For context, we also include a set of tables comparing the performance of RSQ and
CYC to an additional benchmark, VRP
, which we defined earlier. Again, this acts
similarly to a lower bound, though by definition, it is not a true lower bound. In practice
though, because of the immense computation time required to solve deterministic VRP’s
for each individual sample, we enforced a cap on the amount of time spent on each
solve. In the vast majority of cases, this time limit was reached, meaning a lower bound
on the solution to the deterministic problem is used instead, making VRP
an even more
aggressive estimate and a looser bound. Again, because of these computational
limitations, we report results only for instances of small or medium size. In each case,
the optimality gap percentage between the heuristic approach and the VRP
“bound”
consistently shrinks by a few points when using the RSQ approach versus the CYC
approach.
98
N = 35
f p VRP
RSQ Gap CYC Gap
6 0.1 1362.0 1527.3 10.8 % 1550.5 12.2 %
12 0.1 2269.7 2836.3 20.0 % 2993.7 24.2 %
12 0.2 2290.0 2845.5 19.5 % 3135.3 27.0 %
12 0.3 2305.3 2864.5 19.5 % 2941.2 21.6 %
18 0.1 2843.2 3615.5 21.4 % 3801.2 25.2 %
18 0.2 2781.2 3719.4 25.2 % 3957.8 29.7 %
18 0.3 2733.4 3749.8 27.1 % 3885.7 29.7 %
24 0.1 4587.2 4925.6 6.9 % 5139.4 10.8 %
24 0.2 4218.0 5200.9 18.9 % 5269.8 20.0 %
Figure 6.7: Average RSQ results vs VRP* for N = 35 instances
N = 45
f p VRP
RSQ Gap CYC Gap
8 0.1 1631.7 1890.0 13.7 % 1918.2 14.9 %
16 0.1 2658.3 3503.2 24.1 % 3723.6 28.6 %
16 0.2 2703.8 3476.5 22.2 % 3968.4 31.9 %
16 0.3 2707.0 3602.4 24.9 % 3707.3 27.0 %
24 0.1 3189.1 4209.2 24.2 % 4564.9 30.1 %
24 0.2 3241.8 4524.8 28.4 % 4938.9 34.4 %
24 0.3 3237.0 4697.6 31.1 % 4884.8 33.7 %
32 0.1 5682.9 6271.7 9.4 % 6500.2 12.6 %
32 0.2 5218.6 6563.4 20.5 % 6696.0 22.0 %
Figure 6.8: Average RSQ results vs VRP* for N = 45 instances
99
Regarding run times, it is worthwhile to note that for all experiments reported in
this section could be completed in a matter of seconds, or at most, minutes. Exact run
times are reported in set of tables below. The bulk of the computational time is spent
computing the TSP solutions upfront. These TSP solutions provide the alternate route
we can take when considering potential resequencing actions. Any additional time is
spent solving the DP recursion.
In our experiments, we were successfully able to solve for instances of up to 85
customers. This is in line with comparable approaches in the literature. The limiting
factor was not the time required to solve the TSP problems, but the way in which the DP
recursion scaled. However, based on the range of instances we report results for here, we
find RSQ to make meaningful improvements over CYC, particularly as p and f falls in a
certain, previously-discussed range.
N = 35
f p RSQ CYC
6 0.1 3.1 0.2
12 0.1 3.3 0.1
12 0.2 2.8 0.2
12 0.3 3.0 0.1
18 0.1 3.0 0.1
18 0.2 3.1 0.3
18 0.3 3.0 0.1
24 0.1 2.9 0.1
24 0.2 3.0 0.1
Figure 6.9: Average RSQ runtimes (in seconds) for N = 35 instances
100
N = 45
f p RSQ CYC
8 0.1 4.1 0.3
16 0.1 3.8 0.1
16 0.2 4.1 0.1
16 0.3 4.1 0.2
24 0.1 4.1 0.1
24 0.2 4.2 0.1
24 0.3 4.0 0.1
32 0.1 3.9 0.2
32 0.2 3.8 0.1
Figure 6.10: Average RSQ runtimes (in seconds) for N = 45 instances
N = 55
f p RSQ CYC
10 0.1 7.9 0.3
20 0.1 7.8 0.4
20 0.2 8.1 0.4
20 0.3 7.4 0.2
30 0.1 7.7 0.6
30 0.2 7.6 0.4
30 0.3 7.7 0.4
40 0.1 7.2 0.5
40 0.2 7.8 0.5
Figure 6.11: Average RSQ runtimes (in seconds) for N = 55 instances
101
N = 65
f p RSQ CYC
12 0.1 8.5 0.3
24 0.1 7.9 0.4
24 0.2 8.2 0.5
24 0.3 7.8 0.2
36 0.1 7.8 0.3
36 0.2 9.0 0.2
36 0.3 8.5 0.3
48 0.1 8.6 0.3
48 0.2 8.8 0.5
Figure 6.12: Average RSQ runtimes (in seconds) for N = 65 instances
N = 75
f p RSQ CYC
28 0.1 8.8 0.3
28 0.2 8.8 0.6
42 0.1 8.9 0.4
42 0.2 8.7 0.3
56 0.1 9.1 0.1
56 0.2 8.7 0.4
Figure 6.13: Average RSQ runtimes (in seconds) for N = 75 instances
N = 85
f p RSQ CYC
32 0.1 21.9 2.1
32 0.2 21.9 2.0
48 0.1 22.5 2.5
48 0.2 22.4 1.9
64 0.1 22.1 2.0
64 0.2 22.8 2.0
Figure 6.14: Average RSQ runtimes (in seconds) for N = 85 instances
102
7
Conclusion
Throughout the course of this dissertation, we have addressed a variety of different
routing problems and developed novel heuristics for each. While our approaches are
versatile when it comes to the variations of problems they can be adapted to handle,
they have also been shown, through a series of comprehensive computational
experiments, that they are effective. The heuristics proposed in this dissertation match or
improve upon state of the art approaches in terms of solution quality or required
runtime, and sometimes they prove to be competitive across both of these dimensions.
For the deterministic traveling salesman problem, we present a branch-and-bound
algorithm that dynamically constructs, checks, and discards tours as it searches for the
optimal solution. From this branch-and-bound algorithm, we develop two heuristics
and demonstrate their effectiveness generating solutions to the TSP . The first computes
updated bounds as the search tree is traversed; this approach finds optimal or
near-optimal solutions for moderate-sized instances. The second heuristic sacrifices
some solution quality but gains a tremendous reduction in required runtime.
We continue this line of research and extend it to an important TSP variant, the
TD-TSP . We perform a set of experiments on a specific version of the TD-TSP , the
103
average-cost TSP , and dynamically generate solutions on instances of up to 107 cities. By
varying certain parameters in the heuristic, we can again trade off solution quality and
required runtime. In the end, we are able to come within a few percentage points of the
best known bound for the majority of these instances, while at the same time making
significant reductions in runtime.
This dynamic heuristic for the TSP and TD-TSP inspires another heuristic for a
stochastic version of the TSP , more specifically, the dynamic TSP with stochastic arc
costs. We build a price-directed heuristic policy for the D-TSP , and again, we include
results of a set of computational experiments comparing the heuristic to some relevant
D-TSP bounds and benchmarks. We demonstrate that, in general, the dynamic approach
improves upon a fixed tour approach and shrinks the gap to the best known bound.
Finally, we expand our research beyond the TSP and its variants and into another
crucial routing problem in the field of combinatorial optimization, the vehicle routing
problem. Focusing on the VRP with stochastic arc costs, we take a well known a priori
approach, the cyclic heuristic, and illustrate how by allowing some limited
reoptimization to occur in an intelligent way, we can make improvements to solution
quality without a meaningful increase to runtime or the heuristic becoming intractable,
particularly as the fill rate grows.
The common thread between all of these approaches is the impact dynamic decision
making can have when solving routing problems. A priori policies have been shown
time and again to be effective, practical heuristics, especially in places where fully
dynamic heuristics are intractable. But in our work, we show that by incorporating
dynamic decision making in a controlled way, we can generate quality solutions quickly
and for a number of different types of routing problems.
104
7.1 Future Work
Future work on these topics could progress in a number of directions. To conclude
this dissertation, we will briefly discuss a few of them. As we alluded to in earlier
chapters, the dynamic programming based branch-and-bound heuristic we
implemented for the TSP and the TD-TSP in Chapter 4, could also be adapted to solve
other TSP variants. More specifically, a modified version of the approach could in theory
be built for any TSP variant that can be formulated as a DP . The challenge, as it was in
the work we have presented here, is mitigating the computational complexity in clever,
sometimes novel, ways, without sacrificing too much in the way of solution quality.
In much the same way, the price-directed heuristic policy we proposed in Chapter 5
could be modified to solve additional TSP variants as well. A logical next step might be
to mirror our progression on the deterministic TSP and address the TD-TSP with
stochastic arc costs. As we explored in our work with the VRP with stochastic demand,
dynamic approaches offer noticeable advantages over static ones when stochasticity is
introduced. Being able to adjust course as new information is received throughout the
course of a route can be tremendously useful, so long as the additional computational
expense is carefully managed.
We illustrated this point in Chapter 6 by adapting the TSP-based cyclic heuristic into
a limited reoptimization approach, the single resequencing heuristic. By allowing the
vehicle just a single reoptimization opportunity, we observed decreased costs for high
fill rate instances without the procedure becoming intractable. We believe similar
improvement could be made over other a priori approaches, for example, a VRP-based
cyclic heuristic that uses deterministic VRP solutions in place of TSP solutions.
Alternatively, an subsequent line of research could be to explore additional hybrid
heuristics between a priori and full reoptimization techniques. The single resequencing
105
heuristic could be developed into a multiple resequencing heuristic, where we now
allow a limited number of resequences to occur as opposed to just one, so long as we
remain tractable. We could also allow resequences to occur after a successful delivery,
without requiring the vehicle to return to the depot first. Rather, the vehicle could
execute a resequence, using the solution to a Hamiltonian path problem, and evaluate
the option of moving to that new customer directly from its current location.
In general, based on the work presented in this dissertation and contributions made
by our research, we see great opportunity in using dynamic optimization to solve
stochastic routing problems, particularly when employed in an intelligent, controlled
way. Hybrid approaches like single resequencing heuristic offer an effective balance of
performance and practicality, ideal for the many challenges presented by real-world
applications.
106
Bibliography
[1] H. Abeledo, R. Fukasawa, A. Pessoa, and E. Uchoa. The time dependent traveling
salesman problem: polyhedra and algorithm. Mathematical Programming Computa-
tion, pages 1–29, 2013.
[2] T. Achterberg, T. Koch, and A. Martin. Branching rules revisited. Operations Research
Letters, 33(1):42–54, 2005.
[3] D. Adelman. Price-directed replenishment of subsets: Methodology and its ap-
plication to inventory routing. Manufacturing and Service Operations Management,
5:348–371, 2003.
[4] D. Adelman. A price-directed approach to stochastic inventory/routing. Operations
Research, 52:499–514, 2004.
[5] D. Adelman. Dynamic Bid Prices in Revenue Management. Operations Research,
55:647–661, 2007.
[6] D. Adelman and D. Klabjan. Duality and existence of optimal policies in general-
ized joint replenishment. Mathematics of Operations Research, 30:28–50, 2005.
[7] D. Adelman and D. Klabjan. Computing near-optimal policies in generalized joint
replenishment. INFORMS Journal on Computing, 24(1):148–164, 2012.
[8] A. Ak and A. L. Erera. A paired-vehicle recourse strategy for the vehicle-routing
problem with stochastic demands. Transportation Science, 41(2):222–237, 2007.
[9] D. Applegate, R. Bixby, V . Chv´ atal, and W. Cook. Finding cuts in the tsp (a prelimi-
nary report). Center for Discrete Mathematics & Theoretical Computer Science, 1995.
[10] D. L. Applegate, R. E. Bixby, V . Chv´ atal, and W. J. Cook. The Traveling Salesman
Problem: A Computational Study. Princeton University Press, Princeton, New Jersey,
2006.
[11] N. Ascheuer, M. J ¨ unger, and G. Reinelt. A Branch & Cut Algorithm for the Asym-
metric Traveling Salesman Problem with Precedence Constraints. Computational
Optimization and Applications, 17:61–84, 2000.
107
[12] A. Bagchi and A. Mahanti. Three approaches to heuristic search in networks. Journal
of the ACM (JACM), 32(1):1–27, 1985.
[13] E. Balas. The Prize Collecting Traveling Salesman Problem. Networks, 19:621–636,
1989.
[14] E. Balas. The Prize Collecting Traveling Salesman Problem: II. Polyhedral Results.
Networks, 25:199–216, 1995.
[15] E. Balas. The Prize Collecting Traveling Salesman Problem and Its Applications. In
Gutin and Punnen [71], pages 663–696.
[16] E. Balas, M. Fischetti, and W. R. Pulleyblank. The Precedence-Constrained Asym-
metric Traveling Salesman Polytope. Mathematical Programming, 68:241–265, 1995.
[17] R. Baldacci, E. Bartolini, and A. Mingozzi. An exact algorithm for the pickup and
delivery problem with time windows. Operations research, 59(2):414–426, 2011.
[18] R. Baldacci, A. Mingozzi, and R. Roberti. New State-Space Relaxations for Solving
the Traveling Salesman Problem with Time Windows. INFORMS Journal on Com-
puting, 24:356–371, 2012.
[19] J. L. Bander and C. C. White. A heuristic search approach for a nonstation-
ary stochastic shortest path problem with terminal cost. Transportation Science,
36(2):218–230, 2002.
[20] John E Beasley. Route firstcluster second methods for vehicle routing. Omega,
11(4):403–408, 1983.
[21] R. Bellman. Dynamic Programming Treatment of the Travelling Salesman Problem.
Journal of the Association for Computing Machinery, 9:61–63, 1962.
[22] M. Benichou, J. M. Gauthier, P . Girodet, G. Hentges, G. Ribiere, and O. Vincent.
Experiments in mixed-integer linear programming. Mathematical Programming,
1(1):76–94, 1971.
[23] J. J. Bentley. Fast algorithms for geometric traveling salesman problems. ORSA
Journal on computing, 4(4):387–411, 1992.
[24] L. Bertazzi and N. Secomandi. Improved rollout search for the vehicle routing prob-
lem with stochastic demands. Available at SSRN, 2016.
[25] D. P . Bertsekas, J. N. Tsitsiklis, and C. Wu. Rollout algorithms for combinatorial
optimization. Journal of Heuristics, 3(3):245–262, 1997.
[26] D. Bertsimas, P . Chervi, and M. Peterson. Computational approaches to stochastic
vehicle routing problems. Transportation science, 29(4):342–352, 1995.
108
[27] D. J. Bertsimas. A Vehicle Routing Problem with Stochastic Demand. Operations
Research, 40:574–585, 1992.
[28] D. Bienstock, M. X. Goemans, D. Simchi-Levi, and D. Williamson. A note on the
prize collecting traveling salesman problem. Mathematical Programming, 59:413–420,
1993.
[29] C. Blum and A. Roli. Metaheuristics in combinatorial optimization: Overview and
conceptual comparison. ACM Computing Surveys (CSUR), 35(3):268–308, 2003.
[30] T. Cheong and C. C. White. Dynamic Traveling Salesman Problem: Value of Real-
Time Traffic Information. IEEE Transactions on Intelligent Transportation Systems,
13(2):619–630, 2012.
[31] C. H. Christiansen and J. Lysgaard. A branch-and-price algorithm for the capaci-
tated vehicle routing problem with stochastic demands. Operations Research Letters,
35(6):773–781, 2007.
[32] N. Christofides. Worst-Case Analysis of a New Heuristic for the Travelling Sales-
man Problem. Technical Report ADA025602, Graduate School of Industrial Admin-
istration, Carnegie Mellon University, 1976.
[33] N. Christofides, A. Mingozzi, and P . Toth. State space relaxation procedures for the
computation of bounds to routing problems. Networks, 11:145–164, 1981.
[34] J.-F. Cordeau, M. Gendreau, G. Laporte, J.-Y. Potvin, and F. Semet. A guide to ve-
hicle routing heuristics. Journal of the Operational Research society, pages 512–522,
2002.
[35] J.-F. Cordeau, G. Laporte, and A. Mercier. A unified tabu search heuristic for vehi-
cle routing problems with time windows. Journal of the Operational research society,
pages 928–936, 2001.
[36] G. A. Croes. A method for solving traveling-salesman problems. Operations Re-
search, 6(6):791–812, 1958.
[37] G. B. Dantzig, D. R. Fulkerson, and S. M. Johnson. Solution of a large scale traveling-
salesman problem. Operations Research, 2:393–410, 1954.
[38] G. B. Dantzig, D. R. Fulkerson, and S. M. Johnson. On a Linear Programming,
Combinatorial Approach to the Traveling Salesman Problem. Operations Research,
7:58–66, 1959.
[39] G. B. Dantzig and J. H. Ramser. The truck dispatching problem. Management science,
6(1):80–91, 1959.
109
[40] D. P . de Farias and B. van Roy. The linear programming approach to approximate
dynamic programming. Operations Research, 51:850–865, 2003.
[41] D. P . de Farias and B. van Roy. On constraint sampling in the linear program-
ming approach to approximate dynamic programming. Mathematics of Operations
Research, 29:462–478, 2004.
[42] M. Dell’Amico, F. Maffioli, and A. Sciomachen. A lagrangian heuristic for the prize
collectingtravelling salesman problem. Annals of Operations Research, 81:289–306,
1998.
[43] V . Desai, V . F. Farias, and C. C. Moallemi. A smoothed approximate linear program.
Advances in Neural Information Processing Systems, pages 459–467, 2009.
[44] E. W. Dijkstra. A note on two problems in connexion with graphs. Numerische
mathematik, 1(1):269–271, 1959.
[45] M. Dorigo and L. M. Gambardella. Ant colony system: A cooperative learning
approach to the traveling salesman problem. Evolutionary Computation, IEEE Trans-
actions on, 1(1):53–66, 1997.
[46] M. Dror, G. Laporte, and F. V . Louveaux. Vehicle routing with stochastic demands
and restricted failures. Mathematical Methods of Operations Research, 37(3):273–283,
1993.
[47] M. Dror, G. Laporte, and P . Trudeau. Vehicle routing with stochastic demands:
Properties and solution frameworks. Transportation science, 23(3):166–176, 1989.
[48] Y. Dumas, J. Desrosiers, E. Gelinas, and M. Solomon. An Optimal Algorithm for the
Traveling Salesman Problem with Time Windows. Operations Research, 43:367–371,
1995.
[49] I. Dumitrescu, S. Ropke, J.-F. Cordeau, and G. Laporte. The traveling salesman
problem with pickup and delivery: polyhedral results and a branch-and-cut algo-
rithm. Mathematical Programming, 121:269–305, 2010.
[50] A. L. Erera, J. C. Morales, and M. Savelsbergh. The vehicle routing problem with
stochastic demand and duration constraints. Transportation Science, 44(4):474–492,
2010.
[51] V . F. Farias and B. van Roy. An Approximate Dynamic Programming Approach to
Network Revenue Management. Preprint available online athttp://web.mit.edu/
~
vivekf/www/mypapers2.html, 2007.
[52] D. Feillet, P . Dejax, and M. Gendreau. Traveling salesman problems with profits.
Transportation science, 39(2):188–205, 2005.
110
[53] M. Fischetti, J. J. S. Gonzalez, and P . Toth. Solving the orienteering problem through
branch-and-cut. INFORMS Journal on Computing, 10(2):133–148, 1998.
[54] M. Fischetti and P . Toth. An additive approach for the optimal solution of the prize-
collecting travelling salesman problem. Vehicle Routing, Methods and Studies, 16,
1988.
[55] M. L. Fisher and R. Jaikumar. A generalized assignment heuristic for vehicle rout-
ing. Networks, 11(2):109–124, 1981.
[56] M. Gendreau, A. Hertz, and G. Laporte. New insertion and postoptimization pro-
cedures for the traveling salesman problem. Operations Research, 40(6):1086–1094,
1992.
[57] M. Gendreau, A. Hertz, and G. Laporte. A tabu search heuristic for the vehicle
routing problem. Management science, 40(10):1276–1290, 1994.
[58] M. Gendreau, A. Hertz, G. Laporte, and M. Stan. A generalized insertion heuris-
tic for the traveling salesman problem with time windows. Operations Research,
46(3):330–335, 1998.
[59] M. Gendreau, G. Laporte, and R. S´ eguin. Stochastic vehicle routing. European Jour-
nal of Operational Research, 88(1):3–12, 1996.
[60] M. Gendreau, G. Laporte, and F. Semet. A branch-and-cut algorithm for the undi-
rected selective traveling salesman problem. Networks, 32(4):263–273, 1998.
[61] M. Gendreau, G. Laporte, and D. Vigo. Heuristics for the traveling salesman prob-
lem with pickup and delivery. Computers & Operations Research, 26(7):699–714, 1999.
[62] F Glover. Tabu search - part I. ORSA Journal on computing, 1(3):190–206, 1989.
[63] F. Glover. Tabu search - part II. ORSA Journal on computing, 2(1):4–32, 1990.
[64] G. A. Godfrey and W. B Powell. An adaptive dynamic programming algorithm
for dynamic fleet management. I. Single period travel times. Transportation Science,
36(1):21–39, 2002.
[65] R. H. Gonzales. Solution to the traveling salesman problem by dynamic program-
ming on the hypercube. Technical Report 18, Operations Research Center, Mas-
sachusetts Institute of Technology, 1962.
[66] J. C. Goodson, J. W. Ohlmann, and B. W. Thomas. Cyclic-order neighborhoods with
application to the vehicle routing problem with stochastic demand. European Journal
of Operational Research, 217(2):312–323, 2012.
111
[67] J. C. Goodson, J. W. Ohlmann, and B. W. Thomas. Rollout policies for dynamic
solutions to the multivehicle routing problem with stochastic demand and duration
limits. Operations Research, 61(1):138–154, 2013.
[68] J. C. Goodson, B. W. Thomas, and J. W. Ohlmann. Restocking-based rollout poli-
cies for the vehicle routing problem with stochastic demand and duration limits.
Transportation Science, 2015.
[69] M. Gr¨ otschel and M. Padberg. On the symmetric travelling salesman problem i:
Inequalities. Mathematical Programming, 16(1):265–280, 1979.
[70] M. Gr¨ otschel and M. Padberg. On the symmetric travelling salesman problem ii:
Lifting theorems and facets. Mathematical Programming, 16(1):281–302, 1979.
[71] G. Gutin and A.P . Punnen, editors. The Traveling Salesman Problem and Its Variations.
Kluwer Academic Publishers, Dordrecht, The Netherlands, 2002.
[72] P . E. Hart, N. J. Nilsson, and B. Raphael. A formal basis for the heuristic determina-
tion of minimum cost paths. Systems Science and Cybernetics, IEEE Transactions on,
4(2):100–107, 1968.
[73] P . E. Hart, N. J. Nilsson, and B. Raphael. Correction to a formal basis for the heuristic
determination of minimum cost paths. ACM SIGART Bulletin, (37):28–29, 1972.
[74] M. Held and R. M. Karp. A dynamic programming approach to sequencing prob-
lems. Journal of the Society for Industrial & Applied Mathematics, 10(1):196–210, 1962.
[75] M. Held and R. M. Karp. The traveling-salesman problem and minimum spanning
trees. Operations Research, 18:1138–1162, 1970.
[76] M. Held and R. M. Karp. The traveling-salesman problem and minimum spanning
trees: Part II. Mathematical Programming, 1:6–25, 1971.
[77] K. Helsgaun. An effective implementation of the lin–kernighan traveling salesman
heuristic. European Journal of Operational Research, 126(1):106–130, 2000.
[78] D. S. Johnson and L. A. McGeoch. The traveling salesman problem: A case study
in local optimization. Local search in combinatorial optimization, pages 215–310, 1997.
[79] R. M. Karp. A patching algorithm for the nonsymmetric traveling-salesman prob-
lem. SIAM Journal on Computing, 8(4):561–573, 1979.
[80] S. Kirkpatrick, D. Gelatt Jr., and M. P . Vecchi. Optimization by simulated annealing.
Science, 220(4598):671–680, 1983.
[81] G. Laporte. The traveling salesman problem: An overview of exact and approxi-
mate algorithms. European Journal of Operational Research, 59(2):231–247, 1992.
112
[82] G. Laporte, M. Gendreau, J.-Y. Potvin, and F. Semet. Classical and modern heuristics
for the vehicle routing problem. International transactions in operational research, 7(4-
5):285–300, 2000.
[83] G. Laporte, F. V . Louveaux, and L. Van Hamme. An integer l-shaped algorithm
for the capacitated vehicle routing problem with stochastic demands. Operations
Research, 50(3):415–423, 2002.
[84] E. L. Lawler and D. E. Wood. Branch-and-bound methods: A survey. Operations
research, 14(4):699–719, 1966.
[85] H. Lei, G. Laporte, and B. Guo. The capacitated vehicle routing problem
with stochastic demands and time windows. Computers & Operations Research,
38(12):1775–1783, 2011.
[86] S. Lin. Computer solutions of the traveling-salesman problem. BSTJ, 44:2245–2269,
1965.
[87] S. Lin and B. W. Kernighan. An effective heuristic algorithm for the traveling-
salesman problem. Operations research, 21(2):498–516, 1973.
[88] J. T. Linderoth and M. W. P . Savelsbergh. A computational study of search strategies
for mixed integer programming. INFORMS Journal on Computing, 11(2):173–187,
1999.
[89] A. Lucena. Time-dependent traveling salesman problem–the deliveryman case.
Networks, 20(6):753–763, 1990.
[90] A. Mahanti and A. Bagchi. And/or graph heuristic search methods. Journal of the
ACM (JACM), 32(1):28–51, 1985.
[91] C. Malandraki and M. S. Daskin. Time dependent vehicle routing problems: For-
mulations, properties and heuristic algorithms. Transportation science, 26(3):185–200,
1992.
[92] C. Malandraki and R. B. Dial. A restricted dynamic programming heuristic algo-
rithm for the time dependent traveling salesman problem. European Journal of Op-
erational Research, 90(1):45 – 55, 1996.
[93] J. P . Marques-Silva and K. A. Sakallah. GRASP: A search algorithm for propositional
satisfiability. IEEE Transactions on Computers, 48(5):506–521, 1999.
[94] A. Mingozzi, L. Bianco, and S. Ricciardelli. Dynamic programming strategies for
the traveling salesman problem with time window and precedence constraints. Op-
erations Research, 45:365–377, 1997.
113
[95] L. G. Mitten. Branch-and-bound methods: General formulation and properties.
Operations Research, 18(1):24–34, 1970.
[96] N. Mladenovi´ c and P . Hansen. Variable neighborhood search. Computers & Opera-
tions Research, 24(11):1097–1100, 1997.
[97] S. Nadarajah, F. Margot, and N. Secomandi. Approximate Dynamic Programs for
Natural Gas Storage Valuation Based on Approximate Linear Programming Relax-
ations. Technical Report 2011-E5, Tepper School of Business, Carnegie Mellon Uni-
versity, 2011.
[98] C. Novoa and R. Storer. An approximate dynamic programming approach for the
vehicle routing problem with stochastic demands. European Journal of Operational
Research, 196(2):509–515, 2009.
[99] M. Padberg and S. Hong. On the symmetric travelling salesman problem: a computa-
tional study. Springer, 1980.
[100] M. Padberg and G. Rinaldi. Optimization of a 532-city symmetric traveling sales-
man problem by branch and cut. Operations Research Letters, 6(1):1–7, 1987.
[101] K. P . Papadaki and W. B. Powell. An adaptive, dynamic programming algorithm for
stochastic resource allocation problems I: Single period travel times. Transportation
Science, 2002.
[102] J. Pearl. Heuristics: intelligent search strategies for computer problem solving. Addison-
Wesley Pub. Co., Inc., Reading, MA, 1984.
[103] J. Picard and M. Queyranne. The time-dependent traveling salesman problem and
its application to the tardiness problem in one-machine scheduling. Operations Re-
search, 26(1):86–110, 1978.
[104] W. B. Powell. Approximate Dynamic Programming: Solving the curses of dimensionality,
volume 703. Wiley-Interscience, 2007.
[105] W. B. Powell. What you should know about approximate dynamic programming.
Naval Research Logistics (NRL), 56(3):239–249, 2009.
[106] W. B. Powell, P . Jaillet, and A. Odoni. Stochastic and dynamic networks and routing.
Handbooks in operations research and management science, 8:141–295, 1995.
[107] W. B. Powell, J. A. Shapiro, and H. P . Sim˜ ao. An adaptive dynamic programming
algorithm for the heterogeneous resource allocation problem. Transportation Science,
36(2):231–249, 2002.
[108] W. B. Powell and B. Van Roy. Approximate dynamic programming for high dimen-
sional resource allocation problems. Handbook of learning and approximate dynamic
programming, pages 261–280, 2004.
114
[109] H. N. Psaraftis. Dynamic vehicle routing: Status and prospects. annals of Operations
Research, 61(1):143–164, 1995.
[110] R. Ramesh and K. M. Brown. An efficient four-phase heuristic for the generalized
orienteering problem. Computers & Operations Research, 18(2):151–165, 1991.
[111] G. Reinelt. TSPLIB - A Traveling Salesman Problem Library. ORSA Journal on Com-
puting, 3:376–384, 1991. Updated archive available online at http://comopt.ifi.
uni-heidelberg.de/software/TSPLIB95/.
[112] J. Renaud, F. F. Boctor, and G. Laporte. A fast composite heuristic for the symmetric
traveling salesman problem. INFORMS Journal on Computing, 8(2):134–143, 1996.
[113] J. Renaud, F. F. Boctor, and J. Ouenniche. A heuristic for the pickup and delivery
traveling salesman problem. Computers & Operations Research, 27(9):905–916, 2000.
[114] D. J. Rosenkrantz, R. E. Stearns, and P . M. Lewis II. An analysis of several heuristics
for the traveling salesman problem. SIAM journal on computing, 6(3):563–581, 1977.
[115] P . J. Schweitzer and A. Seidmann. Generalized polynomial approximations in
markovian decision processes,. Journal of Mathematical Analysis and Applications,
110:568–582, 1985.
[116] N. Secomandi. Comparing neuro-dynamic programming algorithms for the ve-
hicle routing problem with stochastic demands. Computers & Operations Research,
27(11):1201–1225, 2000.
[117] N. Secomandi. A rollout policy for the vehicle routing problem with stochastic
demands. Operations Research, 49(5):796–802, 2001.
[118] N. Secomandi. Analysis of a rollout approach to sequencing problems with stochas-
tic routing applications. Journal of Heuristics, 9(4):321–352, 2003.
[119] N. Secomandi and F. Margot. Reoptimization approaches for the vehicle-routing
problem with stochastic demands. Operations research, 57(1):214–230, 2009.
[120] H. P . Sim˜ ao, J. Day, A. P . George, T. Gifford, J. Nienow, and W. B. Powell. An
approximate dynamic programming algorithm for large-scale fleet management:
A case application. Transportation Science, 43(2):178–197, 2009.
[121] W. R. Stewart Jr and B. L. Golden. Stochastic vehicle routing: A comprehensive
approach. European Journal of Operational Research, 14(4):371–385, 1983.
[122] H. Topaloglu and W. B. Powell. Dynamic-programming approximations for
stochastic time-staged integer multicommodity-flow problems. INFORMS Journal
on Computing, 18(1):31–42, 2006.
115
[123] A. Toriello. Equivalence of an approximate linear programming bound with the
held-karp bound for the traveling salesman problem, 2013.
[124] A. Toriello. Optimal toll design: a lower bound framework for the asymmetric
traveling salesman problem. Mathematical Programming, 144(1-2):247–264, 2014.
[125] A. Toriello, W. B. Haskell, and M. Poremba. A dynamic traveling salesman problem
with stochastic arc costs. Operations Research, 62(5):1107–1125, 2014.
[126] P . Toth and D. Vigo. The vehicle routing problem, volume 9. Society for Industrial and
Applied Mathematics, 1987.
[127] M. A. Trick and S. E. Zin. A Linear Programming Approach to Solving
Stochastic Dynamic Programs. Unpublished manuscript available online at
http://mat.gsia.cmu.edu/trick/, 1993.
[128] M. A. Trick and S. E. Zin. Spline Approximations to Value Functions: A Linear
Programming Approach. Macroeconomic Dynamics, 1:255–277, 1997.
[129] R. J. Vander Wiel and N. V . Sahinidis. An exact solution approach for the time-
dependent traveling-salesman problem. Naval Research Logistics (NRL), 43(6):797–
820, 1996.
[130] W. Yang, K. Mathur, and R. H. Ballou. Stochastic vehicle routing problem with
restocking. Transportation Science, 34(1):99–112, 2000.
116
Abstract (if available)
Abstract
In this dissertation, we present a number of computationally-effective algorithms and heuristics for different routing problems, including the traveling salesman problem, the time-dependent traveling salesman problem, and the vehicle routing problem. Both deterministic and stochastic cases are considered. For the traveling salesman problem and the time-dependent traveling salesman problem, a new dynamic programming-based branch-and-bound heuristic is presented and shown to improve upon other approaches through a set of computational experiments. This methodology is then adapted into a dynamic heuristic policy for the dynamic traveling salesman problem with stochastic arc costs. Finally, a limited reoptimization heuristic for the vehicle routing problem with stochastic demand is presented. Results of computational experiments are again included, demonstrating an improvement over existing a priori techniques for certain instance types.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Train scheduling and routing under dynamic headway control
PDF
Routing problems for fuel efficient vehicles
PDF
Routing for ridesharing
PDF
Train routing and timetabling algorithms for general networks
PDF
Package delivery with trucks and UAVs
PDF
The warehouse traveling salesman problem and its application
PDF
New approaches for routing courier delivery services
PDF
The robust vehicle routing problem
PDF
Continuous approximation formulas for cumulative routing optimization problems
PDF
Applications of explicit enumeration schemes in combinatorial optimization
PDF
A stochastic employment problem
PDF
Routing and inventory model for emergency response to minimize unmet demand
PDF
Models and algorithms for pricing and routing in ride-sharing
PDF
Vehicle routing and resource allocation for health care under uncertainty
PDF
An online cost allocation model for horizontal supply chains
PDF
Asymptotic analysis of the generalized traveling salesman problem and its application
PDF
Continuous approximation for selection routing problems
PDF
Computational geometric partitioning for vehicle routing
PDF
Continuous approximation formulas for location and hybrid routing/location problems
PDF
A continuous approximation model for the parallel drone scheduling traveling salesman problem
Asset Metadata
Creator
Poremba, Michael
(author)
Core Title
Dynamic programming-based algorithms and heuristics for routing problems
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Industrial and Systems Engineering
Publication Date
09/05/2017
Defense Date
08/21/2017
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
approximate linear programming,branch-and-bound,dynamic programming,dynamic traveling salesman problem with stochastic arc costs,OAI-PMH Harvest,time-dependent traveling salesman problem,traveling salesman problem,vehicle routing problem,vehicle routing problem with stochastic demand
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Dessouky, Maged (
committee chair
), Toriello, Alejandro (
committee chair
), Carlsson, John (
committee member
), Savla, Ketan (
committee member
)
Creator Email
mjp388@gmail.com,poremba@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-426705
Unique identifier
UC11265296
Identifier
etd-PorembaMic-5706.pdf (filename),usctheses-c40-426705 (legacy record id)
Legacy Identifier
etd-PorembaMic-5706.pdf
Dmrecord
426705
Document Type
Dissertation
Rights
Poremba, Michael
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
approximate linear programming
branch-and-bound
dynamic programming
dynamic traveling salesman problem with stochastic arc costs
time-dependent traveling salesman problem
traveling salesman problem
vehicle routing problem
vehicle routing problem with stochastic demand