Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Variants of stochastic knapsack problems
(USC Thesis Other)
Variants of stochastic knapsack problems
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
V ARIANTS OF STOCHASTIC KNAPSACK PROBLEMS by Kai Chen A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (INDUSTRIAL AND SYSTEMS ENGINEERING) December 2013 Copyright 2013 Kai Chen To my parents ii Acknowledgments My sincere gratitude goes to Prof. Sheldon Ross, my academic advisor, for his support, encouragement, and contributions toward this achievement. No words could exaggerate the help I got from Prof. Sheldon Ross back to the days when I headed nowhere on the research and turned to him for guidance. I appreciate the help and inspirations as well from my committee members: Prof. Suvra- jeet Sen, Prof. Ramandeep Randhawa, Prof. Julia L. Higle, Prof. Qiang Huang, Prof. Aiichiro Nakano. My warmest regards to those friends I met in Southern California: my roommates, ISE/USC students, etc. I would like to give my special thank and the best wish to Sumeng Yu, thank you for lots of things. Finally, my deepest appreciation goes to my parents for their unwavering love and incredible respect throughout this long journey. Kai Chen Los Angeles, California Oct 2013. iii Contents Dedication ii Acknowledgments iii Abstract viii Chapter 1 Introduction 1 Chapter 2 Static Stochastic Knapsack Problems 8 2.1 the Static BKP model . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.2 Solution Structures . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 the SKP with simple recourse and penalty model . . . . . . . . . . . . . 15 2.2.1 Problem Definition with Preliminaries . . . . . . . . . . . . . . 15 2.2.2 Unimodality and Monotonicity . . . . . . . . . . . . . . . . . . 18 2.3 With exponentially distributed capacity for the two models . . . . . . . 23 2.3.1 the Static BKP model . . . . . . . . . . . . . . . . . . . . . . . 23 2.3.2 the SKP with simple recourse and penalty model . . . . . . . . 25 2.4 Search scheme for the optimal solution . . . . . . . . . . . . . . . . . . 28 2.5 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.5.1 Example for the static BKP model . . . . . . . . . . . . . . . . 30 iv 2.5.2 Example for the SKP with simple recourse and penalty model withc> 0 and exponential items’ weights . . . . . . . . . . . . 31 Chapter 3 An Adaptive Broken Knapsack Problem 32 3.1 Problem Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2 Optimal Policy forn = 2 . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.2.1 Preliminary Analysis . . . . . . . . . . . . . . . . . . . . . . . 38 3.2.2 Policy Statement and Optimality Proof forn = 2 . . . . . . . . 40 3.3 Two heuristic policies for generaln . . . . . . . . . . . . . . . . . . . . 46 3.3.1 Generalized Policy n . . . . . . . . . . . . . . . . . . . . . . . 47 3.3.2 A Second Heuristic Policy . . . . . . . . . . . . . . . . . . . . 48 3.3.3 A Numerical Example . . . . . . . . . . . . . . . . . . . . . . 49 Chapter 4 The Adaptive Stochastic Knapsack Problem with Exponential Capacity 52 4.1 Problem Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.2 Three Different Special Cases . . . . . . . . . . . . . . . . . . . . . . . 60 4.2.1 Model 1: Item’s weight and reward are independent, and the reward is exponentially distributed. . . . . . . . . . . . . . . . . 61 4.2.2 Model 2: Item’s weight is exponentially distributed and its reward is proportional to the weight. . . . . . . . . . . . . . . . . . . . 66 4.2.3 Model 3: Item’s reward is deterministic and its weight is random. 68 4.3 An Example ofn = 3 for Model 1 and Model 2 . . . . . . . . . . . . . 68 Chapter 5 The Markovian Stochastic Knapsack Problem with Exponential Capacity 72 5.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.2 When c=0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 v 5.3 Whenc> 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.4 A Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Chapter 6 Conclusions 92 Bibliography 94 List of Appendices Appendix A Appendix For Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . 97 Appendix B Appendix For Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . 103 List of Figures 3.1 Assuming only type 1 items available, at state (r;v) wherev < v 1 (r), we will keep inserting items until the state enters the stopping domain determined by the curve v 1 (). We denote (R 11 (r;v);V 1 (r;v)) as the intersection point betweenv 1 () and the line passing (r;v) with slopev 1 . 38 3.2 Assuming only two types of items available wherev 1 <v 2 andw 1 <w 2 , at state (r;v), there are 5 different cases regarding the relative position of the state point (r;v) w.r.t. the critical curvesv 1 () andv 2 (). . . . . . 44 vi B.1 Given V 1 (r;v)E 2 [V 1 (rT;v +v 2 T )], Case 1 considersvv 1 (r), from equations (3.5), we knowV 1 (rh;v+v 1 h) =v+v 1 h; 8 0hr; Case 2 considersv > v 1 (r), which means the state (r;v) is outside the stopping zone determined by boundaryv 1 (). . . . . . . . . . . . . . . 103 B.2 Given V 1 (r;v)E 2 [V 1 (rT;v+v 2 T )], Case 1 considersV 1 (r;v) =v, i.e.,vv 1 (r); Case 2 considersV 1 (r;v)>v, i.e.,v<v 1 (r). . . . . . 112 List of Tables 2.1 n=3, with parameters:v 1 =0:305025; 1 =0:313816;v 2 =0:334888; 2 = 0:466047;v 3 =0:68152; 3 =0:562881: . . . . . . . . . . . . . . . . . . 30 2.2 n = 3, with parameters: c = 5:0;d = 20:0;v 1 = 2:0;w 1 = 0:32;v 2 = 3:0;w 2 =0:40;v 3 =4:0;w 3 =0:52: . . . . . . . . . . . . . . . . . . . . 31 3.1 A numerical example of different policies on the discretized adaptive BKP model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.1 Parameters in a numerical example for Model 1 and Model 2. . . . . . . 68 vii Abstract The thesis studies variants of stochastic knapsack problems (SKP). In these models, there is a knapsack whose capacity is either a known constant or a random variable with a given distribution, e.g., an exponential distribution; there existn types of items with each type having infinite supply of items; it’s assumed that each item has a stochastic weight and a stochastic (or constant) reward depending on its type. We consider both static and adaptive models for SKP: a static model requires the decision maker to determine the subset of items to be put into the knapsack in the beginning of the problem; an adaptive model allows us to select a type of item and put one item into the knapsack on each stage, and we decide the move on the next stage based on the feedback from the system. We also consider an on-line SKP where items arrive sequentially to the system with successive item types constituting a Markov chain. The objectives for these variants of SKP are to find their respective optimal policies which lead to the maximal expected total return. We first study two models in static SKP: the static broken knapsack problem (BKP) and the SKP with simple recourse and penalty cost. In BKP, we lose everything once the knapsack is broken, whereas in the SKP with simple recourse and penalty the returns from the items put in the knapsack are retained but a penalty cost is incurred if the knapsack is broken. For both models, properties of unimodality of the expected return functions and monotonicity of the marginal optimal decision function are revealed under viii certain assumptions on the system parameters under both the constant and exponentially distributed knapsack capacity cases. An efficient searching algorithm for the optimal decision based on the two properties is developed. We also study an adaptive BKP model with constant knapsack capacity where each item has exponentially distributed weight and the reward is proportional to its weight with a constant factor, with the exponential rate and the constant factor depending on the item’s type. We state and prove an optimal policy when there are only two types of items. For the general model, heuristic policies are presented and discussed. When the knapsack has exponential capacity, we discuss another SKP model which only assumes the joint distribution of an item’s weight and reward for each type. We explore the solution structure of this general model and show that under three special circumstances on those joint distributions we can find the optimal policies. The last model is the markovian SKP with exponential capacity where items arrive to the system and their types follow a given Markov chain. In this model, we have to decide to accept or reject an incoming item with the knowledge of the item’s type; a fixed cost has to be paid in order to observe the next item. We are forced out of the system once the knapsack is broken or if an item is rejected, and we can choose to stop at any time point. We present the optimal policy for this model. ix Chapter 1 Introduction The knapsack problem is a classic and widely studied problem in combinatorial opti- mization arising from resource allocation (see Kellerer et al. [KPP04]). In its original form there are multiple items whose weights and rewards are known in advance, and we want to choose a subset of these items to maximize the sum of the rewards of the selected items subject to the knapsack capacity constraint for the items’ total weights. Knapsack problems naturally arise in resource allocation and budget planning problems where one wants to extract the maximum return from candidate projects while keeping the resource/budget balanced. In the optimization formulation of a knapsack problem, each item is given a decision variable, either 1 or 0, representing an item being selected or not. Knapsack problem is NP-hard even in the deterministic version, i.e, when items’ weights and rewards are all known. For deterministic knapsack problems, both the dynamic programming method (see Toth[Tot80]) and the branch and bound algorithm (see Kolesar [Kol67]) provide efficient ways to search for the optimal solution in pseudo-polynomial computation time. However, in most real life circumstances, the deterministic assumption on items’ weights and rewards are often violated due to the fact that these values are sometimes revealed only after one’s execution of a decision. For instance, when we decide to enroll in a project, its exact budget requirement is typically unknown before it is completed. The reward from the project may also vary because of uncertain factors during the execu- tion process. Therefore, to accommodate these practical situations, stochastic knapsack 1 problems (SKP) are introduced for problems where one or both of items’ weights and rewards are randomly distributed. There are generally two groups of SKP discussed in the literatures: SKP with all items available in the beginning; SKP with items arriving to the system one by one. For the former case where all items are already there, we can either put candidate items in knapsack all at once (the static SKP); or we can take advantage of adaptivity by putting in one item on each stage with updated system’s state information (the adaptive SKP). For the latter case, whenever an item arrives to the system, we decide whether to accept or reject the item, and we have no control over the type of incoming items which follow an inherent process. For both types of SKP models, Schilling [Sch94] gives results on the asymptotic optimal values. For the static SKP, some papers study problems where the objective is to maximize the probability of achieving a pre-specified total return target while the capacity con- straint is strictly satisfied (the target criterion). They assume stochastic rewards and deterministic weights on each item. Steinberg and Parks [SP79] and Sniedovich [Sni80] considered problems where all weights are integral numbers. A preference ordering was proposed on the reward distributions, which was then applied within a dynamic programming (DP) scheme to facilitate the search for optimal solutions. Henig [Hen90] discussed percentile criterion (which maximize a percentile value) in addition to the tar- get criterion. The author mainly showed that by combining DP with a search procedure, the problem can be solved relatively fast when rewards are normally distributed. Mor- ton and Wood [MW98] compared a simulation-based DP and an integer programming method in the case of normally distributed rewards and deterministic weights. They also provided a simulation-based procedure to approximate the optimal solution for more general reward distributions. 2 Another common objective in the static SKP is to maximize the total rewards and at the same time keep the overflow probability under a certain level (the chance constraint model). Kosuch and Lisser [KL10] [KL11] discussed the chance constraint model as well as the simple recourse model and proposed stochastic methods that provide upper and lower bounds to solve the problems using branch and bound techniques, similar to what Cohn and Barnhart did in [CM98]. Concerning approximation methods for the chance constraint model, Kleinberg et al. [KRT97] proved that there exist polynomial time approximation schemes whose return is at least as good as the optimal solution given the overflow constraint (the probability of overflow) is relaxed in a small fractional manner. Goel and Indyk [GI99] presented those approximation schemes when the items’ weights follow Poisson, exponential, or Bernoulli distributions respectively. In the static SKP with simple recourse, instead of constraining the overflow prob- ability, a penalty cost proportional to the overflow amount is assumed. We study in this thesis the static SKP with simple recourse and penalty model which generalizes the original model by adding a fixed cost whenever the overflow occurs. In this model, which assumes that each type has an infinite supply of items, we show the unimodal- ity property of the expected return function holds when (a) the overflow fixed cost is 0, or (b) when all weight distributions are exponential. We develop an efficient search algorithm to locate sub-optimal solutions by applying the unimodality property, as well as the monotonicity property of the marginal optimal decision function (whose proof is always similar as that for the unimodality property in our models). The other static SKP model studied here is the static broken knapsack problem (BKP) model. In this model, we assume the overflow leads to the loss of all existing values in the knapsack (the broken assumption). We prove the unimodality property holds for the static BKP model when all weight distributions satisfy the DRH condition, i.e., having decreasing 3 reversed hazard rate. General results on the two static models are given as well when the knapsack has an exponential distributed capacity. We assume the broken assumption in all the other SKP models studied in this thesis: the adaptive BKP model, the adaptive SKP with exponential capacity and the markovian SKP with exponential capacity. When all items are available in the beginning, another type of SKP is the adaptive SKP: select an item and put in the knapsack; the item’s weight and reward is revealed to the decision maker; stop or select another item based on the updated system state. Derman et al.[DLR78] consider a renewal decision problem which is indeed an adaptive SKP with exponentially distributed items’ weights. They determine the optimal policy when assuming different supply amounts for each type. Dean et al.[DGV04] study the benefit of adaptivity comparing to the static SKP with random weights, where they assume the final overflowing item contributes no value. They bound the adaptivity gap, which measures the ratio of the optimal adaptive policy value to the optimal static policy value, to a factor of four; and they also devise a polynomial-time adaptive policy that approximates the optimal policy with a factor of 3 +" for any positive ". Ilhan et al. [TID11] study the adaptive SKP with target achievement objective where items’ rewards are random and their weights are deterministic. They formulate the adaptive SKP as a DP problem for discrete random rewards. For more general reward distributions like normal distributions, they present a heuristic which mixes adaptive and static policies, and they show the heuristic leads to near-optimal results when all reward distributions are assumed to be normal. We study two adaptive SKP models: the adaptive BKP where the weight of an item is exponentially distributed and its reward is proportional to the weight with a given factor (the exponential rate and the constant factor are both depending on the item’s type); the adaptive SKP with exponentially distributed knapsack capacity where we 4 only assume joint distributions on items’ weights and rewards. For the first adaptive model, we present an optimal policy when n = 2, where n is the number of types of items; we also prove a heuristic policy which yields expected returns very close to the optimal. For the second model, we take account of the special problem structure from the assumption that the knapsack capacity is exponentially distributed. Three specific circumstances on the joint distributions of items’ weight and reward are discussed with respective optimal policies presented. Indeed, with the memoryless property of exponential distributions, the adaptive SKP with an exponential knapsack capacity can be interpreted as the burglar problem. In the burglar problem, the burglar accumulates wealth through a series of burglaries. On each round, he either chooses a target from a fixed set of target types; or he retires with all the wealth earned so far. For every target type, the burglar knows the probability of being caught and the reward distribution from burgling the target. He loses every- thing and goes to jail if is caught in a burglary. The objective is to find a policy which maximizes expected total rewards. The burglar problem can be applied in the area of healthcare. Imagine a patient is receiving a treatment. On each round of the treatment, the doctor in charge has to choose one from a number of various types of modalities for the patient. The added improvement from a modality is assumed to be randomly distributed depending on its type. The use of each modality has a certain probability of bringing side effects that disallow further treatments, thus negating all previous work on the patient. The doctor always has the option to terminate the treatment at any time to preserve the progress achieved to that point. The burglar problem is discussed in Kadane [Kad71]. The paper assumes that there exist a finite number of candidate targets and the reward from each successful burglary is known in advance. The author discusses two objectives: maximizing the probability of meeting a pre-specified goal; maximizing the expected total fortune. In Ross [Ros83], the burglar problem assuming only one type of 5 targets is mentioned as an example of optimal stopping problems where the one-stage look-ahead rule is the optimal policy. Other applications include: Ross et al. [RTW09] with a clinical setting; Woo [Woo09] with a counter-terrorism backdrop. The second group of SKP models assume that candidate items come to the system sequentially. In Ross and Tsang [RT89], different types of items have different arrival rates. The weight of an incoming item is deterministic and its reward is randomly dis- tributed depending on the item’s type. The authors show that, under a wide range of parameters with only two types of items, the optimal control is always of the threshold structure. In Lin et al. [LLY08], they consider an application in the field of revenue management where customer orders, e.g., flight ticket reservation, arrive regularly with the demand quantity and the unit bid price. The objective is to maximize the expected total revenue given a deadlineT . They show in the paper the asymptotic optimality of the switch-over policy, which accepts the highest price first and then starts to accept lower prices as the deadline approaches. A dynamic pricing policy based on the same logic is developed. In Van and Young [VSY00] and Lu et al. [LCJ99], they discuss other applications of finite-horizon stochastic knapsack models in revenue management and project selection problems. They assume that different types of item arrive by Poisson processes and for each item, both the weight and the reward are deterministic. They dis- cuss the problem structures using DP and prove that with a fixed time constraint, there exist acceptance intervals on the total existing rewards axis for each type. Kleywegt and Papastavrou [KP98] [KP01] discuss the dynamic and stochastic knapsack problem (DSKP). In their model, an item’s weight and reward are unknown but revealed upon its arrival. Whenever an item becomes available, i.e., it arrives to the system, we can either accept or reject it. If we reject the item, a penalty cost is incurred. There is a salvage value for any remaining capacity on the deadline. They explore the structures 6 of the optimal policy in this model and show the optimal policy consists of a threshold acceptance rule and an optimal stopping rule. The model with sequentially-arriving items discussed in this thesis is the markovian SKP with an exponentially distributed capacity. We assume in this model that items of different types arrive to the system one by one where the types constitute a given Markov process. A fixed cost has to be paid on each stage in order to observe the incoming item. It is also assumed that we have to leave the system once we reject an item or at anytime we choose to stop. We prove that given the type information of the available item, the optimal policy proceeds if and only if the current existing total rewards fall in a continuous interval depending on the item’s type. We give an algorithm to calculate those intervals for all types which characterizes the optimal policy. In the following, Chapter 2 discuss two static SKP models; Chapter 3 discuss the adaptive BKP model; Chapter 4 discuss the adaptive SKP with exponential capacity; Chapter 5 discuss the markovian SKP with exponential capacity. 7 Chapter 2 Static Stochastic Knapsack Problems We consider two models of static stochastic knapsack problems (SKP): the static broken knapsack problem (BKP) and the SKP with simple recourse and penalty cost problem. For both models, the knapsack has a given constant capacity and there are n types of items where each type has an infinite supply of items. A typei item has a deterministic valuev i and a random weight with the given distributionF i , and one has to determine at the beginning of the problems the quantities of items on each type to put in the knapsack. The static BKP assumes that if the knapsack is broken, all the existing rewards are wiped out. For the SKP with simple recourse and penalty, if the capacity constraint is violated, the existing rewards are retained but we have to pay a fixed penaltyc plus a variant cost proportional to the overcapacity amount with a given factor d. In both models, the objective is to maximize the expected total return. We will discuss in this chapter for the two models the unimodality property of the expected return function and the monontonicity property of the marginal optimal decision functions. We present a heuristic search algorithm that unitizes the two properties and returns a decision vector close to the optimal solution. We also study the case when the knapsack capacity is exponentially distributed for both models. In Section 2.1, we define the static BKP and prove that the two properties, unimodal- ity and montonocity, hold under the assumption of DRH (decreasing reversed hazard rate) on items’ weight distributions. In Section 2.2, the SKP with simple recourse and penalty is defined. We show that the two properties hold either ifc = 0 or if all weight distributions are exponential. In Section 2.3, we consider for both models the case when 8 the knapsack capacity is exponentially distributed: in the first model, we show that the two properties hold for general distributions on items’ weights; in the second model, we give a sufficient condition and similar results as in the constant capacity model. In Section 2.4, we present a heuristic search algorithm based on the two properties. We show the performance of our algorithm in numerical examples by comparing it with exhaustive searches in Section 2.5. We put some proofs in Appendix A. 2.1 the Static BKP model 2.1.1 Problem Definition Consider a knapsack with a deterministic capacity w. There are n types of items and each type has infinite supply of items available. For a typei item, 1in, its weight is a non-negative randomly variable with a known distribution function F i ; its value is v i , a positive deterministic number given in advance. We want to determine in the beginning of the problem the quantities on each type of items to be put to the knapsack. The total weights of items in the knapsack are revealed only after our decision has been executed. We assume that the weights of the items put in the knapsack are independent. If the knapsack is not broken after we execute our decision, we take all the values in the knapsack; otherwise, we get nothing. The aim is to maximize the expected total returns. In this paper for the static BKP model, we make the following assumption: All F i , 1 i n, have decreasing reversed hazard rate (DRH), i.e., if F i is a continuous (discrete) distribution, then f i (x) F i (x) ( f i (k) F i (k) ) is non-increasing inx forx 0 (in k fork2N) wheref i (x) (f i (k)) is the probability density (mass) function w.r.t.F i . Remark 1 If a distribution function is log-concave, then it’s obviously DRH. As shown in An [An97] and Bagnoli and Bergstrom [BB05], examples of continuous log-concave 9 distributions include normal, exponential, uniform over convex domain, gamma with shaper parameter 1, etc; examples of discrete log-concave distributions include Bernoulli, binomial, Poisson, geometric, etc. Mathematical Notations (k 1 ;k 2 ; ;k n ), a decision vector which putsk i typei items in the knapsack, for each i = 1;:::;n: R(k 1 ; ;k n ) = ( n X i=1 k i v i )P n X i=1 k i X h=1 W ih w the expected return by choosing the decision (k 1 ;k 2 ; ;k n ) whereW ih F i . h(k 1 ; ;k j1 ;;k j+1 ; ;k n ) = arg max k2N R(k 1 ; ;k j1 ;k;k j+1 ; ;k n ) the marginal optimal decision function that decides the quantity of type j items to be put to the knapsack in order to maximize the expected return given the fixed quantities for all other types . 2.1.2 Solution Structures We first want to exclude the types that will never be used in the optimal decision. A natural criteria is to filter out any type which has a smaller reward and at the same time The function returns the smallest one if there are multiple integers that maximize the expected return function. 10 has stochastically greater weight compared to another type. The following proposition shows the validity of this criteria. Proposition 1 For two typesi andj, ifv i v j , andF i st F j , then typei is dominated by type j, i.e., replacing a type i item by a type j item has no negative effect on the expected return. Proof. The replacement of a typei item by a typej item increases items’ total values and at the same time brings down the probability of breaking the knapsack, which implies the dominance of typej over typei in the static BKP model. The above proposition says that the optimal decision never put in any dominated types. In the following discussions, we assume none of then types falls into the domi- nated types category. To start exploring solution structures for the static BKP, we need the following observation regarding the assumption that all items’ weights have to be DRH. Lemma 1 SupposefX i g is a sequence of independent non-negative DRH random vari- ables, and let S m = m X i=1 X i ,8m 1. Then for any constant c > 0, the conditional random variable [S m jS m c] is stochastically increasing inm. Proof. Follows directly from Theorem 1.C.12 (p47) and the equation 1.B.43 (p37) in Shaked and Shanthikumar [SS06]. Theorem 1 (Unimodality of the expected return function) Assuming allF i , 1i n, are DRH, for any j 2 [1;n], fixed k i 2 N for all i 6= j, then the function R(k 1 ; ;k j1 ;k;k j+1 ; ;k n ) is unimodal ink, first increasing and then decreasing ink. 11 Proof. We have to prove that if R(k 1 ; ;k j1 ;k;k j+1 ; ;k n )R(k 1 ; ;k j1 ;k + 1;k j+1 ; ;k n ); then R(k 1 ; ;k j1 ;k + 1;k j+1 ; ;k n )R(k 1 ; ;k j1 ;k + 2;k j+1 ; ;k n ): Let’s define S(k) = k X h=1 W jh + X i6=j k i X h=1 W ih ; v(k) =kv j + X i6=j k i v i ; where W ih F i 8i2 [1;n], and all these random variables (r.v.s) are independent. HereS(k) andv(k) are the total items’ weights and values respectively by applying the decision (k 1 ; ;k; ;k n ). Now, R(k 1 ; ;k; ;k n )R(k 1 ; ;k + 1; ;k n ) ,v(k)P S(k)w v(k + 1)P S(k + 1)w , v(k) v(k + 1) P S(k + 1)wjS(k)w : Hence ifR(k 1 ; ;k; ;k n )R(k 1 ; ;k + 1; ;k n ), then v(k + 1) v(k + 2) v(k) v(k + 1) P S(k+1)wjS(k)w P S(k+2)wjS(k+1)w ; 12 where the first inequality follows because v(k) v(k+1) increases ink and the final inequality follows from Lemma 1. Corollary 1 For anyj2 [1;n], fixedk i for alli6=j, there existsk <1 such that: R(k 1 ; ;k j1 ;k ;k j+1 ; ;k n )R(k 1 ; ;k j1 ;k + 1;k j+1 ; ;k n ): Proof. From Theorem 1, it suffices to show: R(k 1 ; ;k j1 ; 1;k j+1 ;k n ) lim m!+1 R(k 1 ; ;k j1 ;m;k j+1 ; ;k n ): Because the left hand side is non-negative, the above inequality can be proved by show- ing: lim m!+1 R(k 1 ; ;k j1 ;m;k j+1 ; ;k n ) = 0; (2.1) the proof of the above equality is in Appendix A. Remark 2 The unimodality property doesn’t hold for the static BKP model under gen- eral weight distributions. For instance, supposen = 1; v 1 = 1;w = 3; andW 1 , the weight of a type 1 item is such that W 1 = 8 > > < > > : 1 with probability p 17 6 ; 3 w.p. 1 p 17 6 : Then R(1) = 1; R(2) = 17 18 ; R(3) = 17 18 p 17 4 : BecauseR(1)>R(2)<R(3), the unimodality property is violated. 13 Proposition 2 (Monotonicity of the marginal optimal decision function) Given par- tial decision vectorr = (k 1 ; ;k n1 ), the marginal optimal decision function for type n items is defined as h(r) = arg max k2N R(r;k); where R(r;k) =R(k 1 ; ;k n1 ;k): Then,h(r) is non-increasing inr. Proof. Let I i be the unit vector whose i th element equals 1 and others equal 0 for i = 1;:::;n 1: By Theorem 1, it suffices to show: R(r;m)R(r;m + 1) )R(r +I i ;m)R(r +I i ;m + 1): The proof of the preceding implication is similar to the proof in Theorem 1 and it imme- diately follows by the fact that: [S(r +I i ;m)jS(r +I i ;m)<w] st [S(r;m)jS(r;m)<w]; where S(r +I i ;m) and S(r;m) are the total weights by taking decisions (r +I i ;m) and (r;m) respectively. Note that as there is no specific assumption on the order of then types imposed on Proposition 2, it follows that the above monotonicity property holds for the marginal optimal value function of any type. An immediate result from Proposition 2 bounds the search space for the optimal solutions. 14 Corollary 2 (Bounded search space) Let (k 1 ; ;k n ) be the optimal solution for the static BKP withn types available, andk 1 i be the optimal solution if only typei items are available. Then, k i k 1 i ;8i = 1;:::;n: Proof. From Proposition 2. With Corollary 2, we know there are at most n Y i=1 (k 1 i + 1) decision vectors to check for the optimal solutions. Indeed, combining Theorem 1 with Proposition 2, a great portion of all decision vectors can be skipped in the search process to locate the optimal one. We will describe a search scheme in Section 2.4. 2.2 the SKP with simple recourse and penalty model 2.2.1 Problem Definition with Preliminaries The model has the same setting as for the static BKP model. However, after we decide the quantity for each type and put items in the knapsack, instead of losing everything if the knapsack is broken, this model assumes that any overflow incurs a recourse cost which is proportional to the amount of overcapacity with a constant factord, as well as a fixed penaltyc,c 0. Given a decision vectork = (k 1 ; ;k n ), the expected return function is for this model: R(k) =R(k 1 ; ;k n ) = n X i=1 k i v i dE[(W total w) + ]cP (W total >w); (2.2) 15 whereW total = n X i=1 k i X h=1 W ih , andW ih F i : Remark 3 When c = 0, the above is SKP with simple recourse model. Problems of SKP with simple recourse under different settings are discussed in Cohn and Barnhart [CM98], Kosuch and Lisser [KL10]. Proposition 1 is still true for this model, i.e., a type is dominated if an item of this type has a smaller reward but a stochastically greater weight. We also have the following immediate observation. Proposition 3 Letw i be the mean weight of a typei item. Ifd <max i v i w i then sup k R(k) =1 Proof: Suppose v 1 w 1 =max i v i w i >d: Then, R(n; 0;:::; 0)nv 1 dE[W total ]c =n(v 1 dw 1 )c Hence, lim n!1 R(n; 0;:::; 0) =1: From hereon we will assume thatd>max i v i w i . Now, for a fixed j, we shall assume in the following that the decision vector is (k 1 ;:::;k j1 ;k;k j+1 ;:::;k n ), where valuesk i ;i6=j are fixed. Define S(k), k X h=1 W jh + n X i=1;i6=j k i X h=1 W ih ; (2.3) v(k),kv j + n X i=1;i6=j k i v i ; 16 whereW ih F i 8i;h. S(k) andv(k) are the total weights and the total rewards in the knapsack respectively by putting k type j items in addition to items in the rest types. We want to check whetherR(k 1 ; ;k; ;k n ) is unimodal ink. Using the notationI A to be the indicator variable of the eventA, note that R(k 1 ; ;k; ;k n )R(k 1 ; ;k + 1; ;k n ) , E[v(k)d (S(k)w) + cI S(k)>w ]E[v(k + 1)d (S(k + 1)w) + cI S(k+1)>w ] , dE[(S(k + 1)w) + (S(k)w) + ] +cE[I S(k+1)>w I S(k)>w ]v j : (2.4) The following is the equivalent of Corollary 1 for the current model, which implies the existence of the marginal optimal decision given fixed quantities of all types except for type under consideration. Corollary 3 In the SKP with simple recourse and penalty model, for any j 2 [1;n], fixedk i for alli6=j, there existsk <1 such that: R(k 1 ; ;k j1 ;k ;k j+1 ; ;k n )R(k 1 ; ;k j1 ;k + 1;k j+1 ; ;k n ): Proof. Becaused> max i2[1;n] v i w i ; it follows that lim n R(k 1 ; ;k j1 ;n;k j+1 ; ;k n ) =1: 17 2.2.2 Unimodality and Monotonicity Theorem 2 If c = 0, the expected return function in SKP with simple recourse and penalty has the unimodality property. Proof. Givenc = 0, from the equivalence relation (2.4), we only have to show: dE[(S(k+1)w) + (S(k)w) + ]v j )dE[(S(k+2)w) + (S(k+1)w) + ]v j : Let’s assume two r.v.s W 1 F j ;W 2 F j , where W 1 ;W 2 ;S(k) are independent. Since the functiong(x) =x + is a convex function, we have: g(S(k)+W 2 +W 1 w)g(S(k)+W 2 w)g(S(k)+W 1 w)g(S(k)w); (2.5) whereW 1 ; W 2 are always non-negative by our assumption onF j . Hence, dE[(S(k + 2)w) + (S(k + 1)w) + ] =dE[(S(k) +W 1 +W 2 w) + (S(k) +W 2 w) + ] dE[(S(k) +W 1 w) + (S(k)w) + ] =dE[(S(k + 1)w) + (S(k)w) + ]; which concludes the proof. Proposition 4 Whenc = 0, the marginal optimal decision function has the monotonic- ity property. Proof. The proof is similar to that for Theorem 2. 18 The following example shows that the unimodality property need not hold when c> 0. Counter Example: Assumen = 1;v 1 = 1;w = 199:9 and thatW 0 , the weight of an item, is such that W 0 = 8 > > < > > : 1 with probability 110 4 ; 100 with probability 10 4 : Withd = 1 andc = 200, we have: R(100) 100; R(101) 98;R(199) 193: which contradicts with unmodality. We will prove that the unimodality property holds under positive c if every F i is exponential with meanw i fori = 1; ;n. Theorem 3 If allF i ,i = 1; ;n, are exponential, then the expected return function in the SKP with simple recourse and penalty model has the unimodality property. Proof. First we need to define a discrete random variableN as follows. N = 8 > > < > > : 0 ifS(0)>w; k ifS(k 1)w<S(k); for allk 1; (2.6) whereS(k) is defined in equation (2.3). Let f N (k) =P (N =k); k 0: 19 Note: P (S(k)>w) = k X h=0 f N (h); and 1 X h=0 f N (h) = 1: Given all items’ weights are exponentially distributed, from the equivalence relation (2.4), we have R(k 1 ; ;k; ;k n )R(k 1 ; ;k + 1; ;k n ) , dE[(S(k + 1)w) + (S(k)w) + ] +cE[I S(k+1)>w I S(k)>w ]v j ; where E[(S(k + 1)w) + (S(k)w) + ] =w j P (S(k)>w) +P (S(k)w;S(k + 1)>w) ; E[I S(k+1)>w I S(k)>w ] =P (S(k)w;S(k + 1)>w): Therefore, the preceding inequality is equivalent to dw j P (S(k)>w) + (dw j +c)P (S(k)w;S(k + 1)>w)v j , dw j k+1 X h=0 f N (h) +cf N (k + 1)v j : To prove the theorem, we only need to show dw j k+1 X h=0 f N (h) +cf N (k + 1)v j )dw j k+2 X h=0 f N (h) +cf N (k + 2)v j : (2.7) 20 Let’s define: k u , minfk 0 :dw j k+1 X h=0 f N (h)v j 0g; k 0 , minfk 0 :dw j k+1 X h=0 f N (h) +cf N (k + 1)v j 0g: Because 1 X h=0 f N (h) = 1,dw j >v j andc> 0, we knowk u must exist andk 0 k u . We want to show the claim (2.7) holds for allkk 0 . Whenkk u 1, the claim (2.7) follows from the definition ofk u . Ifk u = 0 ork 0 =k u , the claim (2.7) obviously holds. Now we only have to check the claim (2.7) when k2 [k 0 ;k u 1) given k u > 0 and k 0 <k u . For anyk2 [k 0 ;k u 1) givenk u > 0 andk 0 <k u , because dw j ku X h=0 f N (h)<v j dw j ku+1 X h=0 f N (h); we have dw j k+1 X h=0 f N (h) +cf N (k + 1)v j ,c v j dw j k+1 X h=0 f N (h) f N (k + 1) )c> dw j ku X h=0 f N (h)dw j k+1 X h=0 f N (h) f N (k + 1) ; 21 on the other hand, c dw j ku+1 X h=0 f N (h)dw j k+2 X h=0 f N (h) f N (k + 2) )c v j dw j k+2 X h=0 f N (h) f N (k + 2) ,dw j k+2 X h=0 f N (h) +cf N (k + 2)v j : Therefore, to prove the claim (2.7) fork2 [k 0 ;k u 1), we only need to show: dw j ku X h=0 f N (h)dw j k+1 X h=0 f N (h) f N (k + 1) dw j ku+1 X h=0 f N (h)dw j k+2 X h=0 f N (h) f N (k + 2) ; which is equivalent to show: f N (k + 2) f N (k + 1) P ku+1 h=k+3 f N (h) P ku h=k+2 f N (h) ; and this inequality follows if we can show: f N (k + 2) f N (k + 1) f N (k + 3) f N (k + 2) ;8k 0: (2.8) The inequality (2.8) indeed says that the discrete random variable [NjN > 0] is log- concave, which is showed in Lemma 2 below. Lemma 2 The discrete random variable [NjN > 0] where N is defined in (2.6) is log-concave. 22 Proof. See the proof in Appendix A. Proposition 5 When c > 0 and items’ weights are exponentially distributed, the marginal optimal value function has the monotonicity property. Proof. The proof is similar as for Theorem 3. 2.3 With exponentially distributed capacity for the two models In the preceding discussions, we assume that the knapsack capacity is a known constant. In this section, we show that the same results hold for the two static models even when the knapsack has exponential capacity. Since the proof of monotonicity of the marginal optimal decision functions applies the similar logic as in the proof for the unimodality property, in the following, we only give the arguments for the unimodality proof. 2.3.1 the Static BKP model Assuming exponential capacity W which is independent with items’ weights, the expected return function for the static BKP model is R(k 1 ; ;k n ) = ( n X i=1 k i v i )P n X i=1 k i X h=1 W ih W = ( n X i=1 k i v i ) n Y i=1 k i Y h=1 P (W ih W ); whereW ih F i . 23 A result similar to Lemma 1 holds under the assumption of exponential capacity with no need of assuming DRH (decreasing reversed hazard rate) on items’ weights. Lemma 3 SupposefX i g is a sequence of independent non-negative random variables, and letS m = m X i=1 X i ,8m 1. Then the conditional random variable [S m jS m W ] is stochastically increasing in m, where W is an exponential random variable that is independent withX 1 ;X 2 ;:::. Proof. We want to prove [S m jS m W ] st [S m+1 jS m+1 W ]; which is equivalent to P (S m sjS m W )P (S m+1 sjS m+1 W );8s 0: , P (S m s;S m W ) P (S m W ) P (S m+1 s;S m+1 W ) P (S m+1 W ) , P (S m+1 W ) P (S m W ) P (S m+1 s;S m+1 W ) P (S m s;S m W ) ( letH(W ) = minfs;Wg andX =S m+1 S m ; ) ,P (S m+1 WjS m W )P (S m+1 h(W )jS m h(W )) ,P (WX)P (Xh(W )S m jS m h(W )): LetW h = st h(W )S m jh(W ) S m , where we use the notationV = st U if ran- dom variablesV andU have the same distribution. Now from the preceding equivalent inequality, it suffices to show W st W h ; 24 which is from W h = st minfWS m ;sS m gjWS m ;sS m st WS m jWS m ;sS m = st W: With the above lemma, as in Section 2.1.2, the following theorem is immediate. Theorem 4 In the static BKP model with exponential capacity, for any F i , 1 i n, both unimodality of the expected return function and monotonicity of the marginal optimal decision function hold. 2.3.2 the SKP with simple recourse and penalty model When the knapsack capacityW is an exponential random variable and it is independent with items’ weights, the expected return function for the SKP with simple recourse and penalty model is R(k 1 ; ;k n ) = n X i=1 k i v i dE[(W total W ) + ]cP (W total >W ); (2.9) whereW total = n X i=1 k i X h=1 W ih , andW ih F i : We still have to assumed> max i v i w i to guarantee the existence of the optimal solu- tion. For the general model with no constraints onc and no specific assumption onF i , there exists a sufficient condition under which the unimodality property holds. 25 Proposition 6 LetX i F i ; 1 i n, wherew i = E[X i ]; and the capacityW is an exponential r.v. and independent with allF i . A sufficient condition for the unimodality is: c min 1in dw i dE[(X i W ) + ] P (X i >W ) : (2.10) Proof. Using the same notations as in equation (2.4), for anyj2f1; ;ng, given dE[(S(k + 1)W ) + (S(k)W ) + ] +cE[I S(k+1)>W I S(k)>W ]v j ; (2.11) to prove the unimodality, we have to show dE[(S(k+2)W ) + (S(k+1)W ) + ]+cE[I S(k+2)>W I S(k+1)>W ]v j : (2.12) LetX j F j ,w j =E[X j ], then dE[(S(k + 1)W ) + (S(k)W ) + ] +cE[I S(k+1)>W I S(k)>W ] =dw j P (S(k)W ) +dE[(S(k + 1)W ) + jS(k)<W ]P (S(k)<W ) +cP (S(k + 1)>WjS(k)<W )P (S(k)<W ) =dw i P (S(k)W )+ dE[(X j W ) + ] +cP (X j >W ) P (S(k)<W ): Similarly, the lefthand side of the inequality (2.12) is equal to dw i P (S(k + 1)W )+ dE[(X j W ) + ] +cP (X j >W ) P (S(k + 1)<W ): 26 Now with condition (2.10), dw i P (S(k + 1)W )+ dE[(X j W ) + ] +cP (X j >W ) P (S(k + 1)<W ) dw i P (S(k)W ) dE[(X j W ) + ] +cP (X j >W ) P (S(k)<W ) =dw i P (S(k + 1)W;S(k)<W ) dE[(X j W ) + ] +cP (X j >W ) P (S(k)<W;S(k + 1)W ) = dw i dE[(X j W ) + ]cP (X j >W ) P (S(k)<W;S(k + 1)W ) 0: Remark 4 When c = 0, condition (2.10) in Proposition 6 always holds because the lefthand side of condition (2.10) is non-negative. When c > 0 but items have exponentially distributed weights, we can prove the unimodality in the similar way as in Theorem 3: by re-defining N in definition (2.6) withW replacingw, the new r.v.N still satisfies Lemma 2, the proof of which needs the observation that the newY in Proof of Lemma 2 in Appendix A, whereY =WS(0), still has log-concave pdf whenW is an exponential r.v.. Theorem 5 In the SKP with simple recourse and penalty model with exponential capac- ity, when c=0 or all types have exponential weights, both the unimodality and the mono- tonicity properties hold. 27 2.4 Search scheme for the optimal solution Assuming the unimodality property of the expected return function and the monotonicity property of the marginal optimal decision functions, we present a search algorithm to locate the optimal solution for the two static SKP models. Given the weight distributionsF i ,8i = 1; ;n, and all other parameters, the value of the expected return functionR( ) on a decision vector is difficult to compute even if F i is a common distribution like normal or exponential. However, we always can use simulation procedures to approximate R( ) for each input vector. To facilitate the simulation, a large sample set of random variables in a specified distribution will be generated for each type in the beginning. Then whenever random variables of a certain distribution are needed in the simulation process, they can be randomly drawed from those sets. IfF i has a well-defined inverse functionF 1 i , the set of typei items’ weights can be generated through stratified simulation, i.e., choose a large enough integerN, and the set is:fF 1 i ( k 1 2 N ) :k = 1; 2; ;Ng. If allF i ,i2f1; ;ng, are exponential or normal, we only have to generate one set of standard random variables instead ofn sets for every type, then whenever an instance is needed, we randomly select one from the set and transform the standard r.v. to the desired one. In the following, given a decision vector, the expected return functionR( ) represents the simulation-genarated value. According to Corollary 2, we first have to find all k 1 i ,8i 2 [1;n] to bound the search space. Assuming only type i items are available, k 1 i is determined as follows: computingR(2 l ) forl = 0; 1; 2; sequentially until the firstl, denoted asl u , where R(2 lu1 ) > R(2 lu ). From the unimodality property, 2 lu must be an upper bound ofk i . A binary search in [0; 2 lu ] is then taken to pinpoint the optimal solutionk 1 i . A heuristic search algorithm similar to simulated annealing for optimal solutions is presented here. In this heuristic, we start withn decision vectors: (0; ;k 1 i ; ; 0), 8i2f1; ;ng. On the first round, for each starting vector, we try to find the best 28 marginal decision on the first element while keeping the remainingn1 elements fixed. We update each vector respectively by replacing the first element with the calculated best marginal decision. Then on the next round, we find the best marginal decision on the second element of each updated vector while keeping other elements fixed, update vectors respectively, and move to the next round so that on ther-th round, we consider for each vector the element on position (r 1)modn + 1. The updating process stops when no more improvement can be made after n rounds. The decision vector that has the highest expected return among then final vectors is returned by the program. Search Algorithm Computek 1 i for eachi2 [1;n]. LetS be an empty set. Fori from 1 ton. Setd i = (0; ;k 1 i ; ; 0). Setc i = 0, andpos = 0. Whilec i n. Updatepos=pos (mod n); Keep all elements in decision vectord i fixed except for that in position pos+1; apply the unimodality property in bisection search to find the local optimal decision for typepos+1 in the range [0;k 1 i ]. If the newly-found quantity differs the old one for typepos+1 ind i , setc i =0; otherwisec i =c i +1. Updatepos=pos+1. End While. Add d i ;R(d i ) to the setS. End For. 29 In the setS, find d;R(d) which has the largestR(d), outputd. The decision vector returned by the above search algorithm is not guaranteed to be the optimal vector, however, as we’ll show in the following numerical examples, it is very close to the optimal. 2.5 Numerical Examples In this part, we give numerical examples to illustrate the implementation of the search algorithm for the two models with constant capacity. The programs are written in C++ codes, and the value of the expected return function on each decision vector is approxi- mated by Monte Carlo simulation. 2.5.1 Example for the static BKP model We assume the items’ weights are absolute values of normal r.v.s. (note: the r.v.s are in DRH). The parameters of the distributions are randomly generated, i.e., for each type i2 [1; 3], we setv i =u 1 ,F i jN(0;u 2 2 )j whereu 1 ;u 2 are independent uniform r.v.s in (0; 1). capacity (w) RT (ALG) RT (ES) ER (ALG) ER (ES) 1 0.013508 0.016692 0.858716 0.868598 5 0.05416 1.08355 4.95937 y 4.94351 20 0.244282 236.822 23.0249 23.0105 Table 2.1: n = 3, with parameters: v 1 = 0:305025; 1 = 0:313816;v 2 = 0:334888; 2 = 0:466047;v 3 =0:68152; 3 =0:562881: 30 On Table 1 of n = 3, the running time (RT) in our algorithm (ALG) becomes an even smaller percentage of RT in the exhaustive search (ES) as the problem size goes up in terms of the increasing knapsack capacity. The computation efficiency achieved by the algorithm forn = 3 is evident compared to ES. As we can see in this example, with capacity 20, our algorithm runs nearly a thousand times faster than in purely exhaus- tive search without compromising the result. All the expected returns (ER) from our algorithm under different problem sizes in this example are close to the optimal values computed from ES. 2.5.2 Example for the SKP with simple recourse and penalty model withc> 0 and exponential items’ weights We consider an example of n = 3 and for a type i item,8i 2 [1; 3], its weight is exponentially distributed with mean w i . In this example as shown on Table 2, we set d = 20 whered> max i f v i w i g. When the total capacity equals 15, the exhaustive search (ES) runs more than 15 mins compared to less than 4 seconds in our heuristic algorithm. The performance and efficiency of the presented algorithm which takes advantage of the unimodality property for the second model is illustrated on Table 2. capacity (w) RT (ALG) RT (ES) ER (ALG) ER (ES) 5 0.405688 5.35432 25.3228 25.3396 10 0.936152 124.067 58.5908 59.033 15 2.03097 >> 10 3 92.3793 NA Table 2.2: n=3, with parameters: c = 5:0;d = 20:0;v 1 =2:0;w 1 =0:32;v 2 =3:0;w 2 = 0:40;v 3 =4:0;w 3 =0:52: y ER from our algorithm is higher than the optimal one due to the sample errors in simulation processes. 31 Chapter 3 An Adaptive Broken Knapsack Problem We consider a stochastic knapsack problem where the overflow event triggers the loss of all items in the knapsack. There aren types of items available and each type has an infinite supply. An item has an exponentially distributed random weight and its reward is proportional to the weight with a given factor. We have to decide on each stage whether to stop, or continue to put an item of a selected type in the knapsack. An item’s weight will be learned after being placed to the knapsack. The objective is to maximize the expected total rewards. In Section 3.1, we define and formulate the adaptive BKP model under the dynamic programming framework. The preliminary notations are given in this part before we proceed to explore the characteristics of the problem structure. We also show a type preference order and demonstrate the optimal stopping rule. In Section 3.2, we discuss the case of n = 2, propose and prove an optimal policy. This optimal policy is then summarized in an easy-to-implement action selection strategy forn = 2. In Section 3.3, we try to generalize the optimal policy ofn = 2 to a heuristic policy of anyn using the same logic. We evaluate this generalized policy and analyze its limitations. A second heuristic policy for generaln is then given and tested in a numerical example. In this chapter, we define the indicator functionI A = 1 if the eventA occurs, oth- erwise I A = 0. We put the proofs which involve mainly algebraic manipulations in Appendix B. 32 3.1 Problem Setting Consider a knapsack with a deterministic capacity w. There are n different types of items available to be put in the knapsack, and each type has an infinite supply. For a typei item, 1 i n, it has valuev i W i , wherev i is a deterministic positive constant andW i is the item’s weight whereW i Exp(w i ),w i = E[W i ]. It is assumed that an item’s weight is independent with the weights of other items both within the same type and between types. At each stage, we can either choose to stop and leave the system with all the existing rewards in the knapsack, or, we can choose to select an item of any type to put in the knapsack. An item’s weight is immediately revealed after being put. If the knapsack is broken, when the total weights of items in the knapsack exceed the capacity, we are forced out of the system with no return at all. Otherwise, we move to the next stage. The objective of the adaptive BKP is to find a policy that achieves the maximal expected return. Dynamic Programming Framework This adaptive BKP can be formulated in a dynamic programming framework. Let (r;v) be the state variable of the model wherer is the remaining capacity andv is the total values of items in the knapsack. LetV (r;v) be the optimal expected value function at state (r;v). Optimality Equations for the adaptive BKP with i = 1 w i ;8i2 [1;n], V (r;v) = max v; max i=1;:::;n f Z r 0 i e i t V (rt;v +v i t)dtg V (r;v) = 0; if r< 0: (3.1) 33 Given n types of different items, we first want to discard those types which will never be used by an optimal policy. Proposition 7 Ifv i <v j ,w i >w j , then a type i item should never be used. Proof. The idea of the proof comes from Smith [SMI78]. We construct a composite component which consists of N type j items, where N is a geometric distributed random variable with parameterw j =w i . It is easy to see this composite component has weight that is exponentially distributed with mean w j w j =w i = w i . Since v i < v j , it is always better to replace type i item with this composite component because the composite item has higher unit weight value than type i item does, and at the same time the weight of the composite component has exactly the same distribution as that of type i item. With Proposition 7, from now on we always assume without loss of generality that: Item types are ordered wherev 1 <v 2 <<v n andw 1 <w 2 <<w n . A natural question for this dynamic programming model is to determine the optimal stopping time. In analogy to one-stage look-ahead rule for optimal stopping problems, the candidate stopping rule for the adaptive BKP is: if it’s better to stop now than insert one more item of any type and then stop, then just stop. We call this stopping rule the one-stage look-ahead rule for the adaptive BKP. According to this rule, we should stop at state (r;v) if v Z r 0 i e i t (v +v i t)dt;81in: (3.2) It’s easy to check that if (r;v) satisfies (3.2), then (r 0 ;v 0 ) satisfies this condition for any r 0 ;v 0 wherer 0 <r andv 0 >v. Lemma 4 It is optimal to stop in state (r;v) if and only if (r;v) satisfies the condition (3.2). 34 Proof. Our proof is similar to the one at Ferguson [Fer] (for Theorem 2 of Chapter 5). Let’s first define V m (r;v) as the optimal value function at state (r;v) for the m-stage version of the problem. Now, we only have to prove the following two parts: 1). The stopping rule specified by (3.2) is the optimal one for them-stage problem. 2). At any state (r;v),V m (r;v)!V (r;v) asm!1. By observing the stopping domain is closed under one-stage look-ahead rule for any finite-stage problem, it is easy to prove part 1) by induction. For part 2), let’s denotef as the optimal policy to our infinite-stage problem, andf m is the truncated policy from policyf applied to them-stage problem. Let ~ V fm (r;v) be the total value we’ll get by applying policyf m to them-stage problem starting from state (r;v), and let ~ V f (r;v) be the total value for the infinite-stage problem. Since: ~ V fm (r;v)! ~ V f (r;v) a:s; whenm! +1; andj ~ V fm (r;v)j<v + max i fv i rg, by dominated convergence theorem, we have: lim m!1 E[ ~ V fm (r;v)] =E[ ~ V f (r;v)]: Therefore, V (r;v) =E[ ~ V f (r;v)] = lim m!1 E[ ~ V fm (r;v)] lim m!1 V m (r;v): Combining the above inequality with the fact that:V (r;v)V m (r;v);8m, we have: V (r;v) = lim m!1 V m (r;v): 35 Lemma 4 implies the optimal policy when there is only one type of item available. Assuming only type i items are available whose parameters are v i and w i , from the condition (3.2), when the state point (r;v) is on the optimal stopping boundary, we must have: v = Z r 0 i e i t (v +v i t)dt: Solving for v on the above equation, we obtain the optimal stopping boundary when only typei items are available: v i (r) = v i i (e i r 1 i r): (3.3) We callv i () defined in (3.3) as the critical curve of typei fori2 [1;n]. We show in the following proposition that when the state point (r;v) is above the critical curve of typei, then the optimal policy never chooses a typei item at (r;v). Proposition 8 At state (r;v), ifv>v i (r) wherev i () is the critical curve of typei, then it’s never optimal to select typei item at state (r;v). Proof. Assuming there are two identical knapsacks, both of which are currently at state (r;v). Let policy be any policy which selects a typei item at state (r;v). We apply policy in the first knapsack starting from state (r;v). After policy puts in a typei item in the first knapsack, if this move breaks the first knapsack, the second knapsack stops at sate (r;v); otherwise, skipping the first move, we start to replicate in the second knapsack the rest of moves by policy in the first knapsack, i.e., except for the first move, whatever policy puts in to the first knapsack, we put in exactly the same item in the second knapsack. The corresponding items for the two knapsacks have the same 36 weight and the same values by coupling their random sources. Let’s assume the weight of the typei item policy puts in the first knapsack at state (r;v) isW i , whereW i Exp(w i ); and assume the total weights and values of the rest of items policy puts into the first knapsack after the first move, if it doesn’t break the first knapsack, are W P andV P respectively, whereW P andV P are dependent on each other andW P is also dependent on W i . Now let’s denote the event of W i r as e i ; and the event of W i +W P r ase P . For the first knapsack, the expected return from applying policy is: E[(v +v i W i +V P )I e P]: For the second knapsack, the expected return from applying the policy developed above is at least: E[v(1I e i )] +E[(v +V P )I e P]: We also have: E[v(1I e i )] +E[(v +V P )I e P]E[(v +v i W i +V P )I e P] =E[v(1I e i )]E[v i W i I e P] E[v(1I e i )]E[v i W i I e i ] > 0; where the first inequality fromI e PI e i ;a:s:, and the second inequality from the given assumption thatv >v i (r). Therefore, the policy applied in the second knapsack by not putting in the typei item at state (r;v) is always better than the original policy. This concludes the proof. 37 3.2 Optimal Policy forn = 2 In this section, a thorough analysis is given whenn = 2. 3.2.1 Preliminary Analysis We first want to find the optimal value function whenn = 1. Now let’s assume there is only one type of items available with the parameterv 1 andw 1 (= 1 1 ). r v v 1 (r) → (r,v) V 1 (r,v) R 11 (r,v) −v 1 Figure 3.1: Assuming only type 1 items available, at state (r;v) wherev < v 1 (r), we will keep inserting items until the state enters the stopping domain determined by the curvev 1 (). We denote (R 11 (r;v);V 1 (r;v)) as the intersection point betweenv 1 () and the line passing (r;v) with slopev 1 . When only one type of item is available, V (r;v) = v ifv v 1 (r). Ifv < v 1 (r), the optimal policy is to put in items until the state enters the stopping domain. In the latter case, see Figure 3.1, after an item being inserted at state (r;v), because the unit value per weight of a type 1 item isv 1 , the next state must be on the line of slopev 1 38 which passes the original state point (r;v). For any state (r;v) outside the stopping domain, let’s denote the intersection point between the curvev 1 () and the line of slope v 1 passing (r;v) as (R 11 (r;v);V 1 (r;v)), solving for which we obtain: R 11 (r;v) = log(v 1 =v 1 + 1 r + 1) 1 : (3.4) Starting from any state (r;v) where v < v 1 (r), we focus on the last item the optimal policy put in the knapsack before the state enters the stopping domain specified byv 1 () or the knapsack breaks. We can think of the weight of this item as flowing continu- ously until the item is completely in knapsack. Since the item is the last one before the state enters the stopping domain, as its weight flows in, there must exist an inter- mediate state which hits on the stopping boundary at (R 11 (r;v);V 1 (r;v)). Due to the memoryless property of the exponential distribution, the remaining weight of the item is exponentially distributed with meanw 1 . Therefore, putting in this last item is equivalent to putting in an item at state (R 11 (r;v);V 1 (r;v)). On the stopping boundaryv 1 (), it is indifferent to stop or to put in one more item and then stop. SoV 1 (r;v) is the expected total return from the optimal policy at state (r;v) whenv <v 1 (r). Let’s useV 1 (r;v) as the optimal value function at any state (r;v) when only type 1 items are available. From the above discussions, we have: V 1 (r;v) = 8 > > > > > < > > > > > : 0; ifr< 0 v; ifvv 1 (r) v 1 (R 11 (r;v)); otherwise. (3.5) When only type 1 items are available, we denote the optimal policy implied by the optimal stopping rule as Policy One, which has the value functionV 1 (r;v) at state (r;v). 39 3.2.2 Policy Statement and Optimality Proof forn = 2 In the following, we assume there are two types of items avaialbe (n = 2): type 1 with parameter v 1 and w 1 ; type 2 with parameter v 2 and w 2 . From Proposition 7, to avoid trivialness, we always assume: v 1 <v 2 ; w 1 <w 2 : Notations wherei;j2f1; 2g. Whenv < v j (r), R ij (r;v) is defined as follows: (R ij (r;v);v j (R ij (r;v))) is the inter- section point between the curvev j () and the line of slopev i passing (r;v), i.e., when we continually put in typei items, the system state crosses the critical curve of typej at the point (R ij (r;v);v j (R ij (r;v))). For any function g(), E i [g(T )] is defined as the expectation of g(T ) where T Exp(w i ). We define Policy Two whenn = 2 as follows: Step 1: at current state (r,v), we calculateV 1 (r;v) andE 2 [V 1 (rT;v +v 2 T )]; Step 2: ifV 1 (r;v) E 2 [V 1 (rT;v +v 2 T )], we follow Policy One at state (r;v). If Policy One did not call for stopping, we update the state and go to Step 1; Step 3: ifV 1 (r;v) < E 2 [V 1 (rT;v +v 2 T )], we put in an item of type 2, update the state and go back to Step 1. 40 In the above policy statement, V 1 (r;v), which has the expression in (3.5), is the expected total return by applying Policy One through out all states starting from (r;v) until stop;E 2 [V 1 (rT;v +v 2 T )], where E 2 [V 1 (rT;v +v 2 T )] = Z r 0 2 e 2 t V 1 (rt;v +v 2 t)dt; is the expected return by putting in a type 2 item at state (r;v) and then applying Policy One at all following states until stop. We first show that if Policy Two follows Policy One at any state, it will do so after- wards until it stops. Lemma 5 If V 1 (r;v)E 2 [V 1 (rT;v +v 2 T )], then V 1 (rh;v +v 1 h)E 2 [V 1 (r hT;v +v 1 h +v 2 T )]; 8hr: Proof. See the proof in Appendix. When Policy Two follows Policy One at state (r;v), if we instead put in a type 2 item at (r;v), Policy Two still follows Policy One at the updated state. Lemma 6 If V 1 (r;v)E 2 [V 1 (rT;v +v 2 T )], then V 1 (rh;v +v 2 h)E 2 [V 1 (r hT;v +v 2 h +v 2 T )]; 8hr: Proof. See the proof in Appendix. With Lemma 5 and Lemma 6 , we can prove when Policy Two follows Policy One at current state (r;v), its value function satisfies the optimality equations in (3.1). 41 Corollary 4 At state (r;v), if V 1 (r;v) E 2 [V 1 (rT;v +v 2 T )], then V 2 (r;v), the value function of Policy Two at (r;v), satisfies: V 2 (r;v) = max v; max i=1;2 f Z r 0 i e i t V 2 (rt;v +v i t)dtg : i.e., the value function at state (r;v) satisfies the optimality equations. Proof. See the proof in Appendix. In the case when Policy Two selects a type 2 item at state (r;v), we show the value functionV 2 (r;v) of Policy Two at state (r;v) still satisfies the optimality equations. Lemma 7 At state (r;v) where V 1 (r;v)E 2 [V 1 (rT;v +v 2 T )], we still have: V 2 (r;v) = max v; max i=1;2 f Z r 0 i e i t V 2 (rt;v +v i t)dtg : Proof. See the proof in Appendix. Corollary 4 and Lemma 7 together yield the optimal policy theorem. Theorem 6 (Optimal Policy Theorem whenn =2) Policy Two is the optimal policy for the adaptive BKP when consider only two types of items available wherev 1 < v 2 , andw 1 <w 2 . Optimal Action Theorem forn = 2 Policy Two is the optimal policy with two available types where v 1 < v 2 and w 1 < w 2 . It turns out that to apply Policy Two, it is not necessary to calculate V 1 (r;v) and E 2 [V 1 (rT;v +v 2 T )] at every state (r;v). We will show in the following Theorem 7, 42 which is also called Optimal Action Theorem forn = 2, that the optimal decision from Policy Two is usually easy to find without complex calculations. To prepare the proof of Theorem 7, we need the proposition below which reveals the relation between the critical curvev 1 () of type 1 and the critical curvev 2 () of type 2. Proposition 9 Givenv 1 <v 2 , 1 > 2 , andv i (),i = 1; 2, the critical curve of typei, we must have either v 1 (r)>v 2 (r);8r> 0; or9r 0 > 0, such that: v 1 (r) 8 > > > > > < > > > > > : <v 2 (r); when 0<r<r 0 ; =v 2 (r); whenr =r 0 ; >v 2 (r); whenr>r 0 : Proof. See the proof in Appendix. Proposition 9 says the critical curvev 1 () is either always above the curvev 2 (), or there may exist a section in the beginning of the horizontal line where v 1 () is below v 2 () but eventuallyv 1 () lies abovev 2 (). This observation will be used in the proof of the following optimal action theorem. Theorem 7 (Optimal Action Theorem for n =2) Assuming only two types of items available wherev 1 < v 2 andw 1 < w 2 , the optimal action at state (r;v) implied by the optimal policy Policy Two is given as follows (see Figure 3.2): Case 1 v maxfv 1 (r);v 2 (r)g : stop; Case 2 v 2 (r)vv 1 (r) : choose type 1 item; Case 3 v 1 (r)vv 2 (r) : choose type 2 item; 43 r v v 1 (r)→ v 2 (r)→ (r,v) case 1 r v v 2 (r)→ v 1 (r)→ (r,v) case 2 r v v 1 (r)→ v 2 (r)→ (r,v) case 3 r v v 1 (r)→ v 2 (r)→ (r,v) case 4 r v v 2 (r)→ v 1 (r)→ (r,v) case 5 Figure 3.2: Assuming only two types of items available wherev 1 < v 2 andw 1 < w 2 , at state (r;v), there are 5 different cases regarding the relative position of the state point (r;v) w.r.t. the critical curvesv 1 () andv 2 (). Case 4 v minfv 1 (r);v 2 (r)g andR 22 (r;v)R 21 (r;v) : choose type 2 item; Case 5 v minfv 1 (r);v 2 (r)g andR 22 (r;v)>R 21 (r;v) : there existsr 0 >R 22 (r;v), ifrr 0 , choose type 1 item; otherwise, choose type 2 item. Proof. Case 1 v maxfv 1 (r);v 2 (r)g . It is optimal to stop from the one-stage look- ahead rule. Case 2 v 2 (r) v v 1 (r) . It is optimal to put in type 1 item from Proposition 8 and the one-stage look-ahead rule. Case 3 v 1 (r) v v 2 (r) . It is optimal to put in type 2 item from Proposition 8 and the one-stage look-ahead rule. 44 Case 4 v minfv 1 (r);v 2 (r)g andR 22 (r;v)R 21 (r;v) . Suppose at state (r;v), V 1 (r;v)E 2 [V 1 (rT;v +v 2 T )]; then by Lemma 6, at state (rh;v +v 2 h) whereh2 [0;r], it is optimal either to put in a type 1 item or stop. Given v minfv 1 (r);v 2 (r)g and R 22 (r;v)R 21 (r;v); from Proposition 9, we must have: v 1 (r 0 )<v 2 (r 0 );8r 0 2 (0;R 21 (r;v)); which implies8h2 (rR 21 (r;v);rR 22 (r;v)), v 1 (rh)<v 2 (rh); and state (rh;v +v 2 h) satisfies Case 3. Therefore, it is optimal to choose type 2 item at (rh;v +v 2 h) whenh2 (rR 21 (r;v);rR 22 (r;v)). So we have found a contradiction with the supposition: V 1 (r;v)E 2 [V 1 (rT;v +v 2 T )]: With the given conditions, we must have V 1 (r;v)<E 2 [V 1 (rT;v +v 2 T )]; 45 and the optimal action at (r;v) is to put in a type 2 item. Case 5 v minfv 1 (r);v 2 (r)g andR 22 (r;v)>R 21 (r;v) . Given v minfv 1 (r);v 2 (r)g andR 22 (r;v)>R 21 (r;v) ; from Proposition 9,v 1 () is abovev 2 () at [R 21 (r;v);1). Therefore, when rR 22 (r;v) h rR 21 (r;v), the state (rh;v +v 2 h) must satisfy Case 2, on which it is optimal to choose type 1 item. As we’ve shown in the proof for Lemma 6 in the appendix, there must exist a positiveh 0 > 0 and if we denote r 0 =R 22 (r;v) +h 0 ;v 0 =v 2 (R 22 (r;v))h 0 v 2 ; the optimal policy is indifferent to put in a type 1 item or a type 2 item at state (r 0 ;v 0 ) and we must have: at (r;v), if r r 0 , the optimal action is choose type 1 item; otherwise, choose type 2 item. 3.3 Two heuristic policies for generaln From the previous discussions, one option to consider in the general model is to general- ize the idea behind Policy Two to develop a policy for the model of anyn. We’ll present the generalized policy for anyn in this section. This policy’s limitation in implementa- tion will be discussed. We also give a second heuristic policy and a numerical example is given to partially evaluate the performance of these heuristic policies. 46 3.3.1 GeneralizedPolicyn In the following, Policy n is a generalized version of Policy Two for anyn. It’s assumed that there aren different types of items where v 1 <v 2 <<v n andw 1 <w 2 <<w n : As the starting point, Policy 1 is defined as Policy One. In the following policy statement, for anyk < n,V k (r;v) is the value function of Policy k. Given Policy k, Policy k+1 is defined as follows: Step 1: at current state (r;v), we compareV k (r;v) withE k+1 [V k (rT;v +v k+1 T )]; Step 2: if V k (r;v) E k+1 [V k (rT;v +v k+1 T )], we use Policy k at (r;v) till stop; otherwise, insert one item of typek + 1, update the state, and go back to Step 1. Remark 5 The above definition is not a direct generalization from the same idea behind Policy Two because we cannot prove the analog of Lemma 5 for the case of generaln. However, this version is much simpler to implement compared to the generalized policy in original form which does the comparison on each stage. We will show in the following discussion that even this simpler version of generalized policy requires unacceptable amount of computational power for bign. From the policy definition, Policy Two is obvious an instance of the policy when n = 2. We proved the optimality of Policy Two by showing its value functionV 2 (r;v) always satisfies the optimality equations. In that proof, we worked onV 1 (r;v) which has an explicit analytical form as in equation (3.5) to reveal the characteristics ofV 2 (r;v). However, when n 3, there is no explicit analytical expression for V n1 (r;v) from 47 which we can replicate the old proof to show the optimality ofV n (r;v). Therefore, on the one hand, we can’t use the same technique to check whether Policy n is optimal or not for generaln; on the other hand, if we want to implement this policy forn 3, because the value function involved has no analytical form, the value function has to be simulated at each decision making point. This becomes a severe bottleneck as the problem size scales up. The running time of the implementation of Policy n rises exponentially asn increases. 3.3.2 A Second Heuristic Policy We introduce a second heuristic policy based on the two statements: (a) at state (r;v), only the types whose critical curves are above the state point should be considered, which are called feasible types for state (r;v); (b) if there are multiple feasible candi- date types after considering (a), it’s better to select the one with higher unit value per weight. The first statement has already been proved in Proposition 8. Statement (b) is not generally true since we can find explicit counter examples in Optimal Action The- orem forn = 2. However, when the state (r;v) is not close to those critical curves of feasible types, choosing the one with the highest unit value per weight could be a good choice. The logic behind it is the intuition that if capacity allows, we’d rather put in higher value items. We’ll show in a numerical example that this heuristic policy has the potential to perform well for our problem. It’s still assumed there aren types of items where: v 1 <v 2 <<v n andw 1 <w 2 <<w n : The heuristic policy is developed from ideas in statements (a) & (b). At any non- stopping state (r;v), the heuristic policy selects the type which has the highest unit 48 value per weight among all the feasible candidate types determined by statement (a). The policy is implemented as follows: At current state (r;v), define the feasible types setS =fi2 [1;n] :vv i (r)g. WhileS6=;; Leth = max k2S k, and put in an item of typeh. Update the state from (r;v) to (r 0 ;v 0 ). Ifv 0 0, stop the program. UpdateS =fi2 [1;n] :v 0 v i (r 0 )g. End while. Stop the program. 3.3.3 A Numerical Example How to evaluate the heuristic policy when we don’t know the optimal policy for the adaptive BKP of n 3 ? Our work-around is to consider the discretized version of the adaptive BKP model. By discretization, we assume items’ weights conform to the discrete geometric distributions instead of the continuous exponential distributions, and all other problem settings are unchanged. Dynamic programming enables us to find the optimal value function at any state in the discretized model. Our implicit assumption in this part is: If one policy works well for the discretized model, it can also do well for the original model. 49 We give a numerical example here on the discretized model for n = 3. In this example, we have three types of items available where v 1 = 2;p 1 = 0:8;v 2 = 3;p 2 = 0:6;v 3 = 4;p 3 = 0:4: Note, items’ preference order is still respected since: v 1 <v 2 <v 2 ; and 1 p 1 < 1 p 2 < 1 p 3 : For each state (N; 0), whereN is a positive integer representing the knapsack capacity, we’ll compute in a computer program the expected return at this state respectively by three different policies: the optimal policy implied by dynamic programming, Policy n forn = 3, and the heuristic policy. The computer program is written in C++ language. The expected return from Policy n and that from the heuristic policy are generated in simulation. The detail results of this numerical example are shown in the table below. From this example, we see the second heuristic policy has good performance com- pared to the optimal one in this discretized model. This gives us confidence that this heuristic works for the original adaptive BKP model as well. Another observation from this example is: Policy n, for generaln, is a sub-optimal policy for the original model, although it’s optimal whenn = 2. v i is unit value per weight for typei;p j is the parameter of the geometric distribution for the weight of typei items, and 1 pi is the mean weight. 50 N DP Policy n second heuristic policy 20 65.98 63.09 65.51 40 143 138.4 141.5 60 221.1 216.9 216.2 80 299.5 292.7 297.6 100 378.6 368.9 375.6 120 457.8 451.7 457.9 140 537.2 529.4 530.3 160 616.7 607.1 618.3 180 696.2 684.7 694.2 200 775.7 762.4 768.7 Table 3.1: A numerical example of different policies on the discretized adaptive BKP model. 51 Chapter 4 The Adaptive Stochastic Knapsack Problem with Exponential Capacity We considers an adaptive stochastic knapsack problem (SKP) where the knapsack capacity is exponentially distributed. There are n types of items available and each type has an infinite supply. An item’s weight and reward are random variables with a given joint distribution determined by the item’s type. On each stage, we can choose to stop and leave the system with all rewards attained so far; or we can select an item of any type and put it in the knapsack. An item’s reward and the information on whether the knapsack is broken or not are immediately revealed after the item being put. We move to the next stage if the knapsack is not broken. Otherwise, when the knapsack is broken, all the rewards in the knapsack are wiped out and we have to leave the system without any return. The objective is to find a policy that maximizes the expected total rewards. We give the problem setting in Section 4.1 and formulate the problem in a dynamic programming framework. Some characteristic results concerning optimal stopping time and items’ preference order are uncovered following the problem definition. In Section 4.2, we consider three different special cases on the joint distribution function of item’s weight and reward. The first model in Section 4.2.1 assumes for any item, the weight and the reward are independent with each other, and the reward is exponentially dis- tributed. The second model in Section 4.2.2 assumes an item’s reward is proportional to its weight with a constant factor determined by its type, and the weight is exponentially 52 distributed. The third model in Section 4.2.3 assumes that any item has a deterministic reward and a random weight. For the first two models, we will explore the structure of the optimal value function and present an algorithm to find the optimal policy. We will give a numerical example ofn = 3 in Section 4.3 to demonstrate the implementation of the algorithm for the optimal policy. For the third model, we will show that this model can be reduced to a static SKP with exponential capacity discussed in Chapter 2. 4.1 Problem Setting We have a knapsack with an exponentially distributed capacityW . There aren types of items and each type has an infinite supply. For a typei item, the weightW i and the rewardR o i are randomly distributed by a given joint distribution functionF i (w;r). On each stage, we can either leave the system with already-earned rewards; or we can select an item of then types and put it in the knapsack. When we choose the latter, the item’s reward and the information on whether overflow occurs or not are learned immediately after we put the item in. If the knapsack is broken, we get nothing and are forced out of the system; otherwise, we move to the next stage. It’s assumed that: weights and rewards between different items are independent with each other; the knapsack capacity is independent with all other random variables. The objective is to find an item-selection and optimal stopping strategy that maximizes the expected total rewards. Remark 6 When we put in a typei item in the knapsack, say the item’s weight isW i and the reward isR o i , due to the memoryless property of the exponential distribution on the knapsack capacityW , the probability of a successful insert isq i where: q i =P (W i W ): 53 The reward incrementR i conditional on successfully putting in the item is defined as: R i = [R o i jW i W ]: We denote the mean reward increment from a typei item as: r i =E[R i ]: As we can see,q i ;R i ;r i are all completely determined by the distribution of the capacity W and the joint distribution functionF i (w;r). The adaptive SKP with exponential capacity can be interpreted as the burglar prob- lem whereq i is the success probability of burgling a typei target andR i is the random reward from the burglary. Dynamic Programming Framework Letr,r 0, be the state of the system wherer is the existing rewards earned so far. Let V (r) be the optimal value function for the problem at stater. Optimality Equation V (r) = max n r; max 1in q i E[V (r +R i )] o ; r 0: (4.1) In the following, we show some properties of the optimal value functionV (r). Proposition 10 V (r) is an increasing function ofr,r 0. Proof. We use coupling of random sources to prove this proposition. Givenr 0 r 0, let’s assume we have two knapsacks where the first knapsack is at state r, and the 54 second at stater 0 . Whenever the optimal policy puts an item with random rewardR in the first knapsack, we put an item of the same type in the second knapsack where the reward R 0 of this item is generated from the same random source of R, i.e., R 0 = R. We also couple the event of breaking the second knapsack with the event of breaking the first knapsack under the optimal policy, i.e., on each stage, either both survive or break. We stop at the second knapsack if the optimal policy stops at the first one. As described above, in any realization of an experiment, the rewards increment in the second knapsack is exactly the same as in the first one. Becauser 0 r, we know the expected return from our policy starting at stater 0 is at least no smaller thanV (r). To show the next observation onV (r), let’s defineV m (r) as the optimal value func- tion at stater of them-stage problem. For anyr 0, we have: V m (r) = 8 > > < > > : max n r; max 1in q i E[V m1 (r +R i )] o m 1; r m = 0: (4.2) Proposition 11 V (r) = lim m!1 V m (r);8r 0: Proof. Assuming is the optimal policy for the original problem, we have: V (r) =E [return during firstm-stage] +E [additional returns] V m [r] +q m 1 X i=1 ir q i =V m [r] + q m+1 r (1q) 2 55 whereq = max i2[1;n] q i , andr = max i2[1;n] r i . Therefore,8r, jV (r)V m (r)j q m+1 r (1q) 2 ! 0; whenm goes to1: With the above two propositions, we are ready to present the following lemma. Lemma 8 V (r) is an increasing, continuous, convex function ofr,r 0. Proof. From equations (4.2), applying mathematical induction and the fact that the max of convex functions is convex, we can prove for allm 1, V m (r) is continuous and convex function of r, r 0. The lemma follows by combining this result with observations in Proposition 10 and 11. Characteristic Results We’ll show the one-stage look-ahead rule is the optimal stopping rule for the adaptive SKP with exponential capacity. And we also want to give items’ preference order to see under which conditions some types are dominated by others, thus never be used in the optimal policy. Lemma 9 The optimal policy stops in stater if and only if rq i E[r +R i ];81in: Proof. The proof is similar to the one for Theorem 2 on Chapter 5 at Ferguson [Fer]. We only have to show: the preceding is the optimal stopping rule for any finite-stage 56 problem; andV m (r)! V (r) asm goes to infinity. The second part has already been demonstrated in Proposition 11. We can use mathematical induction to prove the first part. Remark 7 BecauseE[R i ] =r i , the optimal stopping condition is equivalent to r q i r i 1q i ;81in: The following items’ preference order shows: given two types of items, the type which has both higher success probabilityq i and stochastically greater reward increment R i dominates the other type. Lemma 10 If the reward increment R i from type i item is stochastically greater than R j of typej item, andq i q j , then typej items should never be used. Proof. Let’s assumeR j andR i have the distribution functionF j andF i respectively. Let policy be any policy which is applied to the first knapsack. We develop policy 0 on the second knapsack. Whenever policy puts a typej item in the first knapsack, policy 0 puts a typei item in the second knapsack by the following procedure: we generate two independent standard uniform r.v.su 1 ;u 2 2 (0; 1). The first knapsack is broken ifu 1 > q j and the second knapsack is broken ifu 1 > q i . If the first knapsack is not broken, for the item inserted, setR j =F 1 j (u 2 ). If the second knapsack is not broken, for the item inserted, setR i =F 1 i (u 2 ). Given our assumption, we have: q j q i ; andF 1 j (u 2 )F 1 i (u 2 ); 57 which implies if the second knapsack is broken then the first one must be broken as well; if both survive, the reward increment in the second knapsack is greater than that in the first knapsack. Policy 0 stops on the second knapsack if the first knapsack is broken. For all other moves of policy on the first knapsack, policy 0 copies exactly on the second knapsack by coupling random sources, i.e., either both knapsacks survive or break and the reward increments are always the same. By the structure of policy 0 presented above, on each experiment, policy 0 on the second knapsack always returns at least no worse result than that by policy on the first knapsack. Let’s define the critical point for each type, which is the stopping boundary if only that type of items are available: b i = q i r i 1q i ;8i2 [1;n]: (4.3) In the next lemma, we show a type should never be selected if the current total rewards are greater than the critical point of that type. Lemma 11 It’s never optimal to choose a typei item at staterb i . Proof. We’ll show that whenr b i , for any policy which puts in a typei item at state r, there always exists a strictly better policy which never selects typei items at stater. Assuming we have two knapsacks both at state 0, for any policy, say policy, which is applied to the first knapsack, we’ll develop policy 0 applied to the second knapsack as follows. Policy 0 exactly copies moves of policy by coupling random sources until the first time policy puts in a typei item at a state, sayr , which is greater than We assume the optimal policy chooses to stop when it’s indifferent between stop and insert an item. 58 b i . At state r , policy 0 skips the corresponding move on the second knapsack. We denote the reward increment from the typei item inserted to the first knapsack by policy at stater asR i and the event of successfully putting in the item ase R i . If the first knapsack is not broken, i.e., when e R i is true, policy 0 continues to take the coupled moves on the second knapsack which exactly copy what policy does at stater +R i and all states afterwards. If the first knapsack is broken after putting in the typei item at stater , policy 0 stops on the second knapsack. Let’s denote total rewards of items inserted after stater +R i by policy on the first knapsack asZ and denote the event of successfully retaining this amount of rewards ase Z . Now conditional onr , we have the expected total rewards in the first knapsack as: E[I e R i I e Z (r +R i +Z)]; the expected rewards in the second knapsack by policy is: E[I e R i I e Z (r +Z) + (1I e R i )r ]: 59 Hence, givenp i = 1q i , we have: E[I e R i I e Z (r +Z) + (1I e R i )r ]E[I e R i I e Z (r +R i +Z)] =E[(1I e R i )r ]E[I e R i I e Z R i ] =p i (r q i E[I e Z R i ] p i ) p i (r q i E[R i ] p i ) =p i (r q i r i p i ) =p i (r b i ) 0: From the above discussion, we’ve seen that it’s never better to select a type i item at stater =r , wherer b i . From here on, without loss of generality, we always assume: there are n types of items available with no dominated types as indicated in Lemma 12, and then types are ordered such that: b 1 <b 2 <<b n : Remark 8 An immediate result from Lemma 11 and Lemma 9 is: it is optimal to select a typen item at stater whenr2 [b n1 ;b n ). 4.2 Three Different Special Cases In this section, we consider three different special models on the adaptive SKP with exponential capacity. For the first two models, we give an algorithm to find the opti- mal policy along with a numerical example to demonstrate the implementation of this 60 algorithm in Section 4.3. We show that the third model can be reduced to a static model discussed in Chapter 2. 4.2.1 Model 1: Item’s weight and reward are independent, and the reward is exponentially distributed. This model assumes: for a typei item, the weightW i has distribution functionF i W (w), the rewardR o i is exponentially distributed, andW i andR o i are independent. In this model, from Remark 6, q i =P (W i W ) =E[F i W (W )]; and R i = [R o i jW i W ] =R o i ; r i =E[R i ] =E[R o i ]: For this model, we have a stronger items’ preference order than the one in Lemma 10. Lemma 12 For type i and type j items, if q i r i q j r j ,and q i q j , then type j items should never be used. Proof. Assuming there are two knapsacks both at stater = 0, for any policy, say policy , which is applied to the first knapsack, we develop policy 0 applied to the second knapsack. For all rounds except for the first time policy puts a typej item in the first knapsack, policy 0 exactly copies every move of policy by coupling random sources, i.e., on each round, they put the exactly same type of items with the same rewards, and they have the same events of success or failure in the two knapsacks respectively. For the first time policy puts a typej item in the first knapsack, say the item’s reward isR j , 61 policy 0 puts a typei item in the second knapsack with the coupled reward r i r j R j , where r i r j R j Exp(r i ) because R j Exp(r j ). Let’s condition on the event that the first knapsack is not broken before policy puts in the first typej item and assume the total rewards in both knapsacks to that point are bothR before . Let’s also condition on that the survived knapsacks after policy putting in the first typej item will not be broken before stop and the total rewards increment on the knapsacks areR after . Then the conditional expected total rewards from policy in the first knapsack areq j (R j +R before +R after ), that from policy 0 in the second knapsack is q i ( r i r j R j + R before + R after ). By the assumptions thatq i r i q j r j andq i q j , we know: q i ( r i r j R j +R before +R after )q j (R j +R before +R after ); which says policy 0 is always better than policy. Therefore we can conclude typej should never be used in the optimal policy. Structure of the Optimal Value Function For the convenience of descriptions, let’s define the notations below: 8i2 [1;n]; i = 1 r i ; h i = i (1q i ): Theorem 8 lnV (r) is a piecewise linear convex function ofr having at mostn pieces whenr2 [0;b n ]. Proof. Because Lemma 9 says it’s optimal to stop if r > b n where b n = max i b i , we have: V (r) =r; 8rb n : (4.4) 62 Suppose at stater,r2 [0;b n ], the optimal action is select an item of typei, then we know: V (r) =q i E[V (r +R i )] =q i Z 1 0 i e i t V (r +t)dt: By the continuity ofV () from Lemma 8, if we assume the optimal policy always selects typei at (r;r +) for small positive, then dV (r) dr = d q i R 1 0 i e i t V (r +t)dt dr = d q i R 1 r i e i (yr) V (y)dy dr =q i Z 1 r 2 i e i (yr) V (y)dyq i i V (r) = i V (r)q i i V (r) =h i V (r); which implies: d lnV (r) dr =h i : (4.5) From the above equation (4.5), we know as the state variable decreases fromr to a type change point, only those types, say typej, whoseh j is smaller thanh i , become feasible at the type change point for the optimal policy. Remark 8 tells us that it’s optimal to select a typen item at stater2 [b n1 ;b n ). Therefore, for the optimal policy, there at most exist n points in [0;b n ] where change of optimal types may take place. When r moves between two adjacent type change points, dlnV(r) dr must be constant because the optimal policy selects the same type of items on the interval. Therefore, lnV (r) is 63 linear within intervals of any two adjacent type change points. Remark 9 If r is a type change point in the optimal policy, say for small > 0, the optimal policy selects typei at (r;r) and typej at (r;r +), then from the continuity of the optimal value function, the optimal policy should be indifferent between selecting typei and typej at the stater. By the proof of Theorem 8, we know the optimal policy is characterized by the set of type change points, and the optimal policy always selects the same type between any pair of adjacent type change points. As the state variabler decreases fromb n + to 0, the first type change point isb n where the optimal policy chooses to stop atr2 (b n ;b n +) and chooses to select a typen item atr2 (b n ;b n ) for sufficient small positive. We know from Theorem 8 that there are at mostn type change points in [0;b n ]. The proof of Theorem 8 and the observation in Remark 9 indeed provide us the method to find all these type change points for the optimal policy. Search Scheme for The Optimal Policy We present a scheme to locate these change points asr decreases from one change point to another. As mentioned,b n is the first type change point to start with. Suppose we’ve already locatedk type change points for the optimal policy as follows: c k <c k1 <c k2 <<c 1 =b n ; and no other type change point exists between [c k ;c 1 ]. We want to locate the next type change point c k+1 . It’s also given that the optimal policy selects a type t l item when r2 (c l+1 ;c l ),8l2 [1;k], e.g.,t 1 =n. 64 Because V (c 1 ) =V (b n ) =b n ; from the equation (4.5), if c k+1 is the next type change point we want to locate, then whenr2 [c m+1 ;c m ) for anym2 [1;k], we have: lnV (r) = lnV (c 1 ) + m1 X l=1 (c l+1 c l )h t l + (rc m )h tm : (4.6) To findc k+1 , since the optimal policy will always select typet k atr2 (c k+1 ;c k ), from the proof of Theorem 8, we know only typei for whichh i < h t k should be considered as feasible candidate types the optimal policy may select asr decreases and crosses over c k+1 . From Remark 9, the optimal policy must be indifferent in selecting typei and type t k at statec k+1 , which implies: q i Z 1 0 i e i t V (c k+1 +t)dt =V (c k+1 ); (4.7) whereV (c k+1 ) has expression from equation (4.6) andV (r) forr2 [c k+1 ;1) is speci- fied by equation (4.4) and equation (4.6). For each candidate type i, the equation (4.7) with unknown variable c k+1 can be solved y , say the solution isc i k+1 , if 0 < c i k+1 < c k , then we record typei info with its correspondingc i k+1 ; otherwise, this type is skipped. After we check for all the candidate types, if no type info is in record, we set c k+1 = 0 and we’ve located all the type change points for the optimal policy. Otherwise, we select the type, say typej, whose correspondingc j k+1 is the highest among all types in record, and set c k+1 =c j k+1 ; t k+1 =j: y As we’ll see in the example, the equation (4.7) is always solvable. 65 We then move to the next round withk + 1 type change points where c k+1 <c k <<c 1 : The search for the type change points ends until no more can be found. Remark 10 The above search scheme indeed follows the structure of the optimal policy we discovered in the proof of Theorem 8. Therefore, the type change points we find using the scheme fully represent the optimal policy. Remark 11 If there is only one type of items available, thenb 1 is the only type change point, from the equation (4.6), V (r) = 8 > > < > > : b 1 e (rb 1 )h 1 ifr<b 1 ; r ifrb 1 : 4.2.2 Model 2: Item’s weight is exponentially distributed and its reward is proportional to the weight. This model assumes: for a typei item, the weightW i is exponentially distributed, say, W i Exp(w i ) wherew i is the mean weight; its rewardR o i is equal tov i W i where the deterministic valuev i is the unit reward per weight for typei. For this model, letw =E[W ], we have: q i =P (W i W ) = 1=w i 1=w + 1=w i ; (4.8) and R i = [R o i jW i W ] = [v i W i jW i W ]: 66 SinceW i andW are exponential r.v.s and they are independent, we have: [W i jW i W ] d = minfW i ;Wg; z where minfW i ;Wg is an exponential r.v. with mean 1 1=w+1=w i . Therefore, R i is an exponential r.v. with meanr i where r i = v i 1=w + 1=w i : (4.9) The items’ preference order as in Lemma 12 does not hold for this model because the proof of Lemma 12 requires independency between the event of breaking knapsack and the reward increment from the inserted item, which doesn’t hold for Model 2. However, Model 2 has exactly the same structure on the optimal policy as Model 1 does. Theorem 9 The optimal value function in Model 2 has the same structure described in Theorem 8. Proof. In the proof of Theorem 8 for Model 1, besides the assumption of independency between different items’ weights and rewards which hold for both models, the only assumption we used is: R i is exponentially distributed. This assumption also holds for Model 2 from our discussion before. Therefore, after we change the parameters of q i andr i for typei by equation (4.8) and equation (4.9), all the rest parts of the proof exactly follow. The above theorem says Model 1 and Model 2 indeed have the same problem struc- ture where the only difference is the values of parameters. Therefore, Model 2 has the z d =: r.v.s on the two sides have the same distribution. 67 same structure on its optimal policy which can be determined by the search scheme we presented for the optimal policy of Model 1. 4.2.3 Model 3: Item’s reward is deterministic and its weight is ran- dom. This model assumes: for a type i item, the reward R o i is deterministic, i.e., R o i r o i , and the weightW i is a positive random variable with the distribution functionF i where there is no specific assumption onF i . This is exactly the static BKP with exponential capacity model which is discussed in Section 2.3.1 on Chapter 2. Therefore, we can apply under this case the unimodality property of the expected total return function and the monotonicity property of the marginal optimal decision functions. The heuristic algorithm for the suboptimal solution for Model 3 is given in Section 2.4 on Chapter 2. 4.3 An Example ofn = 3 for Model 1 and Model 2 From our discussion for Model 1 and Model 2, the optimal policies under the two models can both be determined by the search scheme we presented before. In this part, we give an example ofn = 3 to show the implementation of the search scheme to find all type change points of the optimal policy. Parameters of the problem are on the table below: i r i q i b i = r i q i p i i = 1 r i h i = i (1q i ) 1 8 0.5 8 0.125 1 16 2 6 0.6 9 1 6 1 15 3 4 0.7 9 1 3 0.25 0.075 Table 4.1: Parameters in a numerical example for Model 1 and Model 2. 68 The above parameters satisfy items’ preference order indicated in Lemma 12 and we have: b 1 <b 2 <b 3 ; h 1 <h 2 <h 3 : Let’s apply the search scheme to find all the type change points in [0;b 3 ] for the optimal policy. We start with the first type change pointc 1 =b 3 with the typet 1 = 3 forr2 [c 2 ;c 1 ] where c 2 is the next type change point we want to locate. Since the optimal action is selectt 1 = 3 at stater2 [c 2 ;c 1 ], from equation (4.4) and equation (4.6), we have: V (r) = 8 > > < > > : b 3 e (rb 3 )h 3 ifr2 [c 2 ;c 1 ]; r ifrc 1 : (4.10) The candidate types at c 2 are type 1 and type 2 because h 1 < h 3 and h 2 < h 3 . Now we have to solve the equation (4.7) ofc 2 fori = 1; 2 respectively. By applying equation (4.10) to the equation (4.7), we have: q i Z 1 0 i e i t V (c 2 +t)dt =V (c 2 ) )q i Z b 3 c 2 0 i e i t b 3 e (c 2 +tb 3 )h 3 dt +q i e i (b 3 c 2 ) (b 3 +r i ) =b 3 e (c 2 b 3 )h 3 )q i i b 3 e (c 2 b 3 )h 3 Z b 3 c 2 0 e ( i h 3 )t dt +q i e i (b 3 c 2 ) (b 3 +r i ) =b 3 e (c 2 b 3 )h 3 )q i i b 3 e (c 2 b 3 )h 3 1e ( i h 3 )(b 3 c 2 ) i h 3 +q i e i (b 3 c 2 ) (b 3 +r i ) =b 3 e (c 2 b 3 )h 3 : We solve the above equation forc 2 givenq i ; i ;r i ofi = 1; 2 respectively: c 1 2 = 4:31; c 2 2 = 7:74: 69 Becausec 2 2 >c 1 2 , according to the scheme, we setc 2 =c 2 2 = 7:74 andt 2 = 2. We’ve found two type change points wherec 2 = 7:74 < c 1 = b 3 . Sinceh 1 < h 2 , there still exists one candidate type, type 1, forc 3 . Let’s first update the optimal value function to incorporate the caser2 [c 3 ;c 2 ): V (r) = 8 > > > > > < > > > > > : b 3 e (c 2 b 3 )h 3 +(rc 2 )h 2 ifr2 [c 3 ;c 2 ); b 3 e (rb 3 )h 3 ifr2 [c 2 ;c 1 ]; r ifrc 1 : (4.11) Then we apply the above expression to equation (4.7) in order to solve forc 3 : q 1 Z 1 0 1 e 1 t V (c 3 +t)dt =V (c 3 ) )q 1 Z c 2 c 3 0 1 e 1 t b 3 e (c 2 b 3 )h 3 +(c 3 +tc 2 )h 2 dt +q 1 Z b 3 c 3 c 2 c 3 1 e 1 t b 3 e (c 3 +tb 3 )h 3 dt +q 1 e 1 (b 3 c 3 ) (b 3 +r 1 ) =b 3 e (c 2 b 3 )h 3 +(c 3 c 2 )h 2 )q 1 1 b 3 e (c 2 b 3 )h 3 +(c 3 c 2 )h 2 1e ( 1 h 2 )(c 2 c 3 ) 1 h 2 +q 1 1 b 3 e (c 3 b 3 )h 3 e ( 1 h 3 )(c 2 c 3 ) e ( 1 h 3 )(b 3 c 3 ) 1 h 3 +q 1 e 1 (b 3 c 3 ) (b 3 +r 1 ) =b 3 e (c 2 b 3 )h 3 +(c 3 c 2 )h 2 ; which has the solution c 1 3 =0:90: Sincec 3 3 < 0, type 3 will not be recorded and it implies c 3 = 0: 70 Above all, we’ve found all type change points of the optimal policy for this example: c 3 = 0;c 2 = 7:74;c 1 = 9 1 3 ; witht 2 = 2;t 1 = 3: Therefore, the optimal policy in this example is: select a type 2 item at stater2 [0; 7:74]; select a type 3 item at stater2 [7:74; 9 1 3 ]; stop at stater> 9 1 3 . 71 Chapter 5 The Markovian Stochastic Knapsack Problem with Exponential Capacity We consider a stochastic knapsack problem where the knapsack has an exponentially distributed capacity. There aren types of items. Items arrive sequentially with succes- sive item types constituting a Markov chain. Each item type has its own joint distribution of weight and reward. Upon arrival, an item’s type becomes known and the item can then be either accepted or rejected. If the item is rejected, the problem ends with a return equal to the sum of the rewards currently in the knapsack. If the item is accepted and then being put in the knapsack, we learn its reward and whether the knapsack is broken. If the knapsack is broken,the problem ends with 0 return; if it is not broken, we can choose to stop and leave the system with all existing rewards in the knapsack, or, we can pay a fixed cost to move to the next stage. The objective is to maximize the expected total return. We define the markovian SKP with exponential capacity in Section 5.1. In Section 5.2, we show the results for the case when there is no cost to observe the next incoming item. We discuss the structure of this simplified model and present a search algorithm to locate the optimal policy. In Section 5.3, parallel results are given for the more general model when the fixed cost is strictly positive. We show the same logic in the simplified model still applies to the general model, and therefore we have a similar search algo- rithm in this case. We also give a numerical example in Section 5.4 to show readers the implementation of the presented search algorithm. 72 5.1 Problem Definition Consider a knapsack with an exponentially distributed capacityW . There existn differ- ent types of items which arrive sequentially to the system with successive types being a Markov chain with transitional probability matrixP = [p ij ]. For a typei item, the weight W i and the rewardR o i are randomly distributed with a given joint distribution function F i (w;r). Given an item’s type, its weight and reward are independent of all else. Upon an item’s arrival, we are informed of its type and we must then decide whether to retire by rejecting the item, or to continue by accepting it. If we choose to retire, we leave the system with all existing rewards in the knapsack. If we accept the item, we put it in the knapsack while learning its reward and whether the knapsack is broken or not. If the knapsack is broken, we are forced out of the system with no rewards at all; otherwise, we can choose to retire with the updated total rewards, or we can pay a fixed cost ofc (c 0) and continue to observe the type of the next incoming item. The objective is to find a policy that maximizes the expected total return. Due to the memoryless property of the exponential distribution. when a typei item is put in an unbroken knapsack, the probability that it will not break the knapsack is q i =P (W i W ): where (W i ;R o i ) and W are independent, with (W i ;R o i ) having the distribution of the weight and reward of a typei item, andW the distribution of the capacity of the knap- sack. 73 In the following, we letR i denote a random variable whose distribution is the con- ditional distribution ofR o i given thatW i W: That is, LetR i be the reward increment conditional on successfully putting in a typei item where: R i = [R o i jW i W ]; Also, we let r i =E[R i ]: 5.2 When c=0 After putting in the current item, if there is no cost to observe the type of the next item, it is always at least as good to move to the next stage as to stop immediately. Therefore, there is only one decision point on each stage; namely, after we learn the type of the arriving item, we decide whether to stop or to put it in the knapsack. The state variable of the system on each stage is denoted as (i;r) where i is the type of the item in the system and r is the total existing rewards in the knapsack. Let V (i;r) be the optimal value function at state (i;r). Optimality Equation V (i;r) = maxfr;q i E[ n X j=1 p ij V (j;r +R i )]g: (5.1) Define b i = q i r i 1q i ;8i2f1; ;ng: b max = max 1in fb i g: 74 Also, letV m (i;r) denote the optimal value function of them-stage problem at state (i;r): ThenV m (i;r) is such that V m (i;r) = 8 > > > < > > > : maxfr;q i E[ n X j=1 p ij V m1 (j;r +R i )]g whenm 1; r whenm = 0: Proposition 12 For the m-stage problem where m 0, it’s optimal to stop at state (i;r) for anyi2f1; ;ng whenrb max . Proof. We have to show when r b max , V m (i;r) = r for all i = 1;:::;n and any m 0. We prove this using mathematical induction. When m = 0, the claim is obviously true. Suppose the claim is true for m = k wherek 0, then givenrb max , for anyi2 [1;n], V k+1 (i;r) =maxfr;q i E[ n X j=1 p ij V k (j;r +R i )]g =maxfr;q i E[ n X j=1 p ij (r +R i )]g =maxfr;q i (r +r i )]g =r; where the last equality is from the fact that: r q i r i 1q i )rq i (r +r i ): We now show that the optimal value function in them-stage problem converges to the one for the infinite stage problem asm goes to infinity. 75 Proposition 13 V (i;r) = lim m!1 V m (i;r). Proof. Assuming policy is the optimal policy for the infinite stage problem, letq 0 = max j=1;:::;n fq j g andr 0 = max j=1;:::;n fr j g, we have: V (i;r) =E [return during the firstm-stage] +E [additional return] V m (i;r) +q m 0 1 X k=0 r 0 q k 0 V m (i;r) + q m 0 r 0 1q 0 : which implies:jV (i;r)V m (i;r)j q m 0 r 0 1q 0 ! 0; asm!1: Proposition 14 V (i;r) is continuous and strictly increasing inr. Proof. From Proposition 17, fixedi, the first part of the claim holds becauseV m (i;r) is continuous inr for anym 0, which can be easily proved by mathematical induction. We’ll use coupling to prove the monotonicity part. Given r 0 > r, assume we have two identical knapsacks where the first knapsack is at state (i;r) and the second one is at state (i;r 0 ). Assuming the optimal policy is applied to the first knapsack, we’ll exactly copy every move of the optimal policy and apply to the second knapsack: since the two knapsacks both have type i items currently available, we will couple the random sources in state transition so that the two knapsacks face the same type of items on each stage; if the optimal policy puts in an item, we put in the same type of item with the same reward by coupling the random sources of the rewards from the two items; the events of breaking knapsack are coupled as well so that either both knapsacks survive or both fail; we stop on the second knapsack whenever the optimal policy stops on the first one. By our 76 policy on the second knapsack described above, because r 0 > r, in any experiment the final total rewards in the second knapsack is greater than that in the first knapsack. Theorem 10 There exists an n-dim non-negative vector (r 1 ; ;r n ) such that for any i2f1; ;ng, at state (i;r), it is optimal to continue if and only ifr<r i . Proof.8i2f1; ;ng, let’s define: r i = inffr :V (i;r) =rg: The existence ofr i is from the fact implied by Proposition 12 and Proposition 13 that: V (i;r) =r;8rb max : (5.2) Now we only have to prove: if it’s optimal to stop at state (i;r), then it’s optimal to stop at state (i;r 0 ) for anyr 0 r. We first show the observation that V (j;r x )V (j;r y )r x r y ; for anyr x r y ;8j2 [1;n]: (5.3) The above inequality can be proved similarly as for the proof of Proposition 14 using coupling: the optimal policy is applied to the first knapsack which has initial state (j;r x ); we copy by coupling every move of the optimal policy and apply to the second knapsack which has initial state (j;r y ). It’s immediately to see that in each experiment, the total rewards from the first knapsack is no greater than that from the second knapsack plus (r x r y ). Thus we’ve proved the inequality (5.3). Givenr 0 r, if it’s optimal to stop at state (i;r), i.e., rq i E[ n X j=1 p ij V (j;r +R i )]; 77 we have: q i E[ n X j=1 p ij V (j;r 0 +R i )]q i E[ n X j=1 p ij V (j;r +R i ) + (r 0 r) ] =q i E[ n X j=1 p ij V (j;r +R i )] +q i (r 0 r) r + (r 0 r) =r 0 ; which impliesV (i;r 0 ) =r 0 . We call (r 1 ; ;r n ) in Theorem 10 the optimal vector because it completely char- acterizes the optimal policy. We have the following two lemmas on the optimal vector. Lemma 138i = 1;:::;n; b i r i b max : Proof. At state (i;r), b i is the critical stopping boundary for the one-stage problem. Therefore, if r < b i , it’s always better to move one more stage and then stop instead of stop right away, which implies: r i b i . Equation (5.2) in the proof of Theorem 10 showsr i b max . Remark 12 An immediate result from the above lemma is r j =b j ifb j =b max : We can also see that: at each critical state (i;r i ), the optimal policy is indifferent between continue and stop. Lemma 14 r i =q i E[ n X j=1 p ij V (j;r i +R i )]: 78 Proof. SinceV (i;r) is continuous inr by Proposition 14, givenr i > 0, r i = lim r!(r i ) + V (i;r) = lim r!(r i ) V (i;r) =q i E[ n X j=1 p ij V (j;r i +R i )]: Lemma 15 At state (i;r), ifr>r i , it’s strictly better to stop than to continue. Proof. q i E[ n X j=1 p ij V (j;r +R i )]q i E[ n X j=1 p ij V (j;r i +R i ) +rr i ] =q i E[ n X j=1 p ij V (j;r i +R i )] +q i (rr i ) =r i +q i (rr i ) <r; where the first inequality from equation (5.3). We want to develop a search algorithm to locate the optimal vector, in which the indifference property described in Lemma 14 will be applied. We show in the following that the optimal decision vector can be determined among all the decision vectors which satisfy the indifference property w.r.t. their own value functions. Let’s order the elements in the optimal vector (r 1 ; ;r n ) so that: r o(1) <r o(2) <r o(n) : 79 For ann-dim non-negative vector (^ r 1 ; ; ^ r n ), let ^ V (i;r) be the value function of the policy characterized by this vector, i.e., at state (i;r), the corresponding policy stops if r ^ r i and continues ifr< ^ r i . Assuming ^ r i =q i E[ n X j=1 p ij ^ V (j; ^ r i +R i )];8i2f1; ;ng; and then elements are ordered as follows: ^ r ^ o(1) < ^ r ^ o(2) < ^ r ^ o(n) ; we have the following proposition uncovering the relationship between the two vectors. Proposition 15 Given a fixed numberk, wherek2f1; ;n 1g, if ^ o(i) =o(i) and ^ r ^ o(i) =r o(i) for alli,k + 1in, then ^ r ^ o(k) r o(k) . Proof. If ^ r ^ o(k) >r o(k) , from the given assumption that ^ o(i) =o(i) and ^ r ^ o(i) =r o(i) ;8i =k + 1;:::;n; we must have: ^ V (j;r) =V (j;r);8j2 [1;n];8r ^ r ^ o(k) ; from which: ^ r ^ o(k) =q ^ o(k) E[ n X j=1 p ij ^ V (j; ^ r ^ o(k) +R ^ o(k) )] =q ^ o(k) E[ n X j=1 p ij V (j; ^ r ^ o(k) +R ^ o(k) )] < ^ r ^ o(k) ; 80 where the last inequality is from Lemma 15. Search Algorithm For The Optimal Vector From Remark 12, we knowo(n) = arg max j=1;:::;n fb j g andr o(n) = b o(n) . Now we want to findo(n1) andr o(n1) with the following observation: 8rr o(n1) ; V (i;r) = 8 > > > < > > > : q i n X j=1 p ij E[V (j;r +R i )] ifi =o(n); r otherwise: (5.4) From Lemma 14, r o(n1) =q o(n1) E[ n X j=1 p o(n1)j V (j;r o(n1) +R o(n1) )]: (5.5) For everyi6=o(n), supposeo(n1) =i, we use the expression ofV (i;r) in (5.4) and plug it into equation (5.5). Then we can solve forr o(n1) where we denote the solution for eachi asr ijo(n1) . Proposition 15 implies: r o(n1) = max i6=o(n) fr ijo(n1) g; ando(n1) = arg max i6=o(n) fr ijo(n1) g: Now we’ve foundo(n),o(n1) and their corresponding critical points. We then extend the expression ofV (i;r) for eachi to incorporate the case whenr o(n2) r r o(n1) . Again we solve for the critical points of alli6= o(n);o(n1) in the indifference equa- tions, and find the pair (o(n2);r o(n2) ) according to Proposition 15. We proceed this procedure iteratively until all elements in the optimal vector are located. The algorithm is summarized as follows: 81 Search Algorithm: Initialization Seto(n) = arg max j2[1;n] fb j g andr o(n) =b o(n) . SetV (i;r) =r;8i2 [1;n];8r 0. LetS =fi :i2 [1;n];i6=o(n)g. Letk =n 1. Loop Whilek6= 0; For allh2 [k + 1;n], update all the functionsV (o(h);r) simultaneously so that they satisfy: V (o(h);r) =q o(h) n X j=1 p o(h)j E[V (j;r +R o(h) )],8rr o(k+1) ; V (o(h);r) doesn’t change whenrr o(k+1) . For everyi inS, with the updated value function, solve forr ijo(k) <r o(k+1) by: r ijo(k) =q i n X j=1 p ij E[V (j;r ijo(k) +R i )]: Seto(k) = arg max i2S fr ijo(k) g andr o(k) =r o(k)jo(k) . Deleteo(k) from the setS. Set k=k-1. End While. The search algorithm outputs the optimal vector. The key step in the algorithm is the update of value functions on each iteration. We show in the numerical example in Section 5.4 that each update of value functions requires solving a group of ODEs. 82 5.3 Whenc> 0 Whenc> 0, it is not always worthwhile to pay the cost to observe the type of the next item. We therefore have two decision points on each stage: we must decide whether to reject or accept the arriving item after we observe its type; and if it is accepted and its acceptance does not break the knapsack, we must decide, after observing its reward, whether to retire or pay c to observe the type of the next item. We say that the state of the system is (i;r; 0) if an arriving item is of typei andr is the total of the reward currently in the knapsack, and that the state is (i;r; 1) if we have just successfully put in a typei item and the updated total rewards isr. Let V (i;r;I) be the optimal value function at state (i;r;I) where I = 0; 1 and I represents one of the two decision points on each stage. Optimality Equation:8i2f1; ;ng,8r 0, V (i;r; 0) = maxfr;q i E[V (i;r +R i ; 1)]g; V (i;r; 1) = maxfr;c + n X j=1 p ij V (j;r; 0)g: (5.6) Let’s denote the optimal value function of them-stage problem at state (i;r;I) as V m (i;r;I). Form 1, we have: V m (i;r; 0) = maxfr;q i E[V m (i;r +R i ; 1)]g; V m (i;r; 1) = 8 > > > < > > > : maxfr;c + n X j=1 p ij V m1 (j;r; 0) whenm 2; r whenm = 1: As for the case ofc = 0, whenc> 0, we have the following parallel results. 83 Proposition 16 For the m-stage problem where m 1, it’s optimal to stop at state (i;r;I) whenrb max . Proposition 17 V (i;r;I) = lim m!1 V m (i;r;I). Proposition 18 V (i;r;I) is continuous and strictly increasing inr. We also have the counterpart of Theorem 10 forc> 0. Theorem 11 For eachi, there exist non-negativer 0 i andr 1 i such that at state (i;r;I), it’s optimal to stop if and only ifrr I i . The setfr 0 i ;r 1 i : i 2 f1; ;ngg fully characterizes the optimal policy of the problem. As in Lemma 13, we have: Lemma 16 b i r 0 i b max ; 0r 1 i b max ; 8i2f1; ;ng: Remark 13 An immediate result from the above lemma is r 0 j =b j ifb j =b max : As in Lemma 14, we can show the indifference property at state (i;r I i ;I) forI = 0; 1 respectively. Lemma 17 The optimal policy is indifferent between continuing and stopping at state (i;r 0 i ; 0), i.e., r 0 i =q i E[V (i;r 0 i +R i ; 1)]: (5.7) Lemma 18 Eitherr 1 i = 0, or, the optimal policy is indifferent between continuing and stopping at state (i;r 1 i ; 1), i.e., r 1 i =c + n X j=1 p ij V (j;r 1 i ; 0): (5.8) 84 Lemma 19 Ifr>r 0 i , then r>q i E[V (i;r +R i ; 1)]; Ifr>r 1 i , then r>c + n X j=1 p ij V (j;r; 0): Remark 14 Since V (j;b max ; 0) = b max for all j, the equation (5.8) can not hold for r 1 i =b max , which implies: r 1 i <b max ;8i2f1; ;ng: Before presenting the search algorithm for allr 0 i ; r 1 i , as we’ve done for the case of c = 0, we show how to determine the characteristic setfr 0 i ;r 1 i : i = 1; ;ng among all the sets which satisfy the indifference property. Let’s order the 2n elements in the characteristic set so that: r I(1) o(1) r I(2) o(2) r I(2n1) o(2n1) r I(2n) o(2n) whereI(k)2f0; 1g ando(k)2f1; ;ng, fork = 1; ; 2n. For another set, sayf^ r 0 i ; ^ r 1 i : i = 1; ;ng, where ^ r 0 i 2 [b i ;b max ] and ^ r 1 i 2 [0;b max ), let ^ V (i;r;I) be the value function of the policy characterized by this set, i.e., at state (i;r;I), the corresponding policy stops if and only ifr ^ r I i . Let’s assume for eachi, ^ r 0 i and ^ r 1 i satisfy respectively their versions of Lemma 17 and Lemma 18 where the value 85 functionV (j;r;I) is replaced by ^ V (j;r;I) in equations (5.7) and (5.8). Now we order all the elements in this new set so that: ^ r I 0 (1) o 0 (1) ^ r I 0 (2) o 0 (2) ^ r I 0 (2n1) o 0 (2n1) ^ r I 0 (2n) o 0 (2n) ; whereI 0 (k)2f0; 1g ando 0 (k)2f1; ;ng, fork = 1; ; 2n. Proposition 19 Givenk2 [1; 2n), if o 0 (j) =o(j); I 0 (j) =I(j); ^ r I 0 (j) o 0 (j) =r I(j) o(j) ;8k + 1j 2n; (5.9) then ^ r I 0 (k) o 0 (k) r I(k) o(k) . The search algorithm for the characteristic setfr 0 i ;r 1 i : i = 1; ;ng is given as follows. Search Algorithm: Initialization Seto(2n) = arg max j2[1;n] fb j g,I(2n) = 0, andr I(2n) o(2n) =b o(n) . SetV (i;r;I) =r;8i2f1; ;ng;r 0;I2f0; 1g. LetS =f(r 0 o(2n) ;o(2n);I(2n))g,S t =f(o(2n);I(2n))g. Letk = 2n 1. Loop Whilek6= 0; For all (i;I)2S t , update all the functionsV (i;r;I) simultaneously so that they satisfy: V (i;r; 0) = maxfr;q i E[V (i;r +R i ; 1)]g;8rr I(k+1) o(k+1) ; 86 V (i;r; 1) = maxfr;c + n X j=1 p ij V (j;r; 0)g;8rr I(k+1) o(k+1) ; V (i;r;I) doesn’t change whenrr I(k+1) o(k+1) . For every (i;I) = 2S t , with the updated value function, solve forr I ijo(k) r I(k+1) o(k+1) by: r 0 ijo(k) =q i E[V (i;r 0 ijo(k) +R i ; 1)]; ifI = 0; r 1 ijo(k) =c + n X j=1 p ij V (j;r 1 ijo(k) ; 0); ifI = 1. If no solution forr 1 ijo(k) , setr 1 ijo(k) = 0. Set (o(k);I(k)) = arg max (i;I)= 2St fr I ijo(k) g andr I(k) o(k) =r I(k) o(k)jo(k) . Add (o(k);I(k)) toS t and add (r I(k) o(k) ;o(k);I(k)) to S. Set k=k-1. End While. The output of the above algorithm isS =f(r I(k) o(k) ;o(k);I(k)) : 1 k 2ng, from which we can determine the optimal policy by Theorem 11. 5.4 A Numerical Example We give a numerical example ofn = 3 whenc = 0 to show the implementation of the presented search algorithm. In this example, we assumeR i ,i = 1;:::;n, are exponen- tial r.v.s. with meanr i . The parameters of this example are given here: i r i q i i = 1 r i b i = q i r i 1q i 1 3 0.4 1/3 2 2 4 0.5 1/4 4 3 6 0.6 1/6 9 p ij 1 2 3 1 0.5 0.25 0.25 2 0.25 0.5 0.25 3 0.25 0.25 0.5 87 Now let’s implement the search algorithm forc = 0 to determine the optimal policy for this example. We use Mathematica in solving ODE group and root finding in the following numerical calculations. . Initialization Seto(n) = 3 andr 3 = 9. SetV (i;r) =r fori = 1; 2; 3 and8r 0. LetS =f1; 2g. Letk = 2. Whenk = 2 We only have to updateV (3;r) so that V (3;r) =q 3 3 X j=1 p 3j E[V (j;r +R 3 )];8rr 3 : Therefore, for 0rr 3 , we have: V (3;r) =q 3 p 31 E[V (1;r +R 3 )] +p 32 E[V (2;r +R 3 )] +p 33 E[V (3;r +R 3 )] =q 3 p 31 E[r +R 3 ] +p 32 E[r +R 3 ] +p 33 E[V (3;r +R 3 )] =q 3 p 31 (r +r 3 ) +q 3 p 32 (r +r 3 ) +q 3 p 33 Z r 3 r 0 3 e 3 t V (3;r +t)dt +e 3 (r 3 r) (r 3 +r 3 ) : Now plug in all the values of parameters into the above equation, we have: V (3;r) = 1:8 + 0:3r + 0:3 Z 9r 0 1 6 e t=6 V (3;r +t)dt + 15e (9r)=6 ;8r2 [0; 9]: 88 Let’s take derivative ofV (3;r) w.r.tr on both sides of the above equation. Then when r2 [0; 9], we get: V r (3;r) = 0:3 + 0:3 Z 9r 0 1 6 e t=6 V r (3;r +t)dt +e (9r)=6 = 0:3 + 0:3 ( 1 6 e t=6 V (3;r +t))j 9r t=0 1 6 Z 9r 0 1 6 e t=6 V (3;r +t)dt +e (9r)=6 = 0:3 + 0:3 15 6 e (9r)=6 1 6 V (3;r) + 1 6 Z 9r 0 1 6 e t=6 V (3;r +t)dt : We cancel the term \ R 9r 0 1 6 e t=6 V (3;r+t)dt" by combining the previous two equa- tions, which leads to: V r (3;r) = 7 60 V (3;r) 1 20 r;8r2 [0; 9]: We solve the above ordinary differential equation with the boundary conditionV (3; 9) = 9 and we get: V (3;r) =e 7 60 r ( Z r 0 e 7 60 t t 20 dt + 4:19);8r2 [0; 9]: We’ve updated the function ofV (3;r) as follows: V (3;r) = 8 > > < > > : e 7 60 r ( R r 0 e 7 60 tt 20 dt + 4:19) 8r2 [0; 9]; r 8r 9: 89 Now we will solve the following two equations for r ij3 < r 3 respectively where i2S: r 1j3 =q 1 3 X j=1 p 1j E[V (j;r 1j3 +R 1 )]; r 2j3 =q 2 3 X j=2 p 2j E[V (j;r 2j3 +R 1 )]: Plugging the latestV (j;r) into the above equations, we get: r 1j3 = 2:30; r 2j3 = 4:21: Sincer 2j3 >r 1j3 , we seto(2) = 2 andr 2 = 4:21. UpdateS =f1g,k = 1. Whenk = 1 We’ll update the functionsV (2;r) andV (3;r) simultaneously so that they satisfy: V (2;r) = 8 > > > < > > > : q 2 3 X j=1 p 2j E[V (j;r +R 2 )] 8rr 2 ; r 8rr 2 : V (3;r) = 8 > > > > > > > < > > > > > > > : q 3 3 X j=1 p 3j E[V (j;r +R 3 )] 8rr 2 ; e 7 60 r ( R r 0 e 7 60 tt 20 dt + 4:19) 8r2 [r 2 ;r 3 ]; r 8rr 3 : 90 For r2 [0;r 2 ], we can expand the equations of V (2;r) and V (3;r), and take the derivative w.r.t.r on both sides of the equations respectively. By cancelling the integra- tion terms as what we did on the last round, we have: V r (2;r) = 3 16 V (2;r) 1 32 V (3;r) 1 32 r; whenr2 [0; 4:21]; withV (2; 4:21) = 4:21; V r (3;r) = 0:15 6 V (2;r) + 0:7 6 V (3;r) 0:15 6 ; whenr2 [0; 4:21]; withV (3; 4:21) = 6:318: The above is an ODE group, from which we get: V (2;r) = 1:545 + 0:21r + 0:712e 0:107r + 0:29e 0:197r ;8r2 [0; 4:21]; V (3;r) = 2:553 + 0:2593r + 1:835e 0:107r 0:0899e 0:197r ;8r2 [0; 4:21]: With the updated functions of V (2;r) and V (3;r), since S =f1g, we solve for r 1 2 [0; 4:21] in the following equation: r 1 =q 1 3 X j=1 p 1j E[V (j;r 1 +R 1 )]; from which we getr 1 = 2:34: Now we setk = 0 and stop the loop. Output r 1 = 2:34; r 2 = 4:21; r 3 = 9: The optimal policy for this example: at state (i;r), continue if r 2 [0;r i ); stop otherwise. 91 Chapter 6 Conclusions In this thesis, we study different variants of stochastic knapsack problems. In these models, it is assumed that: there aren types of items available, each type has an infinite supply, and an item’s type determines the joint distribution of its weight and reward. Except for the static SKP with simple recourse and penalty model, we always assume that once the knapsack is broken, all existing rewards are wiped out and we have to leave the system without any return. This assumption holds in the application when the realization of an item’s reward depends on successful realizations of all others’ rewards. We first study two static SKP models: the static BKP and the static SKP with simple recourse and penalty. We show that under certain distributions on items’ weights, there exist two nice properties for both models with constant capacity, i.e., the unimodality of the expected return function and the monotonicity of the marginal optimal decision functions. When the knapsack capacity is exponentially distributed, the two properties hold for general weight distributions in the static BKP model; we can find a sufficient condition under which the properties hold for the second model. We apply the two properties to develop an efficient search algorithm to locate the optimal decisions for the two models, i.e., the optimal quantity on each type of items to be put in the knapsack. The future direction on the static models includes research on the chance constraint model, in which the objective is to maximize the expected return subject to a given level of the probability of breaking the knapsack. We then move to study two adaptive SKP models respectively: the first model is the adaptive BKP model , in which it’s assumed that the knapsack has a constant capacity 92 and an item’s reward is proportional to its exponentially distributed random weight; the second model is the adaptive SKP with exponential capacity model, in which we only assume general joint distributions on item’s reward and weight for each type. For the first model, we show that there exists an optimal policy whenn = 2 and we also give an efficient heuristic policy for general n. In the future research, we can consider the assumption where an item’s reward is independent of its weight. The problem structure is totally different under this assumption and the optimal policy when n = 2 for the original model is no longer the optimal. For the second model, we discuss three different assumptions on the joint distributions. We prove that under these specific conditions, we can find schemes which lead to the optimal policies or at least suboptimal one. The future work for this model may consider some other common joint distributions like normal distributions. We also study a model where items arrive to the system one by one: the markovian SKP with exponential capacity model. In this model, we have to pay a fixed costc to observe the type of the next arriving item. We give the basic idea in developing the search algorithm for the optimal solution in the case ofc = 0. We then generalize the idea to present the counterpart algorithm whenc > 0. In this model, once we reject an item, we have to leave the system. In the future, we may consider the assumption that any rejection of items only incurs a fixed cost. This is a more realistic assumption and is worth further investigation. 93 Bibliography [An97] Mark An. Log-concave probability distributions: Theory and statistical test- ing. Duke University Dept of Economics Working Paper, (95-03), 1997. [BB05] Ted Bergstrom and Mark Bagnoli. Log-concave probability and its applica- tions. Economic theory, 26:445–469, 2005. [CM98] Amy Mainville Cohn and Cynthia Barnhart Mit. The stochastic knapsack problem with random weights: A heuristic approach to robust transportation planning. In in Proceedings of the Triennial Symposium on Transportation Analysis. Citeseer, 1998. [DGV04] B.C. Dean, M.X. Goemans, and J. V ondrdk. Approximating the stochastic knapsack problem: The benefit of adaptivity. In Foundations of Computer Science, 2004. Proceedings. 45th Annual IEEE Symposium on, pages 208– 217. IEEE, 2004. [DLR78] C. Derman, G. J. Lieberman, and S. M. Ross. A renewal decision problem. Management Science, 24(5):pp. 554–561, 1978. [Fer] Thomas S. Ferguson. Optimal Stopping and Applications. [GI99] A. Goel and P. Indyk. Stochastic load balancing and related problems. In Foundations of Computer Science, 1999. 40th Annual Symposium on, pages 579–586. IEEE, 1999. [Hen90] M.I. Henig. Risk criteria in a stochastic knapsack problem. Operations Research, 38(5):820–825, 1990. [Kad71] Joseph B Kadane. How to Burgle if You Must: A Decision Problem. Defense Technical Information Center, 1971. [Kar68] Samuel Karlin. Total positivity, volume 1. Stanford University Press, 1968. 94 [KL10] Stefanie Kosuch and Abdel Lisser. Upper bounds for the 0-1 stochastic knapsack problem and a branch-and-bound algorithm. Annals of Operations Research, 176:77–93, 2010. 10.1007/s10479-009-0577-5. [KL11] S. Kosuch and A. Lisser. On two-stage stochastic knapsack problems. Dis- crete Applied Mathematics, 159(16):1827–1841, 2011. [Kol67] P.J. Kolesar. A branch and bound algorithm for the knapsack problem. Man- agement Science, 13(9):723–735, 1967. [KP98] Anton J. Kleywegt and Jason D. Papastavrou. The dynamic and stochastic knapsack problem. Operations Research, 46:17–35, 1998. [KP01] Anton J. Kleywegt and Jason D. Papastavrou. The dynamic and stochastic knapsack problem with random sized items. Operations Research, 49(1):26– 41, January/February 2001. [KPP04] H. Kellerer, U. Pferschy, and D. Pisinger. Knapsack Problems. Springer, Berlin, Heidelberg, 2004. [KRT97] J. Kleinberg, Y . Rabani, and ´ E. Tardos. Allocating bandwidth for bursty connections. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, pages 664–673. ACM, 1997. [LCJ99] L. L. Lu, S. Y . Chiu, and L. A. Cox Jr. Optimal project selection: Stochastic knapsack with finite time horizon. The Journal of the Operational Research Society, 50(6):pp. 645–650, 1999. [LLY08] G.Y . Lin, Y . Lu, and D.D. Yao. The stochastic knapsack revisited: Switch- over policies and dynamic pricing. Operations Research, 56(4):945–957, 2008. [MW98] D.P. Morton and R.K. Wood. On a stochastic knapsack problem and gener- alizations. Advances in Computational and Stochastic Optimization, Logic Programming, and Heuristic Search: Interfaces in Computer Science and Operations Research, pages 149–168, 1998. [NS05] Asok K Nanda and Debasis Sengupta. Discrete life distributions with decreasing reversed hazard. Sankhy¯ a: The Indian Journal of Statistics, pages 106–125, 2005. [Pre73] Andras Prekopa. On logarithmic concave measures and functions. Acta Sci. Math.(Szeged), 34:335–343, 1973. [Ros83] Sheldon M. Ross. Introduction to Stochastic Dynamic Programming: Prob- ability and Mathematical. Academic Press, Inc., Orlando, FL, USA, 1983. 95 [RT89] K.W. Ross and D.H.K. Tsang. The stochastic knapsack problem. Communi- cations, IEEE Transactions on, 37(7):740–747, 1989. [RTW09] Sheldon M Ross, Henk Tijms, and Shinyi Wu. A model for locking in gains with an application to clinical trials. Probability in the Engineering and Infor- mational Sciences, 23(4):637, 2009. [Sch94] K. Schilling. Random knapsacks with many constraints. Discrete applied mathematics, 48(2):163–174, 1994. [SMI78] D. R. SMITH. Note on ’a renewal decision problem’. Management Science, 24(5), 1978. [Sni80] M. Sniedovich. Preference order stochastic knapsack problems: methodolog- ical issues. Journal of the Operational Research Society, pages 1025–1032, 1980. [SP79] E. Steinberg and MS Parks. A preference order dynamic program for a knap- sack problem with stochastic rewards. Journal of the Operational Research Society, pages 141–147, 1979. [SS06] Moshe Shaked and J George Shanthikumar. Stochastic orders. Springer, 2006. [TID11] S. M. R. Iravani T. Ilhan and M. S. Daskin. The adaptive knapsack problem with stochastic rewards. Operations Research, 59:1:242–248, 2011. [Tot80] P. Toth. Dynamic programming algorithms for the zero-one knapsack prob- lem. Computing, 25(1):29–45, 1980. [VSY00] R. Van Slyke and Y . Young. Finite horizon stochastic knapsacks with appli- cations to yield management. Operations Research, 48(1):155–172, 2000. [Woo09] Gordon Woo. Intelligence constraints on terrorist network plots. In Nasrullah Memon, Jonathan David Farley, David L. Hicks, and Torben Rosenorn, edi- tors, Mathematical Methods in Counterterrorism, pages 205–214. Springer Vienna, 2009. 96 Appendix For Chapter 2 Proof for inequality (2.1) in Corollary 1 Denote: S = X i6=j k i X h=1 W ih ; v = X i6=j k i v i ; whereW ih F i 8i2 [1;n], i.e.,S andv are the total weights and the total values of all other types excluding typej. We also assume: E[S] =;Var(S) = 2 : Because: R(k 1 ; ;m; ;k n ) = (v +mv j )P S + m X h=1 W jh w ; 97 after applying the central limit theorem, we have: lim m!+1 R(k 1 ; ;m; ;k n ) = lim m!+1 (v +mv j )( wvmw j q 2 +mw 2 j ) = lim m!+1 (v +mv j )( p m) = lim m!+1 mv j ( p m); where () is the cdf for the standard normal distributionN(0; 1). We have the following upper-bound for the term ( p m): ( p m) = 1 ( p m) = Z 1 p m 1 p 2 e x 2 2 dx Z 1 p m 1 p 2 x p m e x 2 2 dx = e m 2 p 2m : Therefore, lim m!+1 mv j ( p m) lim m!+1 mv j e m 2 p 2m = 0: 98 Proof of Lemma 2 Denote:Y =wS(0);fW h g h1 is a sequence of i.i.d exponentially distributed random variable with mean w j . We can rewrite the definition of the random variable N as follows: N = 8 > > > < > > > : 0 ifY < 0; k if k1 X h=1 W h Y and k X h=1 W h >Y:8k 1 Let’s denote ^ N = [NjN > 0], letf(y) be the probability density function (pdf) of Y , and ^ f(y) as the pdf for [YjY 0]. If S(0) 0, then ^ N is a poisson distribution which implies the log-concavity of ^ N. Otherwise,S(0) is the sum of some independent exponentially distributed random variables whose pdf are all log-concave, from the fact that the convolution of log-concave distributions preserves the log-concavity (see Theorem 7 in Prekopa [Pre73]), we know f(y) must be log-concave in the domain (1;w]. Therefore, ^ f(y) is log-concave in [0;w]. Now we want to show ^ N is log-concave, which follows Lemma 20 below. We now want to prove Lemma 20. The idea of the proof here is from Nanda and Sengupta [NS05]. The authors in that paper proved that: the number of events which arrive according to a Poisson process during a stochastic time period is in discrete DRH if the distribution of the length of this time period is in continuous DRH. In Lemma 20, we prove: the distribution of the number of events is discrete log-concave if the distribution of the time length is continuous log-concave. 99 Lemma 20 IffN(t);t 0g is a Poisson process with rate and is independent ofT , a positive continuous log-concave random variable, then the discrete random variable N(T ) is log-concave. Proof. Let p(k), k 2 N, be the pmf of N(T ). To prove N(T ) is log-concave, it’s equivalent to show: p(k+1) p(k) is non-increasing fork 0. Assuming the continuous r.v.T has pdff(t), and denote: w = sup t0 ft :f(t)> 0g: Let’s define the following functional fork 0, (;g;k) = Z w 0 e x (x) k k! g(x)dx; whereg is any function which makes the above integration well-defined. It’s easy to verify: p(k) = (;f;k);8k 0: Letf 0 (t) be the first derivative off(t) ont, then fork 0, we have: (;f 0 ;k) =C(k) +(;f;k)(;f;k 1); It’s required that the pdf ofT has first order derivative in its domain. 100 whereC(k) =e w (w) k k! f(w)f(0)I k=0 , and (;f;1) = 0. Hence, p(k + 1)=p(k) is non-increasing ink fork 0 , (;f;k + 1)=(;f;k) is non-increasing ink fork 0 , (;f;k) (;f;k 1) =(;f;k) is non-increasing ink fork 1 , (;f 0 ;k)C(k) =(;f;k) is non-increasing ink fork 1 ( (;f 0 ;k)=(;f;k) is non-increasing ink fork 1; andC(k)=(;f;k) is non-decreasing ink fork 1: Now we only have to prove the following two claims: (A1) (;f 0 ;k)=(;f;k) is non-increasing ink fork 1; (A2)C(k)=(;f;k) is non-decreasing ink fork 1. Let’s first prove the claim (A2), fork 1: C(k)=(;f;k) is non-decreasing ink ,e w (w) k k! f(w)= Z w 0 e x (x) k k! f(x)dx is non-decreasing ink , Z w 0 e x (x) k (w) k f(x)dx is non-increasing ink ( (x) k (w) k is non-increasing ink for allx2 [0;w]: To prove the claim (A1), we’ll utilize the log-concave property of f(y) on [0;w]. Because f(t) is log-concave on [0;w], f 0 (t) f(t) is monotone deceasing (see Remark1 in 101 Bagnoli and Bergstrom [BB05]). Therefore, for any > 0,f 0 (t)f(t) changes sign at most once from positive to negative ast goes from 0 tow. Denote K(x;k) =e x (x) k k! ; then K(x;k) is TP 2 (total positivity of order 2) over [0;w]N. From Chapter 5 in Karlin [Kar68], the variation diminishing property ofK(x;k) implies that (;f 0 ;k) (;f;k) changes sign at most once from positive to negative as k goes from 0 to 1, which implies that (;f 0 ;k)=(;f;k) is monotone decreasing. Therefore, we’ve proved the claim (A1). 102 Appendix For Chapter 3 Proof of Lemma 5 r v v 1 (r)→ v 2 (r)→ (r,v) case 1 r v v 1 (r)→ v 2 (r)→ (r,v) V 1 (r,v) R 11 (r,v) −v 1 case 2 Figure B.1: Given V 1 (r;v)E 2 [V 1 (rT;v +v 2 T )], Case 1 considersvv 1 (r), from equations (3.5), we knowV 1 (rh;v +v 1 h) =v +v 1 h; 8 0hr; Case 2 considers v > v 1 (r), which means the state (r;v) is outside the stopping zone determined by boundaryv 1 (). 103 Case 1:vv 1 (r) Under this case (check Case 1 in Figure B.1) where the point (r;v) is above the curve v 1 (), it’s ready to see V 1 (rh;v +v 1 h) =v +v 1 h; 8 0hr: Given V 1 (r;v)E 2 [V 1 (rT;v +v 2 T )]; we have: vE 2 [(v +v 2 T )I Tr ]; which implies (r;v) is above the critical curvev 2 (). Therefore, forh2 [0;r], the state point (rh;v +v 1 h) is always in the stopping domain, which implies: V 1 (rh;v +v 1 h)E 2 [V 1 (rhT;v +v 1 h +v 2 T )]: Case 2:v<v 1 (r) In this case, we have:r>R 11 (r;v): Forh2 [0;r], let’s define the function H(h) :=V 1 (rh;v +v 1 h)E 2 [V 1 (rhT;v +v 1 h +v 2 T )]: H(h), where h2 [0;r], is a continuous function of h with the given assumption that H(0) 0. The lemma will immediately follow if we can proveH(h) is an increasing function inh whenh2 [0;r]. 104 Case 2.1: 0hrR 11 (r;v) The first term ofH(h) is constant onh because V 1 (rh;v +v 1 h) =v 1 (R 11 (r;v));8h2 [0;rR 11 (r;v)]: Let’s expand the 2nd term ofH(h) and find its partial derivative w.r.th. We try to prove the derivative of the expectation term is negative in order to showH(h) is increasing in h when 0hrR 11 (r;v). Denoter h =R 21 (rh;v +v 1 h), we have: E 2 [V 1 (rhT;v +v 1 h +v 2 T )] = Z rh 0 2 e 2 t V 1 (rht;v +v 1 h +v 2 t)dt = Z rhr h 0 2 e 2 t V 1 (rht;v +v 1 h +v 2 t)dt + Z rh rhr h 2 e 2 t (v +v 1 h +v 2 t)dt; 105 dE 2 [V 1 (rhT;v +v 1 h +v 2 T )] dh = Z rhr h 0 2 e 2 t V 1 (rht;v +v 1 h +v 2 t) h dt + (rhr h ) h 2 e 2 t V 1 (rht;v +v 1 h +v 2 t)j t=rhr h + Z rh rhr h 2 e 2 t (v +v 1 h +v 2 t) h dt + (rh) h 2 e 2 t (v +v 1 h +v 2 t)j t=rh (rhr h ) h 2 e 2 t (v +v 1 h +v 2 t)j t=rhr h = Z rhr h 0 2 e 2 t 0dt + (1 (r h ) h ) 2 e 2 (rhr h ) (v +v 1 h + (rhr h )v 2 ) + Z rh rhr h 2 e 2 t v 1 dt 2 e 2 (rh) (v +v 1 h + (rh)v 2 ) (1 (r h ) h ) 2 e 2 (rhr h ) (v +v 1 h + (rhr h )v 2 ) = Z rh rhr h 2 e 2 t v 1 dt 2 e 2 (rh) (v +v 1 h + (rh)v 2 ) =v 1 e 2 (rhr h ) v 1 e 2 (rh) 2 e 2 (rh) (v +v 1 h + (rh)v 2 ) =e 2 (rh) v 1 e 2 r h v 1 2 (v +v 1 h + (rh)v 2 ) : Becauser h =R 21 (rh;v +v 1 h), from the definition ofR 21 (;), v 1 (r h ) =v +v 1 h + (rhr h )v 2 : On the other hand, v 1 (r h )= v 1 1 (e 1 r h 1 1 r h ): Therefore we have: 106 v +v 1 h +v 2 (rh) = v 1 1 (e 1 r h 1 1 r h ) +v 2 r h : (a:1) In order to prove the partial derivative is negative, we only have to show: v 1 e 2 r h v 1 2 (v +v 1 h + (rh)v 2 )< 0: By equation (a:1), we have: v 1 e 2 r h v 1 2 (v +v 1 h + (rh)v 2 ) =v 1 e 2 r h v 1 2 ( v 1 1 (e 1 r h 1 1 r h ) +r h v 2 ) =v 1 1 X k=0 ( 2 r h ) k k! v 1 2 v 1 1 1 X k=2 ( 1 r h ) k k! 2 v 2 r h = 1 X k=2 v 1 2 (r h ) k ( 2 ) k1 ( 1 ) k1 k! +v 1 +v 1 2 r h v 1 v 2 2 r h < 0; where the last inequality holds becausev 1 <v 2 ; and 1 > 2 . Case 2.2:rR 11 (r;v)hr H(h) =V 1 (rh;v +v 1 h)E 2 [V 1 (rhT;v +v 1 h +v 2 T )] = (v +v 1 h)E 2 [(v +v 1 h +v 2 T )I Trh ] = (v +v 1 h) Z rh 0 2 e 2 t (v +v 1 h +v 2 t)dt H 0 (h) =v 1 Z rh 0 2 e 2 t (v +v 1 h +v 2 t) h dt (rh) h 2 e 2 t (v +v 1 h +v 2 t)j t=rh =v 1 v 1 (1e 2 (rh) ) + 2 e 2 t (v +v 1 h +v 2 t)j t=rh =v 1 e 2 (rh) + 2 e 2 t (v +v 1 h +v 2 t)j t=rh 0: 107 Combining Case 2.1 and Case 2.2, we’ve shownH(h) is always increasing inh when h2 [0;r] under Case 2. Proof of Lemma 6 Case 1:vv 1 (r) Similar to the proof of Case 1 in Lemma 5, this part of Lemma 6 can immediately be proved because the state (r;v) is already in stopping domain, i.e., (r;v) is above both v 1 () andv 2 (). Case 2:v<v 1 (r) Case 2.1:h<rR 21 (r;v) Given V 1 (r;v)E 2 [V 1 (rT;v +v 2 T )]; we have: V 1 (r;v)e 2 h E 2 [V 1 (rT;v+v 2 T )jTh]+(1e 2 h )E 2 [V 1 (rT;v+v 2 T )jT <h]: Because of the memoryless property of exponentially distributed r.v., E 2 [V 1 (rT;v +v 2 T )jTh] =E 2 [V 1 (rhT;v +v 2 h +v 2 T )]: 108 When 0th<rR 21 (r;v), V 1 (rt;v +v 2 t)V 1 (r;v); which implies: E 2 [V 1 (rT;v +v 2 T )jT <h]V 1 (r;v): Therefore, V 1 (r;v)e 2 h E 2 [V 1 (rhT;v +v 2 h +v 2 T )] + (1e 2 h )V 1 (r;v); i.e., V 1 (r;v)E 2 [V 1 (rhT;v +v 2 h +v 2 T )]: Because V 1 (rh;v +v 2 h)V 1 (r;v);8h<rR 21 (r;v); we’ve shown: V 1 (rh;v +v 2 h)E 2 [V 1 (rhT;v +v 2 h +v 2 T )];8h<rR 21 (r;v): Case 2.2:rR 21 (r;v)hr As in the proof of Lemma 2, let’s define: H(h):=V 1 (rh;v +v 2 h)E 2 [V 1 (rhT;v +v 2 h +v 2 T )]: 109 It’s easy to seeH(h) is a continuous function ofh,8 hr. WhenrR 21 (r;v)hr, we have: H(h)=V 1 (rh;v +v 2 h)E 2 [V 1 (rhT;v +v 2 h +v 2 T )] =(v +v 2 h) Z rh 0 2 e 2 t (v +v 2 h +v 2 t)dt: Differentiation yields: H 0 (h)=v 2 Z rh 0 2 e 2 t (v +v 2 h +v 2 t) h dt (rh) h 2 e 2 t (v +v 2 h +v 2 t)j t=rh =v 2 v 2 (1e 2 (rh) ) + 2 e 2 (rh) (v +v 2 r) =v 2 e 2 (rh) + 2 e 2 (rh) (v +v 2 r) > 0: From Case 2.1, H(h)j h=rR 21 (r;v) 0: Therefore, H(h) 0;8rR 21 (r;v)hr; which concludes the proof. Proof of Corollary 4 Given V 1 (r;v)E 2 [V 1 (rT;v +v 2 T )]; by Lemma 5 & Lemma 6, fori = 1; 2, we have V 1 (rh;v +v i h)E 2 [V 1 (rhT;v +v i h +v 2 T )];8h2 [0;r]; 110 which implies: V 2 (rh;v +v i h) =V 1 (rh;v +v i h);8h2 [0;r]: Therefore, LHS =V 2 (r;v) =V 1 (r;v); and RHS = maxfv; max i=1;2 f Z r 0 i e i t V 2 (rt;v +v i t)dtgg = maxfv; max i=1;2 f Z r 0 i e i t V 1 (rt;v +v i t)dtgg = maxfv; max i=1;2 fE i [V 1 (rT;v +v i T )]gg: Because V 1 (r;v) = maxfv;E 1 [V 1 (rT;v +v 1 T )]g; and V 1 (r;v)E 2 [V 1 (rT;v +v 2 T )]; we have: RHS = max v; max i=1;2 fE i [V 1 (rT;v +v i T )]g =V 1 (r;v) =LHS; which concludes the proof. 111 Proof of Lemma 7 r v v 1 (r)→ v 2 (r)→ (r,v) R 12 (r,v) R 22 (r,v) case 1 r v v 1 (r)→ v 2 (r)→ (r,v) R 11 (r,v) R 21 (r,v) (r’,v’) case 2 Figure B.2: Given V 1 (r;v) E 2 [V 1 (rT;v +v 2 T )], Case 1 considersV 1 (r;v) = v, i.e.,vv 1 (r); Case 2 considersV 1 (r;v)>v, i.e.,v<v 1 (r). Given V 1 (r;v)E 2 [V 1 (rT;v +v 2 T )]; (b:1) we want to prove: V 2 (r;v) = max v; max i=1;2 fE i [V 2 (rT;v +v i T )]g : (b:2) We consider two different cases to proceed the proof on whether the current state (r;v) is above the curvev 1 () or not (See Figure B.2). Case 1:V 1 (r;v) =v 112 We havevv 1 (r) and V 1 (rt;v +v i t) =v +v i t; 8t2 [0;r];i = 1; 2: From the above observation, we know that Policy Two for the system starting at state (r;v) is equivalent to the one-stage look-ahead stopping rule as if only type 2 items are available. From Proposition 8, whenv v 1 (r), it is never optimal to put in type 1 items at state (r;v) and the following states. Therefore, Policy Two is the optimal policy under this case, which implies the equation (b:2). Case 2:V 1 (r;v)>v We first make two definitions: Good Condition: if state (r;v) satisfiesV 1 (r;v)E 2 [V 1 (rT;v +v 2 T )], we say the state (r;v) meets Good Condition. Critical Condition: if state (r;v) satisfiesV 1 (r;v) = E 2 [V 1 (rT;v +v 2 T )], we say the state (r;v) meets Critical Condition and (r;v) is a Critical Point. On the linef(rt;v +v 2 t) : 0 t rg, denote (r 0 ;v 0 ) as the rightmost point that satisfies Good Condition. It is easy to see that there must exist such (r 0 ;v 0 ) given the inequality (b:1). According to Lemma 6, we know that the state (r 0 h;v 0 +v 2 h) satisfies Good Condition for all 0 h r 0 ; and because (r 0 ;v 0 ) is the rightmost point on the linef(rt;v +v 2 t) : 0 t rg, any state (r 0 +h;v 0 v 2 h), whereh > 0, cannot satisfy Good Condition. The continuity of all the functions involved here implies that (r 0 ;v 0 ) is a Critical Point. To complete the proof, we consider two subcases. Subcase 2.1:r 0 R 21 (r;v) 113 Starting from state (r;v), noter 0 < r, by Policy Two, we will keep inserting items of type 2 until the remaining capacity is less than r 0 . By the memoryless property of exponential r.v.s, we have: E 2 [V 2 (rT;v +v 2 T )] =E 2 [V 2 (r 0 T;v 0 +v 2 T )] =E 2 [V 1 (r 0 T;v 0 +v 2 T )] =V 1 (r 0 ;v 0 ) =v 1 (R 11 (r 0 ;v 0 )); whereR 11 (r 0 ;v 0 ) is well defined sincer 0 R 21 (r;v). Because (R 11 (r 0 ;v 0 );v 1 (R 11 (r 0 ;v 0 ))) must satisfy Good Condition by Lemma 5, this point must be on the left side ofv 2 (), i.e., v 2 (R 11 (r 0 ;v 0 ))v 1 (R 11 (r 0 ;v 0 )): We first want to show: for all points (r h ;v h ) on the line segment [(R 11 (r;v);v 1 (R 11 (r;v))); (r;v)] , we always haveV 2 (r h ;v h )v 1 (R 11 (r 0 ;v 0 )): Because all points on [(R 11 (r 0 ;v 0 );v 1 (R 11 (r 0 ;v 0 ))); (r 0 ;v 0 )] satisfy Good Condition, for all the lines of slopev 2 which intersect with this line segment, the Critical Points on those lines must be below [(R 11 (r 0 ;v 0 );v 1 (R 11 (r 0 ;v 0 ))); (r 0 ;v 0 )]. For any point (r h;v +v 1 h), where 0h (r 0 R 11 (r 0 ;v 0 )), either it satisfies Good Condition, or there exists one point which is on the linef(rht;v +v 1 h +v 2 t) : 0 t rhg but for any two points(r b ;v b ) and(r e ;v e ), we denote[(r b ;v b );(r e ;v e )] as the line segment which con- nects the two points. 114 below [(R 11 (r 0 ;v 0 );v 1 (R 11 (r 0 ;v 0 ))); (r 0 ;v 0 )], and this point satisfies Critical Condition. For both cases V 2 (rh;v +v 1 h)v 1 (R 11 (r 0 ;v 0 ));8 0h (r 0 R 11 (r 0 ;v 0 )): Given a point (r h ;v h ) which is on [(R 11 (r;v);v 1 (R 11 (r;v))); (r (r 0 R 11 (r 0 ;v 0 ));v + (r 0 R 11 (r 0 ;v 0 ))v 1 )], if (r h ;v h ) satisfies Good Condition, then V 2 (r h ;v h ) =V 1 (r h ;v h ) =v 1 (R 11 (r;v))v 1 (R 11 (r 0 ;v 0 )): If (r h ;v h ) doesn’t satisfy Good Condition, let’s denote (r 0 h ;v 0 h ) as the Critical Point on the linef(r h t;v h +v 2 t): 0trhg. Then (r 0 h ;v 0 h ) can be either on right side of v 1 (), or on left side ofv 1 (). On the former case, V 2 (r 0 h ;v 0 h ) =v 1 (R 11 (r 0 h ;v 0 h ))v 1 (R 11 (r 0 ;v 0 )); on the latter case, because (r 0 h ;v 0 h ) must be onv 2 (), V 2 (r 0 h ;v 0 h ) =v 0 h =v 2 (r 0 h )v 2 (R 11 (r 0 ;v 0 ))v 1 (R 11 (r 0 ;v 0 )): Therefore, if (r h ;v h ) doesn’t satisfy Good Condition, V 2 (r h ;v h ) =V 2 (r 0 h ;v 0 h )v 1 (R 11 (r 0 ;v 0 )): Thus for any (r h ;v h ) on [(R 11 (r;v);v 1 (R 11 (r;v))); (r (r 0 R 11 (r 0 ;v 0 ));v + (r 0 R 11 (r 0 ;v 0 ))v 1 )], V 2 (r h ;v h )v 1 (R 11 (r 0 ;v 0 )) 115 In all the discussions so far, we’ve proved that for all points (r h ;v h ) on the line interval [(R 11 (r;v);v 1 (R 11 (r;v))); (r;v)], V 2 (r h ;v h )v 1 (R 11 (r 0 ;v 0 )): Now we’ll use this result to prove the rest of the lemma. If (r;v) is above the curve v 2 (), then the proof is trivial. So in the following we always assume that (r;v) is belowv 2 () so thatR 12 (r;v) is well defined. IfR 12 (r;v)<R 11 (r;v), let’s denote (r 00 ;v 00 ) be the Critical Point on the linef(r t;v +v 1 t) : 0trg, then we must haver 00 =R 12 (r;v). Because r 00 <R 11 (r;v)<R 11 (r 0 ;v 0 ); by the monotonicity ofv 2 (), V 2 (r 00 ;v 00 ) =V 1 (r 00 ;v 00 ) =v 00 =v 2 (r 00 )<v 2 (R 11 (r 0 ;v 0 ))v 1 (R 11 (r 0 ;v 0 )): For all points (r h ;v h ) on the line [(r 00 ;v 00 ); (R 11 (r;v);v 1 (R 11 (r;v)))], we have V 2 (r h ;v h ) =v 2 (R 22 (r h ;v h ))<v 2 (R 11 (r 0 ;v 0 ))v 1 (R 11 (r 0 ;v 0 )): 116 Therefore: E 1 [V 2 (rT;v +v 1 T )] =P 1 T >rr 00 E 1 [V 2 (rT;v +v 1 T )jT >rr 00 ] +P 1 Trr 00 E 1 [V 2 (rT;v +v 1 T )jTrr 00 ] P 1 T >rr 00 E 1 [(v 00 +v 1 T )I Tr 00] +P 1 Trr 00 v 1 (R 11 (r 0 ;v 0 )) P 1 T >rr 00 v 00 +P 1 Trr 00 v 1 (R 11 (r 0 ;v 0 )) v 1 (R 11 (r 0 ;v 0 )) IfR 12 (r;v)R 11 (r;v), denote (r 00 ;v 00 ) = (R 11 (r;v);v 1 (R 11 (r;v))), using the exact same procedure as the above, we can see: E 1 [V 2 (rT;v +v 1 T )]v 1 (R 11 (r 0 ;v 0 )): So we’ve proved under Subcase 2.1, E 1 [V 2 (rT;v +v 1 T )]v 1 (R 11 (r 0 ;v 0 )): Combining the above result with E 2 [V 2 (rT;v +v 2 T )] =v 1 (R 11 (r 0 ;v 0 )); and E 2 [V 2 (rT;v +v 2 T )]>v; 117 we have: V 2 (r;v) =E 2 [V 2 (rT;v +v 2 T )] = max v; max i=1;2 fE i [V 2 (rT;v +v i T )]g : Subcase 2.2:r 0 <R 21 (r;v) It’s easy to see that (r 0 ;v 0 ) is above the curvev 1 (). Because we assume that (r 0 ;v 0 ) is a Critical Point, (r 0 ;v 0 ) must be on the curvev 2 () and we have: V 2 (r;v) =E 2 [V 2 (rT;v +v 2 T )] =v 0 : Let (r 00 ;v 00 ) be the Critical Point on the linef(rt;v +v 1 t) : 0trg. Ifr 00 <R 11 (r;v), then we must haver 00 <r 0 , and V 2 (r 00 ;v 00 ) =v 00 =v 2 (r 00 )<v 2 (r 0 ) =v 0 : Ifr 00 R 11 (r;v), then V 2 (r 00 ;v 00 ) =v 1 (R 11 (r;v))v 1 (R 21 (r;v))<v 0 : We can use similar arguments as for Subcase 2.1 to show that for all the points (r h ;v h ) on the line interval [(r 00 ;v 00 ); (r;v)], we have: V 2 (r h ;v h )v 0 ; which implies: E 1 [V 2 (rT;v +v 1 T )]v 0 =E 2 [V 2 (rT;v +v 2 T )] =V 2 (r;v): 118 Therefore under Subcase 2.2 we can prove: V 2 (r;v) = max v; max i=1;2 fE i [V 2 (rT;v +v i T )]g : Above all, Lemma 7 is proved. Proof of Proposition 9 Let’s define: f(r) =v 1 (r)v 2 (r); then f 0 (r) =v 1 (e 1 r 1)v 2 (e 2 r 1); and f 00 (r) =v 1 1 e 1 r v 2 2 e 2 r : We also have: f(0) =f 0 (0) = 0; f 00 (0) =v 1 1 v 2 2 : Let’s first assumev 1 1 v 2 2 . Because 1 > 2 ,8r> 0, f 00 (r) =v 1 1 e 1 r v 2 2 e 2 r v 2 2 (e 1 r e 2 r )> 0 )f 0 (r)>f 0 (0) = 0 )f(r)>f(0) = 0; 119 which proves the first case in the proposition. Now let’s assumev 1 1 <v 2 2 . We have f 00 (0) =v 1 1 v 2 2 < 0; and given 1 > 2 , we also know f 00 (+1)> 0: The continuity of f 00 (r) implies that there must exist some points r;r > 0, such that f 00 (r) = 0. Let’s definer 2 = inffr : f 00 (r) 0g.8> 0, we have: f 00 (r 2 +) =v 1 1 e 1 r 2 e 1 v 2 2 e 2 r 2 e 2 >e 2 f 00 (r 2 ) = 0: Therefore, f 00 (r)< 0;80r<r 2 ;f 00 (r)> 0;8r>r 2 : Becausef 0 (0) = 0 andf 0 (+1)> 0, the above result implies that there existsr 1 , where r 1 >r 2 , such that: f 0 (r)< 0;80<r<r 1 ;f 0 (r)> 0;8r>r 1 : With the facts thatf(0) = 0 andf(+1)> 0, we know from the above observation that there existsr 0 , wherer 0 >r 1 , such that: f(r)< 0;80<r<r 0 ;f(r)> 0;8r>r 0 : 120 This proves the second case in the proposition. 121
Abstract (if available)
Abstract
We consider variants of stochastic knapsack problems with different problem settings, on-line vs off-line, and with different assumptions on the objective functions. We try to find optimal policies for some and give heuristics for others.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Computational stochastic programming with stochastic decomposition
PDF
A stochastic employment problem
PDF
A stochastic conjugate subgradient framework for large-scale stochastic optimization problems
PDF
The fusion of predictive and prescriptive analytics via stochastic programming
PDF
On the interplay between stochastic programming, non-parametric statistics, and nonconvex optimization
PDF
Stochastic games with expected-value constraints
PDF
Stochastic models: simulation and heavy traffic analysis
PDF
Dynamic programming-based algorithms and heuristics for routing problems
PDF
Bayesian optimal stopping problems with partial information
PDF
Learning enabled optimization for data and decision sciences
PDF
New approaches for routing courier delivery services
PDF
Some bandit problems
PDF
The warehouse traveling salesman problem and its application
PDF
Learning and control in decentralized stochastic systems
PDF
Stochastic optimization in high dimension
PDF
Asymptotic analysis of the generalized traveling salesman problem and its application
PDF
The next generation of power-system operations: modeling and optimization innovations to mitigate renewable uncertainty
PDF
Stochastic and multiscale models for urban and natural ecology
PDF
Topics in algorithms for new classes of non-cooperative games
PDF
Computational validation of stochastic programming models and applications
Asset Metadata
Creator
Chen, Kai
(author)
Core Title
Variants of stochastic knapsack problems
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Industrial and Systems Engineering
Publication Date
11/19/2013
Defense Date
10/24/2013
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
dynamic programming,knapsack,OAI-PMH Harvest,stochastic
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Ross, Sheldon M. (
committee chair
), Randhawa, Ramandeep S. (
committee member
), Sen, Suvrajeet (
committee member
)
Creator Email
kaic@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-347957
Unique identifier
UC11296381
Identifier
etd-ChenKai-2161.pdf (filename),usctheses-c3-347957 (legacy record id)
Legacy Identifier
etd-ChenKai-2161.pdf
Dmrecord
347957
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Chen, Kai
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
dynamic programming
knapsack
stochastic