Close
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Elements of dynamic programming: theory and application
(USC Thesis Other)
Elements of dynamic programming: theory and application
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Elements of Dynamic Programming: Theory and Application Mengxun Yan Submitted in partial fulfillment of the requirements for the degree of Master of Science in Applied Mathematics in the University of Southern California Degree conferral date: August 8th 2017 July 6, 2017 Abstract Dynamic programming has become a common method in practice in solving optimization problem where decisions are made in stages. It is now widely used in computer science, economics, management, and many other elds. One superiority of dynamic programming is that it enables you to deal with problems with randomness. This thesis starts with basic theory, principle of dynamic programming, to give the detailed approach to solve the multi- stage decision problems. The deterministic case will be visited rst then the stochastic case will be visited. After that, applications are involved in best selling time and inventory control problems. In this two applications, we not only give the DP equations for programming, but also nd a general solution under some reasonable assumptions. Contents 1 Introduction 1 2 Dynamic programming Theorem 3 2.1 Basic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Principle of Optimality . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Example: Shortest path problem . . . . . . . . . . . . . . . . 7 2.4 Example: Binomial Option Pricing Model . . . . . . . . . . . 12 2.5 Compare deterministic and stochastic . . . . . . . . . . . . . . 15 3 Best Selling Decision 17 3.1 Problem Stating . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 A simple deterministic example . . . . . . . . . . . . . . . . . 18 3.3 Standard Form . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.4 Optimal Policy for i.i.d oers . . . . . . . . . . . . . . . . . . 23 4 Inventory Control 27 4.1 General Problem . . . . . . . . . . . . . . . . . . . . . . . . . 27 1 4.2 A Simple Example . . . . . . . . . . . . . . . . . . . . . . . . 29 4.3 Continuous unbounded state space with no xed cost . . . . . 34 4.3.1 Assumptions and Objective . . . . . . . . . . . . . . . 34 4.3.2 DP Algorithm . . . . . . . . . . . . . . . . . . . . . . . 35 4.3.3 Analysis the result of the DP solution . . . . . . . . . . 35 4.4 Continuous unbounded state space with nonzero xed cost . . 40 4.4.1 DP algorithm . . . . . . . . . . . . . . . . . . . . . . . 41 4.4.2 K-convexity and its properties . . . . . . . . . . . . . . 44 4.4.3 K-convexity of J(x) . . . . . . . . . . . . . . . . . . . . 49 4.4.4 K-convexity and continuity of G k and J k . . . . . . . . 52 A MATLAB Code for deterministic example in asset pricing 53 B MATLAB Code for stochastic example in asset selling 55 C MATLAB code for the simple inventory control example 57 Bibliography 59 2 Chapter 1 Introduction Dynamic programming is a kind of optimization method concerned with problems formulated as a sequence of decisions. The term was introduced by Richard Bellman in 1950s, and the method is now used in various areas of engineering, economics, nance, management, etc. Same as other opti- mization, it deals with problems whose objective is to minimize a certain cost (or maximize certain reward) within some constraints. One of the most important feature of this method is that the problem can be broken down into multi-stages, and we can make dierent decisions in each stage. When we are trying to make decision in a certain stage, we need to consider not only the present cost, but also the cost involved in the future if you make the decision now. The cost in one stage under certain decision is dened as the sum of current cost and the future cost, so we need to make a balance between these two values (Bertsekas, 2005). 1 Dynamic programming is also a good approach in solving deterministic case, where the state and decision uniquely determine the outcome. Moreover, it is more popular to solve multi-stage problems involving randomness. The outcome of decisions is not sure due to some disturbance. However, the sys- tem will evolve in some manner so the next state can be anticipated to some extent (e.g. the expected value is available). In this case, we need to balance the current cost and the expected future cost. For multi-stage decision problems, there are three ways to categorize them. Deterministic problem vs. Stochastic problem nite state space vs. innite state space Finite time horizon vs. innite time horizon This thesis only focus on nite time horizon problem. Chapter 2 gives the general dynamic programming algorithm where two simple examples are in- volved, one deterministic and one stochastic. It is the basic part and it helps to generate an intuitive feeling for dynamic programming. Then in Chapter 3 and Chapter 4, two applications, best selling time and inventory control, are detailedly analyzed. In these two applications, the discrete state space problem was analyzed rst, then the continuous state case. We not only show how dynamic programming works in these eld, but also try to nd a general solution under some assumptions. 2 Chapter 2 Dynamic programming Theorem 2.1 Basic Model Let us start with the basic model with nite state space and nite time hori- zon. A stochastic basic model is given below with a disturbance!. However, it can also be used for deterministic case simply by taking ! = 0. The basic model is built up by two parts: (1) an underlying discrete-time dynamic system and (2) a cost function that is additive over time. (1) The underlying discrete-time dynamic system shows the evolution of 3 state, and is represented by a series of functions f k x k+1 =f k (x k ;u k ;w k );k = 1;:::;N 1; where, k is the index of time, x k is the state of the system at time k, u k is the decision at time k, w k is a random disturbance at time k, which has in uence on the incurred cost and the state at next stage; N is total number of stages. (2) The cost function The cost incurred at each stage is a function of current statex k , the decision you choose u k and the random parameter w k . Denote the cost at state k by g k (x k ;u k ;w k ), cost at state N by g N (x N ) (Since at time N no decision is needed and no randomness is involved. Therefore, the total cost of the whole model is given by g N (x N ) + N1 X k=0 g k (x k ;u k ;w k ); 4 where g N (x N ) is called a terminal cost. For deterministic case, the objective is to nd proper policy to choose u k to minimize the total cost. However, for stochastic case, we cannot get exact value of total cost before we get to the terminal stage. Therefore, we try to minimize the expectation of total cost. Efg N (x N ) + N1 X k=0 g k (x k ;u k ;w k )g: Let(k) be a decision function that maps state x k into controls u k =(x k ). Denote the class of such policies by =f 0 ; 1 ;:::; N1 g: Thus, the expected total cost if we take decisions given initial state x 0 is V (x 0 ) =Efg N (x N ) + N1 X k=0 g k (x k ;(x k );w k )g: Therefore, our objective is to nd an optimal policy for eachx 0 such that J (x 0 ) =min 2 J (x 0 ): 5 2.2 Principle of Optimality The principle of Optimality is the foundation of dynamic programming as it breaks down a multi-stage optimization problems into many subproblems that is easy to deal with. The idea is very simple: an optimal policy has the property that, from any stage, whatever the previous states and decisions are, the remaining decision must be the optimal policy with regard to the state resulting from the previous decisions (Bellman, 1954). If a policy is optimal from 0 to N. Then the policy in stages P to N should also be an optimal policy for that part. Why it is true? If the remaining decisions are not optimal, then it would be better if we use the optimal policy for the sub-problem. Then the previ- ous policy is not optimal for the whole time line, which leads to contradiction. In a formal writing, according to Bertsekas, Principle of Optimality Let =f 0 ; 1 ;:::; N1 g be an optimal pol- icy for the basic problem, and assume that when using , a given state x i occurs at timei with positive probability. Consider the sub-problem whereby 6 we are at timex i at timei and wish to minimiza the "cost-to-go" from time i to time N Efg N (x N ) + N1 X k=0 g k (x k ;(x k );w k )g: Then the truncation policyf i ; i+1 ;:::; N1 g is optimal for this sub-problem. This principle gives the idea that we can solve a multi-stage problem back- wards. Starting from constructing the optimal decision in the last stage, then the optimal decision in the last two stages, and continuing in this manner we can nally nd the optimal policy for the initial stage. The next two section will detailedly show how it works by applying it into two simple example, one deterministic and the other stochastic problem. 2.3 Example: Shortest path problem The shortest path problem is a classic optimization problem discussed in dy- namic programming. Detailed analysis of this problem can be of great help because many more sophisticated problem can be converted into shortest path problem. Now we only consider the simplest deterministic form with nite states and nite stages. We will discuss the problem in greater gener- ality in later chapter. Consider the question: What is the best way to take from Los Angeles to New York (best regards to the lowest cost)? You have many choices: Los 7 Angeles - Phoenix - Atlanta - New York, Los Angeles - Phoenix - Kansas - New York, Los Angeles - Salt lake city - Chicago - New York, etc. Abstract it into a 4-stage decision problem, 8 Here time horizon is T =f0; 1; 2; 3; 4g. The coordinate of each node denote the (stage, state). For example (3,1) means the 1st state in stage 3. Here for simplicity it is assumed that the route is unidirectional, however the follow- ing analysis still works when it is bidirectional. The number on each arrow denote the cost between the two cities. Notations: Denote d(x;y;p;q) be the cost between the node (x,y) and the node (p,q); Denote V (x;y;u) be the total cost that will spend in position (x;y) if deci- sion u was taken; Denote J(x;y) be the smallest V (x;y;u) with respect to all the available u. 9 By Principle of optimality, the optimal policy for time 0 to 4 should con- tains the optimal policy from time 3 to 4. Therefore we can start with nding optimal policy from the last time period. Clearly for each states in time 3, there is only one available decision. Therefore, J(3; 1) = 9; J(3; 2) = 10; J(3; 3) = 7 Then, move one step back to stage 2. At point (2; 1), in order to nd the optimal decision for this state, we need to compare the cost for dierent decisions. u = 1. The next stage is (3,1). V (2; 1; 1) =d(2; 1; 3; 1)+J(3; 1) = 6+9 = 15; u = 2. The next stage is (3,2). V (2; 1; 2) =d(2; 1; 3; 2) +J(3; 2) = 7 + 10 = 17; u = 3. The next stage is (3,3). V (2; 1; 3) =d(2; 1; 3; 3)+J(3; 3) = 9+7 = 16; Therefore, we should choose route (2,1) ( (2; 1) = 1) as it gives the smallest cost J(2; 1) = 15. Similarly, we can get J(2; 2) = 15 with (2; 2) = 3 and J(2; 3) = 14 with (2; 3) = 1. One step back to stage 1. At point (1; 1), 10 V (1; 1; 1) =d(1; 1; 2; 1) +J(2; 1) = 12 + 15 = 27; V (1; 1; 2) =d(1; 1; 2; 2) +J(2; 2) = 9 + 15 = 24; V (1; 1; 3) =d(1; 1; 2; 3) +J(2; 3) = 6 + 14 = 20. So, J(1; 1) =V (1; 1; 3) = 20, (1; 1) = 3. Continuing in this manner, we can get J(x;y) for all (x,y), summarized in graph. The rst red number represents J(x;y) and the second number represents(x;y). Therefore the minimum cost for the whole problem is 25, and the optimal policy is to choose state number (2,2,3,1) in order. 11 2.4 Example: Binomial Option Pricing Model Now we turn to a problem involving randomness - to price American Option. The randomness lies in the uncertainty of the underlying stock price. It is a very classic problem in nance, which is the foundation for more complicated pricing of nancial derivatives. Problem Statement American put option. American put option allows the holder to sell a stock in strike price K = 110, on or before a predetermined expiration date. This means you need to make a decision exercise the option (u 1 ) or not (u 2 ). Denote the stock price at time k beS K , so the option price at timek equals max(KS k ; 0). At each step, the underlying stock price S can only move to two values, 1:2S and 0:8S, with probability 0.6 and 0.4 respectively. In the example, we treat the discrete time case, where t can only take values in (0, 1, 2). We want to know the optimal policy at t = 0 where S 0 = 100. No dis- count factor is considered now. First we can draw a tree to represent the way stock price change. The number in the nodes show stock price and the number on the arrow shows 12 the probability of that change. Solve by Dynamic Programming The problem can be seen as a dynamic programming problem with both g N andg n are 0. NowJ k (s) denote the maximum prot taken from the American option at stage k state s. Therefore, J k (x) =max(max(KS k ; 0);EfJ k+1 g) [t=2] J 2 (144) = max(KS k ; 0) = max(110 144; 0) = 0; J 2 (96) == max(110 96; 0) = 14; J 2 (64) ==max(110 64; 0) = 46. 13 [t=1] J 1 (x) = max(max(KS k ; 0);EfJ 2 g). The only dierence here is that we need to calculate the expectation of future prot because it is not certain. For x = 120, EfJ 2 g = 0:6J 3 (144) + 0:4J 3 (96) = 0 + 0:4 14 = 5:6. J 1 (120) =max(max(110 120; 0); 5:6) = 5:6. u 1 (120) =u 2 . For x = 80, EfJ 2 g = 0:6J 3 (96) + 0:4J 3 (64) = 0 + 0:4 14 = 0:6 14 + 0:4 46 = 5:6 + 46:4 = 52: J 1 (80) = max(max(110 80; 0); 52) = 52. u 1 (80) =u 2 . [t=0] EfJ 1 g = 0:6 5:6 + 0:4 52 = 3:36 + 20:8 = 24:16.J 0 (100) = max(max(110 100; 0); 24:16) = 24:16. u 0 (100) =u 2 . Therefore, we can get the optimal policy for each state, together with the ex- pected prot. This method is often used to price the american option simply by setting price equal expected prot. The method also works in the case where the underlying stock take more than two values at each step, as long as you know the distribution. 14 2.5 Compare deterministic and stochastic Open-loop and closed loop Open-loop control system is a kind of control that all decisions are made at the beginning of the system (i.e. time 0). Closed-loop control system, also known as feedback control system, means the system use the information generated before a decision is needed. For example, at time k, in order to nd a optimal decision k (x k ), the system use all the information up to time k, which includesfx 0 ;x 1 ;:::x k g andfu 0 ;u 1 ;:::u k g. Compare the previous example. After solving the deterministic shortest path problem, we are sure we need to take the route (2; 2; 3; 1). So in this case both open-loop and closed-loop control can be used, and they will generate exactly the same answer. However, for the stochastic option problem, we are not sure with state we will be at the beginning. The decision depends on the state x k , so only closed-loop control can give us the optimal choice. Forward and Backward DP Algorithm By the principle of optimality, we do the dynamic programming backwards. However, in viewing the shortest path problem, the shorest path from Los Angeles to New York is the same of that from Neew York to Los Angeles. So 15 in this deterministic case, it is also probable to do the dynamic programming forwards starting by ~ J(1; 1) = 5 and ~ J(1; 2) = 8. Althogh forward method works well in this example, we cannot apply it into the stochastic option pricing example. We have no idea what the value ~ J 0 (100) is so we cannot start DP algorithm. It is because of the disturbance in future involved. 16 Chapter 3 Best Selling Decision 3.1 Problem Stating A person have an asset to sell. The whole time line is from 0 to N. He receives oers ! 0 , ! 1 ... ! N1 at time 0, 1, 2 ... ,N-1 separately (Oers are independent). And he should make decision u i at time i that whether to accept the oer! i or not (i=0, 1, 2, ..., N-1). Once accept an oer! i , he can invest the money ! i at interest rate r> 0. If he rejects the oer, he need to wait until next period to look at oer ! i+1 . The last oer must be accepted if all priors have been rejected. The objective is to maximize the revenue at time N. 17 3.2 A simple deterministic example Let us start from a simple example: T = 4 and there are 4 oers. ! 0 = 4, ! 1 = 6, ! 2 = 4, ! 3 = 6. Assume r = 0:1. Common Sense If accept ! 0 = 4, one can get 4 1:1 3 = 5:3240 at time 4. If accept ! 1 = 6, one can get 6 1:1 2 = 7:2600 at time 4. If accept ! 2 = 4, one can get 4 1:1 1 = 4:4000 at time 4. If accept ! 3 = 6, one can get 6 1:1 0 = 6 at time 4. Clearly, observing this, one would accept ! 1 = 6 at time 2 to get 7:26 at time 4. However, for preparation of stochastic case, let's think of this prob- lem backward and use the principle of optimality. backward thinking When you are at time 3, you get two information: whether you still hold the asset and the price of ! 2 = 4. If you still hold the you would like to compare the revenue of accepting ! 2 with the maximum (expected) revenue you can get if you wait. In this case, you compare 4.4000 (accept) with 6 (not accept), and you will not accept ! 2 . One step backward, at time 2, if you have not sold the asset, you will com- 18 pare the revenue of accepting ! 1 with the maximum (expected) revenue you can get if you wait. i.e. compare 7.9860(accept) with 6.6(not accept). So you will choose to accept the oer ! 1 . Notice: the maximum (expected) revenue you can get if you wait is the answer you get from previous step (at time 2). At time 1, if you have not sold the asset, you will compare 5.8465 (accept) with 7.9860 (not accept) and you will not accept the oer ! 0 . To summarize, one would accept the oer at time 2 with price ! 1 , which gives same result as common sense. The process is similar to the Binomial option model discussed in Chapter 2. 3.3 Standard Form Let us state the best selling problem in the framework of dynamic program- ming. Control space: fu 1 ;u 2 g where u 1 represents 'sell' and u 2 represents 'not sell'. Sell decisionu k is made at timek with respect tox k , so it is the decision of accepting w k1 or not. 19 State space: fx k g where x k 2R[fTg x k+1 = 8 > < > : T if u k =u 1 or x k =T ! k otherwise x 0 = 0 k = 0; 1;:::;N 1. Reward function: If accept! 0 (=x 1 ) at time 1, one can get! 0 (1 +r) N1 =x 1 (1 +r) N1 at time N. x 2 ;x 3 ;x 4 ;:::;x N =T . If accept! 1 (=x 2 ), one can get! 1 (1+r) N2 =x 2 (1+r) N2 at time N.x 3 ;x 4 ;x 5 ;:::;x N = T . . . If accept ! N2 (=x N1 ), one can get ! N2 (1 +r) = x N1 (1 +r) at time N. x N =T . If accept ! N1 (=x N ) at time N, one can get ! N1 = x N at time N. x k 6= T 8k. The reward function is Efg N (x N ) + N1 X k=0 g k (x k ;u k ;w k )g: 20 where g N (x N ) = 8 > < > : x N if x N 6=T 0 otherwise and g k (x k ;u k ;w k ) = 8 > < > : (1 +r) Nk x k if x k 6=T and u k =u 1 0 otherwise Optimal policy by DP algorithm Now we can write the DP algorithm. J N (x N ) = 8 > < > : x N if x N 6=T 0 if x N =T At time N 1, if x N1 6= T , we need to compare the revenue if sell today (at price x N1 ) with the expected future revenue EfJ N (! N1 )g, and then choose the larger one. When we nd (1 +r)x N1 > EfJ N (! N1 )g, we will sell it (u N1 =u 1 ). When we nd (1 +r)x N1 <EfJ N (! N1 )g, we will not sell it (u N1 =u 2 ). If x N1 =T , there is no extra revenue since this time. No decision is made because you already sold the asset. Thus, J N1 = 8 > < > : max[(1 +r)x N1 ;EfJ N (! N1 )g] if x N1 6=T 0 if x N1 =T 21 and N1 (x N1 ) = 8 > < > : u 1 if x N1 6=T and (1 +r)x N1 >EfJ N (! N1 )g u 2 if x N1 6=T and (1 +r)x N1 <EfJ N (! N1 )g = 8 > < > : u 1 if x N1 6=T and x N1 > EfJ N (! N1 )g (1+r) u 2 if x N1 6=T and x N1 < EfJ N (! N1 )g (1+r) . . . Then we can get the maximized expected revenue and the corresponding optimal policy for each k, J k (x k ) = 8 > < > : max[(1 +r) Nk x k ;EfJ k+1 (! k )g] if x k 6=T 0 if x k =T k (x k ) = 8 > < > : u 1 if x k 6=T and x k > EfJ k+1 (! k )g (1+r) Nk u 2 if x k 6=T and x k < EfJ k+1 (! k )g (1+r) Nk = 8 > < > : u 1 if x k 6=T and x k > k u 2 if x k 6=T and x k < k where k = EfJ k+1 (! k )g (1 +r) Nk : 22 We can represent the whole process in a graph similar to shortest path prob- lem. Here is the graph for the simple example. The red number under the node is the corresponding maximum expected revenue J. The blue array shows the best 'route' for this problem - to sell at time 2. 3.4 Optimal Policy for i.i.d oers Example In this example, time horizon T=4. The oers w k are i.i.d. Each w can take value 6,8,9 with probability 0.3, 0.6, 0.1 respectively. Using the DP 23 algorithm, we can get the optimal decision for each state together with the expected revenue. 24 The outcome of programming Decreasing Bound Under the assumption that all oers w k are i.i.d. (= w), we can generate good properties for the optimal policy. The most important one is the de- creasing of bound. We follow the proof by Bertsekas, 2005. For simple notation, if x k 6=T , let V k (x k ) = J k (x k ) (1 +r) Nk = 8 > < > : x N if k =N max[x k ; (1 +r) 1 EfV k+1 (w)g] if k = 0; 1;:::;N 1 25 Then we can re-write k as k = EfV k+1 (!)g (1 +r) : So showing k decreasing is the same as showing V k (w) decreasing. Known V N (x) =x and V N1 (x) =max[x; (1 +r) 1 EfV N (w)g] Clearly V N1 (x)>V N (x). Then compare V N1 (x) =max[x; (1 +r) 1 EfV N (w)g] and V N2 (x) =max[x; (1 +r) 1 EfV N1 (w)g]. Using the fact V N1 (x)>V N (x), one can get V N2 (x)>V N1 (x). Continuing in the same manner, V k (x)V k+1 (x) for all x 0 and all k: Thus we can get k k+1 . 26 Chapter 4 Inventory Control In this chapter we apply dynamic programming in inventory control prob- lems. Section 1 gives general problem discussed here as well as some assump- tions. Section 2 is a simple example with nite state space. Scetion 3 and 4 analyze continuous case by solving dynamic programming equations, which leads to some good policy under some assumptions. 4.1 General Problem Suppose you are a manager of a company and you need to decide how many quantities of an item should be ordered at the beginning each time period. The market demand are random numbers with known distribution. So you are supposed to minimize the expected total cost with respect to the uncer- tainties. 27 x k stock available at the beginning of the kth period, u k stock ordered (and immediately delivered) at the beginning of thekth period,w k demand during the kth period with given probability distribution. Assumptions (1) w 1 ;w 2 ;:::;w N1 are independent random variables, (2) Excess demand is backlogged and lled as soon as addition inventory becomes available. i.e. x k+1 =x k +u k w k : (3) Total cost = R(x N ) + N1 X k=0 (r(x k ) +cu k ): where R(x N ) is the terminal cost at the end of N periods. r(x k ) is the penalty for stockx k (holding cost for excess inventory or shortage cost for unlled demand). c is the cost per unit; cu k is the cost for purchasing at period k. (4)u k 0: 28 Objective: Find proper method to choose u k minimize the expected cost EfR(x N ) + N1 X k=0 (r(x k ) +cu k )g: 4.2 A Simple Example We start with a simple example. Set upper bound equals 2 units and lower bound equals 0 unit for stock. The demandw k can only take value 0, 1, 2 with probability 0.3, 0.6, 0.1 respectively. Moreover, we only consider nonnegative integers x k and u k (i.e. 0, 1, 2). The planning horizon N = 3. Value assigned Terminal cost R(x N ) is assumed to be 0. Cost per unit order c=2. Cost per redundant inventory h=1. Cost per re- dundant demand p=3. r(x k ) = 3Efmax(0;w k x k u k )g +Efmax(0;x k +u k w k )g Initial stock x 0 = 0. 29 So J k (x k ) =minf2u k + 3Efmax(0;w k x k u k )g +Efmax(0;x k +u k w k )g +EJ k+1 (x k +u k w k )g =minEf2u k + 3max(0;w k x k u k ) +max(0;x k +u k w k ) +J k+1 (x k +u k w k )g Then our goal is to nd optimal policy u k (x k ) to minimizeJ k (x k ) for each k By DP algorithm, start from time 3, J 3 (x 3 ) = 0: Time 2: (Calculate J 2 (0), J 2 (1), J 2 (2) respectively) x 2 = 0 J 2 (0) =minEf2u k + 3max(0;w k x k u k ) +max(0;x k +u k w k )g =minEf2u k + 3max(0;w k u k ) +max(0;u k w k )g Let V 2 (0) =Ef2u k + 3max(0;w k u k ) +max(0;u k w k )g Compute V 2 (0) respect to u 2 (0) =0, 1, or 2. u 2 (0) = 0;V 2 (0) = 2:4; 30 u 2 (0) = 1;V 2 (0) = 2:6; u 2 (0) = 2;V 2 (0) = 5:2: Compare the three values, we choose u 2 (0) = 0, J 2 (0) = 2:4. x 2 = 1 J 2 (1) =minEf2u k + 3max(0;w k x k u k ) +max(0;x k +u k w k )g =minEf2u k + 3max(0;w k 1u k ) +max(0; 1 +u k w k )g V 2 (1) =minEf2u k + 3max(0;w k 1u k ) +max(0; 1 +u k w k )g Compute V 2 (1) respect to u 2 (0) =0, 1(since x k +u k =x k+1 2). u 2 (1) = 0;V 2 (1) = 0:6; u 2 (1) = 1;V 2 (1) = 3:2: Compare the two values, we choose u 2 (1) = 0, J 2 (1) = 0:6. Similarly, Compute V 2 (2) respect to u 2 (2) = 0. u 2 (2) = 0 is the only choice and J 2 (2) = 1:2. Time 1: (Calculate J 1 (0), J 1 (1), J 1 (2) respectively) 31 x 2 = 0 J 1 (0) =minEf2u k + 3max(0;w k x k u k ) +max(0;x k +u k w k ) +J 2 (x 2 )g =minEf2u k + 3max(0;w k x k u k ) +max(0;x k +u k w k ) +J 2 (x 1 +u 1 w 1 )g =minEf2u k + 3max(0;w k u k ) +max(0;u k w k ) +J 2 (u 1 w 1 )g V 1 (0) =Ef2u k + 3max(0;w k u k ) +max(0;u k w k ) +J 2 (u 1 w 1 )g Compute V 1 (0) respect to u 1 (0) =0, 1, or 2. u 1 (0) = 0;V 1 (0) = 4:8; u 1 (0) = 1;V 1 (0) = 4:46; u 1 (0) = 2;V 1 (0) = 6:16: Compare the three values, we choose u 1 = 1, J 1 (0) = 4:46. x 2 = 1 J 1 (1) =minEf(1 +u 1 w 1 ) 2 +u 1 +J 3 (1 +u 1 w 1 )g minimize respect to u 1 = 0 or 1, get u 1 (1) = 0;J 1 (1) = 2:46. 32 x 1 = 2, the only choice is u 1 (2) = 0 and J 1 (2) = 2:16. In the same way, We can getJ 0 (0);J 0 (1);J 0 (2) and the correspondingu 0 (0);u 0 (1);u 0 (2) for Time 0. The result from MATLAB programming: 33 4.3 Continuous unbounded state space with no xed cost Now we study more general case when x,u are continuous and unbounded. x k+1 =x k +u k w k 4.3.1 Assumptions and Objective For a proper result, three assumptions are made The demand w k for each time k are bounded and independent The expected cost for each stage isJ k (x k ) =min[cu k +pEfmax(0;w k x k u k )g +hEfmax(0;x k +u k w k g. (Economic interpretation: c represents cost per unit stock if ordered (note: there is no xed cost here); h represents penalty for redundant inventory (storage cost) per unit stock; for the redundant demand, we lost some prot because we do not have enough inventory. For this part of lost, we let p denote the lost per redundant demand. c> 0;h 0;p>c,p;h 0. Why p>c? -If pc, it would never be optimal to buy new stock at the last period and possibly in earlier periods (Bertsekas, 2005). Final cost is 0. 34 Objective: to minimize the total expected cost Ef N1 X k=0 (cu k +pEfmax(0;w k x k u k )g +hEfmax(0;x k +u k w k gg: 4.3.2 DP Algorithm Applying the DP algotithm, J N (x N ) = 0 J k (x k ) =min[cu k +pEfmax(0;w k x k u k )g +hEfmax(0;x k +u k w k )g +EfJ k+1 (x k +u k w k )g]: Solving backwards, we can get best decision u k (x k ) and minimum expected cost J k (x k ) for each stage x k at time k. Moreover, we want to do more analysis to get a general form of the solution. The outline of the next two sections is from Bertsekas, 2005. 4.3.3 Analysis the result of the DP solution To simplify the notation, we assume that all demands w k are independent and identically distributedw. however the following analysis remains true even without this assumption. 35 Denote r(x) =pmax(0;x) +hmax(0;x) H(y) =Efr(yw k )g =pEfmax(0;wy)g +hEfmax(0;yw)g Then, J k (x k ) =min u k [cu k +H(x k +u k ) +EfJ k+1 (x k +u k w)g] Let y k =x k +u k , then y k x k since u k 0. J k (x k ) =min y k x k [c(y k x k ) +H(y k ) +EfJ k+1 (y k wg] =min y k x k [cy k +H(y k ) +EfJ k+1 (y k w)g]cx k Let G k (y) =cy +H(y) +EfJ k+1 (yw)g, J k (x k ) =min y k x k G k (y k )cx k : We try to get the optimal policy in 5 steps. Step 1: Show function H is convex. Pf: max(0;x) and max(0;x) are both convex function, max(0; x 1 +x 2 2 ) max(0;x 1 ) +max(0;x 2 ) 2 36 max(0; x 1 x 2 2 ) max(0;x 1 ) +max(0;x 2 ) 2 : Since p and h are both positive, pmax(0; x 1 x 2 2 ) +max(0; x 1 +x 2 2 ) max(0;x 1 ) +max(0;x 2 ) 2 + max(0;x 1 ) +max(0;x 2 ) 2 ; and taking expectation preserves convexity, so function H is convex. Step 2: get minimum of G N1 (y). J N = 0 is convex. ConsiderG N1 (y) =cy +H(y) +EfJ N (yw)g =cy +H(y). It is a convex function because both cy and H(y) are convex. Now we want to show it !1 asjyj!1 so as to get a unconstrained minimum. Observe that H 0 (y) ! p as y ! 1 and ! h as y ! 1. Then G 0 N1 (y) = c +H 0 (y)! cp < 0 (By assumption 3) as y !1 and !c +h> 0 as y!1. Thus G N1 (y) has an unconstrained minimum point, denoted by S N1 = argmin y2R G N1 (y). Then under the constraint y N1 x N1 , the minimum point S 0 N1 =argmin yx N1 G N1 (y)is obtained by 37 S 0 N1 = 8 > < > : S N1 if x N1 <S N1 x N1 if x N1 S N1 which can be easily seen by the graph. Step 3: Optimal policy at time N 1. Now we try to nd the minimum ofS N1 with respect tou N1 . Taking back from the system equation, u k =S k x k , one can obtain, N1 (x N1 ) = 8 > < > : S N1 x N1 if x N1 <S N1 0 if x N1 S N1 38 Step 4: Convexity of J N1 . J N1 (x N1 ) =min yx N1 [cy +H(y)]cx N1 = 8 > < > : cS N1 +H(S N1 )cx N1 if x N1 <S N1 H(x N1 ) if x N1 S N1 = 8 > < > : c(S N1 x N1 ) +H(S N1 ) if x N1 <S N1 H(x N1 ) if x N1 S N1 Here cS N1 +H(S N1 ) is a constant number. So the total function can be viewed as a line with negative slope combined with a convex function. Sincecx N1 !1 as x N1 !1 and H(x N1 )!1 as x N1 !1, lim jx N1 j!1 J N1 (x N1 ) =1. A general gure is given below. Clearly J N1 is also a convex function. 39 Step 5: Optimal policy for all k. We would like to nish the optimal policy by mathematical induction. Suppose J k+1 is convex, lim jyj!1 J k+1 (y) =1, and lim jyj!1 G k (y) =1, then by the same argument of previous steps, one can get, J k (x k ) = 8 > < > : cS k +H(S k )cx k +EfJ k+1 (S k w k )g if x k <S k H(x k ) +EfJ k+1 (x k w k )g if x k S k where S k is an unconstrained minimum of G k . Then, it is easy to see thatJ k is convex,lim jyj!1 J k (y) =1 andlim jyj!1 G k1 (y) = 1. Thus by induction we can show that k =y k x k = 8 > < > : S k x k if x k <S k 0 if x k S k 4.4 Continuous unbounded state space with nonzero xed cost In this section, we analysis a similar problem to last section. The only dierence is that a xed cost K > 0 is included for every positive order. 40 It will lead us to a optimal policy called (s;S) policy, k = 8 > < > : S k x k if x k <s k 0 if x k s k 4.4.1 DP algorithm Now the DP algorithm becomes J N (x N ) = 0 J k (x k ) = 8 > < > : min[K +cu k +H(x k +u k ) +EfJ k+1 (x k +u k w k )g] if u k > 0 min[0 +H(x k + 0) +EfJ k+1 (x k + 0w k )g] if u k = 0 =minfH(x k )EfJ k+1 (x k w k )g;min u k >0 [K +cu k +H(x k +u k ) +EfJ k+1 (x k +u k w k )g]g As before, assume all w follow the same distribution, and dene H(y) =Efr(yw)g =pEfmax(0;wy)g +hEfmax(0;yw)g G k (y) =cy +H(y) +EfJ k+1 (yw)g y k =x k +u k 41 Then, J k =minfG k (x k )cx k ;min u k >0 [K +G k (x k +u k )cx k ]g =minfG k (x k );min u k >0 [K +G k (x k +u k )]gcx k =minfG k (x k );min y k x k [K +G k (y k )]gcx k We start dynamic programming by analyzing J N1 (y) =minfG N1 (x N1 );min y N1 x N1 [K +G N1 (y N1 )]gcx N1 : G N1 (y) =cy +H(y) is a convex function since it is sum of two convex func- tions. Denote S N1 = argmin y2R G N1 (y) and denote s N1 be the smallest value of y for which G N1 (y N1 ) = K +G N1 (S N1 ). Looking into three cases by graph, 42 In cases (1) and (2), where x k s k in both cases, the minimum of J n1 is attained wheny N1 =x N1 . So optimizing policyu N1 =y N1 x N1 = 0: In cases (3), where x k < s k , the minimum of J n1 is attained when y N1 = S N1 . So optimizing policy u N1 =y N1 x N1 =S N1 x N1 : 43 To sum up, N1 =y N1 x N1 = 8 > < > : S N1 x N1 if x N1 <s N1 0 if x N1 s N1 IfG k (x k ) are convex for eachx k , we could continue this process and it would be easy to get a good policy. Denote S k = argmin y2R G k (y) and denote s k be the smallest value of y for which G y (y) =K +G y (S y ). We could get, k =y k x k = 8 > < > : S k x k if x k <s k 0 if x k s k which is the (s;S) policy. In most cases, G k may not be convex so we cannot proceed like the zero exed cost case. However, according to Scarf, G k has a property called K- convexity. This property can help us get (s;S) policy for the problem even G k is not convex for all k. 4.4.2 K-convexity and its properties Denition: LetK 0, and letg be a a real-valued dierentiable function. Then function g is called Kconvex if 44 K +g(z +y)g(y) +zg 0 (y), for all z 0;b> 0;y: If dierentiability is not assumed, then the appropriate denition of K- convexity would be K +g(z +y)g(y) +z( g(y)g(yb) b ) Denition 1 implies denition 2, and denition 2 implies denition 1 if g is dierentiable. There are some simple properties of K-convexity that is of use to our problem. (1) 0-convexity is equivalent to ordinary convexity. A real-valued convex function g is also K-convex for all K 0 (2) If g(y) is K-convex, then g(y +h) is K-convex for all h. (3) If f and g are K-convex and M-convex, respectively, then f +g is K +M-convex for all > 0 and > 0. (4) If g(y) is K-convex and ! is a random variable, and Efg(y!)g <1, then Efg(y!)g is also K-convex. We now deduce properties of a continuous K-convex function g with g(y)!1 asjyj!1. Obviously, for this function g, there must exist a global minimum point, denoted byS =argmin y2R g. Sog(S)g(y), for all scalars y (5). 45 Let s denote the smallest value of z such that zS and g(S) +K =g(z). So g(S) +K =g(s). By the denition of K-convexity, for8y<s, K +g(S)g(s) + Ss sy (g(s)g(y)): Plug in g(S) +K, g(s)g(s) + Ss sy (g(s)g(y)): Since Ss > 0 and sy > 0 we can get g(s)g(y) 0 for all y < s. Moreover, the equality cannot hold since y < s and s is dened to be the smallest value of z such that and g(S) +K = g(z). So we conclude that g(s)g(y)< 0 for all y<s (6) for y 1 <y 2 <s, by denition, K +g(S)g(y 2 ) + Sy 2 y 2 y 1 (g(y 2 )g(y 1 )): Also since y 2 <s, g(y 2 )>g(S) +K. Adding the two inequalities, 0> Sy 2 y 2 y 1 (g(y 2 )g(y 1 )) Sy 2 y 2 y 1 > 0 so g(y 1 )>g(y 2 ) and g is a decreasing function on (1;s) (7) 46 Claim: g(y)g(z) +K8y;z with syz. (8) Proof: (1)If y =s, clearly g(s)g(S) +Kg(z) +K by denition of s. (2) If s<y<S, either y<z <S or S <z, we have g(s) =K +g(S)g(y) + Sy ys (g(y)g(s)) i.e. (1 + Sy ys )g(s) (1 + Sy ys )g(y); So g(s)g(y) (3)If y =S, g(S)g(z)g(z) +K. (4) If S <y<z, then, K +g(z)g(y) + zy yS (g(y)g(S)) g(y)g(S)> 0, zy> 0 but yS < 0, so K +g(z)g(y): (5) If s<y<S, g(s) =K +g(S)g(y) + Sy ys (g(y)g(s)) 47 i.e. (1 + Sy ys )g(s) (1 + Sy ys )g(y); So g(s)g(y) Also, because S is the minimum of g, g(z) +Kg(S) +K =g(s): Therefore, g(z) +Kg(y). Claim (8) proved. Combine these 4 properties, we can conclude the minimum of g(y) is at- tained at y = S if x < s and y = x if x s. and the corresponding minimum of J(x) =minfg(x);min y>x [K +g(y)]gcx is J(x) = 8 > < > : K +g(S)cx if x<s g(x)cx if xs Moreover, if G k (x k ) are all K-convex for each k, together with G k (y)!1 asjyj!1 for all k, the optimal policy for each stage is the (s;S) policy, k =y k x k = 8 > < > : S k x k if x k <s k 0 if x k s k 48 4.4.3 K-convexity of J(x) It remains to be shown thatG k (x k ) is convex for eachk and thatG k (y)!1 asjyj!1 to get the (s;S) policy. k = 8 > < > : S k x k if x k <s k 0 if x k s k The most important part is to show that if function g is K-convex, then the functionJ dened above is also K-convex. Then we can use this property to deduce convexity of J N1 , G N2 , J N2 , G N3 and so on. Claim: If g(y) is a K-convex function, then function J(x) =minfg(x);min y>x [K +g(y)]gcx is also K-convex. Recall J(x) = 8 > < > : K +g(S)cx if x<s g(x)cx if xs To prove K-convexity, we want to show K +J(y +z)J(y) +z( J(y)J(yb) b ): 8z 0;b> 0: We prove it by distinguish it into 4 cases. 49 Case (1). ys and ybs In this region of values of z,b, andy, the functionJ N1 equals tog(x)cx, which is the sum of two K-convex functions. So it is K-convex according to the property (3). Case (2). ys but yb<s (ys<b) In this case two sub-cases are considered (i) g(y)g(s). By denition of K-convexity, K +g(y +z)g(y) +z( g(y)g(s) ys ) g(y) +z( g(y)G N1 (s) b ) What we want to show is K +J(y +z)J(y) +z( J(y)J(yb) b ) The part between yb and s is K +g(S)cx; The part between s and y is g(x)cx. So we can rewrite in this way, K +g(y +z)c(y +z)g(y)cy +z( g(y)cyg(s) +c(yb) b ) i.e. K +g(y +z)g(y) +z( g(y)g(s) b ); 50 which is proved above. (ii) g(y)<g(s). Since S minimize function g, g(y +z)g(S), K +g(y +z)K +g(S) =g(s) denition of s >g(y) assumption g(y) +z( g(y)g(s) b ) adding negative value yields same inequality as in g(y)g(s), which leads to K-convexity of J. Case (3) If y y +z s N1 , the function J N1 = K +g(S)cx is linear and hence K-convex. Case (4) If y<s<y +z, then we re-write the denition as K +g(y +z)c(y +z)g(s)cy +z( g(s)cyg(s) +c(yb) b ) i.e. K +g(y +z)g(s); which holds by denition of s. 51 Therefore, the K-convexity of J is proved. 4.4.4 K-convexity and continuity of G k and J k Now by using the fact that function G N1 is convex (thus K-convex), con- tinuous and G N1 (y)!1 asjyj!1, we can get K-convexity of J N1 . In addition, J N1 can be seen to be continuous. Then by property (4) and (3), G N2 (y) = cy +H(y) +EfJ N1 (yw)g is also K-convexity. Since ! N2 is bounded, G N2 is continuous and G N2 (y)!1 asjyj!1. By the analysis for J N1 , we can obtain the K-convexity of J N2 . Repeating the procedure, we can obtain K-convexity and continuity of the functionsG k and J k for all k, together with G k (y)!1 asjyj!1 for all k. Then the optimal policy is the (s;S) policy by the argument above. 52 Appendix A MATLAB Code for deterministic example in asset pricing w= [ 0 , 4 , 6 , 4 , 6 ] ; % w(k)=w fk1g T=4; %time horizon J (T+1 ,2)= w(T+1) ; %set terminal revenue function for i =1:T+1 J ( i , 1 ) =0; %all states T have revenue J=0 end for k= T:1:1 J (k , 2 )= max(w( k )1.1^(5k ) , J ( k+1 ,2) ) ; %backward DP 53 clear C end J J2=J ( : , 2 ) ' ; maxValPos=find ( J2 == max( J2 ) ) ; %find index of the maximum J S e l l i n g t i m e= max( maxValPos )1 54 Appendix B MATLAB Code for stochastic example in asset selling clear w= [ 6 , 8 , 9 ] ; %possible value of w p = [ 0 . 3 , 0 . 6 , 0 . 1 ] ; %probability distribution of w T=4; %time horizon r =0.1 EJ= wp ' for j =2:length (w)+1 J (T+1, j )= w( j1) ; %set terminal revenue function end for i =1:T+1 55 J ( i , 1 ) =0; %all states T have revenue J=0 end for k= T:1:1 for x=2:length (w)+1 V(x1)=J ( k+1,x ) end for x=2:length (w)+1 J (k , x )= max(w(x1)(1+ r ) ^(5k ) , Vp ' ) ; %backward DP i f J (k , x )== w(x1)(1+ r ) ^(5k ) ; %find optimal policy u(k , x ) =1; else u(k , x ) =2; end end end J u 56 Appendix C MATLAB code for the simple inventory control example clear N=2 %maximum number of possible demand p=[0.3 0.6 0 . 1 ] ; %distribution of demand for i =1:N+1 J (4 , i ) =0; %final cost equals 0. end for k=3:1:1 for i =1:N+1 for u=1:N+2i 57 for j =1:N+1 A( j )= max(0 , i1+u1(j1))+3max(0 , j1(i1)(u1))+ J ( k+1,max(1 , i1+u1(j1)+1) ) ; % create a matrix for easy computing each V. end C(u) =2(u1)+pA' ; [ minval , index ] = min(C) ; %find the corresonding index for minimum cost J (k , i )=minval ; mu(k , i )=index1; end clear C end end J mu 58 Bibliography [1] Bellman, R. (1954). The theory of dynamic programming. The Rand corporation. [2] Bertsekas, D. P. (2005). Dynamic programming and optimal control. Athena Scientic. [3] Bertsekas, D. P. (1987). Dynamic programming. Prentice-Hall. [4] Inventory Analysis and Management. Multi-Period Stochastic Mod- els: Optimality of (s,S) Policy for K-Convex Objective Func- tions. Retrieved from http://faculty.ndhu.edu.tw/ ywan/courses/Inven- tory/notes/L8 multiple periods stochastic models s S.pdf [5] Lecture 25: Dynamic Programming: Matlab Code Retrieved from http://sail.usc.edu/ lgoldste/Ling285/Slides/Lect25 handout.pdf 59
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Equilibrium model of limit order book and optimal execution problem
PDF
Conditional mean-fields stochastic differential equation and their application
PDF
Dynamic approaches for some time inconsistent problems
PDF
Optimal and exact control of evolution equations
PDF
Application of statistical learning on breast cancer dataset
PDF
Asset price dynamics simulation and trading strategy
PDF
Supervised learning algorithms on factors impacting retweet
PDF
Controlled McKean-Vlasov equations and related topics
PDF
Improvement of binomial trees model and Black-Scholes model in option pricing
PDF
Finding technical trading rules in high-frequency data by using genetic programming
PDF
On the simple and jump-adapted weak Euler schemes for Lévy driven SDEs
PDF
Some topics on continuous time principal-agent problem
PDF
Recurrent neural networks with tunable activation functions to solve Sylvester equation
PDF
The application of machine learning in stock market
PDF
Set values for mean field games and set valued PDEs
PDF
Optimal dividend and investment problems under Sparre Andersen model
PDF
Interval arithmetic and an application in finance
PDF
Credit risk of a leveraged firm in a controlled optimal stopping framework
PDF
Dynamic programming-based algorithms and heuristics for routing problems
PDF
The spread of an epidemic on a dynamically evolving network
Asset Metadata
Creator
Yan, Mengxun
(author)
Core Title
Elements of dynamic programming: theory and application
School
College of Letters, Arts and Sciences
Degree
Master of Science
Degree Program
Applied Mathematics
Publication Date
07/19/2017
Defense Date
07/15/2017
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
best selling time,dynamic programming,inventory control,OAI-PMH Harvest,stochastic optimization
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Lototsky, Sergey (
committee chair
), Mancera, Ricardo (
committee member
), Zhang, Jianfeng (
committee member
)
Creator Email
mengxuny@usc.edu,mengxunyan@outlook.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-402287
Unique identifier
UC11265783
Identifier
etd-YanMengxun-5542.pdf (filename),usctheses-c40-402287 (legacy record id)
Legacy Identifier
etd-YanMengxun-5542.pdf
Dmrecord
402287
Document Type
Thesis
Rights
Yan, Mengxun
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
best selling time
dynamic programming
inventory control
stochastic optimization