Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Bayesian optimal stopping problems with partial information
(USC Thesis Other)
Bayesian optimal stopping problems with partial information
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
BAYESIAN OPTIMAL STOPPING PROBLEMS WITH PARTIAL INFORMATION by Yen-Ming Lee A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (INDUSTRIAL AND SYSTEMS ENGINEERING) December 2010 Copyright 2010 Yen-Ming Lee ii Acknowledgments I would like to express the deepest appreciation to my advisor and dissertation chair, Dr. Sheldon Ross, for his excellent guidance and patience to help me develop an understanding of the subject. I am also grateful for all the support and advice I have received from Dr. Maged Dessouky and Dr. James Moore during my doctoral study. I would also like to thank the Daniel J. Epstein Department of Industrial and Systems Engineering and the Center for Risk and Economic Analysis of Terrorism Events (CREATE), who have provided me nancial support through teaching and research assistantship. I am grateful to my good friends in Los Angeles and in Taiwan. They helped and accompanied me emotionally and physically in both good times and bad times. In addition, I oer my regards and blessings to all of those who supported me during the completion of my dissertation. My family has been a great support for me throughout these years. I would like to dedicate this dissertation to the memory of my beloved father Hua-Long Lee, who has always been a role model to me in his ambitious and persistent personality. I also want to thank my mother, Hui-Jen Liu, for her enduring love, Yen-Hsiu Lee and Yen-Hsien Lee for being caring sisters. Last, thanks especially go to Levent Kocaga, who has encouraged and supported me in any respect of my life. iii Table of Contents Acknowledgments ii List of Tables v Abstract vii Chapter 1: Introduction 1 1.1 Asset-Selling Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Burglar Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Organization of the Study . . . . . . . . . . . . . . . . . . . . . . . . 7 Chapter 2: Literature Review 8 Chapter 3: Asset-Selling Problem without recall 12 3.1 Classical Selling Problem . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2 Bayesian Selling Problem without Recall - Innite Stage . . . . . . . 14 3.2.1 Structure of Optimal Policy . . . . . . . . . . . . . . . . . . . 15 3.2.2 Monotonicity of the Value Function . . . . . . . . . . . . . . . 20 3.2.3 Heuristic Methods . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2.4 Upper Bound of Optimal Expected Return . . . . . . . . . . . 28 3.2.5 Numerical Study . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.3 Bayesian Selling Problem without Recall - Finite Stage . . . . . . . . 41 3.3.1 Heuristic Methods . . . . . . . . . . . . . . . . . . . . . . . . 42 3.3.2 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . 43 3.4 Bayesian Selling Problem without Recall - Generalized n-Distribution Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.5 eBay Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.5.1 Solution by Linear Programming . . . . . . . . . . . . . . . . 51 3.5.2 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . 55 iv Chapter 4: Asset-Selling Problem with Recall 58 4.1 Classical Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.2 Bayesian Selling Problem with Recall . . . . . . . . . . . . . . . . . . 58 4.2.1 Structure of Optimal Policy . . . . . . . . . . . . . . . . . . . 59 4.2.2 Monotonicity of the Value Function . . . . . . . . . . . . . . . 61 4.2.3 Heuristic Methods . . . . . . . . . . . . . . . . . . . . . . . . 62 4.2.4 Upper Bound of Optimal Expected Return . . . . . . . . . . . 64 4.2.5 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . 65 Chapter 5: Burglar Problem 67 5.1 Classical Burglar Problem . . . . . . . . . . . . . . . . . . . . . . . . 67 5.1.1 Unimodality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.1.2 Simulation And Computational Approaches . . . . . . . . . . 70 5.2 Bayesian Burglar Problem . . . . . . . . . . . . . . . . . . . . . . . . 74 5.2.1 Monotonicity of the Value Function . . . . . . . . . . . . . . . 75 5.2.2 Structure of Optimal Policy . . . . . . . . . . . . . . . . . . . 79 5.2.3 Dynamic Heuristic Policies . . . . . . . . . . . . . . . . . . . . 83 5.2.4 Static Heuristic Policies . . . . . . . . . . . . . . . . . . . . . 84 5.2.5 Upper Bound of Optimal Expected Return . . . . . . . . . . . 86 5.2.6 Numerical Study . . . . . . . . . . . . . . . . . . . . . . . . . 89 Chapter 6: Conclusions 99 Bibliography 103 Appendix A: Simulation Algorithms 106 A.1 Asset-Selling Problem without Recall . . . . . . . . . . . . . . . . . . 106 A.2 Asset-Selling Problem with Recall . . . . . . . . . . . . . . . . . . . . 110 A.3 Burglar Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Appendix B: Computation for Exponential Distribution 114 B.1 Asset-Selling Problem without Recall . . . . . . . . . . . . . . . . . . 114 B.2 Asset-Selling Problem with Recall . . . . . . . . . . . . . . . . . . . . 115 B.3 Burglar Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 v List of Tables 3.1 Parameter settings for the innite-stage Bayesian selling problem with- out Recall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2 Results of the innite-stage Bayesian selling problem without recall: expected returns of the heuristics . . . . . . . . . . . . . . . . . . . . 34 3.3 Results of the innite-stage Bayesian selling problem without recall: expected returns of some reference policies . . . . . . . . . . . . . . . 37 3.4 Results of the innite-stage Bayesian selling problem without recall: upper bounds of the optimal expected return . . . . . . . . . . . . . . 39 3.5 Parameter settings for the nite-stage Bayesian selling problem with- out recall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.6 Results of the nite-stage Bayesian selling problem without recall: ex- pected returns of the heuristics and the approximated optimal return 46 3.7 Parameter settings for the eBay Problem . . . . . . . . . . . . . . . . 56 3.8 Results of the eBay problem and UB 2 in the innite-stage Bayesian selling problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.1 Results of the innite-stage Bayesian selling Problem with Recall: ex- pected returns of the heuristics and upper bound of the optimal ex- pected return . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.1 Parameter settings for the Bayesian burglar problem - Exponential distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 vi 5.2 Results of the Bayesian burglar problem - Exponential Distribution: expected returns of the heuristics, expected returns of some reference policies, and upper bounds of the optimal expected return. 1 = 0:05, 2 = 0:2, and p 0 = 0:5 . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.3 Results of the Bayesian burglar problem - Exponential Distribution: expected returns of the heuristics, expected returns of some reference policies, and upper bounds of the optimal expected return. 1 = 0:05, 2 = 0:1, and p 0 = 0:5 . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.4 Results of the Bayesian burglar problem - Exponential Distribution: expected returns of the heuristics, expected returns of some reference policies, and upper bounds of the optimal expected return. 1 = 0:05, 2 = 0:05, and p 0 = 0:5 . . . . . . . . . . . . . . . . . . . . . . . . . . 95 vii Abstract This dissertation focuses on an application of stochastic dynamic programming called the optimal stopping problem. The decision maker has to choose a time to take a given action based on sequentially observed random variables in order to maximize an expected payo. All previous research on optimal stopping assumes that the distri- butions of random variables are completely known or partially known with unknown parameters. Throughout the dissertation, we address the problems with the uncer- tainty assumption that the random variables are from one of two possible distributions with given initial probabilities as to which is the true distribution. The probabili- ties are then updated in a Bayesian manner as the successive random variables are observed. We rst consider a problem of optimal stopping called asset-selling problem. There is a sequence of oers that comes in from a xed but unknown distribution. The de- cision maker has to decide to take an oer or continue to observe the next oer. There is a x cost for each oer observed. We consider both the cases where recall- ing a past oer is allowed and not allowed. Under our uncertainty assumption, the optimal solutions cannot be attained numerically. For each case, we present a dy- namic programming model and present some structural results of the optimal policy. We propose some heuristic policies and upper bounds of the optimal expected return. viii Using simulation, the performances of the heuristic methods are evaluated by com- paring with the upper bounds. We study a variant of the asset-selling problem in which the decision now is to set up a reserve price before oers come in. The seller accepts an oer if and only if it is greater than the reserve price. The problem can be viewed as an eBay-like online auction problem if we consider the maximal bidding price of each auction as an oer. We develop a dynamic programming model and show that the optimal solution can be obtained by linear programming. We then extend the work to another typical optimal stopping problem - the burglar problem. The problem considers a burglar who plans a series of burglaries. He/she may accumulate his/her earnings as long as he is not caught. If he/she is caught during a burglary, he/she loses everything and goes to jail. He/she wants to retire with a maximum expected fortune before he/she is caught. We again address the problem with our uncertainty assumption on the distribution of the returns for each burglary. There are two possible sets of the combination of the probability of a successful burglary and the loot distribution. We present the dynamic programming model and some structural results of the optimal policy. Some ecient heuristic policies are evaluated by comparing with the upper bounds of the optimal expected return. 1 Chapter 1 Introduction In a world full of possibilities and opportunities, selecting the best time to stop with what you have in your hand is one practical but crucial act. For example, a gambler needs to decide when to quit playing with his/her current fortune. A software company needs to decide when to nish the testing phase and launch a new product. An unemployed worker needs to decide when to stop searching for a job and take the best oer he/she has received [18]. The general concept of these problems is called optimal stopping theory. The theory of optimal stopping has a long history and has been studied extensively in the elds of applied probability, statistics, decision theory, mathematical nance, and economics. This topic rooted in the theory of sequential statistical analysis developed by Wald [33] and has spread far and wide ever since. Dynamic programming is an important method for multistage decision processes and especially useful in stochastic decisions. Optimal stopping problems are often solved by using stochastic dynamic programming. The basic framework of the optimal stopping problem in this dissertation is as follows. The decision maker takes a given action based on sequentially observed random variables. When he/she chooses to stop, a reward is received and the value of the reward depends on the observed random variables. There may be a cost for observing each random variable or future reward may be discounted. The objective is to nd a stopping rule which maximizes the expected return or minimizes the expected cost. 2 In this dissertation, we introduce some classical problems of optimal stopping and consider a new direction of uncertainty assumptions. Most problems on optimal stopping in the literature assume that the distributions of random variables are com- pletely known or partially known with unknown parameters. In the problems to be considered here, we address them with the uncertainty assumption that the random variables are from one of some possible distributions, especially one of two possible distributions, with given initial probabilities as to which is the true distribution. The probabilities are then updated in a Bayesian manner as the successive random vari- ables are observed. All problems are modeled with stochastic dynamic programming and the objectives are to maximize the expected return. This dissertation focuses on two applications of optimal stopping: asset-selling problem and burglar problem. We aim to answer the following questions for each problem: How to model the problem in a dynamic programming form? Is the optimal policy obtainable? What is the structure of the optimal policy? Under what conditions can the value function be monotone? What heuristic policies can be used to approximate the optimal policy? What are the rationales of the heuristic policies? How to evaluate the performance of the heuristics? How good are the heuristic policies? 3 1.1 Asset-Selling Problem We study a particular application of optimal stopping problem known as the asset- selling problem or house-selling problem. The sequence of random variables represents oers for buying an asset we owned. Once the oer is made, we have to decide to sell the asset (and stop) or to continue observing next oer. Assume that the oers are independent and all have the same distribution. We need to pay a xed-amount observation cost each time we want to observe an oer, e.g., fee for classied ads. The expected return then is the accepted oer minus the total observation cost. Assume that oers and the observation cost are nonnegative values. The classical asset-selling problem has the assumption that the oers are from a known distribution. However, we may not always have full information on the oer distribution. Most of the time, we only have some information on the true distribution at the beginning and the information is Bayesian updated after observing an oer. We call such a problem Bayesian selling problem. There are several variations of the Bayesian selling problem. The problem can be innite stage or nite stage. The decision can be made before or after an oer comes in. Moreover, the problem diers whether recalling past oers is allowed or not. We rst present an innite-stage Bayesian selling problem without recall, i.e., an oer rejected now cannot be accepted later on. Assume that oers are from one of two possible distributions. Under a pessimistic point of view, we may guess that the distribution is f 1 and under an optimistic point of view, we think the distribution is f 2 . We initially assume that f 1 is the true distribution with probability p and f 2 is the true distribution with probability 1p. The probability of whetherf 1 is the true distribution is Bayesian updated as we observe incoming oers. With the uncertainty 4 of oers, we are interested in nding a stopping rule to maximize the expected return. We use dynamic programming technique to formulate the problem and present some structural results of the optimal policy. We also propose some heuristic policies to approximate the optimal policy. To test the performance of our heuristic policies, we show how to obtain upper bounds of the optimal expected return and use the bounds as benchmark values to compare with the simulation results of our heuristic policies. The results show that our heuristic methods can provide good solutions. We then present a nite-stage version of the Bayesian selling problem without recall, in which the oers are from one of two possible distributions. If we limit our problem into a nite-stage and nite-state space problem, we can solve the optimality equation recursively and obtain the optimal value numerically. Thus, the optimal value can be used to compare with the simulation results of our heuristic methods to see how good the heuristics are. We are also interested in an extension to a generalized Bayesian selling problem without recall, in which oers are from one of n possible distributions. We verify the analytical results from the two-distribution problem and see whether they are still valid for the generalized n-distribution problem. A variation of the Bayesian selling problem without recall is an application of eBay-like reserve price auctions. We call such a problem eBay problem. Consider the maximal bidding price of each auction (minus eBay's Final Value Fees for the seller if the item sells) as an oer. That is, no matter how many bidders bid on this auction, we only consider the price from the highest bidder. By denition from eBay, a Reserve Price Auction is a type of online auction selling format where a seller sets the minimum price they are willing to accept for the item. If the reserve price is not met at the end of the auction, the seller can change the reserve price and start another auction for the item until it has been sold. The cost for observing an 5 oer can be viewed as the listing fee (eBay's Insertion Fees). We assume that the maximal bidding price of each auction is from one of the two possible probability density functions, f 1 and f 2 , given initial probabilities. We want to nd the optimal reserve price before starting an auction (before oers come in) so as to maximize the expected return. For the eBay problem, we transform a dynamic programming model into a corresponding linear programming model and use linear programming approach to solve the optimal value (Ross [24]). We also consider a Bayesian selling problem where the decision maker can recall and accept past oers after observing a subsequent one. Assume that the problem is under innite horizon and oers are from one of two possible distributions. When the oer distribution is completely known, with or without recall does not change the optimal policy. However, the problem is more dicult under our uncertainty assumption of oer distribution when recall is allowed. Hence, similar to the work in the Bayesian selling problem without recall, we present the dynamic programming model, the structural results of the optimal policy, some ecient heuristic policies, and upper bounds of the optimal policy for the Bayesian selling problem with recall. Numerical results obtained by simulation are also presented. 1.2 Burglar Problem We study another application of optimal stopping, called burglar problem. The prob- lem considers a burglar who plans a series of burglaries. He/she may accumulate the earnings as long as he/she is not caught and he/she may retire at any time. If he/she is caught during a burglary, he/she loses everything and goes to jail (problem ends). He/she wants to retire with a maximum expected fortune before he/she is caught. In this optimal stopping problem, the sequence of random variables represents returns 6 for each burglary and the action is to retire with the current accumulated earnings or to conduct another burglary. Assume that the loots (of each burglary) are in- dependent and all have the same distribution. Also, the loots are nonnegative and independent of the event that he/she is caught. We rst present the classical burglar problem. It has the assumptions that the loot of a burglary is from a known distribution and the probability of a successful burglary is given. We address this problem through dynamic programming and present some simulation and computational approaches to estimate the value function. The optimal policy is a threshold policy, where the burglar retires if and only if the accumulated loot is larger than the optimal threshold value. However, we may not always have full information on the distribution and the success rate. Most of the time, we only have some information at the beginning and the information is Bayesian updated after each burglary. We introduce such a problem called Bayesian burglar problem. More precisely, we have the following assumptions: under an optimistic point of view, the burglar may guess that the probability of a successful burglary is q 1 and the density of his/her loot is f 1 ; under a pessimistic point of view, he/she thinks that the probability of successful is q 2 and the density of his/her loot is f 2 . Assume that the initial probability of case (q 1 ;f 1 ) is p and the initial probability of case (q 2 ;f 2 ) is 1p. The probabilities are Bayesian updated as the burglar attempts burglaries. We use the dynamic programming technique to formulate the problem. We also present some structural analysis of the optimal policy and prove the monotonicity condition of the value function. To approximate the optimal solution, we propose some dynamic heuristic policies and some static heuristic methods. We obtain upper bounds of the optimal return and the performances of the heuristics are evaluated 7 by comparing to the upper bounds. We use simulation to conduct the numerical experiments and the results are shown and analyzed. 1.3 Organization of the Study This dissertation is organized as follows. Chapter 2 reviews the relevant literature on the asset-selling problem and the burglar problem. In Chapter 3, we present some asset-selling problems when recalling past oers is not allowed: the classical asset selling problem, the innite-stage Bayesian selling problem, the nite-stage Bayesian selling problem, and the eBay problem. Chapter 4 focuses on the asset-selling problem with recall. In Chapter 5, we present the burglar problems as previously described, including the classical and Bayesian problems. Conclusions and future research di- rections are given in Chapter 6. 8 Chapter 2 Literature Review There is a signicant amount of research on optimal stopping. The problem originated in the theory of sequential statistical analysis by Wald [33], in which the number of observations is not determined in advance. Snell [32] formulated a generalized optimal stopping problem for discrete stochastic processes. Other fundamental results for optimal stopping theory can be found by Dynkin [11], Siegmund [31], and Shiryaev [30]. This type of problem can be applied in the eld of statistics, economics, and nance, etc. Chow, Robbins, and Siegmund [6] and Ferguson [14] gave comprehensive studies of the subject and general reviews of the relative problems and solutions. In the asset-selling problems, a sequence of random variables representing incom- ing oers is observed and the objective is to choose a stopping policy to maximize the expected return, while there is a xed cost for each observation. The problems were rst considered when the distribution of oers is completely known. See MacQueen and Miller [20] and Chow and Robbins [4] [5]. The optimal policy in this case is to stop if and only if the oer is greater than a reservation price (i.e., a threshold policy), regardless of with recall or without recall. The burglar problem was rst introduced by Haggstrom [16], in which the mean reward of burglaries. Note that the optimal decision only depends on the mean. The problem can also be viewed as a variation of stopping a discounted sum (See Dubins and Teicher [10]). The optimal policies 9 of both asset-selling and burglar problems can be obtained by the simple one-stage look-ahead rule. (See [6], [24], and [14] for one-step look-ahead policies). Those conventional problems assumed that the distribution of oers is completely known. There is also a lot of literature on the optimal stopping problem with unknown or partially known distribution. Sakaguchi [28] rst introduced the asset-selling prob- lems, with and without recall, in which the distribution involves unknown parameters. He assumed that the observations are normally distributed with an unknown mean but xed variance, where is assumed to have a normal prior distribution. The prior distributions is updated in a Bayesian manner as successive oers are observed. He used dynamic programming techniques and determine the structures of the optimal polices. DeGroot [7] then studied the problem treated in Sakaguchi [28] and gave explicit solutions for both with and without recall cases. Derman, Lieberman, and Ross [9] consider a problem in which there aren workers of known values, p 1 ... p n , and n jobs arrive in sequential order. Assume that X 1 ;:::;X n are independent and identically distributed random variables and associ- ated with the j th job is the random variable X j , such that the return for assigning worker i to job j is the product p i X j . The problem is to assign workers to jobs to maximize the expectation of the sum of the returns. The workers can be viewed as the assets to sell and the jobs can be viewed as successive oers. If p 1 = ... = p n1 = 0 and p n = 1, the problem reduces to the nite stage version of the classical asset selling problem without recall allowed. Moreover, if there are multiple p 0 s = 1, it can be viewed as there are multiple items to sell. It is shown in [9] that the optimal policy does not depend on the p 0 s. Albright [1] extends this type of asset selling problem with the assumption similar to DeGroot [7] that there are one or more unknown parameters in the distribution from which the X 0 s are sampled. He shows 10 that the main result in [9] goes through without any distribution assumptions. He gives the form for the optimal policy of nite stage problem without recall when the underlying family of distributions is normal, uniform, or gamma, and the prior is the natural conjugate family in each case. Roseneld, Shapiro, and Butler [23] extend DeGroot's [8] and Albright's [1] work on the strategies of dynamic programming models with which oers are sampling from specic families of distributions with one or more unknown parameters. They focus on the forms of the optimal policy in both problems of sampling with and without recall. In the recall-allowed case, they investigate the conditions that the one-step look-ahead policies are optimal. For the no-recall case, they derive the general conditions for the reservation price property to hold. The reservation price property states that if a certain price is accepted, then any higher price would also have been accepted at that point in time. For the related work on the reservation price property, see Roseneld and Shapiro [22] and Seierstad [29]. Martinsek [21] studied the problem with recall when the oer distribution to be exponentially distributed with unknown mean. He proposed some stopping rules designed to approximate the optimal policy as an exact solution unavailable. Other optimal stopping problems with similar unknown-parameter assumptions have been studied extensively in Degroot's [8] book. He gave more detail about the problems and showed the existence of optimal stopping rules. Rothschild [27] studied an application of economics for searching the lowest price of an item and showed that the optimal policy is also characterized by a reservation-price property. Asset-selling problem is often discussed closely with other optimal stopping prob- lems. For example, the secretary problem (see Ferguson [13]) (sometimes described as the marriage problem) is a nite-stage stopping problem that selects the best of 11 the applicants (spouse) based on relative ranks. Some online auction problems can be viewed as variants of the secretary problem. These problems were mostly stud- ied from the perspectives of computer science. Although the problem settings and assumptions are dierent from the work in this dissertation, it is of interest to bring up this emerging research direction since we present the asset-selling problem as a framework for online auctions. See relative works of Kleinberg [17] and Babaio et al. [2]. On the other hand, Gallien [15] proposed a dynamic programming problem connecting the asset-selling problem and online auction applications. He treated the problem for both time sensitivity of participants and probabilistic features of dynamic bidder arrivals. To the best of our knowledge, the particular uncertainty assumption in this disser- tation has only been considered with very limited amount of literature. Eshragh and Modarres [12] introduced a new approach in probability distribution tting, called Decision on Belief. They formulated a special case of nite-stage optimal stopping problem by dynamic programming with the underlying uncertainty assumption sim- ilar to ours, i.e., the distribution of the random variables is from a set of potential distributions. Moreover, in our problem, initial probabilities information of the distri- butions are given and Bayesian updated upon receive incoming observations. Eshragh and Modarres treated the problem with the same assumptions and named the prob- abilities as belief. A predetermined value called least acceptable belief is used to eliminate candidates among pairwise comparisons. The objective is to determine the optimal value of the least acceptable belief to maximize the probability of correct selection. The problem is solved by nonlinear programming. 12 Chapter 3 Asset-Selling Problem without recall In this chapter we consider some asset-selling models in which no past oers can be recalled. We rst present the classical asset-selling problem and its solution. We also present the Bayesian selling problems with specied uncertainty assumptions under both innite and nite horizons. We then expand the analytical results to a more generalized problem. A variation of the Bayesian selling problem called eBay problem is introduced in the last section. 3.1 Classical Selling Problem We wish to sell an asset and oers come in daily. Each oer is independent of others and all have the same distribution. In a classical problem of selling an asset, the distribution of oers is completely known. Once the oer is made, we have to decide to sell the asset or continue to observe the next oer. We pay a x-amount nonnegative observation costC every time we receive an oer. Eventually, we need to sell the asset, i.e., take an oer and stop. Assume you may not recall past oers, i.e., the accepted oer is the nal oer before you stop. Thus, the return is the accepted oer minus the total observation cost incurred. The objective of this problem is to maximize the expected return. Assume random variable X represents oers and has distribution functionF and densityf. Dene the statex to be the current oer. Let V (x) be the maximal expected return from now on given that the current state is x. 13 The optimality equation is V (x) =C + max x; Z V (y)f(y)dy Let = R V (y)f(y)dy, which is also the expected optimal return at the beginning of the problem. Thus, when at statex, ifx<, it is optimal to rejectx and continue; ifx, it is optimal to take the oerx and stop. The optimal policy is called policy such that we accept an oer if and only if it is greater than or equal to the threshold value . To obtain the threshold value , we rearrange terms = Z V (y)f(y)dy =C + Z 0 f(y)dy + Z 1 yf(y)dy ) C + Z 1 f(y)dy = Z 1 yf(y)dy ) C = Z 1 (y)f(y)dy Thus, C =E (X) + (3.1) Since the right hand side of (3.1) is continuous in and decreasing from +1 to 0, there is a unique solution for for any C > 0. Thus, by solving equation (3.1), we can obtain the threshold value and hence the optimal policy. Now consider a nite stage version of the problem. Again, you may stop at any time and take the current oer or pay cost C to observe the next oer. However, assume now the asset must be sold by timeN, i.e., you are only allowed to observe at mostN oers. When there is no more oer left, you need to receive the current oer and sell the asset, no matter what the oer is. Let V n (x) be the maximal expected 14 return when the current oer is x and there are n remaining oers. The optimality equations are V 0 (x) = C +x V n (x) = C + max n x; Z V n1 (y)f(y)dy o ; for all x and n = 1; 2;:::;N 1 Let (n) = R V n1 (y)f(y)dy for n = 1; 2;:::;N 1. We can obtain those (n) values by the method of backward induction. Thus, the optimal policy has the fol- lowing form: At stage n, we accept an oer if and only if it is greater than or equal to the threshold value (n). 3.2 Bayesian Selling Problem without Recall - In- nite Stage Now suppose that the distribution of oers is not known in advance. Rather, we only know that there are two possible distributions, f 1 and f 2 , with given initial probabilities as to which one is the true distribution. Let p denote the probability of the density function f 1 and it is Bayesian updated as we observe incoming oers. Assume that we pay cost C to observe an oer and no past oers can be recalled. The objective is to maximize the expected return. LetV (x;p) be the maximum expected return given thatx is the current oer and p is the updated posterior probability of f 1 after receiving x. Let f p be a mixture of distribution functionsf 1 andf 2 , wheref p =pf 1 +(1p)f 2 . Letg(y;p) be the updated posterior probability of f 1 given thaty is the coming oer andp is the probability of 15 f 1 before receiving oer y. Thus, g(y;p) = pf 1 (y) pf 1 (y) + (1p)f 2 (y) = pf 1 (y) f p (y) The optimality equation is V (x;p) =C + maxfx; Z V (y;g(y;p))f p (y) dyg; for all x and p (3.2) The second term in the maximum function is a function of p and is independent to the value x. Let H(p) be the expected optimal return given that p is the current probability of f 1 and an oer is to be observed. That is, H(p) = Z V (y;g(y;p))f p (y) dy (3.3) We then rewrite the optimality equation (3.2) V (x;p) =C + maxfx;H(p)g; for all x and p (3.4) Thus, at state (x;p), the optimal policy is to accept x if and only ifxH(p). Next, we analyze the optimal policy and present some structural results. 3.2.1 Structure of Optimal Policy From equation (3.4), we know that we need to calculateH(p) to determine the optimal policy. Since H(p) involves the integral of the value function V , which is a function of a two-dimensional continuous-state space with high computational complexity, we are unable to solve H(p) numerically. As a result, we propose some heuristics for 16 the problem. Before presenting the heuristics, we will show some properties of the optimal policy. For a classical problem, by Section 3.1, we know that there exists an optimal policy when the oers are from a known distribution and the threshold value solves equation (3.1). Now consider our problem with the uncertainty assumption. Let 1 be the threshold value of the problem with oers having distributionf 1 and 2 be the threshold value of the problem with oers having distributionf 2 . Throughout this section, assumef 1 is the good distribution andf 2 is the bad one. Here we dene the good distribution is the one that has larger threshold value, i.e., 1 2 . The following propositions and corollary show the structure of the optimal policy. Proposition 1 When in state (x;p), it is optimal to acceptx ifxp 1 + (1p) 2 . Proof. To prove this, we rst show that H(p) pH(1) + (1p)H(0): Suppose p is the posterior probability of f 1 . Then the maximum expected return when we are about to receive an oer is H(p). On the other hand, consider another situation where we are to be told which one is the real distribution. The maximum expected return when we are about to receive an oer can be obtained by conditioning on the real distribution, so it is pH(0) + (1p)H(1). The expected return when we are to be given additional information must be at least as large as which this information is not given. Thus, we have H(p)pH(1) + (1p)H(0) Whenp = 1, H(1) is the threshold value of the problem with oers having distribu- tion f 1 , i.e., 1 . Similarly, H(0) = 2 . So, 17 H(p)p 1 + (1p) 2 Thus, we should accept an oer that is greater than or equal to p 1 + (1p) 2 : Now we know that H(p)p 1 + (1p) 2 . We also want to nd a tighter lower bound for H(p). Consider a policy y that accepts the rst oer greater than y. Let L y (p) be the expected return from this policy when the probability of f 1 is p. With F i = 1 F i being the distribution function corresponding to the density f i , we have L y (p) =p E f 1 [XjX >y] C F 1 (y) + (1p) E f 2 [XjX >y] C F 2 (y) (3.5) Let L(p) be the maximal value among all L y (p), i.e., L(p) = max y L y (p). Proposition 2 When in state (x;p), it is optimal to reject x if x<L(p). Proof. SinceH(p) is the expected optimal return given that the probability of f 1 is p, we can see that H(p)L y (p); for all y which proves the result. It is intuitive to think that it is also optimal to reject an oer less than 2 . Thus, we are interested in the relationship between 2 and L(p). Before we can show the relationship, we need a couple of lemmas. The hazard rate function of a nonnegative continuous random variable X with distribution function F and density f is dened as X (t) =f(t)= F (t) where F (t) = 1F (t). We say X is hazard rate order larger than Y , written X hr Y , if X (t) Y (t), for all t. It is well-known that X hr Y implies that X st Y where the 18 latter means that P (X > x) P (Y > x) for all t. Indeed, X hr Y implies that [XjX >t] st [YjY >t] for all t [26]. LetR f (y) be the expected return from the policy that accepts the rst oer greater than y when f is the density function of oers. Lemma 1 Let X have probability density function f and let Y have probability den- sity function g. If X hr Y then R f (t)R g (t) for all t. Proof. By denition, R f (t) = E f [XjX > t] C P (X >t) and R g (t) = E g [YjY > t] C P (Y >t) . The result follows becauseX hr Y implies thatP (X >t)P (Y >t) and [XjX >t] st [YjY >t] implies that E[XjX >t]E[YjY >t]. Lemma 2 If < 1 , then R f 1 (): Proof. Let 1 (t) be the hazard rate function corresponding to the density f 1 . We will construct a random variableY with hazard rate function Y (t) satisfying Y (t) 1 (t) and which is such that E[(Y) + ] =C. To begin, note that 1F 1 (t) =e R t 0 1 (s)ds giving f 1 (t) = 1 (t)e R t 0 1 (s)ds Let X be the random variable that has hazard rate function 1 (t). When 1, X hr X 1 . Let f X be the density function of X and we have f X (t) = 1 (t)e R t 0 1 (s)ds 19 Now, for < 1 , consider E[(X ) + ] = Z 1 0 (t) + 1 (t)e R t 0 1 (s)ds dt If = 1, E[(X ) + ] = E[(X 1 ) + ] > E[(X 1 1 ) + ] = C. Also, as !1, E[(X ) + ]! 0. We can also see that E[(X ) + ] is continuous in . Thus, there exists a value > 1 that satises E[(X ) + ] = C. Let Y be the random variable X : Now, use that R f 1 () R f () = where the inequality follows from Lemma 1 and the equality becauseE[(X ) + ] = C implies that is the optimal threshold value when the oers have densityf , which implies the equality, and proves the lemma. Proposition 3 For any p, 2 L(p)H(p)p 1 + (1p) 2 1 . Proof. By Proposition 1, Proposition 2, and 1 2 , if follows L(p)H(p)p 1 + (1p) 2 1 Thus, we only need to show 2 L(p). Recall that L y (p) is the expected return from policy that accepts the rst oer that is y when probability of f 1 is p, and L(p) = max y L y (p): Thus, 20 L(p) L 2 (p) = pR f 1 ( 2 ) + (1p)R f 2 ( 2 ) = pR f 1 ( 2 ) + (1p) 2 p 2 + (1p) 2 = 2 where the nal inequality used Lemma 2. Corollary 1 It is optimal to reject an oer that is less than 2 . Proof. It follows directly from Proposition 3. 3.2.2 Monotonicity of the Value Function In this subsection we will provide and prove a condition for V (x;p) to be monotone in p. First, the following proposition and lemma hold for any distribution. Proposition 4 V (x;p) is an increasing function in x. Proof. This follows from V (x;p) =C + maxfx;H(p)g. Lemma 3 g(y;p) is an increasing function in p. Proof. g(y;p) = pf 1 (y) pf 1 (y) + (1p)f 2 (y) = 1 1 + (1p)f 2 (y) pf 1 (y) (3.6) We can easily see that g(y;p) is an increasing function in p. 21 There are some monotone properties that we would like to show when the two distributions have likelihood ratio order relationship. By denition, X 1 is likelihood ratio order larger than X 2 , written X 1 lr X 2 , if f 1 (x) f 2 (x) " x. To show the value function is monotone in p under a certain condition, we show the following lemmas rst. Now we use the notation X g to mean that X has a probability density function g. Let X 1 f 1 and X 2 f 2 . Lemma 4 If X 1 lr X 2 , then g(y;p) is an increasing function in y. Proof. By denition, f 2 (y) f 1 (y) is decreasing in y. The argument follows directly from equation (3.6). Let Y 1 f p 1 and Y 2 f p 2 , where p 1 p 2 . Lemma 5 If X 1 lr X 2 , then Y 1 st Y 2 . Proof. Since Y 1 f p 1 and Y 2 f p 2 , Y 1 and Y 2 can be viewed as Y 1 = 8 > < > : X 1 with probability p 1 X 2 with probability 1p 1 Y 2 = 8 > < > : X 1 with probability p 2 X 2 with probability 1p 2 22 Hence for all a, P (Y 1 >a)P (Y 2 >a) = [p 1 P (X 1 >a) + (1p 1 )P (X 2 >a)] [p 2 P (X 1 >a) + (1p 2 )P (X 2 >a)] = (p 1 p 2 )P (X 1 >a) + (p 2 p 1 )P (X 2 >a) = (p 1 p 2 )[P (X 1 >a)P (X 2 >a)] 0 The last inequality holds since X 1 lr X 2 implies X 1 st X 2 and p 1 p 2 . Proposition 5 If X 1 lr X 2 , then V (x;p) is an increasing function in p. Proof. We will prove it by induction. Consider the nite stage problem rst. Let V n (x;p) be the maximum expected return when the current state is (x;p) and there are n oers to go. The optimality equations for the nite stage problem are V n (x;p) = C + max n x; Z V n1 (y;g(y;p))f p (y) dy o ; for all x and p V 0 (x;p) = C +x SinceV 0 (x;p) is increasing inp, assumeV n1 (x;p) is an increasing function inp. Now we want to show that V n (x;p) is also increasing in p. For p 1 p 2 , V n (x;p 1 ) = C + max n x; Z V n1 (y;g(y;p 1 ))f p 1 (y) dy o = C + max n x;E [V n1 (Y 1 ;g(Y 1 ;p 1 ))] o C + max n x;E [V n1 (Y 1 ;g(Y 1 ;p 2 ))] o 23 where Y 1 f p 1 and the inequality holds by Lemma 3 and the induction hypothesis. Now, let Z(y) =V n1 (y;g(y;p 2 )) By Proposition 4, V (x;p) is increasing in x. It can also be applied to the nite stage problem, i.e., V n (x;p) is increasing in x when at stage n. In addition, by Lemma 4 and the induction hypothesis, Z(y) is increasing in y. We have V n (x;p 1 ) C + maxfx;E [Z(Y 1 )]g C + maxfx;E [Z(Y 2 )]g = V n (x;p 2 ) The second inequality holds by Lemma 5. Thus, V n (x;p) is an increasing function in p. When n!1, V n (x;p)!V (x;p). Hence V (x;p) is also an increasing function in p. Note that we are unsure if the assumption of a likelihood ratio order in Proposition 5 can be weakened to either assuming a usual stochastic order or a hazard rate order. 3.2.3 Heuristic Methods In this subsection, we propose some heuristic methods to approximate the optimal policy. The expected returns of the heuristic methods will be obtained by simulation. Assume that we can analytically compute the threshold values 1 and 2 . We will describe the detail simulation processes later in Section 3.2.5. See Appendix A for the algorithms of all the heuristic policies. 24 Policy mix It is intuitive to think that the mixture of the two threshold values, 1 and 2 , might be a good threshold value for our problem. Let policy mix be the policy that, when in state (x;p), we accept the current oer x if and only ifxp 1 + (1p) 2 . Since L(p) p 1 + (1p) 2 for all p (see Proposition 3), the policy mix implies that we will accept an oer greater than or equal to p 1 + (1p) 2 and reject an oer less than L(p). Thus, the policy mix satises the optimal criteria in Propositions 1 and 2. We propose the policy mix as our rst heuristic solution. In other words, we use p 1 + (1p) 2 to approximate the optimal threshold value H(p). Policy L Let policy L be the policy that, when in state (x;p), accepts the current oer x if and only if x L(p). Similar to the idea of the rst heuristic with the results from Proposition 3, the policy mix will accept an oer greater than or equal to p 1 + (1p) 2 and reject an oer less than L(p). Thus, it satises the optimal criteria in Propositions 1 and 2. We propose the policy L as our second heuristic solution, i.e., using L(p) to approximate H(p). Note that the computation of L(p) could be complicated. Policy midpoint From Propositions 1 and 2, we know that when in state (x;p), it is optimal to accept x if x p 1 + (1p) 2 and reject x if x < L(p). Thus, we only need to decide whether to take the oer x if L(p) x < p 1 + (1p) 2 . Now we simply consider the midpoint ofL(p) andp 1 + (1p) 2 as the threshold value. Let policy midpoint be the policy that, when in state (x;p), we accept the current oer x if and only if x p 1 +(1p) 2 +L(p) 2 . 25 Policy Improvement on mix We use the policy improvement technique on the policy mix to derive the fourth heuristic method. The idea of the policy improvement is that, for any state and given a current policy, if using a new policy for one stage and then switching back to the current policy performs at least as good as using the current policy throughout, we would use the new policy. Here the current policy means the policy mix . From Propositions 1 and 2, we know that when in state (x;p), it is optimal to accept x if x p 1 + (1 p) 2 and reject x if x < L(p). The policy mix satises these optimal conditions. Thus, we want to further improve the policy mix when L(p)x<p accept="" method="" for="" value=""> < > : x ; if xH(p) H(p) ; if x<H(p) In the rst heuristic method, we accept an oer greater than p 1 + (1p) 2 and reject it otherwise, which can be thought as usingp 1 +(1p) 2 as an approximation of H(p). Now rewriting equation (3.2) and applying this approximation into the optimality equation, we have V (x;p) = C + maxfx;E[V (Y;g(Y;p))]g; Yf p C + maxfx;E[V (Y;g(Y;p))]g where V (x;p) =C + 8 > < > : x ; if xp 1 + (1p) 2 p 1 + (1p) 2 ; if x</p><p method="" accept="" value="" for="" max="" list="" summary=""> 200) = (1 1 19 ) 201 2 10 5 . To obtain the optimal value and simplify the calculation, we compute an approximated optimal value by cutting o the possible oers to 200. For all the oers greater than 201, we consider them all as 201 and thus give more weights to the oer value 201. We also consider a nite-state space of p and p = 0.001, 0.002,..., 0.999, 1. The optimality 45 equation becomes V n (x;p) C + max n x; 200 X y=0 V n1 (y;g(y;p))f p (y) + V n1 (201;g(201;p)) [p (1 200 X y=0 f 1 (y)) + (1p)(1 200 X y=0 f 2 (y))] o ; for all x and p ;n = 2;:::;N The optimal expected return of 10 oers is P 1 y=0 V 9 (y;g(y;p))f p (y) and we also cut o the possible oers to 200 in the same way. Thus, 1 X y=0 V 9 (y;g(y;p))f p (y) 200 X y=0 V 9 (y;g(y;p))f p (y) +V 9 (201;g(201;p)) " p (1 200 X y=0 f 1 (y)) + (1p)(1 200 X y=0 f 2 (y)) # Since the variances of those distributions are large (range from around 20 to 380), Table 3.5: Parameter settings for the nite-stage Bayesian selling problem without recall Case No. Parameters 1 2 3 4 5 6 7 8 1 0.1 0.1 0.1 0.1 0.1 0.1 0.05 0.05 2 0.12 0.12 0.15 0.15 0.2 0.2 0.1 0.1 C 1 0.5 1 0.5 1 0.5 1 0.5 Initial p 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1 9 9 9 9 9 9 19 19 2 7.333 7.333 5.667 5.667 4 4 9 9 46 by using simulation we might get some large oers that make the expected return larger. Thus, we use variance reduction technique to obtain better simulation result (Ross [25]). Instead of run 10 5 replications for each distribution, now for each case we rst simulate 1000 pilot runs for both f 1 and f 2 and then obtain the standard deviation of the expected return, say 1 and 2 . Run 2 10 5 1 1 + 2 replications for f 1 and 2 10 5 2 1 + 2 replications for f 2 . In Table 3.6, we show the expected returns of the policy mix threshold , called R mix threshold , the expected returns of the policy mix f , called R mix f , and the ap- proximated optimal values, called R , for each case. Similar to the results in the innite-stage problem, the expected returns of the policy mix threshold are still larger than those of the policy mix f as we expected. We can also see that with policy mix threshold , the results are very close to the approximated optimal value. Table 3.6: Results of the nite-stage Bayesian selling problem without recall: expected returns of the heuristics and the approximated optimal return Case No. R mix threshold R mix f R 1 15.847 15.838 15.874 2 18.415 18.375 18.422 3 13.370 13.350 13.374 4 15.906 15.892 15.907 5 10.570 10.509 10.620 6 13.180 13.124 13.193 7 28.199 28.104 30.448 8 31.138 31.057 33.197 47 3.4 Bayesian Selling Problem without Recall - Gen- eralized n-Distribution Problem So far our assumption is based on the oers from one of the two possible distribu- tions. We are also interested in the extension to a generalized problem that there are n possible distributions, f 1 , f 2 , ..., f n . In this section, we examine the results obtained from the two-distribution problem and see whether they are still valid for the generalized problem. Assume that there is a given initial probability vector p =fp 1 ;p 2 ;:::;p n g, where p i is the probability of the distribution f i being the true one. The probability vector p is also Bayesian updated as we observe incoming oers. We dene f p be a mixture of distribution functions f 1 ;:::;f n , where f p = X i p i f i . Let g(y;p) be the updated posterior probability vector given that y is the coming oer and p is the probability vector before receiving oer y. We have g(y;p) = g 1 (y;p);g 2 (y;p);:::;g n (y;p) = ( p 1 f 1 (y) f p (y) ; p 2 f 2 (y) f p (y) ;:::; p n f n (y) f p (y) ) The optimality equation is V (x;p) =C + maxfx; Z V y;g(y;p) f p (y) dyg; for all x and p (3.7) Let H(p) = R V y;g(y;p) f p (y) dy. We rst examine the structure of the optimal policy. Let i be the threshold value of the problem with oers having distributionf i . Let L y (p) be the expected return from the policy that accepts the rst oer greater than y when the probability vector is p and let L(p) = max y L y (p). The following 48 propositions are analogues of Propositions 1 to 3 in Section 3.2.1. The proofs are omitted here since they follow the same logic in the two-distribution problem. Proposition 6 When in state (x;p), it is optimal to accept x if x n X i=1 p i i . Proposition 7 When in state (x;p), it is optimal to reject x if x<L(p). Proposition 8 For any p , min i i L(p)H(p) n X i=1 p i i max i i . Now we want to check the monotonicity of the value function as in Section 3.2.2. Assume x and y are both vectors with n elements. The vector x is said to majorize the vector y if n X i=1 x i = n X i=1 y i and k X i=1 x i k X i=1 y i , for k = 1;:::;n 1. We use the notation xy to show that x majorizes y. Assume p and q are both probability vectors of the generalized n-distribution problem and pq. The following proposition is an analogue of Proposition 4. Proposition 9 V (x;p) is an increasing function in x. We cannot prove that an analogue of Lemma 3 is valid, nor can we provide coun- terexamples due to the complexity of the probability vectors. Thus, we are not sure whetherg(y;p)g(y;q) for anypq. As a result, an analogue of Proposition 5 can- not be proved as well. We leave this part as a subject for future research. However, we can still show that an analogue of Lemma 4 holds. Let X i f i , for i = 1;:::;n. Lemma 6 If X 1 lr X 2 lr ::: lr X n and y 1 y 2 , g(y 1 ;p)g(y 2 ;p) for any p. 49 Proof. It is obvious that n X i=1 g i (y 1 ;p) = n X i=1 g i (y 2 ;p) = 1. We want to show that k X i=1 g i (y 1 ;p) k X i=1 g i (y 2 ;p), for k = 1;:::;n 1. First, when k = 1, g 1 (y 1 ;p) = p 1 f 1 (y 1 ) f p (y 1 ) = 1 1 + p 2 p 1 f 2 (y 1 ) f 1 (y 1 ) + p 3 p 1 f 3 (y 1 ) f 1 (y 1 ) +::: + pn p 1 fn(y 1 ) f 1 (y 1 ) 1 1 + p 2 p 1 f 2 (y 2 ) f 1 (y 2 ) + p 3 p 1 f 3 (y 2 ) f 1 (y 2 ) +::: + pn p 1 fn(y 2 ) f 1 (y 2 ) = g 1 (y 2 ;p) The inequality holds because f 2 (x) f 1 (x) ; f 3 (x) f 1 (x) ;:::; and fn(x) f 1 (x) are all decreasing functions in x. We know that f 1 (x) f j (x) and f 2 (x) f j (x) increase inx, forj = 3;:::;n. Thus, p 1 f 1 (x) p j f j (x) and p 2 f 2 (x) p j f j (x) also increase inx. That is, p j f j (x) p 1 f 1 (x)+p 2 f 2 (x) , the reciprocal the sum of p 1 f 1 (x) p j f j (x) and p 2 f 2 (x) p j f j (x) , decrease in x. Hence, when k = 2, g 1 (y 1 ;p) +g 2 (y 1 ;p) = p 1 f 1 (y 1 ) +p 2 f 2 (y 1 ) f p (y 1 ) = 1 1 + P n j=3 p j f j (y 1 ) p 1 f 1 (y 1 )+p 2 f 2 (y 1 ) 1 1 + P n j=3 p j f j (y 2 ) p 1 f 1 (y 2 )+p 2 f 2 (y 2 ) = g 1 (y 2 ;p) +g 2 (y 2 ;p) By using the same logic fork = 3;:::;n1, we can see that k X i=1 g i (y 1 ;p) k X i=1 g i (y 2 ;p). Thus, g(y 1 ;p)g(y 2 ;p). 50 3.5 eBay Problem In Section 3.2, we study the problem where we have to decide whether to stop or to continue after receiving an oer. In this section, we consider the problem that the decision maker now has to set up a value before oers come in. For any oer larger than the pre-dened value, we stop and accept the oer. Otherwise, we reject the oer, pay cost C, and continue observing the next oer. Again we assume recall is not allowed. We call this an eBay problem because the pre-dened value can be viewed as the reserve price in the eBay bidding mechanism. Consider the maximal bidding price of each auction as the oer. No matter how many buyers bid on this auction, we only take the price from the high bidder as an oer. The seller sets up a reserve price for the item. At the end of an auction, if the maximal bidding price is smaller than the reserve price, the seller refuses to sell the item and starts a new auction for it. The costC for observing oers can be viewed as the listing fee and we assume there is no extra charge for the seller if the item sells or the auction ends without a winning bid. Assume oers are from one of the two possible probability density functions, f 1 and f 2 . F 1 and F 2 are the cumulative distribution functions. Dene the problem to be in state1 when an oer is accepted and in state p when we are about to receive an oer, where p is the probability of f 1 and 0 p 1. Let f p = pf 1 + (1p)f 2 andF p =pF 1 + (1p)F 2 . LetV (z) denote the maximum expected additional return when in state z. The optimality equation is 51 V (1) = 0 V (p) = C + max x n Z x 0 V pf 1 (y) f p (y) f p (y) dy + Z 1 x yf p (y) dy o ; 0p 1 where x is the reserve price. When p is the posterior probability of f 1 , then the maximum expected return is V (p). Consider another situation where we are to be told which one is the real distribution. The maximum expected return can be obtained by conditioning on the real distribution, so it ispV (1)+(1p)V (0). The expected return when we are to be given additional information must be at least as large as which this information is not given. Thus,V (p) is never better thanpV (1)+(1p)V (0). LetV = maxfV (1);V (0)g. We can see V (p)pV (1) + (1p)V (0)V , i.e., V (p) is bounded by V . 3.5.1 Solution by Linear Programming In this subsection, we use linear programming approach to solve the dynamic pro- gramming model (see Ross [24]). Recall that, in the problem of Section 3.2, 1 and 2 are the threshold values of the problems with oers having distributions f 1 and f 2 , respectively. Let maxf 1 ; 2 g = . Lemma 7 At any state p, the optimal reserve price is less than or equal to . Proof. Assume the optimal reserve price at state p, called x (p), is greater than . For an oerx such thatxx (p), it is optimal to reject it. Thus, the maximum expected return after rejecting x (=C +V (p)) must be at least as large as the return from accepting the oer x (=C +x), i.e., V (p)x. In the problem of Section 3.2, the expected optimal return given that p is the current probability of f 1 is H(p). On the other hand, in the eBay problem, the 52 expected optimal return when in state p isV (p). Since in the innite-stage Bayesian selling problem we choose actions (accept or reject an oer) after receiving oers while in the eBay problem we choose actions (assign reserve price) before receiving oers, the former case must be at least as good as the latter one, so H(p) V (p). By Proposition 3, we have p 1 + (1p) 2 H(p), which implies H(p). Thus, H(p)V (p)x. A contradiction to our above assumption, so x (p). Let X i be the state of the process at time i and policy be the optimal policy. Let P (X n 6=1) be the probability of X n 6=1 under policy . Lemma 8 P (X n 6=1)! 0 as n!1. Proof. First, we want to nd an upper bound of the expected accepted oer. Let x (p) be the optimal reserve price when in state p. By Lemma 7, x (p) . Let R be the expected accepted oer given that the accepted oer is greater than , i.e., R = E[YjY ], where random variable Y represents oers. We can also see that E[YjY y] is an increasing function in y. Thus, we have E [accepted oer] = E[YjYx (p)]R, for all p. We then consider the expected return under the optimal policy when in state p to be V (p) = E [return] = E [returnjX n =1]P (X n =1) +E [returnjX n 6=1]P (X n 6=1) RP (X n =1) + (RnC)P (X n 6=1) RnCP (X n 6=1) 53 IfP (X n 6=1)9 0,V (p)!1 asn!1. SinceV (p) is a bounded function, we have P (X n 6=1)! 0. Proposition 10 If u is a bounded function satisfying u(1) = 0 and u(p)C + max x n Z x 0 u pf 1 (y) f p (y) f p (y) dy + Z 1 x yf p (y) dy o ; 0p 1 then u(p)V (p) for all p. Proof. Let s(x;p) =C + Z 1 x yf p (y) dy and rewrite the condition as u(p) max x n Z x 0 u pf 1 (y) f p (y) f p (y) dy +s(x;p) o (3.8) Consider the preceding problem with the exception that we are given the option of stopping at any time. If we stop when in state p, then we earn a terminal reward u(p). Here s(x;p) can be viewed as the expected reward when in state p and action x is chosen. Now, equation (3.8) states that stopping immediately is better than doing anything else for one stage and then stopping. Repeating this argument shows that stopping immediately is better than doing anything else for n stages and then stopping. Thus, for the optimal policy , u(p)E [n-stage returnjX 0 =p ] +E [u(X n )jX 0 =p ] where X i is the state of the process at time i. Now consider E [u(X n )jX 0 = p ] by conditioning on the value of X n , we have 54 E [u(X n )jX 0 =p ] = E [u(X n )jX 0 =p and X n =1]P (X n =1jX 0 =p ) + E [u(X n )jX 0 =p and X n 6=1]P (X n 6=1jX 0 =p ) = E [u(X n )jX 0 =p and X n 6=1]P (X n 6=1jX 0 =p ) The last equation holds because u(1) = 0. Since u is a bounded function and by Lemma 8 that P (X n 6=1)! 0, we can see that E [u(X n )jX 0 = p ]! 0. Thus, with n!1, we obtain u(p)V (p) =V (p) By Proposition 10, we can know that the optimal value functionV (p) is the small- est bounded function that satises the inequality. Hence,V (p) will be the solution of the optimization problem min Z 1 0 u(p)dp s.t. u(p)C + Z x 0 u pf 1 (y) f p (y) f p (y) dy + Z 1 x yf p (y) dy ; for all x and 0p 1 The preceding problem has continuous state space p between [0; 1]. In order to solve the linear programming problem numerically, we will rst discretize the state space. We divide the region [0; 1] evenly intor segments and discretize the state space 55 p into 1 r ; 2 r ;:::; r1 r ; 1 . If the probability of f 1 is equal to q and (j1) r < q j r , we set the state p = j r . The objective function then can be approximated by min X p u(p) ; p2 1 r ; 2 r ;:::; r 1 r ; 1 To approximate the continuous oers, we also discretize the possible oers (sample space). With a given small number d, we divide [0,1] intof0;d; 2d; 3d;:::g. If an oer is equal to x and id<x (i + 1)d, we set the oer to be (i + 1 2 )d. If the oer X has continuous distributionF 1 and densityf 1 , letX 0 be the random variable from the mass function f 0 1 such that f 0 1 ((i + 1 2 )d) = PfX 0 = (i + 1 2 )dg = Pfid < X (i + 1)dg = F 1 ((i + 1)d)F 1 (id). We can also derive the mass function f 0 2 from F 2 and f 2 in the same manner. Let f 0 p = pf 0 1 + (1p)f 0 2 . Thus, we can rewrite the approximated linear programming problem as min X p u(p) s.t. u(p)C + X yx u pf 0 1 (y) f 0 p (y) f 0 p (y) + X y>x yf 0 p (y) ; for p2 1 r ; 2 r ;:::; r 1 r ; 1 and for all x = (i + 1 2 )d; i = 0; 1; 2;::: 3.5.2 Numerical Results In our numerical experiments, we discretize the state space p into 100 states. When we calculate pf 1 (y) fp(y) , we round it to 2 decimal places. For example, if we obtain the posterior probability = 0.233, we will set it to 0.23. We solve the problem for 8 dier- ent cases and all the oers are from geometric distributions, where f 1 has parameter 56 1 and f 2 has parameter 2 . See Table 3.7 for the settings. The mean values of f 1 andf 2 , denoted by 1 and 2 , are also shown in the table for reference. Note that we use the same mean settings for the 8 cases as those in the innite-stage Bayesian sell- ing problem. As the Geometric distribution is the discrete analog of the exponential distribution, we may compare the results from these two problems. Table 3.7: Parameter settings for the eBay Problem Case No. Parameters 1 2 3 4 5 6 7 8 1 0.1 0.1 0.1 0.1 0.1 0.1 0.05 0.05 2 0.12 0.12 0.15 0.15 0.2 0.2 0.1 0.1 C 1 0.5 1 0.5 1 0.5 1 0.5 Initial p 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1 9 9 9 9 9 9 19 19 2 7.333 7.333 5.667 5.667 4 4 9 9 Since the largest-mean distribution among our settings is a Geometric distribution with mean 19, there is only a small chance that we will have an oer greater than 200 (e.g., for case 2 and case 6, P (X > 200) = (1 1 19 ) 201 2 10 5 ). Thus, we assume that the oers are from truncated Geometric distributions with sample space f0; 1;:::; 200g, i.e., if an oer is greater than 200, we will set it to be 200. The possible reserve prices also range from 0 to 200. Following is the model we use to obtain the optimal expected return. min X p u(p) s.t. u(p)C + x X y=0 u pf 1 (y) f p (y) f p (y) + 200 X y=x+1 yf p (y) ; for p = 0:01; 0:02;:::; 1 and x = 0; 1;:::; 200 57 By solving the preceding LP problem, the optimal expected returns when initialp is 0.5, i.e.,u(0:5, are shown in Table 3.8. We also showUB 2 , the tighter upper bound of the optimal expected return, from the Bayesian selling problem. By comparing the two results,, we can see how much we lose if the decision has to be made before oers come in (eBay problem) rather than after oers come in (Bayesian selling problem). As can be seen from the table, the two results are very close. The average percentage dierence among the 8 cases here is 6.18%. Since the optimal expected return in the Bayesian selling problem is less than the upper bound, the expected returns from the two models are even closer. Thus, we can conclude that by assigning the reserve price before oers come in is an ecient method for the decision maker. Table 3.8: Results of the eBay problem andUB 2 in the innite-stage Bayesian selling problem Case u(0:5) (eBay Problem) UB 2 (Bayesian Selling Problem) 1 17.688 17.817 2 23.310 23.872 3 14.253 14.434 4 19.019 20.243 5 10.892 11.762 6 15.246 17.251 7 32.663 35.865 8 41.823 47.035 58 Chapter 4 Asset-Selling Problem with Recall In this chapter we consider the asset-selling problems where the decision maker can recall and accept past oers after observing a subsequent one. We brie y describe the classical problem and then present the Bayesian selling problem. 4.1 Classical Problem Assume the maximum oer so far received is x m and it is optimal to continue. After observing a subsequent oer x, if the maximum oer so far received is still x m , i.e., current oerxx m , it is still optimal to continue. Thus, you will never recall a past oer. As a result, the optimal policy of the problem with recall is the same as the problem without recall. 4.2 Bayesian Selling Problem with Recall In this section, we study a problem similar to the one in Section 3.2 except that we are allowed to recall and accept a past oer after observing a subsequent one. The uncertainty assumptions are the same. The distribution of oers is xed from one of the two possible distributionsf 1 andf 2 . The probabilities as to which is the true one are given and Bayesian updated as we observe incoming oers. Letx be the maximum oer so far received. LetV (x;p) be the maximum expected return given that x is the maximum oer so far received and p is the updated prob- 59 ability of f 1 after receiving the most recent oer. Let f p (y) = pf 1 (y) + (1p)f 2 (y) and g(y;p) = pf 1 (y) f p (y) : The optimality equation is V (x;p) =C + max n x; Z V (maxfx;yg;g(y;p))f p (y)dy o , for all x and p (4.1) Let 1 and 2 be the optimal threshold values when oers are from known dis- tributions f 1 and f 2 , respectively. When we are about to observe the rst oer, if the distribution were known, the optimal policy for the problem with recall is the same as that for the problem without recall. That is, 1 and 2 here are the same as those in the problem without recall andC =E f 1 [(X 1 ) + ] =E f 2 [(X 2 ) + ]. How- ever, under the uncertainty assumptions, the problem diers when recall is allowed. Throughout this section, we assume f 1 is the good distribution, i.e., 1 2 . Let H(x;p) be the expected optimal return, at (x;p), if we do not stop. H(x;p) = Z V (maxfx;yg;g(y;p))f p (y)dy (4.2) Then we can rewrite equation (4.1), V (x;p) =C + maxfx;H(x;p)g (4.3) At state (x;p), the optimal policy is to accept x if and only if xH(x;p). 4.2.1 Structure of Optimal Policy The following propositions and corollary show the structure of the optimal policy. Proposition 11 It is optimal to accept an oer that is greater than or equal to 1 . 60 Proof. If we are to be told which distribution is the true one, the maximum expected return is 1 . When we are not to be told which distribution is the true one, the best we can do must be no larger than that when we are to be told, i.e., no larger than 1 . Thus, for any oer greater than or equal to 1 , we should accept it. Let 1sla (x;p) be the expected return, after paying cost C to observe an oer and at state (x;p), if we choose to go exactly one more stage and then stop. That is, 1sla (x;p) =C+E fp [maxfx;Xg]. The policy that stops if and only ifx 1sla (x;p) is called the one-stage look-ahead policy. Proposition 12 When in state (x;p), it is optimal to reject x if x< 1sla (x;p). Proof. When in state (x;p) and x < 1sla (x;p), the problem can be viewed that accepting x is no better than rejecting x and accepting at the next stage. Thus, for the optimal result we would rather reject x. In other words, it is optimal to reject an oer if the one-stage look-ahead policy asks us to reject. The following is a corollary of Proposition 12. Corollary 2 It is optimal to reject an oer that is less than 2 . Proof. If an oer x < 2 , we have E f 1 [(Xx) + ] > C and E f 2 [(Xx) + ] > C. Thus, for any p, E fp [(Xx) + ] > C. After rearranging terms, we have x <C + E fp [maxfx;Xg] = 1sla (x;p). By Proposition 12, we should reject x. Let V f 1 (x) = V (x; 1) and V f 2 (x) = V (x; 0). Assume random variable X 1 has the probability density function f 1 and X 2 has the probability density function f 2 . Proposition 13 When in state (x;p) and 2 x < 1 , it is optimal to accept x if xp 1 + (1p)(C +E[maxfx;X 2 g]). 61 Proof. Now we consider a case that you will be told which is the true distribu- tion after you choose to continue. Let (x;p) be the maximum expected return, at state (x;p), if you are being told which is the true distribution after choosing to ob- serve the next oer. Thus, at state (x;p), the best we can do in this case isC + maxfx;(x;p)g, where (x;p) =pE[V f 1 (maxfx;X 1 g)] + (1p)E[V f 2 (maxfx;X 2 g)]. The expected return when we are to be given additional information after one more period must be at least as large as it would be when this information is not to be given, so (x;p) H(x;p). Thus, if x (x;p), then x H(x;p) and we should accept x. If the true distribution is known to be f 1 and x < 1 , we should reject x and accept the rst oer greater than or equal to 1 . Thus, the optimal expected return is 1 , i.e., E[V f 1 (maxfx;X 1 g)] = 1 . If the true distribution is known to be f 2 , it is optimal to accept any oer greater than or equal to 2 . Since x is already greater than or equal to 2 , we haveE[V f 2 (maxfx;X 2 g)] =C +E[maxfx;X 2 g]. So when in state (x;p) and 2 x< 1 , (x;p) =p 1 + (1p)(C +E[maxfx;X 2 g]). Hence, it is optimal to accept x if xp 1 + (1p)(C +E[maxfx;X 2 g]). 4.2.2 Monotonicity of the Value Function In this subsection, we will show the monotonicity of the value function. First, we prove the intuitive result that the larger the maximum oer so far received, the larger the maximum expected return. Proposition 14 V (x;p) is an increasing function in x. Proof. We will prove it by induction. Consider the nite stage problem rst. Let V n (x;p) be the maximum expected return when the current state is (x;p). Assume 62 there aren oers to go. The optimality equations for the nite stage problem can be written as V n (x;p) = C + max n x; Z V n1 (maxfx;yg;g(y;p))f p (y)dy o ;8x;p (4.4) V 0 (x;p) = C +x (4.5) When n = 0, V 0 (x;p) is increasing in x. Assume V n1 (x;p) is increasing in x and from equation (4.4) we see that V n (x;p) is indeed increasing in x. When n!1, V n (x;p)!V (x;p). Hence V (x;p) is also a increasing function in x. If the distributions have a likelihood ratio order relationship, then the value func- tion is monotone in p. Proposition 15 If X 1 lr X 2 , then V (x;p) is an increasing function in p. The proof is omitted here since it follows the same logic as in the proof of Proposi- tion 5. Similar to the problem without recall, we are unsure if the assumption of a likelihood ratio order in Proposition 15 can be weakened to either assuming a usual stochastic order or a hazard rate order. 4.2.3 Heuristic Methods In this subsection, we propose three heuristic methods to approximate the optimal policy. The expected returns of the heuristic methods will be obtained by simulation. One-Stage Look-ahead Policy From Proposition 12, we know that we can use the one-stage look-ahead policy to determine when to reject. We propose the one-stage look-ahead policy, called policy 1sla , as the rst heuristic solution. 63 Policy The second heuristic policy follows the results directly from Proposition 11, 13, and Corollary 2. Assume we can analytically compute E[maxfx;X 2 g]. Let policy be the second heuristic policy that, when in state (x;p), accepts x ifx 1 , rejects x if x< 2 , and otherwise acceptsx if and only ifxp 1 +(1p)(C +E[maxfx;X 2 g]). Modied Two-Stage Look-ahead Policy If the one-stage look-ahead policy tells us to accept an oer, for the best result, we do not know whether we should accept or not. We propose a heuristic policy called the modied two-stage look-ahead policy to help us further determine when we should accept. We use policy 2sla to represent it. First, if the one-stage look-ahead policy tells us to reject an oer, we reject it. If the one-stage look-ahead policy tells us to accept, we then consider a two-stage nite problem where there are only two oers to go. If the expected return of the two-stage nite problem is no larger than the current oer, we accept the oer. Let V 1 (x;p) be the maximum expected return at state (x;p) when there is only one oer to go and V 2 (x;p) be that when there are two oers to go. The optimality equations of the two-stage dynamic programming problem are V 1 (x;p) = C + max n x;C +E fp [maxfx;Xg] o V 2 (x;p) = C + max n x;E fp [V 1 (maxfx;Xg;g(X;p))] o Assume we can analytically compute V 1 (x;p). We use simulation to estimate E fp [V 1 (maxfx;Xg;g(X;p))]. At state (x;p), policy 2sla stops if and only if x is greater than or equal to 1sla (x;p) and the estimated E fp [V 1 (maxfx;Xg;g(X;p))]. 64 4.2.4 Upper Bound of Optimal Expected Return We use the similar techniques of UB 2 in Section 3.2.4 to obtain an upper bound of the optimal expected return. Let policy be the optimal policy of the problem with recall. We adopt the same notations of R, E [R], E [Rjf 1 ], and E [Rjf 2 ] in Section 3.2.4. Now we are going to obtain upper bounds of E [Rjf 1 ] and E [Rjf 2 ] and the upper bounds are obtained by acting as if we know the true distribution but with some constraints on the policy. Simulations are conducted to obtain the upper bounds. Upper Bound of E [Rjf 1 ] From Proposition 13, we know that when in state (x;p) and 2 x < 1 , policy tells us to accept x if x p 1 + (1p)(C +E[maxfx;X 2 g]). Also, if the true distribution is known to be f 1 , it is optimal to use policy 1 . Using policy can be viewed as using policy 1 , which accepts the rst oer greater than 1 , with a constraint on accepting an oer that is at least p 1 + (1p)(C +E[maxfx;X 2 g]). (We have shown in Section 3.2.4 that with the recall property, we would never accept an oer smaller thanp 1 + (1p)(C +E[maxfx;X 2 g]) even under the constraint.) Hence, when in state (x;p), given thatf 1 is the true distribution, policy results in a better expected return than the policy . Thus, the expected return of the policy given f 1 is an upper bound of E [Rjf 1 ]. Upper Bound of E [Rjf 2 ] From the proof of Corollary 2, we know that when in state (x;p) and x < 2 , x is also less than 1sla (x;p), which implies 2 1sla (x;p). It is obvious that 1sla (x;p) is increasing in x so when x 2 , we also have 2 1sla (x;p). That is, for any (x;p), 2 1sla (x;p). When the true distribution is known to bef 2 , it is optimal to 65 accept x if x 1sla (x;p) since x is also greater than 2 . Also, we know that, when in state (x;p), policy tells us to reject x if x< 1sla (x;p). Hence, given that f 2 is the true distribution, accepting oer x if and only if x 1sla (x) results in a better expected return than the policy . In other words, the expected return of the policy 1sla given f 2 is an upper bound of E [Rjf 2 ]. 4.2.5 Numerical Results We use the same simulation procedures as in the problem without recall. All oers are from Exponential distributions. We conduct 8 dierent cases from the param- eters shown in Table 3.1. Each simulation runs 10 5 replications for a given distri- bution. In the heuristic policy 2sla , we also use simulation to estimate the value of E fp [V 1 (maxfx;Xg;g(X;p))]. At each state, we sample 1000 oers from the dis- tribution f p , called Y 1 ;Y 2 ;:::;Y 1000 . The estimated value is obtained by computing P 1000 i=1 V 1 [(maxfx;Y i g;g(Y i ;p))] 1000 . Table 4.1: Results of the innite-stage Bayesian selling Problem with Recall: expected returns of the heuristics and upper bound of the optimal expected return Case No. R 1sla R 2sla R UB 1 17.858 17.854 17.364 18.052 2 22.812 22.814 21.938 23.113 3 14.707 14.695 12.703 15.379 4 19.152 19.160 16.556 19.894 5 11.981 11.988 9.482 12.869 6 16.361 16.395 14.377 17.055 7 36.160 36.226 31.580 37.668 8 46.300 46.373 43.364 47.315 Table 4.1 shows the simulation results of the three heuristics and the upper bounds: the expected return of the one-stage look-ahead policy, called R 1sla , the expected 66 return of the modied two-stage look-ahead policy, called R 2sla , the expected return of policy , called R , and the upper bound of the optimal expected return, called UB. As can be seen from the table, policy seems not to be a good heuristic policy. However, the expected returns of the policies 1sla and 2sla are very close to the upper bounds. With policy 2sla , we need to conduct an additional simulation to estimate E fp [V 1 (maxfx;Xg;g(X;p))] when in state p. Although policy 1sla not always has the best outcome, it seems to be a more ecient heuristic policy. 67 Chapter 5 Burglar Problem 5.1 Classical Burglar Problem In a classical burglar problem, a burglar accumulates his/her earnings from successful burglaries. If unsuccessful, he/she loses everything and the problem ends. He/she is allowed to retire at any time and keep all the fortune he/she has obtained up to that time. Assume the probability of a successful burglary is q and the loot of a burglary has density f with a nite mean . The objective is to retire with a maximum expected fortune. Dene the state space x to be all the earnings that the burglar has accumulated so far. Let V (x) be the maximum expected return given that x is the current accu- mulated loot. The optimality equation is V (x) = max n x;q Z V (x +y)f(y)dy o , for all x (5.1) We can easily see that V (x) is an increasing function in x. Let B represents the set of states for which stopping is at least as good as continuing for exactly one more period. We have 68 B = x :xq Z (x +y)f(y)dy = x :xqx +q = x :x q 1q The one-stage look-ahead policy is to stop when in B. Let the state be1 if the burglar is caught and V (1) = 0. Since the state can only be increasing, it follows that B is closed, and thus the one-stage look-ahead policy is optimal [24]. That is, the burglar should retire if and only if the accumulated loot is at least q 1q . Let = q 1q . 5.1.1 Unimodality LetR(y) be the expected return using the policy, called policy y , where the burglar retires if and only if the accumulated loot is greater than or equal to y. We want to prove that R(y) is an unimodal function. Proposition 16 R(y) is an unimodal function such that R(y)"y if y (5.2) R(y)#y if y (5.3) Proof. We will prove (5.2) rst. Assume h 2 h 1 , we want to show that R(h 2 )R(h 1 ). If the accumulated loot ish 1 orh 2 , the expected returns are the same for the policies h 1 and h 2 . Thus, we only need to compare the expected return when the accumulated loot is between (h 2 ;h 1 ). Assume the current accumulated loot 69 is v, where h 2 v h 1 . With the policy h 2 , we can see that the return is v. Now we are interested in the expected return with the policy h 1 . Let Z n be the accumulated loot after the burglar attempts n more burglaries. We have Z 0 = v. Let N be a stopping time representing the number of additional burglaries with the policy h 1 , where N = minfi :Z i h 1 g. Let Z n =Z min(N;n) . We will show that Z n is a submartingale. Since 0Z i <Z N for i<N, and 0Z j =Z N for jN, E[jZ n j] = E[Z n ] E[Z N ] q(h 1 +) (because Z N1 <h 1 ) < 1: Also, for n + 1>N, E[Z n+1 jZ 1 ;:::;Z n ] =Z N =Z n . for n + 1N, E[Z n+1 jZ 1 ;:::;Z n ] =q(Z n +)>Z n . The inequality holds because Z n < h 1 = q 1q . Thus, E[Z n+1 jZ 1 ;:::;Z n ] Z n . By the denition of submartingale [26], Z n is a submartingale. When the current accumulated loot is v, the expected return with the policy h 1 is Z N . Since Z n is a submartingale, then E[Z N ]v = E[ P N i=1 (Z i Z i1 )] 0. So using the policy h 1 is better than using the policy h 2 , i.e., R(h 2 ) R(h 1 ). Thus, (5.2) is proved. To prove (5.3), assume h 4 h 3 and we want to show that R(h 4 ) R(h 3 ). Again, we only need to check the expected return when the accumulated loot is 70 between (h 4 ;h 3 ). Assume current accumulated loot is w, where h 4 wh 3 . With the policy h 4 , the return is w. Since w, the optimal policy tells us that retiring at w results better expected return than continuing. The expected return with the policy h 3 will be no larger than w. Hence, R(h 4 )R(h 3 ) and (5.3) is true. 5.1.2 Simulation And Computational Approaches In this subsection, we present the simulation methods to estimate the value function. We further show how to eciently conduct simulation with variance reduction tech- niques. We also present how to analytically compute the value function if f is an exponential distribution. We will use the notation Xg throughout this section to represent that X has a probability density function g. For the classical burglar problem, we cannot analytically solve the value function if f is an arbitrary distribution. Thus, we use simulation to obtain the estimated V . We know that the optimal policy is to retire if and only if the current earning is at least . When in state x, the most straightforward method to estimate the value function V (x) with simulation is as follows: 1. If x, return x and stop. 2. Generate a random number UU(0; 1). If U >q, return 0 and stop. 3. If Uq, generate a random variable Xf. Let x =x +X and go to Step 1. Variance Reduction by Conditioning We provide a more ecient simulation algorithm to estimate the value functionV (x). Instead of determining whether each burglary attempt is successful or not at each state, we rst nd the additional number of consecutively successful burglaries needed 71 when at state x for the burglar to retire under the optimal policy. We then estimate the expected return by conditioning. Note that this method gives us an improved estimator with smaller variance. We now present this approach. Let R be the return under the optimal policy when in state x, i.e., E[R ] = V (x). Let X i represent the loot of the ith successful burglary, where X i f, i 1. Let M denote the additional number of consecutively successful burglar- ies needed, when in state x, for the burglar to retire under the optimal policy, i.e., M = minfn :X 1 +::: +X n xg. The return under the optimal policy, when in state x, given M;X 1 ;X 2 ;::: can be written as R jM;X 1 ;X 2 ;::: = 8 > > < > > : M X i=0 X i +x ; with probability q M 0 ; otherwise Hence, E[R jM;X 1 ;X 2 ;:::] =q M ( M X i=0 X i +x) Becauseq M ( P M i=0 X i +x) is the conditional expected return under the optimal policy given M, X 1 , X 2 ;:::, it has the same mean as R but with a smaller variance (see [25]). Hence, q M ( P M i=1 X i +x) is a better estimator of V (x). When in state x, the simulation algorithm for the conditional expectation estimator is as follows: 1. Generate X 1 , X 2 , ..., X M from distribution f such that M = minfn :X 1 +::: +X n xg. 2. Return q M ( M X i=1 X i +x). 72 The Use of Antithetic Variables We want to see whether we can further improve the estimator from the previous subsection. We will use the antithetic variable approach. Suppose that for any random variableX having probability density functionf and cumulative distribution functionF ,X can be generated with the inverse transform method, i.e.,X =F 1 (U), where U U(0; 1). We can see that q M ( P M i=0 X i +x) is a function of the random variablesX 1 ;X 2 ;::: (M is determined byX 0 i s). If we initially useM random numbers U i ,i = 1;:::;M, to generateq M ( P M i=0 X i +x) by settingX i =F 1 i (U i ), then the second simulation run can be done in the same fashion, but using the random numbers 1U i , i = 1;:::;M. If we need less than M random numbers in the second simulation run, we just discard the unused random numbers. If more than M random numbers are needed, then we will use new random numbers (independent of U i , i = 1;:::;M). Now we need to examine whether we have a smaller variance by using antithetic variables. If one of the X 0 i s is smaller, M will be larger or at least the same. Thus, q M is nonincreasing. Even though the change in P M i=1 X i +x could be increasing (or decreasing), the change is not comparable to that of q M . Hence, generally speaking, q M ( P M i=1 X i +x) will be decreasing. As a result, the antithetic variable approach leads to a variance reduction for most cases (see [25]). Value Function for Exponential Distribution The preceding simulation algorithms are used if f is an arbitrary distribution. If f is an exponential function, then we can compute the value function without using simulation. Now assume f is an exponential density function with rate . We rst show that M 1 has a Poisson distribution. 73 LetfM(t);t 0g be a counting process and let us consider X i to be the time between the (i 1)th and the ith event of this process. Since X 0 i s are independent and identically distributed random variables having density functionf,fM(t);t 0g is a Poisson process. M(t) can be written as M(t) = max ( n : n X i=1 X i t ) = min ( n : n X i=1 X i >t ) 1 We can see thatM(x) = minfn :X 1 +::: +X n >xg 1 =M 1. (Since all the random variables are continuous, here we ignore the case X 1 +::: +X n =x.) Thus, M 1 has a Poisson distribution with rate (x). That is, M 1 + Poisson((x)) and P (M =j) = e (x) ((x)) j1 (j 1)! When in state x, let I = 8 > < > : 1 ; if the next M burglaries are all successful 0 ; otherwise Because of the memoryless property of exponential distribution, the return is I( + W ), where W Exp() and is independent of I. When x<, 74 V (x) = E[I( +W )] = ( + 1 )E[I] = ( + 1 )E[E[IjM]] = ( + 1 )E[q M ] = ( + 1 ) 1 X j=1 q j e (x) ((x)) j1 (j 1)! = q( + 1 )e (x) 1 X j=1 (q(x)) j1 (j 1)! = q( + 1 )e (x)+q(x) Thus, we obtain the value function as follows: V (x) = 8 > < > : q( + 1 )e (x)+q(x) ; if x< x ; if x (5.4) 5.2 Bayesian Burglar Problem Assume that the probability of a successful burglary on each attempt and the returns for each successful burglary are xed but unknown. Under an optimistic point of view, he/she may guess that the probability of a successful burglary is q 1 and the density of his/her loot is f 1 with a nite mean 1 ; under a pessimistic point of view, he/she thinks that the probability of successful isq 2 and the density of his/her loot is f 2 with a nite mean 2 . The initial probabilities as to which is the true case is given. We initially assume that the probability of the case (q 1 ;f 1 ) is p and it is Bayesian updated as the burglar attempts burglaries. 75 Let x be all the earnings that the burglar has accumulated so far. Let V (x;p) be the maximum expected return given that x is the current accumulated loot and p is the current probability of the case (q 1 ;f 1 ). Let g(y;p) be the updated posterior probability of the case (q 1 ;f 1 ) given that y is the most recent successful loot amount and p is the probability of the case (q 1 ;f 1 ) before earning loot y. Thus, g(y;p) = pq 1 f 1 (y) pq 1 f 1 (y) + (1p)q 2 f 2 (y) The optimality equation is V (x;p) = max n x; Z V (x +y;g(y;p)) [pq 1 f 1 (y) + (1p)q 2 f 2 (y)]dy o , for allx andp (5.5) LetH(x;p) be the expected optimal return, at (x;p), if the burglar does not retire. H(x;p) = Z V (x +y;g(y;p)) [pq 1 f 1 (y) + (1p)q 2 f 2 (y)]dy Then we can rewrite (5.5) as V (x;p) = max n x;H(x;p) o 5.2.1 Monotonicity of the Value Function In this subsection we will provide and prove a condition for V (x;p) to be monotone in p. First, the following proposition and lemma hold for any case. 76 Proposition 17 V (x;p) is an increasing function in x. Proof. We will prove it by induction. Consider the nite stage problem rst. Let V n (x;p) be the maximum expected return when the current state is (x;p) and there are at most n burglaries to go (at stage n). The optimality equations for the nite stage problem are V n (x;p) = max n x; Z V n1 (x +y;g(y;p)) [pq 1 f 1 (y) + (1p)q 2 f 2 (y)]dy o (5.6) V 0 (x;p) = x (5.7) When n = 0, V 0 (x;p) is increasing in x. Assume V n1 (x;p) is increasing in x and from equation (5.6) we see that V n (x;p) is indeed increasing in x. When n!1, V n (x;p)!V (x;p). Hence V (x;p) is also a increasing function in x. Lemma 9 g(y;p) is an increasing function in p. Proof. Rearrange g(y;p), g(y;p) = pq 1 f 1 (y) pq 1 f 1 (y) + (1p)q 2 f 2 (y) = 1 1 + (1p)q 2 pq 1 f 2 (y) f 1 (y) (5.8) We can easily see that g(y;p) is an increasing function in p. Next, we are going to show that the value function is monotone in p under some conditions. It is intuitive to think that if in some sense, the case (q 1 ;f 1 ) is better than the case (q 2 ;f 2 ), then the larger the probabilityp, the larger the expected return. We start with the following lemmas. Let X 1 f 1 and X 2 f 2 . 77 Lemma 10 If X 1 lr X 2 , then g(y;p) is an increasing function in y. Proof. By denition, f 2 (y) f 1 (y) is decreasing in y. The argument follows directly from equation (5.8). For given p 1 and p 2 , let Y 1 and Y 2 be the random variables dened as follows: Y 1 = 8 > > > > < > > > > : X 1 with probability p 1 q 1 X 2 with probability (1p 1 )q 2 0 with probability 1p 1 q 1 (1p 1 )q 2 (5.9) Y 2 = 8 > > > > < > > > > : X 1 with probability p 2 q 1 X 2 with probability (1p 2 )q 2 0 with probability 1p 2 q 1 (1p 2 )q 2 (5.10) Lemma 11 If X 1 lr X 2 , q 1 q 2 , and p 1 p 2 , then Y 1 st Y 2 . Proof. Assume X 1 lr X 2 , q 1 q 2 , and p 1 p 2 . Note that X 1 and X 2 are nonnegative random variables. Hence for all a 0, P (Y 1 >a)P (Y 2 >a) = [p 1 q 1 P (X 1 >a) + (1p 1 )q 2 P (X 2 >a)] [p 2 q 1 P (X 1 >a) + (1p 2 )q 2 P (X 2 >a)] = (p 1 q 1 p 2 q 1 )P (X 1 >a) (p 1 q 2 p 2 q 2 )P (X 2 >a) (p 1 q 1 p 2 q 1 )P (X 1 >a) (p 1 q 2 p 2 q 2 )P (X 1 >a) = (p 1 q 1 p 2 q 1 p 1 q 2 +p 2 q 2 )P (X 1 >a) = (p 1 p 2 )(q 1 q 2 )P (X 1 >a) 0 78 The rst inequality holds because X 1 lr X 2 implies X 1 st X 2 , i.e., P (X 1 > a) P (X 2 >a). Proposition 18 If X 1 lr X 2 and q 1 q 2 , then V (x;p) is an increasing function in p. Proof. Consider the nite stage problem and letV n (x;p) be the maximum expected return when the current state is (x;p) and there are at most n burglaries to go. See equations (5.6) and (5.7) for the optimality equations. Since V 0 (x;p) is increasing in p, assume V n1 (x;p) is an increasing function in p. Now we want to show that V n (x;p) is also increasing in p. For a xed x, we dene V n1 (w;p) = 8 > < > : V n1 (w;p) ; if w6=x 0 ; if w =x Assume p 1 p 2 , thus V n (x;p 1 ) = max n x; Z V n1 (x +y;g(y;p 1 )) [p 1 q 1 f 1 (y) + (1p 1 )q 2 f 2 (y)]dy o = max n x; Z V n1 (x +y;g(y;p 1 )) [p 1 q 1 f 1 (y) + (1p 1 )q 2 f 2 (y)]dy o = max n x;E V n1 (x +Y 1 ;g(Y 1 ;p 1 )) o max n x;E V n1 (x +Y 1 ;g(Y 1 ;p 2 )) o where Y 1 is a random variable dened as equation (5.9) and the inequality holds by Lemma 9 and the induction hypothesis. Now, let Z(x;y) =V n1 (x +y;g(y;p 2 )) 79 By Proposition 17, V (x;p) is increasing in x. It can also be applied to the nite stage problem, i.e., V n (x;p) is increasing in x when at stage n. In addition, by Lemma 10 and the induction hypothesis, Z(x;y) is increasing in y. We have V n (x;p 1 ) maxfx;E [Z(x;Y 1 )]g maxfx;E [Z(x;Y 2 )]g = V n (x;p 2 ) where Y 2 is a random variable dened as equation (5.10). The second inequality holds by Lemma 11. Thus, V n (x;p) is an increasing function in p. When n!1, V n (x;p)!V (x;p). Hence V (x;p) is also an increasing function in p. Note that we are unsure if the assumption of a likelihood ratio order in Proposition 18 can be weakened to either assuming a usual stochastic order or a hazard rate order. 5.2.2 Structure of Optimal Policy In this section, we rst show that, for a given p, if it is optimal to retire when the accumulated loot is x, then it is also optimal to retire whenever the accumulated loot is greater than x. We then show some analytical results of the optimal policy structure. We start with the following lemma. Lemma 12 For xed p, V (x;p)x is nonincreasing in x. Proof. Consider the nite stage problem and letV n (x;p) be the maximum expected return when the current state is (x;p) and there are at most n burglaries to go. See equations (5.6) and (5.7) for the optimality equations. It is obvious that V 0 (x;p)x is nonincreasing in x, so assume the same for V n1 (x;p)x. Now, 80 V n (x;p)x = max n 0; Z V n1 (x +y;g(y;p)) [pq 1 f 1 (y) + (1p)q 2 f 2 (y)]dyx o = max n 0; Z [V n1 (x +y;g(y;p)) (x +y)] [pq 1 f 1 (y) + (1p)q 2 f 2 (y)]dy x + Z (x +y) [pq 1 f 1 (y) + (1p)q 2 f 2 (y)]dy o = max n 0; Z [V n1 (x +y;g(y;p)) (x +y)] [pq 1 f 1 (y) + (1p)q 2 f 2 (y)]dy +pq 1 1 + (1p)q 2 2 (1pq 1 (1p)q 2 )x o and the result follows from the induction hypothesis. When n!1, V n (x;p)! V (x;p). Hence for xed p, V (x;p)x is nonincreasing in x. Proposition 19 For xed p, there is a value x(p) such that it is optimal to retire if the current accumulated loot is x and xx(p). Proof. For xedp, assume it is optimal to retire when the accumulated loot isx, i.e., V (x;p) =x. It follows from Lemma 12 that for any x 0 x, we have V (x 0 ;p)x 0 V (x;p)x = 0. Thus, V (x 0 ;p) = x 0 , that is, it is also optimal to retire at x 0 . Let x(p) denotes the smallest accumulated loot, when the current probability of the case (q 1 ;f 1 ) is p, at which it is optimal to retire. That is, x(p) = minfx :V (x;p)x = 0g 81 We want to nd a condition for x(p) to be monotone in p. It is intuitive to think that if the case (q 1 ;f 1 ) is better than the case (q 2 ;f 2 ), then the larger the probability p, the larger the threshold value x(p). Proposition 20 If X 1 lr X 2 and q 1 q 2 , then x(p) is increasing in p. Proof. Assume X 1 lr X 2 and q 1 q 2 . From Proposition 18, we know that V (x;p) is increasing in p. Assume p 1 p 2 and we want to show that x(p 1 ) x(p 2 ). If x(p 1 )<x(p 2 ), we have 0 = V (x(p 1 );p 1 )x(p 1 ) by denition of x(p) V (x(p 1 );p 2 )x(p 1 ) by assumption and Proposition 18 V (x(p 2 );p 2 )x(p 2 ) by Lemma 12 = 0 by denition of x(p) It follows V (x(p 1 );p 2 )x(p 1 ) = 0. Since x(p 1 ) < x(p 2 ), x(p 2 ) is not the smallest value of x satisfying V (x;p 2 )x = 0. Contradiction. Hence x(p 1 )x(p 2 ). From the preceding proof and Proposition 18, we can also see that a sucient condition for x(p) to be increasing in p is V (x;p) to be increasing in p. Corollary 3 If V (x;p) is increasing in p, then x(p) is increasing in p. Now we are going to show the structure of the optimal policy. Let 1 = q 1 (1q 1 ) 1 and 2 = q 2 (1q 2 ) 2 . If (q 1 ;f 1 ) is the real case of the probability of a successful burglary and the loot distribution, we know that it is optimal to retire at which the accumulated loot is at least 1 . If (q 2 ;f 2 ) is the real case, it is optimal to retire at which the accumulated loot is at least 2 . Assume 1 2 . 82 Proposition 21 It is optimal to retire if the accumulated loot is greater than or equal to 1 . Proof. If the burglar is to be told which case is the true one and his/her accumulated loot is at least 1 , no matter which case it is, it is optimal for him to retire. If he/she is not to be told which case is the real one, the expected fortune must be no larger than that when he/she is to be told. Thus, if his/her accumulated loot is greater than or equal to 1 , he/she should retire. Consider the case that he/she chooses to go exactly one more stage (one more burglary) and then stop (retire). At state (x;p), the expected return is R (x + y) [pq 1 f 1 (y) + (1p)q 2 f 2 (y)]dy =pq 1 x + (1p)q 2 x +pq 1 1 + (1p)q 2 2 . The policy that the burglar retires if and only if xpq 1 x + (1p)q 2 x +pq 1 1 + (1p)q 2 2 is called the one-stage look-ahead policy. That is, x pq 1 1 + (1p)q 2 2 1pq 1 (1p)q 2 . Let 1sla (p) = pq 1 1 + (1p)q 2 2 1pq 1 (1p)q 2 (5.11) The following proposition shows the relationship between the optimal policy and the one-stage look-ahead policy. Proposition 22 When in state (x;p), the burglar should continue if x< 1sla (p). Proof. When in state (x;p) andx< 1sla (p), the problem can be viewed that retiring now with total earnings x is no better than attempt one more burglary and retire at the next stage. Thus, for the optimal result the burglar would continue. 83 Proposition 23 For any p, 2 1sla (p) 1 . Proof. If the accumulated loot isx andx< 2 , we know thatx< 1 as well, which can be represented as x<q for=""> 1 , we can also see that x> 1sla (p), which implies 1sla (p) 1 . Corollary 4 It is optimal to continue if the accumulated loot is less than 2 . Proof. By Propositions 22 and 23, the burglar should continue. Corollary 5 If 1 = 2 , it is optimal to retire if and only if the accumulated loot is greater than or equal to 1 . Proof. It follows directly from Proposition 22 and Corollary 4. Remark 1 If 1 = 2 , then 1sla (p) = 1 = 2 , for all p. 5.2.3 Dynamic Heuristic Policies In this subsection, we propose two dynamic heuristic methods to approximate the optimal policy. The expected returns of the heuristic methods will be obtained by simulation. We will describe the detail simulation processes in Section 5.2.6. Algo- rithms for each policy can be found in Appendix A. One-Stage Look-ahead Policy From Proposition 22, we know that the burglar should continue if the one-stage look- ahead policy asks him to continue. We propose the one-stage look-ahead policy, called 84 policy 1sla , as our rst heuristic solution. That is, the burglar retires if and only if x 1sla (p). From Proposition 23, we can see that the one-stage look-ahead policy also satises the optimal condition in Proposition 21. Policy mix We know that 1 and 2 are the optimal solutions of the problems when the prob- ability of a successful burglary and the loot distribution are (q 1 ;f 1 ) and (q 2 ;f 2 ), re- spectively. It is intuitive to think that the mixture of the two threshold values might be a good threshold value for our problem. If the probability of the case (q 1 ;f 1 ) isp, let p = p 1 + (1p) 2 = pq 1 1 1q 1 + (1p)q 2 2 1q 2 . From Propositions 21 and 22, we know that, when in state (x;p), it is optimal to continue if x< 1sla (p) and to retire if x 1 . Thus, our second heuristic policy, called policy mix , is: when in state (x;p), the burglar continues ifx< 1sla (p), retires ifx 1 , otherwise he/she retires if and only if x p . Remark 2 If 1 = 2 , then 1sla (p) = p = 1 = 2 , for all p. That is, The one-stage look-ahead policy and the policy mix are the same. 5.2.4 Static Heuristic Policies In the previous subsection, we presented the dynamic heuristic policies which took into account of the updated probabilities. In this subsection, we present some static heuristic policies for which the decision is made only using the initial probabilities. Best Constant Threshold Value We present a static heuristic policy which calls for retiring when the accumulated loot is greater than or equal to a constant value y, where y depends on the initial probabilities. We aim to pick y to maximize the expected return and the expected 85 return is obtained by conditioning on which case is the true one with the initial probabilities. Consider a policy y that the burglar retires when the accumulated loot is greater than y. Let W y (p) be the expected return from this policy when the probability of case (q 1 ;f 1 ) is p. Let R 1 (y) be the expected return from the policy y if the case is known to be (q 1 ;f 1 ) and R 2 (y) be that if the case is known to be (q 2 ;f 2 ). Thus, W y (p) =pR 1 (y) + (1p)R 2 (y) Let W (p) be the maximal value among all W y (p), i.e., W (p) = max y W y (p). If the initial probability of case (q 1 ;f 1 ) isp 0 , lety be the value thatW (p 0 ) =W y (p 0 ). We use policy y as one of our static heuristic policies. Best Number of Burglary Attempts Let us consider a policy that the burglar will conduct n burglaries and then retire. Let the expected return of this policy be R n (p) when the probability of case (q 1 ;f 1 ) is p. We have R n (p) =pq n 1 nE[X 1 ] + (1p)q n 2 nE[X 2 ]; where X 1 f 1 and X 2 f 2 . Let R (p) be the maximal value among all R n (p). Let n be the optimal number of burglary attempts when the initial probability of case (q 1 ;f 1 ) is p 0 , i.e., R (p 0 ) = R n (p 0 ). The second static heuristic policy is that the burglar retires after the n th burglaries. Note that n does not depend on the form of the distributions f 1 and f 2 , but only on the mean values of f 1 and f 2 . When n becomes large enough,R n (p) decreases inn. Thus, we can just enumeraten and nd the optimal n . 86 5.2.5 Upper Bound of Optimal Expected Return To estimate how good the performances of our heuristic policies are, we need to compare the expected returns of our heuristic policies with some benchmark value. We consider an upper bound of the optimal expected return as a benchmark. In this section, we present approaches to obtain some upper bounds of the optimal expected return. The most intuitive upper bound of the optimal expected returns can be obtained by acting as if we know the true case at the beginning and then use the optimal policy of the known one-distribution problem. Let V (x; 1) = V f 1 (x) and V (x; 0) = V f 2 (x). When we are to be told which one is the true distribution at the beginning, if the initial probability of case (q 1 ;f 1 ) isp 0 , the best we can do isp 0 V f 1 (0) + (1p 0 )V f 2 (0). Letp 0 V f 1 (0)+(1p 0 )V f 2 (0) =V 0 (p 0 ). Thus,V 0 (p 0 ) is an upper bound of the optimal expected return and it can be computed with the methods in Section 5.1.2. However, acting as if we know the true case gives us too much information. We want to nd a tighter upper bound than V 0 (p). From Propositions 21 and 22, when at state (x;p), we know that it is optimal for the burglar to continue if x< 1sla (p) and to retire if x 1 . If we use the optimal policy when x < 1sla (p) (continue) and x 1 (retire) and then use some value greater than V (x;p) as return when 1sla (p) x < 1 , we can obtain an upper bound of the optimal expected return. Now we explain how to obtain the value. If 1sla (p) x < 1 , consider a condition that the burglar is to be told which one is the true case now (instead of knowing the true case at the beginning). The expected return with this condition must be at least as large as when this distribution information is not given, i.e., at leastV (x;p). Sincex 1sla (p), thenx 2 . Thus, 87 the best we can do with the additional information ispV f 1 (x)+(1p)x. Therefore, to obtain the upper bound of the optimal expected return, we use a policy called policy UB : we rst use the one-stage look-ahead policy as long as it tells the burglar to continue. Then when the rst time the one-stage look-ahead policy tells the burglar to retire, if at state (x;p), we set the terminal reward aspV f 1 (x) + (1p)x and stop. (Note that if x 1 , pV f 1 (x) + (1p)x = x, which also means to retire. Thus, we do not need to check whether x 1 ). V f 1 (x) can be obtained with the methods in Section 5.1.2 by simulation or computation. We use simulation to obtain the expected return of policy UB as an upper bound of the optimal expected return. See Appendix A for the algorithm. Next, we provide another approach to obtain an upper bound of the optimal expected return. Instead of nding the upper bound directly, we obtain it by con- ditioning on which one is the true distribution. Let policy be the optimal policy of the burglar problem. Let R be the net return and E [R] be the expected return under the optimal policy. DeneE [Rj(q 1 ;f 1 )] andE [Rj(q 2 ;f 2 )] to be the expected returns under policy given that the true case is (q 1 ;f 1 ) and (q 2 ;f 2 ), respectively. That is, E [R] =pE [Rj(q 1 ;f 1 )] + (1p)E [Rj(q 2 ;f 2 )]; where p is the initial probability of the case (q 1 ;f 1 ). We rst obtain the bounds of E [Rj(q 1 ;f 1 )] and E [Rj(q 2 ;f 2 )] separately and then compute an upper bound of E [R] by conditioning. In this case, simulations are also conducted to obtain the upper bounds. We now explain the approach to obtain the bounds. We start with the upper bound of E [Rj(q 2 ;f 2 )]. The upper bound is obtained by acting as if we know the true distribution but with some constraints on the policy. 88 From Proposition 23, we know that 2 1sla (p). Thus, when the true case is known to be (q 2 ;f 2 ), it is optimal to retire atx ifx 1sla (p) (sincex is also 2 ). Also, we know that, when in state (x;p), policy tells us to continue if x< 1sla (p). Hence, given that (q 2 ;f 2 ) is the true case, retiring at x if and only if x 1sla (p) results in a better expected return than the policy . In other words, E 1sla [Rj(q 2 ;f 2 )], the expected return of the policy 1sla given that (q 2 ;f 2 ) is the true case, is an upper bound of E [Rj(q 2 ;f 2 )]. Now we are going to nd the upper bound of E [Rj(q 1 ;f 1 )]. Assume random variables X 1 and X 2 have the probability density functions f 1 and f 2 , respectively. We consider a condition that the burglar will be told which is the true case if he/she chooses to continue. Let (x;p) be the maximum expected return, at state (x;p), if the burglar continues and will then be told which is the true case. Thus, at state (x;p), the best he/she can do in this condition is maxfx;(x;p)g, where (x;p) =pq 1 E[V f 1 (x +X 1 )] + (1p)q 2 E[V f 2 (x +X 2 )] (5.12) The expected return when he/she is to be given additional information after one more period must be at least as large as it would be when this information is not to be given, so(x;p)H(x;p). Since it is optimal to retire ifxH(x;p), policy also tells the burglar to retire if x(x;p). Let policy be the policy that, when in state (x;p), the burglar continues if x < 1sla (p), retires if x 1 , otherwise he/she retires if and only if x (x;p). Dene E [Rj(q 1 ;f 1 )] to be the expected returns under policy given that the true case is (q 1 ;f 1 ). Now we want to check whether E [Rj(q 1 ;f 1 )] is an upper bound of 89 E [Rj(q 1 ;f 1 )]. If x 1 and given that (q 1 ;f 1 ) is the true case, when in state (x;p), since both policies and tell you to retire, the returns are the same. If the current accumulated loot is 1 , it is optimal to retire and the best the burglar can earn is 1 . Thus, ( 1 ;p) 1 . We can also easily see that (x;p) is increasing inx. So ifx< 1 , then(x;p) is no more than 1 , i.e.,H(x;p)(x;p) 1 . From Proposition 16, we know that if the distribution is known with the optimal threshold value , for y 2 y 1 , the expected return using the policy with threshold value y 1 is larger than that with y 2 . However, here (x;p) and H(x;p) are not constant values but rather functions of x and p. We cannot be sure that given that (q 1 ;f 1 ) is the true case and x < 1 , policy results in a better expected return than the policy . Thus, E [Rj(q 1 ;f 1 )] might not be an upper bound of E [Rj(q 1 ;f 1 )] and we will leave this as a conjecture for now. Conjecture 1 E [Rj(q 1 ;f 1 )]E [Rj(q 1 ;f 1 )]. Later in the numerical study section, we use E [Rj(q 1 ;f 1 )] as a conjectured upper bound ofE [Rj(q 1 ;f 1 )], along withE 1sla [Rj(q 2 ;f 2 )] to compute a conjectured upper bound of E [R]. We call this conjectured upper bound UB Conj . 5.2.6 Numerical Study In this section, we present our numerical study. We rst introduce the methodology and the parameter settings used in the numerical experiments. We then present the results and a detailed analysis of the results. Experiments Let R be the return. To estimate the expected return with a policy , called E [R], we use simulation with stratied sampling [25]. Let random variable X be the loot 90 of a burglary andp be the original probability of the case (q 1 ;f 1 ). Note thatE [R] = pE [Rj(q 1 ;f 1 )] + (1p)E [Rj(q 2 ;f 2 )]. Assume we want to generate n replications of R under a certain policy, we will do np replications on the simulations conditional on X f 1 and n(1p) replications on the simulations conditional on X f 2 . If we let R 1 be the average of thenp observed values ofR generated conditional on the case (q 1 ;f 1 ) and R 2 be the average of the n(1p) observed values of R generated conditional on the case (q 2 ;f 2 ), then the stratied sampling estimator of E [R] is p R 1 + (1p) R 2 . We use simulation with the stratied sampling technique to obtain the expected returns of the heuristic methods from dierent scenarios. To show the performance of our heuristic methods, we compare the results of the heuristics with corresponding upper bounds of the optimal expected returns. Some of the upper bounds are also obtained by simulation. To compare the dierences between the heuristic policies and the upper bounds, we use common random numbers when conducting the simulations. That is, we use the same set of random numbers for the simulations of dierent policies. For each policy, we run n = 2 10 5 replications to estimate the expected return. See Appendix A for the simulation algorithms of the heuristics and upper bounds. In our numerical experiments, we rst assume that f 1 andf 2 are exponential dis- tributions, wheref 1 has mean 1= 1 andf 2 has mean 1= 2 . We experiment on several scenarios with various parameter sets. See Table 5.1 for the parameter settings. We use (q 1 ; 1 ) and (q 2 ; 2 ) to represent the two possible combinations of the probability of a successful burglary and the density of each loot. For (q 1 ; 1 ), we test three dier- ent sets: (0.2, 0.05), (0.5, 0.05), and (0.8, 0.05). In each scenario, we x (q 1 ; 1 ) as one of the three sets and change (q 2 ; 2 ) from q 2 = 0:1; 0:2;:::; 0:9 and 2 = 0:2; 0:1; 0:05. 91 Thus, f 1 has mean 20 and f 2 has mean 5, 10, or 20. The threshold values 1 and 2 are shown in the table for reference. Note that 1 is not necessary greater than or equal to 2 here. We adapt the analytical results in Section 5.2.2 and conduct the experiments accordingly. Let p 0 be the initial probability of case (q 1 ;f 1 ), and we use p 0 = 0:5 for each scenario. Thus, for each scenario, we run np 0 = 10 5 replications on the simulation whenXf 1 andn(1p 0 ) = 10 5 replications on the simulation when Xf 2 . Table 5.1: Parameter settings for the Bayesian burglar problem - Exponential distri- bution q 1 1 1 q 2 2 2 q 2 2 2 q 2 2 2 0.2 0.05 5 0.1 0.2 0.56 0.1 0.1 1.11 0.1 0.05 2.22 0.5 0.05 20 0.2 0.2 1.25 0.2 0.1 2.5 0.2 0.05 5 0.8 0.05 80 0.3 0.2 2.14 0.3 0.1 4.29 0.3 0.05 8.57 0.4 0.2 3.33 0.4 0.1 6.67 0.4 0.05 13.33 0.5 0.2 5 0.5 0.1 10 0.5 0.05 20 0.6 0.2 7.5 0.6 0.1 15 0.6 0.05 30 0.7 0.2 11.67 0.7 0.1 23.33 0.7 0.05 46.67 0.8 0.2 20 0.8 0.1 40 0.8 0.05 80 0.9 0.2 45 0.9 0.1 90 0.9 0.05 180 Tables 5.2, 5.3, and 5.4 show the results for the scenarios of [ 1 = 0:05, 2 = 0:2], [ 1 = 0:05, 2 = 0:1], and [ 1 = 0:05, 2 = 0:05], respectively, with dierent q 1 and q 2 settings. We introduce the notation used in the tables as follows: 92 R 1sla : the expected return of the one-stage look-ahead policy R mix : the expected return of policy mix W (p 0 ) : the maximal expected return of all constant threshold policies with the initial probability p 0 W p 0 (p 0 ) : the expected return of the policy that retires when the accumulated loot is greater than p 0 1 + (1p 0 ) 2 with the initial probability p 0 R (p 0 ) : the maximal expected return among all policies that retire after a xed number of burglaries with the initial probability p 0 UB : the expected return of policy UB UB Conj : the conjectured upper bound obtained by acting as if we know the true distribution with some constraints on the optimal policy V 0 (p 0 ) : the value of p 0 V f 1 (0) + (1p 0 )V f 2 (0) The expected returns of the dynamic heuristic policies are obtained by simulation. As can be seen from the tables, the dierences between the values ofR 1sla andR mix are insignicant. It seems that the one-stage look-ahead policy performs slightly better than the policy mix in most scenarios. Also, the algorithm for the one-stage look-ahead policy has less complexity. Thus, we would prefer to use the one-stage look-ahead heuristic policy. Next, we compare our dynamic heuristic policies with the static heuristic policies. The results of the static heuristic policies stated in Section 5.2.4 are shown in the tables. The rst reference shown in the tables isW (p 0 ), which is the expected return of the best threshold (y ) policy. We also show the values of W p 0 (p 0 ) as reference, 93 Table 5.2: Results of the Bayesian burglar problem - Exponential Distribution: ex- pected returns of the heuristics, expected returns of some reference policies, and upper bounds of the optimal expected return. 1 = 0:05, 2 = 0:2, and p 0 = 0:5 q 1 q 2 R 1sla R mix W (p 0 ) W p 0 (p 0 ) R (p 0 ) UB UB Conj V 0 (p 0 ) 0.2 0.1 2.268 2.268 2.275 2.274 2.250 2.281 2.268 2.298 0.2 0.2 2.541 2.541 2.539 2.534 2.500 2.555 2.542 2.559 0.2 0.3 2.831 2.831 2.829 2.823 2.750 2.839 2.831 2.841 0.2 0.4 3.165 3.165 3.160 3.158 3.000 3.168 3.165 3.164 0.2 0.5 3.590 3.590 3.563 3.563 3.250 3.590 3.590 3.563 0.2 0.6 4.119 4.119 4.097 4.091 3.500 4.121 4.119 4.105 0.2 0.7 4.895 4.895 4.893 4.861 3.750 4.908 4.898 4.944 0.2 0.8 6.388 6.366 6.217 6.217 4.160 6.447 6.408 6.540 0.2 0.9 10.763 10.734 10.318 9.844 8.717 11.023 10.910 11.195 0.5 0.1 6.145 6.141 6.103 5.974 5.250 6.149 6.148 6.317 0.5 0.2 6.208 6.191 6.196 6.156 5.500 6.260 6.251 6.577 0.5 0.3 6.514 6.500 6.402 6.401 5.750 6.641 6.615 6.859 0.5 0.4 6.774 6.772 6.745 6.736 6.000 6.908 6.859 7.183 0.5 0.5 7.249 7.249 7.221 7.198 6.250 7.362 7.292 7.582 0.5 0.6 7.927 7.927 7.880 7.855 6.800 8.003 7.943 8.123 0.5 0.7 8.933 8.933 8.862 8.850 7.450 8.961 8.944 8.962 0.5 0.8 10.570 10.570 10.559 10.559 8.200 10.570 10.570 10.559 0.5 0.9 15.021 14.998 14.673 14.634 9.218 15.115 15.068 15.213 0.8 0.1 17.986 17.986 17.973 16.119 16.385 17.986 17.986 18.225 0.8 0.2 17.984 17.984 17.973 16.161 16.400 17.984 17.984 18.485 0.8 0.3 18.101 18.101 17.973 16.222 16.465 18.102 18.101 18.767 0.8 0.4 18.051 18.031 17.974 16.324 16.640 18.098 18.086 19.090 0.8 0.5 18.188 18.147 17.980 16.514 17.009 18.390 18.374 19.489 0.8 0.6 18.782 18.696 18.018 16.906 17.680 19.104 19.090 20.031 0.8 0.7 19.808 19.795 18.272 17.789 18.785 20.172 20.164 20.870 0.8 0.8 21.821 21.821 19.972 19.960 20.480 22.040 22.015 22.466 0.8 0.9 26.994 26.994 26.373 26.366 23.765 27.035 27.031 27.121 94 Table 5.3: Results of the Bayesian burglar problem - Exponential Distribution: ex- pected returns of the heuristics, expected returns of some reference policies, and upper bounds of the optimal expected return. 1 = 0:05, 2 = 0:1, and p 0 = 0:5 q 1 q 2 R 1sla R mix W (p) W p 0 (p 0 ) R (p 0 ) UB UB Conj V 0 (p 0 ) 0.2 0.1 2.557 2.556 2.536 2.536 2.500 2.563 2.557 2.550 0.2 0.2 3.068 3.068 3.063 3.063 3.000 3.072 3.067 3.070 0.2 0.3 3.643 3.643 3.634 3.634 3.500 3.644 3.643 3.634 0.2 0.4 4.289 4.288 4.278 4.277 4.000 4.290 4.289 4.281 0.2 0.5 5.060 5.060 5.052 5.044 4.500 5.067 5.060 5.079 0.2 0.6 6.082 6.086 6.033 6.033 5.000 6.108 6.086 6.163 0.2 0.7 7.664 7.640 7.564 7.468 5.700 7.741 7.687 7.840 0.2 0.8 10.465 10.395 10.306 10.017 8.256 10.679 10.567 11.033 0.2 0.9 19.049 19.109 18.623 17.101 17.434 19.739 19.672 20.342 0.5 0.1 6.385 6.388 6.341 6.265 5.500 6.419 6.402 6.568 0.5 0.2 6.801 6.782 6.774 6.761 6.000 6.896 6.832 7.089 0.5 0.3 7.330 7.324 7.351 7.351 6.500 7.455 7.370 7.653 0.5 0.4 8.085 8.080 8.070 8.068 7.000 8.196 8.112 8.300 0.5 0.5 8.980 8.980 8.969 8.966 7.500 9.044 8.991 9.098 0.5 0.6 10.188 10.188 10.151 10.150 8.600 10.203 10.190 10.181 0.5 0.7 11.859 11.863 11.847 11.846 9.900 11.863 11.859 11.859 0.5 0.8 14.875 14.855 14.700 14.686 11.430 14.993 14.918 15.052 0.5 0.9 22.888 22.678 21.805 21.617 17.610 23.672 23.541 24.361 0.8 0.1 18.015 18.012 17.977 16.212 16.386 18.015 18.012 18.476 0.8 0.2 18.019 18.014 17.988 16.408 16.416 18.020 18.019 18.997 0.8 0.3 18.077 18.066 18.026 16.718 16.546 18.081 18.076 19.561 0.8 0.4 18.124 18.141 18.139 17.217 16.896 18.145 18.169 20.208 0.8 0.5 18.608 18.523 18.484 18.028 17.634 18.833 18.833 21.006 0.8 0.6 19.762 19.646 19.474 19.371 18.976 20.362 20.373 22.089 0.8 0.7 22.052 22.013 21.684 21.681 21.186 22.678 22.689 23.767 0.8 0.8 26.306 26.306 25.998 25.995 24.576 26.597 26.606 26.960 0.8 0.9 36.227 36.210 36.223 36.223 31.672 36.239 36.219 36.269 95 Table 5.4: Results of the Bayesian burglar problem - Exponential Distribution: ex- pected returns of the heuristics, expected returns of some reference policies, and upper bounds of the optimal expected return. 1 = 0:05, 2 = 0:05, and p 0 = 0:5 q 1 q 2 R 1sla R mix W (p 0 ) W p 0 (p 0 ) R (p 0 ) UB UB Conj V 0 (p 0 ) 0.2 0.1 3.080 3.081 3.047 3.047 3.000 3.082 3.080 3.052 0.2 0.2 4.108 4.108 4.094 4.094 4.000 4.108 4.108 4.094 0.2 0.3 5.221 5.222 5.211 5.210 5.000 5.226 5.222 5.222 0.2 0.4 6.503 6.506 6.456 6.452 6.000 6.524 6.505 6.516 0.2 0.5 7.969 7.949 7.935 7.916 7.000 8.024 7.976 8.112 0.2 0.6 9.877 9.849 9.858 9.790 8.000 9.990 9.903 10.279 0.2 0.7 12.893 12.833 12.750 12.519 10.600 13.098 12.988 13.634 0.2 0.8 18.467 18.417 18.429 17.486 16.448 18.798 18.834 20.020 0.2 0.9 36.407 36.552 36.606 32.157 34.868 36.809 36.842 38.638 0.5 0.1 6.896 6.893 6.909 6.835 6.000 6.928 6.905 7.071 0.5 0.2 7.947 7.938 7.935 7.916 7.000 8.001 7.958 8.112 0.5 0.3 9.106 9.108 9.120 9.116 8.000 9.154 9.116 9.240 0.5 0.4 10.480 10.484 10.491 10.491 9.000 10.499 10.485 10.534 0.5 0.5 12.172 12.172 12.131 12.131 10.000 12.172 12.172 12.131 0.5 0.6 14.206 14.209 14.210 14.210 12.200 14.246 14.209 14.297 0.5 0.7 17.054 17.053 17.120 17.117 14.800 17.272 17.168 17.652 0.5 0.8 22.132 21.920 22.050 21.997 19.110 22.798 22.810 24.038 0.5 0.9 37.458 37.118 37.231 35.215 35.044 38.741 38.767 42.657 0.8 0.1 18.238 18.187 18.117 16.685 16.388 18.347 18.312 18.979 0.8 0.2 18.609 18.529 18.429 17.486 16.448 18.938 18.949 20.020 0.8 0.3 19.319 19.147 19.060 18.560 16.708 19.842 19.810 21.148 0.8 0.4 20.365 20.110 20.208 20.010 17.408 20.999 20.971 22.442 0.8 0.5 22.129 21.873 22.050 21.997 19.110 22.793 22.818 24.038 0.8 0.6 24.850 24.776 24.806 24.798 21.840 25.397 25.421 26.205 0.8 0.7 28.882 28.846 28.974 28.974 25.988 29.145 29.063 29.560 0.8 0.8 36.098 36.098 35.946 35.946 32.768 36.098 36.098 35.946 0.8 0.9 51.612 51.363 51.594 51.590 48.161 52.827 52.879 54.564 96 where p 0 = p 0 1 + (1p 0 ) 2 . (See Appendix B for the calculation of W y (p) and W (p) with Exponential distribution). It represents the expected return of the static policy that using p 0 as threshold value. Since W (p 0 ) uses the best threshold value, W p 0 (p 0 ) W (p 0 ). As can be seen from the tables, the larger the gap between 1 and 2 , the larger the dierence between W p 0 (p 0 ) and W (p 0 ). The expected return of retiring after the best number of burglary attempts, R (p 0 ), is also shown in the tables. The weakness of these static policies is that we do not use Bayesian updating, i.e., the expected returns do not depends on the posterior probability p. We believe that the dynamic policies are better than the static ones. Our numerical results show thatR 1sla andR mix are larger thanR (p 0 ) for all scenarios. Also, in most scenarios, R 1sla and R mix are larger than W (p 0 ). Even if for those few scenarios that W (p 0 ) are larger, the dierences are very small (less than 1% percentage dierence). Thus, generally speaking, the dynamic heuristic policies are no worse than the static ones. (However, we cannot conclude from our numerical results that the dynamic policies are better than the static ones). We may not always have smaller expected returns from the non-Bayesian updating policies, especially for the extreme cases that one of the q 1 or q 2 is small or large. When the two possible combinations of the probability of a successful burglary and the density of each loot are very dierent from each other, e.g., largeq 1 with a large-meanf 1 versus smallq 2 with a small-meanf 2 , our dynamic heuristics have signicant better performance. We also present the upper bounds of the optimal expected returns in Table 5.2, 5.3, and 5.4. The rst upper bound shown in the tables, called UB, is obtained by returningpV f 1 (x) + (1p)x if the accumulated lootx is larger than 1sla (p) when at state (x;p). We use simulation to obtained UB (see Appendix A for the algorithm). 97 Next to UB is a conjectured upper bound of the optimal expected return obtained by conditioning on which one is the true distribution, called UB Conj . The values of UB Conj are also obtained by simulation. We useE [Rj(q 1 ;f 1 )] as a conjectured upper bound ofE [Rj(q 1 ;f 1 )] and useE 1sla [Rj(q 2 ;f 2 )] as an upper bound ofE [Rj(q 2 ;f 2 )]. For the simulation of the upper bound given (q 1 ;f 1 ), we need to employ the policy . In order to do so, we need to know the value of (x;p) when at state (x;p). See Appendix B for (x;p) function of the problem with exponential distribution. From our numerical results,UB Conj is smaller thanUB for most scenarios. We believe that UB Conj provides a tighter upper bound than UB. We also show the values of V 0 (p 0 ) as references, where V 0 (p 0 ) = p 0 V f 1 (0) + (1 p 0 )V f 2 (0), i.e., the best we can do if we are to be told the true distribution at the beginning. These are the larger upper bounds as described in Section 5.2.5 and can be obtained with the methods described in Section 5.1.2. It is intuitive that V 0 (p 0 ) is larger than our UB and UB Conj because V 0 (p 0 ) has more information. For some scenarios, we have smallerV 0 (p 0 ) thanUB andUB Conj . We believe that the discrep- ancy might be due to simulation. The dierences are insignicant (the percentage dierences are less than 0.1%), thus they are neglectable. As can be seen from the tables, our heuristics are very close to the upper bounds of the optimal expected returns (the percentage dierence in the expected return is generally less than 1%). In other words, they are even closer to the optimal expected returns and thus we can conclude that our heuristics performs very well. Summary We summarize the numerical study as follows: We prefer the one-stage look-ahead policy to the policy mix . 98 The larger the gap between 1 and 2 , the larger the dierence betweenW p 0 (p 0 ) and W (p 0 ). The dynamic heuristic policies are generally no worse than the static ones. We believe that V 0 (p 0 )UBUB Conj . 99 Chapter 6 Conclusions In this dissertation, we present some optimal stopping problems, where the decision maker has to choose a time to stop based on sequentially observed random variables. We address the problems with the uncertainty assumption that the random variables are from one of some possible distributions with given initial probabilities as to which is the true distribution. The probabilities are then updated in a Bayesian manner as the successive random variables are observed. We rst introduce the Bayesian selling problem, which is an asset-selling problem when the oers are from one of the two possible probability density functions. We have presented two models when no recall past oers allowed. In the rst model, after receiving an oer, we have to decide whether to accept the oer or to reject it. We show the structure of the optimal policy and prove the monotonicity of the value function when the two distributions have likelihood ratio order relationship. We then propose some heuristic policies to approximate the optimal expected return. Some upper bounds of the optimal expected return are obtained to test the performance of the heuristic policies. Simulations are conducted to obtain the expected returns of some heuristics and upper bounds. The numerical study shows that using a mixture of the threshold values when at state p, p 1 + (1p) 2 , is a good and ecient heuristic threshold value. The upper bound obtained by acting as if we know the true distribution with some constraints 100 on the optimal policy gives us the tightest bound of the optimal expected return. The expected returns of the heuristics are very close to the optimal expected return by comparing with the upper bound. We also test and verify some arguments from the numerical study: A heuristic policy performs better if it satises the optimal conditions; using policy improvement technique gives us better results than not using it; the dynamic heuristic policies are generally better than the static ones. We also present a nite-stage version of the Bayesian selling problem, in which we can compute the optimal expected return numerically. We use the optimal value to evaluate the performance of the heuristic policies and the results shows that our heuristic policy does perform well. We expand the problem to a generalized Bayesian selling problem, where there are n possible distributions. We verify the analytical results from the two-distribution problem and show that the structure of the optimal policy is still valid for the generalized n-distribution problem. In the second model of the Bayesian selling problem without recall, we present an application of eBay-like auctions on deciding the reserve price before oers come in. The optimal expected return is obtained by linear programming and compared with the upper bound of the optimal expected return from the rst model. The numerical study shows that assigning the reserve price before oers come in is an ecient method. We then study the Bayesian selling problem with recall allowed. The character- istics of the optimal policy and the value function are studied. We present some heuristic policies and obtain an upper bound of the optimal expected return to eval- uate the heuristics. The numerical study shows that the one-stage look-ahead policy is a good solution. 101 We study another optimal stopping problem called the burglar problem, in which a burglar who plans a series of burglaries has to decide when to retire with a maximum expected accumulated loot before being caught. We address the problem with an uncertainty assumption that we do not know the probability of a successful burglary and the loot distribution, but we have the information that they are from one of the two given combinations with initial probabilities. We call such a problem Bayesian burglar problem. We prove the structural results of the optimal policy and propose some dynamic and static heuristic policies. We show how to obtain the upper bound of the optimal expected return and evaluate the performance of the heuristics. The numerical study suggests that the one-stage look-ahead policy is a good and ecient policy and the dynamic heuristic policies are generally no worse than the static ones. This dissertation can be expanded to several future research topics. So far we only deal with the asset-selling problem that there is only one item to sell. We may be also interested in the multiple selling problem, for example, k items to sell. Various problems with multiple items to sell have already appeared in the literature. Bruss and Ferguson [3] consider the problem when the oers are k-dimensional random vectors having a known distribution. Lippman, Ross, and Seshadri [19] consider the problem that once an oer is made, it is either rejected or marked for acceptance and the items are sold at a price equal to the minimum of the k marked oers. We want to extend our model to a problem of multiple assets to sell. In this case, the problem under oer distribution uncertainty will no longer just choose the best oer but the best k oers. Another possible direction for future research is to consider problems with dif- ferent objectives. For example, we are also interested in applying our uncertainty assumption to another asset-selling problem: Consider a nite problem that there 102 is only at most N oers and N is known. Instead of maximizing the expected re- turn, we are interested in maximizing the probability of selecting the best oer. (It can be viewed as a variant of the classical secretary problem, in which the objective is to maximize the probability of selecting the best applicants). Assume the oer distribution is xed but unknown from one of some possible distributions. In the eBay problem, we apply the idea of asset-selling problem into online bidding mechanism. We solve the problem mathematically and nd the optimal reserve price. We are also interested in verify the model with real life eBay bidding system. In the eBay mechanism, the cost for observing the next oer is no longer a xed amount but is based on what your reserve price is. We may need to adjust our model according to this feature. 103 Bibliography [1] S. C. Albright, A bayesian approach to a generalized house selling problem, Man- agement Science 24 (1977), no. 4. [2] M. Babaio, N. Immorlica, D. Kempe, and R. Kleinberg, Online auctions and generalized secretary problems, Statistical Science 4 (1989), no. 3, 282{289. [3] F. T. Bruss and T. S. Ferguson, Multiple buying or selling with vector oers, Journal of Applied Probability 34 (1997), no. 4, 959{973. [4] Y. S. Chow and H. Robbins, A martingale system theorem and applications, Fourth Berkeley Symposium on Mathematical Statistics and Probability 1 (1960), 93{104. [5] , On optimal stopping rules, Z. Wahrscheinlichkeitstheorie verw. Gebiete 2 (1963), 33{49. [6] Y. S. Chow, H. Robbins, and D. Siegmund, Great expectations: Theory of optimal stopping, Houghton Miin, 1971. [7] M. H. DeGroot, Some problems of optimal stopping, Journal of the Royal Sta- tistical Society 30 (1968), no. 1, 108{122. [8] , Optimal statistical decisions, McGraw-Hill, 1970. [9] C. Derman, G. J. Lieberman, and S. M. Ross, A sequential stochastic assignment problem, Management Science 18 (1972), no. 7. [10] L. E. Dubins and H. Teicher, Optimal stopping when the future is discounted, The Annals of Mathematical Statistics 38 (1967), no. 2, 601{605. [11] E.B. Dynkin, The optimum choice of the instant for stopping a markov process, Soviet Mathematics Doklady 4 (1963), no. 3, 627{629. [12] A. Eshragh and M. Modarres, A new approach to distribution tting: Decision on beliefs, Journal of Industrial and Systems Engineering 3 (2009), no. 1, 56{71. 104 [13] T. S. Ferguson, Who solved the secretary problem?, Statistical Science 4 (1989), no. 3, 282{289. [14] , Optimal stopping and applications, Electronic Text, http://www.math.ucla.edu/~tom/stopping/contents.html, 2000. [15] J. Gallien, Dynamic mechanism design for online commerce, Operations Re- search 54 (2006), no. 2, 291{310. [16] G. W. Haggstrom, Optimal sequential procedures when more than one stop is required, The Annals of Mathematical Statistics 38 (1967), no. 6, 1618{1626. [17] R. Kleinberg, A multiple-choice secretary algorithm with applications to online auctions, Proceedings of the sixteenth annual ACM-SIAM symposium on Dis- crete algorithms (2005), 630{631. [18] S. A. Lippman and J. J. McCall, The economics of job search: a survey, Economic Inquiry 14 (1976), no. 2, 155{189. [19] S. A. Lippman, S. M. Ross, and S. Seshadri, A weakest link marked stopping problem, Journal of Applied Probability 44 (2007), no. 4, 843{851. [20] J. MacQueen and R.G. Miller, Optimal persistence policies, Operations Research 8 (1960), no. 3, 362{380. [21] A. T. Martinsek, Approximations to optimal stopping rules for exponential ran- dom variables, The Annals of Probability 12 (1984), no. 3, 876{881. [22] D. B. Roseneld and R. D. Shapiro, Optimal adaptive price search, Journal of Economics Theory 25 (1981), no. 1, 1{20. [23] D. B. Roseneld, R. D. Shapiro, and D. A. Butler, Optimal strategies for selling an asset, Management Science 29 (1983), no. 9, 1051{1061. [24] S. M. Ross, Introduction to stochastic dynamic programming, Academic Press, Inc., 1983. [25] , Simulation, Academic Press, 2006. [26] S. M. Ross and E. A. Pekoz, A second course in probability, Pekozbooks, 2007. [27] M. Rothschild, Searching for the lowest price when the distribution of prices is unknown, The Journal of Political Economy 82 (1974), no. 4. [28] M. Sakaguchi, Dynamic programming of some sequential sampling design, Jour- nal of Mathematical Analysis and Applications 2 (1961), 446{466. 105 [29] A. Seierstad, Reservation prices in optimal stopping, Operations Research 40 (1992), no. 2. [30] A. N. Shiryaev, Optimal stopping rules, Springer, New York, 1978. [31] D. O. Siegmund, Some problems in the theory of optimal stopping rules, The Annals of Mathematical Statistics 38 (1967), no. 6, 1627{1640. [32] J. L. Snell, Applications of martingale system theorems, Transactions of the American Mathematical Society 73 (1952), no. 2, 293{312. [33] A. Wald, Sequential analysis, John Wiley & Sons, New York, 1947. 106 Appendix A Simulation Algorithms In this Appendix, we present the simulation algorithms for the heuristic policies and the upper bounds of optimal expected returns. Assume 1 2 . A.1 Asset-Selling Problem without Recall Policy mix 1. Set R = 0. 2. Generate a random variable X f, where f is the true distribution and set R =RC. 3. Update the probability off 1 fromp to the updated posterior probability pf 1 (X) fp(X) . 4. If Xp 1 + (1p) 2 , return R +X and stop. If X </q></p><p step="">C, go to Step 2. Policy 0 mix f 1. Set R = 0. 2. Generate a random variable Y f, where f is the true distribution and set R =RC. 3. Update the probability off 1 fromp to the updated posterior probability pf 1 (Y) fp(Y) . 4. LetXf p . IfE fp [(XY ) + ]C, returnR+Y and stop. IfE fp [(XY ) + ]>C, go to Step 2. Policy Obtained Via Approximation of Optimal Value 1. Set R = 0. 2. Generate a random variable X f, where f is the true distribution and set R =RC. 3. Update the probability off 1 fromp to the updated posterior probability pf 1 (X) fp(X) . 4. If XE[V Y; pf 1 (Y) fp(Y) ], return R +X and stop. If X <E[V Y; pf 1 (Y) fp(Y) ], go to Step 2. In Step 4, we need to know the value of E[V (Y;g(Y;p))]. We use simulation to estimate E[V (Y;g(Y;p))]. For state p, we sample 1000 oers from the distribution f p , called Y 1 ;Y 2 ;:::;Y 1000 . The estimated value of E[V (Y;g(Y;p))] is obtained by computing P 1000 i=1 V (Y i ;g(Y i ;p)) 1000 . Here is the algorithm ofE[V (Y;g(Y;p))] for each p: 109 1. Generate 1000 random numbers U 1 ;U 2 ;:::;U 1000 U(0; 1). 2. If U i p, generate a random variable Y i f 1 , else generate Y i f 2 . 3. Compute p i = pf 1 (Y i ) fp(Y i ) . 4. Compute p i =p i 1 + (1p i ) 2 . 5. If Y i p i , let V i =Y i , otherwise let V i = p i . 6. ReturnC + P 1000 i=1 V i 1000 . UB 1 - Policy UB 1. Set R = 0. 2. Generate a random variable X f, where f is the true distribution and set R =RC. 3. Update the probability off 1 fromp to the updated posterior probability pf 1 (X) fp(X) . 4. If Xp 1 + (1p) 2 , return R +X and stop. 5. If X <L(p), go to Step 2. 6. Return R +p 1 + (1p)X and stop. UB 2 - Upper Bound of E [Rjf 1 ] 1. Set R = 0 and X m = 0. 2. Generate a random variable X f, where f is the true distribution and set R =RC. 3. Update the probability off 1 fromp to the updated posterior probability pf 1 (X) fp(X) . 4. Set X m = maxfX;X m g. 5. If Xp 1 + (1p) 2 , return R +X m and stop. If X </p><p step="" value="" for="">q, return 0 and stop. 3. If U q, generate a random variable X f, where f is the true distri- bution. Let R = R + X and update the probability of (q 1 ;f 1 ) from p to pq 1 f 1 (X) pq 1 f 1 (X)+(1p)q 2 f 2 (X) . 4. If R 1sla (p), return R and stop. If R< 1sla (p), go to Step 2. Policy mix 1. Set R = 0. 2. Generate a random number UU(0; 1). Assume q is the true probability of a successful burglary. If U >q, return 0 and stop. 3. If U q, generate a random variable X f, where f is the true distri- bution. Let R = R + X and update the probability of (q 1 ;f 1 ) from p to pq 1 f 1 (X) pq 1 f 1 (X)+(1p)q 2 f 2 (X) . 4. If R 1 , return R and stop. 5. If R< 1sla (p), go to Step 2. 6. If R p , return R and stop. If R< p , go to Step 2. UB - Policy UB 1. Set R = 0. 2. Generate a random number UU(0; 1). Assume q is the true probability of a successful burglary. If U >q, return 0 and stop. 3. If U q, generate a random variable X f, where f is the true distri- bution. Let R = R + X and update the probability of (q 1 ;f 1 ) from p to pq 1 f 1 (X) pq 1 f 1 (X)+(1p)q 2 f 2 (X) . 4. IfR 1sla (p), returnpV f 1 (X) + (1p)X and stop. IfR< 1sla (p), go to Step 2. 113 UB Conj - Upper Bound of E [Rj(q 1 ;f 1 )] 1. Set R = 0. 2. Generate a random number UU(0; 1). If U >q 1 , return 0 and stop. 3. IfUq 1 , generate a random variable Xf 1 . LetR =R +X and update the probability of (q 1 ;f 1 ) from p to pq 1 f 1 (X) pq 1 f 1 (X)+(1p)q 2 f 2 (X) . 4. If R 1 , return R and stop. If R< 1sla (p), go to Step 2. 5. If R(x;p), return R and stop. If R<(x;p), go to Step 2. Recall that (x;p) = pq 1 E[V f 1 (x +X 1 )] + (1p)q 2 E[V f 2 (x +X 2 )]. If f 1 and f 2 are Exponential distributions, see Appendix B for the calculation of(x;p) function. If f 1 and f 2 are arbitrary functions, we use the simulation approaches described in Section 5.1.2 to estimate the values of E[V f 1 (x +X 1 )] and E[V f 2 (x +X 2 )]. For each distribution, we run 100 replications. UB Conj - Upper Bound of E [Rj(q 2 ;f 2 )] 1. Set R = 0. 2. Generate a random number UU(0; 1). If U >q 2 , return 0 and stop. 3. IfUq 2 , generate a random variable Xf 2 . LetR =R +X and update the probability of (q 1 ;f 1 ) from p to pq 1 f 1 (X) pq 1 f 1 (X)+(1p)q 2 f 2 (X) . 4. If R 1sla (p), return R and stop. If R< 1sla (p), go to Step 2. 114 Appendix B Computation for Exponential Distribution B.1 Asset-Selling Problem without Recall L y (p) and L(p) Functions From Equation (3.5), L y (p) = p E f 1 [XjX >y] C F 1 (y) + (1p) E f 2 [XjX >y] C F 2 (y) = pE f 1 [XjX >y] + (1p)E f 2 [XjX >y]C p F 1 (y) + 1p F 2 (y) = p R 1 y xf 1 (x)dx F 1 (y) ! + (1p) R 1 y xf 2 (x)dx F 2 (y) ! C pe 1 y + (1p)e 2 y = p R 1 y x 1 e 1 x dx e 1 y + (1p) R 1 y x 2 e 2 x dx e 2 y Cpe 1 y C(1p)e 2 y = p(y + 1 1 ) + (1p)(y + 1 2 )Cpe 1 y C(1p)e 2 y = y + p 1 + 1p 2 Cpe 1 y C(1p)e 2 y We rst check whether L y (p) is a concave function of y. We have @L y (p) @y = 1 1 Cpe 1 y 2 C(1p)e 2 y @ 2 L y (p) @y 2 = 2 1 Cpe 1 y 2 2 C(1p)e 2 y 0 By the second-order condition of concave function, this show that L y (p) is concave. For L(p), since L(p) = max y L y (p), we take the derivative of L y (p) with respect to y and nd the value of y, called y , for which the derivative is 0. Let L(p) =L y (p). 115 B.2 Asset-Selling Problem with Recall E[maxfx;Xg] Function For the one-stage look-ahead and the modied two-stage look-ahead policies, we need to know the value of E fp [maxfX;xg], where (x;p) is the current state and X f p . We also need to calculate the value of E[maxfx;X 2 g] for policy and the upper bound of E [Rjf 1 ], where x is the current state and X 2 f 2 . Thus, we show the computation of E[maxfx;Xg], where XExp(). E[maxfx;Xg] = Z maxfx;yge y dy = Z x 0 xe y dy + Z 1 x ye y dy = x( 1 e x + 1 ) +(x + 1) e x 2 = x + e x B.3 Burglar Problem (x;p) Function for Burglar Problem First, we use equation (5.4) to calculate E[V f 1 (x +X 1 )] and E[V f 2 (x +X 2 )]. E[V f 1 (x +X 1 )] = Z 1 0 V f 1 (x +y)f 1 (y)dy = Z 1 x 0 V f 1 (x +y) 1 e 1 y dy + Z 1 1 x V f 1 (x +y) 1 e 1 y dy (B.1) The rst term in (B.1) is as follows: Z 1 x 0 V f 1 (x +y) 1 e 1 y dy = Z 1 x 0 q 1 ( 1 + 1 1 )e 1 ( 1 xy)+q 1 1 ( 1 xy) 1 e 1 y dy = q 1 1 ( 1 + 1 1 )e 1 ( 1 x)+q 1 1 ( 1 x) Z 1 x 0 e q 1 1 y dy 116 = q 1 1 ( 1 + 1 1 )e 1 ( 1 x)+q 1 1 ( 1 x) ( 1 q 1 1 e q 1 1 y 1 x 0 ) = q 1 1 ( 1 + 1 1 )e 1 ( 1 x)+q 1 1 ( 1 x) ( 1 q 1 1 e q 1 1 ( 1 x) + 1 q 1 1 ) = ( 1 + 1 1 )e 1 ( 1 x) + ( 1 + 1 1 )e 1 ( 1 x)+q 1 1 ( 1 x) The second term in (B.1) is as follows: Z 1 1 x V f 1 (x +y) 1 e 1 y dy = Z 1 1 x (x +y) 1 e 1 y dy = Z 1 1 x x 1 e 1 y dy + Z 1 1 x y 1 e 1 y dy = x 1 ( 1 1 e 1 ( 1 x) ) + 1 Z 1 1 x ye 1 y dy = xe 1 ( 1 x) + 1 e 1 y 2 1 ( 1 y 1) 1 1 x = xe 1 ( 1 x) + e 1 ( 1 x) ( 1 ( 1 x) + 1) 1 = xe 1 ( 1 x) +e 1 ( 1 x) ( 1 x) + 1 1 e 1 ( 1 x) = ( 1 + 1 1 )e 1 ( 1 x) Thus, E[V f 1 (x +X 1 )] = ( 1 + 1 1 )e 1 ( 1 x) + ( 1 + 1 1 )e 1 ( 1 x)+q 1 1 ( 1 x) +( 1 + 1 1 )e 1 ( 1 x) = ( 1 + 1 1 )e 1 ( 1 x)+q 1 1 ( 1 x) Similarly, E[V f 2 (x +X 2 )] = ( 2 + 1 2 )e 2 ( 2 x)+q 2 2 ( 2 x) . Thus, when in state (x;p), (x;p) =pq 1 ( 1 + 1 1 )e 1 ( 1 x)+q 1 1 ( 1 x) + (1p)q 2 ( 2 + 1 2 )e 2 ( 2 x)+q 2 2 ( 2 x) 117 We use the (x;p) function for the simulation of the upper bound given (q 1 ;f 1 ). W p (y) and W p Functions for Burglar Problem LetR 1 (y) be the expected return from the policy y if the case is known to be (q 1 ;f 1 ) and R 2 (y) be that if the case is known to be (q 2 ;f 2 ). Thus, W y (p) =pR 1 (y) + (1p)R 2 (y) ForR 1 (y) andR 2 (y), we can use an analogue of the value function in Equation (5.4) and the value y is substituted for the threshold value . R 1 (y) = q 1 Z 1 y xf 1 (x)dx + Z y 0 q 1 (y + 1 1 )e 1 (yx)+q 1 1 (yx) f 1 (x)dx = q 1 Z 1 y x 1 e 1 x dx +q 1 Z y 0 q 1 (y + 1 1 )e 1 (yx)+q 1 1 (yx) 1 e 1 x dx = 1 q 1 e 1 x 2 1 ( 1 x 1) 1 y + 1 q 2 1 (y + 1 1 )e 1 y+q 1 1 y Z y 0 e q 1 1 x dx = ( 1 y + 1)q 1 1 e 1 y + 1 q 2 1 (y + 1 1 )e 1 y+q 1 1 y 1 q 1 1 e q 1 1 x y 0 = yq 1 e 1 y + q 1 1 e 1 y + (y + 1 1 )q 1 e 1 y+q 1 1 y (1e q 1 1 y ) = yq 1 e 1 y + q 1 1 e 1 y +yq 1 e 1 y+q 1 1 y + q 1 1 e 1 y+q 1 1 y yq 1 e 1 y q 1 1 e 1 y = (y + 1 1 )q 1 e 1 (1q 1 )y Similarly, R 2 (y) = (y + 1 2 )q 2 e 2 (1q 2 )y . Thus, W y (p) =pq 1 (y + 1 1 )e 1 (1q 1 )y + (1p)q 2 (y + 1 2 )e 2 (1q 2 )y For the function of W (p), we take the derivative of W y (p) with respect to y and nd the value of y, called y , for which the derivative is 0. Let W (p) = W y (p). In the numerical results section, we use W p 0 1 +(1p 0 ) 2 (p 0 ) and W (p 0 ) as references.</p>
Abstract (if available)
Abstract
This dissertation focuses on an application of stochastic dynamic programming called the optimal stopping problem. The decision maker has to choose a time to take a given action based on sequentially observed random variables in order to maximize an expected payoff. All previous research on optimal stopping assumes that the distributions of random variables are completely known or partially known with unknown parameters. Throughout the dissertation, we address the problems with the uncertainty assumption that the random variables are from one of two possible distributions with given initial probabilities as to which is the true distribution. The probabilities are then updated in a Bayesian manner as the successive random variables are observed.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
A stochastic employment problem
PDF
Multi-armed bandit problems with learned rewards
PDF
Queueing loss system with heterogeneous servers and discriminating arrivals
PDF
Models and algorithms for energy efficient wireless sensor networks
PDF
Train scheduling and routing under dynamic headway control
PDF
Dynamic programming-based algorithms and heuristics for routing problems
PDF
Continuous approximation for selection routing problems
PDF
Speeding up distributed constraint optimization search algorithms
PDF
Generalized optimal location planning
PDF
The warehouse traveling salesman problem and its application
PDF
Data-driven optimization for indoor localization
PDF
Vehicle routing and resource allocation for health care under uncertainty
PDF
Asymptotic analysis of the generalized traveling salesman problem and its application
PDF
Precision-based sample size reduction for Bayesian experimentation using Markov chain simulation
PDF
Continuous approximation formulas for cumulative routing optimization problems
PDF
Variants of stochastic knapsack problems
PDF
Information design in non-atomic routing games: computation, repeated setting and experiment
PDF
Applications of Wasserstein distance in distributionally robust optimization
PDF
Discounted robust stochastic games with applications to homeland security and flow control
PDF
A continuous approximation model for the parallel drone scheduling traveling salesman problem
Asset Metadata
Creator
Lee, Yen-Ming
(author)
Core Title
Bayesian optimal stopping problems with partial information
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Industrial and Systems Engineering
Publication Date
11/22/2010
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Bayesian updating,dynamic programming,OAI-PMH Harvest,optimal stopping
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Ross, Sheldon M. (
committee chair
), Dessouky, Maged M. (
committee member
), Marino, Anthony M. (
committee member
), Moore, James Elliott, II. (
committee member
)
Creator Email
leemiho@gmail.com,yenmingl@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m3542
Unique identifier
UC1229320
Identifier
etd-Lee-4214 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-411188 (legacy record id),usctheses-m3542 (legacy record id)
Legacy Identifier
etd-Lee-4214.pdf
Dmrecord
411188
Document Type
Dissertation
Rights
Lee, Yen-Ming
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
Bayesian updating
dynamic programming
optimal stopping