Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Modeling customer choice in assortment and transportation applications
(USC Thesis Other)
Modeling customer choice in assortment and transportation applications
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
MODELING CUSTOMER CHOICE IN ASSORTMENT AND TRANSPORTATION APPLICATIONS by Guang Li A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (BUSINESS ADMINISTRATION) August 2016 Copyright 2016 Guang Li To my loving late father, dear mother and steadfast husband, for being the core of my strength throughout the quest of attaining my Ph.D. degree. ii Acknowledgments Words cannot express how grateful I am to my advisor, Professor Paat Rusmevichientong, for his unwavering support and motivation throughout my Ph.D. studies. This dissertation could not have been accomplished without his invaluable guidance and advice. I could not have had a better advisor and mentor for the past five years. I would like to express my sincere gratitude to the rest of my dissertation committee: Professors Sha Yang and Leon Zhu, who are also my co-authors, for their comments, encouragement, and collaboration. I especially want to thank Professor Huseyin Topaloglu for the inspirational collaboration and learning opportunities in writing Chapter Two of this dissertation. I would like to thank the collegial faculty members and friendly staff at the Marshall School of Business, who were always available when I needed help in research or life. Special thanks to my Ph.D. friends for their sympathetic ear, altruistic help, and stimulating discussions we shared about our research and lives. To my beloved family, thank you for supporting, encouraging, and believing in me in my pursuits throughout my life. iii Contents Acknowledgments iii Contents iv List of Tables vii List of Figures viii Abstract ix 1 A Greedy Algorithm for the Two-Level Nested Logit Model 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Characterization of the Optimal Assortments . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 A Greedy Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.5 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2 The d-Level Nested Logit Model: Assortment and Price Optimization Problems 16 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 iv 2.3 Assortment Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.4 Properties of An Optimal Assortment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.5 An Algorithm for Assortment Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.6 Numerical Illustrations for Assortment Optimization . . . . . . . . . . . . . . . . . . . . . 31 2.7 Practical Performance of Assortment Optimization Algorithm . . . . . . . . . . . . . . . . 34 2.8 Extension to Multi-Period Capacity Allocation . . . . . . . . . . . . . . . . . . . . . . . . 37 2.9 Price Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.9.1 Characterization of a Stationary Point . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.9.2 A Pricing Algorithm without Gradients . . . . . . . . . . . . . . . . . . . . . . . . 41 2.9.3 Extension to Arbitrary Cost Functions . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.10 Numerical Results for Price Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.11 The Choice Model and the Assortment Optimization Problem . . . . . . . . . . . . . . . . . 47 2.12 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3 A Dual-Account Bus Card Problem 53 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.3 The Model and Optimal Policy for Fully Strategic Customers . . . . . . . . . . . . . . . . . 57 3.3.1 A Finite-Horizon Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.4 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.4.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.4.2 Estimation of Daily Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.4.3 Estimation of Refill Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.4.4 Grid Search for Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.5 Counterfactual Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 v 3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 References 72 A Technical Appendix to Chapter 2 79 A.1 Omitted Results in Section 2.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 A.2 Omitted Results in Section 2.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 A.3 Omitted Results in Section 2.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 A.4 Omitted Results in Section 2.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 A.4.1 Proof of Lemma 2.9.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 A.4.2 Proof of Theorem 2.9.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 A.4.3 Proof of Theorem 2.9.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 A.5 Additional Numerical Results for Price Optimization . . . . . . . . . . . . . . . . . . . . . 103 A.6 Proof of Theorem 2.11.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 A.7 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 A.7.1 Properties of the Log-Likelihood Function . . . . . . . . . . . . . . . . . . . . . . . 112 A.7.2 Demonstration of Improvement in Prediction Accuracy in a Single Data Set . . . . . 113 A.7.3 Numerical Experiments for an Ensemble of Data Sets . . . . . . . . . . . . . . . . . 117 A.7.4 Proof of Theorem A.7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 B Technical Appendix to Chapter 3 125 vi List of Tables 2.1 Computation ofA 14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.2 Computation ofA 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.3 Computation ofA root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.4 Running times for AA, FE and LP for the test problems with two and three levels. . . . . . . 36 2.5 Performance comparison between PUPD and GA. . . . . . . . . . . . . . . . . . . . . . . . 46 3.1 List of parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 A.1 Performance comparison between PUPD and GA with informed initial prices. . . . . . . . . 104 A.2 Performance comparison between PUPD and OC. . . . . . . . . . . . . . . . . . . . . . . . 105 A.3 Parameters of the mixture of logits model. . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 A.4 Comparison of the total errors for one, two and three-level nested logit models. . . . . . . . 120 vii List of Figures 2.1 An illustration of the model formulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2 The linesfL k;Sk () : S k 2A k g and the pointsfI q k : q = 0;:::;Q k g for some collection A k =fS 1 k ;S 2 k ;S 3 k ;S 4 k g. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.3 A three-level problem instance and its parameters. . . . . . . . . . . . . . . . . . . . . . 31 3.1 Illustration of the myopic optimal ordering policy with zero fixed costs when h d hr > bc d bcr . . . 63 3.2 Simulated holding costs when fixed costs are zero. . . . . . . . . . . . . . . . . . . . . . . 67 3.3 Simulated holding costs and joint fixed cost. . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.4 Optimal fares and revenue improvement with respect to fare elasticity of ridership. . . . . . 70 3.5 Percentage reduction in fare revenue with respect to reduction in fixed cost. . . . . . . . . . 70 A.1 Confusion matrices for the best fitted one, two and three-level nested logit models. . . . . . 116 viii Abstract While companies around the world strive to meet the demand of their customers, the more ambitious ones seek to shape consumer demand. To help companies make better operational and marketing decisions, we need to understand how and why customers make their purchase decisions. However, it is difficult to know the actual formulae that customers use when making their choices due to the complex nature of human behavior and unobservable externalities. My research interests lie in modeling customer choice behaviors, as well as in seeking solutions to high-impact operations management problems, such as assortment planning and price optimization, using both analytical and data-driven approaches. In this dissertation, I seek to examine three facets relating to my research interests, focusing on assortment optimization algorithms, price optimization problems, and a dual-account bus card system. The first chapter, “A Greedy Algorithm for the Two-Level Nested Logit Model,” is based on a joint work with Professor Paat Rusmevichientong. We consider the assortment optimization problem under the classical two-level nested logit model, where the goal is to find the revenue-maximizing assortment of products to be offered. We establish a necessary and sufficient condition for the optimal assortment and use this optimality condition to develop a simple greedy algorithm that iteratively removes at most one product from each nest, until the optimality condition is satisfied. Our algorithm exploits the “lumpy” structure of the optimal solution, where in each nest, a certain set of “consecutive” products will always appear together in the optimal assortment. The algorithm is simple, intuitive, and extremely fast. For a problem withm nests, with each nest havingn products, the running time isO(nm logm). This is the fastest known running time for this problem. The second chapter, “The d-Level Nested Logit Model: Assortment and Price Optimization Problems,” is based on a joint work with Professors Paat Rusmevichientong and Huseyin Topaloglu. We consider assort- ment and price optimization problems under thed-level nested logit model. In the assortment optimization ix problem, the goal is to find the revenue-maximizing assortment of products to offer, when the prices of the products are fixed. Using a novel formulation of the d-level nested logit model as a tree of depth d, we provide an efficient algorithm to find the optimal assortment. For ad-level nested logit model withn products, the algorithm runs in O(dn logn) time. In the price optimization problem, the goal is to find the revenue-maximizing prices for the products, when the assortment of offered products is fixed. Although the expected revenue is not concave in the product prices, we develop an iterative algorithm that generates a sequence of prices converging to a stationary point. Numerical experiments show that our method converges faster than gradient-based methods by many orders of magnitude. In addition to providing solutions for the assortment and price optimization problems, we give support for thed-level nested logit model by demon- strating that it is consistent with the random utility maximization principle and equivalent to the elimination by aspects model. The third chapter, “A Dual-Account Bus Card Problem,” is based on a joint work with Professors Paat Rusmevichientong and Sha Yang. We study the optimal recharging policy of strategic passengers in a dual-account bus card system. The first account offers a heavy discount on the bus fare, but its remaining balance expires at the end of each month; the second account offers less of a discount, but the remaining balance never expires. Using a finite-horizon dynamic programming formulation, we analyze the structural properties of the expected minimum cost function. Based on real passenger spending data, we estimate the cost parameters that customers associate with both accounts. Moreover, we perform counterfactual analysis to suggest the optimal fares that should be set by the bus company and investigate managerial implications of the cost parameters. x Chapter 1 A Greedy Algorithm for the Two-Level Nested Logit Model 1.1 Introduction In an assortment optimization problem, a firm wishes to offer the most profitable set of products to satisfy customer demand. As shown in an excellent survey in K¨ ok et al. (2009), this problem has many important applications in retail and revenue management. Our paper focuses on assortment optimization in revenue management, where, unlike in the retail setting, the focus is primarily on product selections, without any inventory replenishment. The seminal work by Talluri and van Ryzin (2004) introduced the first assortment optimization problem under the multinomial logit (MNL) model McFadden (1974). Although the MNL model admits a tractable solution, it suffers from the independence of irrelevant alternatives (IIA) property, which may lead to a paradox when alternatives are closely related and result in biased estimates (Train 2003). The deficiency of the MNL model has led many researchers to consider assortment optimization under more complex choice models; see, for example, Gaur and Honhon (2006), Honhon et al. (2010), Farias et al. (2011). The nested logit model extends the MNL model by grouping similar alternatives into a nest and allowing differential substitution patterns within and across nests, thereby partially relaxing the IIA restriction (McFadden 1978). Under the nested logit model, a consumer first chooses a nest of products and then chooses a product from the chosen nest. Existing literature on assortment optimization under the nested logit model makes use of a linear programming framework; see, for example, Davis et al. (2014), Rusmevichientong et al. (2009), Gallego and Topaloglu (2014). In this paper, we present an alternative solution to assortment optimization under the two-level nested logit model. We establish a necessary and sufficient condition for an optimal assortment (Theorem 1.3.3), which is, to our knowledge, the first such optimality condition for this problem. In addition, we reveal a 1 “lumpy” structure of the optimal assortment: surprisingly, within each nest, a certain set of “consecutive” products always appears together in the optimal solution. Moreover, as shown in Theorem 1.3.4, by look- ing at a certain index of each product, we can determine in advance which products will appear together! Exploiting the optimality condition and the “lumpy” structure, our greedy algorithm iteratively removes (at most) one product from each nest and terminates with an optimal assortment inO(nm logm) time, for the problem withm nests, with each nest havingn products. This is the fastest known running time for this problem. Moreover, our assortment optimization can be used as a subroutine for the network revenue management problem under the nested logit model. A revenue model over a single-leg flight is first considered by Talluri and van Ryzin (2004). For extensions of this work to multiple resources, see, for example, Gallego et al. (2004), Liu and van Ryzin (2008), Kunnumkal (2014). The main approach in these papers is to formulate a variety of deterministic linear programming approximations and solve the linear programs using column generation. The column generation subproblem in this setting is precisely the assortment optimization considered in this paper, when customers choose according to a two-level nested logit model. We perform sensitivity analysis to study the effect of reducing product revenues on the optimal assort- ment. We show that the entire profile of the optimal revenue and assortment when every product revenue is reduced by can be computed simultaneously for all 2 R + in a single run of the greedy algorithm inO(nm logm) time. Moreover, we show that the optimal solution to the multi-period capacity alloca- tion problem with total allocation periodsT and total capacityC can be computed inO(TC +nm logm) operations, by applying the greedy algorithm just once. 1.2 Model We havem nests indexed byi2f1; 2;:::;mg, and each nesti hasn products indexed byj2f1; 2;:::;ng. All of the analysis easily extends to the case in which some nests have fewer thann products. A productj in nest i is denoted by ji, and in settings in which the nest index is clear from context, we refer to it simply as product j. We denote the no-purchase option by product 0, which is the only product in nest 0. Each productji has a revenuer ji 0, withr 0 = 0. Without loss of generality, we assume that the products in each nest are sorted in a descending order of revenues; that is, for each nesti2f1; 2;:::;mg, r 1i r 2i r ni 0. 2 We assume that the customer is a utility maximizing agent, and the utility that she assigns to each product is given by: U 0 = 0 + 0 and U ji = ji + i + ji ; i = 1;:::;m; j = 1;:::;n; where ji and 0 are the deterministic components andf 0 ; i ; ji :i = 1;:::;m; j = 1;:::;ng are inde- pendent random errors. The random variable 0 has a Gumbel distribution with a location parame- ter of zero and a scaling parameter of one, which we write as 0 Gumbel(0; 1). For each nest i, ji Gumbel(0; 1= i ) for allj, where i 2 (0; 1] is a nest-specific parameter. Finally, for eachi, i is inde- pendent off ji :j = 1;:::;ng, and it is assumed to have a distribution such that i + ji Gumbel(0; 1) for allj; see Cardell (1997a) for the existence of such a distribution. Let S = (S 1 ;:::;S m ) denote an assortment of products, where S i is a subset of the n products f1i; 2i;:::;nig in nest i, for all i 2 f1; 2;:::;mg. As the nest index i is explicit, we simply write S i f1; 2;:::;ng. For each productji, letv ji =e ji = i denote its preference weight, and letv 0 =e 0 . As shown in Ben-Akiva and Lerman (1985), the probabilityQ ji (S) that a customer chooses a productj2 S i under assortmentS is given by: Q ji (S) =Q jji (S i )Q i (S); where Q jji (S i ) = v ji P `2S i v `i and Q i (S) = P `2S i v `i i v 0 + P m k=1 P `2S k v `k k : Here,Q i (S) denotes the probability that a customer chooses a product in nesti, andQ jji (S i ) denotes the conditional probability of the customer’s selecting productji, given that nesti is chosen. Thus, the total expected revenueRev(S) for any assortmentS = (S 1 ;:::;S m ) is given by: Rev(S) = m X i=1 X j2S i r ji Q ji (S) = m X i=1 Q i (S) X j2S i r ji Q jji (S i ) = m X i=1 Q i (S)Rev i (S i ); where Rev i (S i ) = P j2S i r ji Q jji (S i ) = P `2S i r `i v `i P `2S i v `i denotes the expected revenue fromS i in nesti, and we set it to be zero ifS i is empty. The assortment optimization problem is thus given by: Z = max S=(S 1 ;:::;Sm) Rev(S); 3 and we denote an optimal assortment byS = (S 1 ;:::;S m ) and let Z = Rev(S ) denote the opti- mal revenue. If there are ties, we can choose the optimal assortment according to any predetermined tie- breaking rule. 1 1.3 Characterization of the Optimal Assortments LetN + =ff1g;f1; 2g;:::;f1; 2;:::;ngg denote the collection of revenue-ordered subsets. As shown in the following lemma, at the optimal solution, any nonempty nest is revenue-ordered. The lemma is a restatement of Theorem 4 in Davis et al. (2014), and we refer the reader to the paper for the details of the proof. Lemma 1.3.1 (Optimal Nest Structure). Fori = 1;:::;m, ifS i 6=?, thenS i 2N + . Although Lemma 1.3.1 shows thatS i 2f?g[N + for alli, exhaustive search is not feasible because there are (n+1) m possible configurations of revenue-ordered subsets amongm nests. To develop an efficient algorithm, we will establish a necessary and sufficient condition for an optimal assortment. For any integers k 1 andk 2 , let [k 1 ;k 2 ] = 8 < : fk 1 ;k 1 + 1;:::;k 2 g if k 1 k 2 ; ? otherwise: Also, for each nesti and 1k 1 k 2 n, letG i (k 1 ;k 2 ) be defined by: G i (k 1 ;k 2 ) =Rev i ([1;k 2 ]) Rev i ([1;k 2 ])Rev i ([k 1 ;k 2 ]) f i 1 P `2[k 1 ;k 2 ] v `i = P `2[1;k 2 ] v `i ; wheref i : [0; 1]! [0; 1] is a continuous, strictly increasing function defined by: f i (x) = 8 < : (x 1 i x)=(1x) if x2 [0; 1) i if x = 1 : Moreover, we define 0=0 = 0, thusG i (1;k 2 ) =Rev i ([1;k 2 ]) for allk 2 . 1 For example, we can chooseS to be the minimal optimal assortment; that is, there is no other optimal assortmentB = (B1;:::;Bm)6=S such thatBiS i for alli. All of our analysis applies to other tie-breaking rules. 4 Interpretation ofG i (;) SupposeS i = [1;k 2 ], and considerk 1 2f1;:::;k 2 g. The functionG i ([k 1 ;k 2 ]) can be used to evaluate the net effect of removing [k 1 ;k 2 ] from nesti on the total revenue, without any com- putation on the resultant assortment. This intuition is confirmed in the following lemma, which establishes a precise condition on when we can useG i (;) to determine if a subset can be removed fromS i to improve the total revenue. Lemma 1.3.2 (When Is Removing Subsets Beneficial?). For any assortmentS = (S 1 ;:::;S m ), a collection of subsetsfA i S i : i = 1; 2;:::;mg can be removed fromS to achieve a greater revenue if and only if P m i=1 [V (S i ) i Rev i (S i )V (S i nA i ) i Rev i (S i nA i ) ] P m i=1 [V (S i ) i V (S i nA i ) i ] < Rev(S); where for any setX,V (X) denotes the sum of the preference weights of the products inX. Moreover, if S i = [1;p i ] with p i 1, A i = [k i ;p i ] with k i p i , and A ` = ? for all `6= i, then removingA i fromS i leads to a greater revenue if and only ifG i (k i ;p i ) < Rev(S); removingA i fromS i leads to the same revenue if and only if G i (k i ;p i ) =Rev(S). Proof. Fori = 1;:::;m, let b S i =S i nA i . Assume thatA i 6=? for somei; otherwise, the result is trivially true. Asv ji > 0 for allji, P m i=1 V ( b S i ) < P m i=1 V (S i ). By definition, Rev( b S) = P m i=1 V ( b S i ) iRev i ( b S i ) v 0 + P m i=1 V ( b S i ) i . Then, Rev(S) = P m i=1 V (S i ) i Rev i (S i ) v 0 + P m i=1 V (S i ) i = v 0 + P m i=1 V ( b S i ) i v 0 + P m i=1 V (S i ) i Rev( b S) + P m i=1 h V (S i ) i V ( b S i ) i i v 0 + P m i=1 V (S i ) i P m i=1 h V (S i ) i Rev i (S i )V ( b S i ) i Rev i ( b S i ) i P m i=1 h V (S i ) i V ( b S i ) i i : Thus,Rev(S) is a convex combination ofRev( b S) and P m i=1 [ V (S i ) iRev i (S i )V ( b S i ) iRev i ( b S i ) ] P m i=1 [ V (S i ) iV ( b S i ) i ] , which gives the desired result. LetS i = [1;p i ],A i = [k i ;p i ]6=?, andA ` =? for all`6=i. Asf i 1 V (A i ) V (S i ) = V (S i ) iV ( b S i ) i V ( b S i ) i 1 V (S i )V ( b S i ) i andRev i ( b S i ) = V (S i ) V ( b S i ) Rev i (S i ) + 1 V (S i ) V ( b S i ) Rev i (A i ); it is easy to verify that V (S i ) i Rev i (S i )V ( b S i ) i Rev i ( b S i ) = V (S i ) i V ( b S i ) i G i (k i ;p i ): 5 Therefore, V (S i ) iRev i (S i )V ( b S i ) iRev i ( b S i ) V (S i ) iV ( b S i ) i Rev(S) if and only ifG i (k i ;p i ) Rev(S). The inequality is strict if and only ifG i (k i ;p i )<Rev(S). 2 Lemma 1.3.2 is consistent with our interpretation of G i (;): for p i 1, if S i = [1;p i ] and G i (k i ;p i ) < Rev(S), then removing [k 1 ;k 2 ] from S i is beneficial. If S i = [1;p i ] in an optimal assort- mentS , thenG i (k i ;p i ) Rev(S ) should hold for allk i p i . The intuition is confirmed by the main result of this section, which is stated in the following theorem. The theorem establishes a necessary and sufficient condition for an optimal assortment, and it forms the basis for our greedy algorithm in Section 1.4. Theorem 1.3.3 (Optimality Condition). Consider any assortmentS = ([1;p 1 ];:::; [1;p m ]) such thatS i [1;p i ] for alli. Then,S is optimal if and only if for every nesti such thatp i 1, min j= 1;:::;p i G i (j;p i )Rev(S): Proof. Suppose thatS = (S 1 ;:::;S m ) is an optimal solution, whereS i = [1;p i ] for alli. Ifp i 1, then S i 6=?. AsS is optimal, removing a subset [k i ;p i ] fromS i cannot improve the revenue. Thus, it follows from Lemma 1.3.2 thatG i (k i ;p i )Rev(S) for allk i 2f1; 2;:::;p i g, which is the desired result. To establish sufficiency, consider any assortmentS = (S 1 ;:::;S m ) such that S i S i = [1;p i ] for all i, and S satisfies the condition of the theorem. We will show that Rev(S) = Z . For each i, let E i =S i nS i = [k i ;p i ] for some indexk i . By our hypothesis, for every nesti such thatE i 6=?,G i (k i ;p i ) Rev(S), thus removingE i fromS i does not increase the revenue. It then follows from Lemma 1.3.2 that Rev(S) V (S i ) iRev i (S i )V (S i nE i ) iRev i (S i nE i ) V (S i ) iV (S i nE i ) i ; where for any setX,V (X) is the sum of the preference weights of the products inX. Thus, Rev(S) min i:E i 6=? V (S i ) i Rev i (S i )V (S i nE i ) i Rev i (S i nE i ) V (S i ) i V (S i nE i ) i P i:E i 6=? [V (S i ) i Rev i (S i )V (S i nE i ) i Rev i (S i nE i )] P i:E i 6=? [V (S i ) i V (S i nE i ) i ] ; where the second inequality follows because for any x2R k and y2R k + , P k i=1 x i P k i=1 y i min i=1;:::;k x i y i , and for alli,V (S i ) i V (S i nE i ) i 0. Note that ifE i =?, thenS i = S i . It then follows from Lemma 6 1.3.2 thatRev(S)Rev(S 1 nE 1 ;S 2 nE 2 ;:::;S m nE m ) =Rev(S 1 ;:::;S m ) =Z ; which is the desired result. 2 Surprisingly, as shown in the following theorem, the optimal assortment in each nest is “lumpy,” with certain consecutive products always appearing together. Moreover, for each nesti, by looking at the index G i (j;j) of each productj, we can determine in advance which products will appear together! Theorem 1.3.4 (Lumpiness of the Optimal Assortments). For every nesti, if there exist productsj andk, such thatj <k and G i (j;j)<G i (j + 1;j + 1)<<G i (k;k); then eitherfj;j + 1;:::;kgS i orfj;j + 1;:::;kg\S i =?. Proof. It suffices to prove the result whenk = j + 1; that is, we need to show that ifG i (j;j) < G i (j + 1;j + 1), then eitherfj;j + 1gS i orfj;j + 1g\S i =?. There are two cases to consider. Case 1: j + 12S i . In this case,fj;j + 1gS i becauseS i 2N + by Lemma 1.3.1. Case 2: j + 1 = 2 S i . Suppose thatj2 S i . Then,S i = [1;j]. Let b S = (S 1 ;:::; [1;j + 1];:::;S m ). By definition, Rev( b S) Rev(S ), and asS is obtained from b S by removing productj + 1 from nesti, it follows from Lemma 1.3.2 thatG i (j + 1;j + 1)Rev( b S). So, we have thatG i (j;j)<G i (j + 1;j + 1) Rev( b S)Rev(S ); which implies thatRev (S 1 ;:::;S i nfjg;:::;S m )>Rev(S ). This contradicts the optimality ofS . Thus,j = 2S i . 2 Note that the “lumpiness” property is nest-specific and is independent ofv 0 . Set 1 = 0:6 and 2 = 0:5. Let v 11 = 2;v 21 = 7, v 31 = 6 and v 12 = 1. Let r 11 = 17;r 21 = 6;r 31 = 5 and r 12 = 17. Then, G 1 (1; 1) = 17,G 1 (2; 2) = 2:6 andG 1 (3; 3) = 3:2. By Theorem 1.3.4, products 21 and 31 in nest 1 satisfy the “lumpiness” condition, indicating either the presence or the absence of both products in the optimal assortment. It is easy to verify that for 0<v 0 < 12:55, the optimal assortment consists only of products 11 from nest 1 and product 12 from nest 2 (so both products 21 and 31 are absent), whereas forv 0 12:55, all 4 products belong to the optimal assortment. We can view the occurrence of “lumpiness” as the consequence of the property of G i (;) and the revenue-ordered property of the optimal assortment. SupposeS i S i , andG i (j;j)<G i (j + 1;j + 1) for productsj andj + 1 inS i . IfG i (j + 1;j + 1) < Rev(S), then it is beneficial to remove both productsj 7 andj + 1 from nesti. Otherwise, the revenue-ordered property implies that ifj + 12 S i , thenj2 S i . Thus, productsj andj + 1 always appear together in the optimal solution; that is, they can be merged into a single product when computing the optimal assortment, as illustrated in the next section. 1.4 A Greedy Algorithm Our proposed GREEDY ALGORITHM generates a sequence of assortments S t :t = 0; 1;::: , terminating with an assortment that satisfies the optimality condition in Theorem 1.3.3. In Stage 1 of the algorithm, we exploit the lumpy structure of the optimal assortments shown in Theorem 1.3.4, by combining products that will always appear together into a single group. In Stage 2, we make use of the optimality condition and iteratively remove at most one product from each nest until an optimal assortment is obtained. A formal description of the algorithm is given as follows. Stage 1 (Lumping): Compute the index G i (j;j) for every product ji. For i = 1;:::;m, if G i (j;j) < G i (j + 1;j + 1), then it follows from Theorem 1.3.4 that eitherfj;j + 1g S i orfj;j + 1g\S i =?. Thus, we can combine the two products, and replace products j and j + 1 with a single “new” product with a revenue Rev i ([j;j + 1]) and a preference weight v ji +v j+1;i . Assign the new product the index j, and calculate the newG i (j;j). Repeat this process until we obtain a list of indicesG i (j;j) that is non- increasing inj. Without loss of generality, assume that at the end of Stage 1, each nesti hasn products such thatG i (1; 1)G i (2; 2)G i (n;n): Stage 2 (Removal): Let S 1 = ([1;n];:::; [1;n]). For iteration t 1, given S t = [1;J t 1 ]; [1;J t 2 ];:::; [1;J t m ] , let Min t = min i :J t i 1 G i (J t i ;J t i ) and Index t = arg min i :J t i 1 G i (J t i ;J t i ): If Min t Rev(S t ), the algorithm terminates and outputsS t . Otherwise, if Min t < Rev(S t ), then the algorithm generates a new assortmentS t+1 = (S t+1 1 ;:::;S t+1 m ) as follows: S t+1 i = 8 < : S t i if i6= Index t S t i nfJ t i g = [1;J t i 1] if i = Index t : 8 Thus, in each iteration, we remove J t i from S t i if G i (J t i ;J t i ) = Min t and the product J t i violates the optimality condition given in Theorem 1.3.3; that is,G i (J t i ;J t i )<Rev(S t ). The following lemma shows thatS t always contains the optimal assortment. Lemma 1.4.1 (Containment). For allt,S S t . Proof. We will prove the result by induction. It is true for t = 1 by our construction. Suppose that the lemma is true fort> 1; that is,S S t . We will show thatS i S t+1 i for alli. Consider an arbitrary nest i. There are two cases to consider:S i =S t i andS i S t i . Case 1: Suppose thatS i = S t i = [1;J t i ]. IfJ t i = 0, theni6= Index t andS t+1 i = S t i by our construction. So, suppose thatJ t i 1. As Rev S ) Rev(S 1 ;:::;S i nfJ t i g;:::;S m , G i (J t i ;J t i ) Rev(S ) Rev(S t ) follows from Lemma 1.3.2 and the optimality ofS . As G i (J t i ;J t i ) Rev(S t ), S t+1 i = S t i , which is the desired result. Case 2: Suppose thatS i S t i = [1;J t i ]. Then,J t i = 2S i andJ t i 1. AsS i 2N + ,S i [1;J t i 1]. By our construction, we will either removeJ t i fromS t i or do nothing, and in both cases, we haveS i S t+1 i . 2 The main result of this section is stated in the following theorem. Theorem 1.4.2 (Correctness). The GREEDY ALGORITHM terminates with an optimal assortment in O(nm logm) time. Proof. We will first establish the running time of the algorithm. Note that Rev i ([1;j]) is a convex com- bination of Rev i ([1;j 1]) andr ji , and by definition,G i (j;j) = Rev i ([1;j]) Rev i ([1;j])r ji f i(1v ji = P j `=1 v `i) : For simplicity, assume that we can compute the functionf i (x) for eachx and the indexG i (j;j) for every prod- uctji inO(1) time. For each nesti, we will show that the “lumping” process takesO(n) operations. The lumping process requires two types of operations: 1) comparison between two indices and 2) merging of two products. Whenever we merge two products to create a new one, the total number of products is reduced by one. As we start withn products, the number of mergings is at mostn 1. We perform two types of comparisons. Starting from product 1, if the ordering of indices is correct up to product j, we compare G i (j;j) forward withG i (j + 1;j + 1). If no violation occurs, thenG i (j;j) G i (j + 1;j + 1), and we proceed to productj + 1. If a violation occurs, then we must merge productsj andj + 1 to create a new productj. To ensure the correct ordering of indices, we need to compare the new indexG i (j;j) backward 9 withG i (j 1;j 1). Givenn products, there areO(n) forward comparisons. As a backward comparison only occurs after each merging, and there areO(n) mergings, the number of backward comparisons is also O(n). Thus, the total number of operations isO(n). Withm nests, the total running time for Stage 1 is O(nm). In Stage 2, the algorithm will continue to run as long as a single product is removed from any nest. Given nm products, the algorithm will terminate in O(nm) iterations. We will show that each iteration takesO(logm) time. At the beginning of Stage 2, we create a self-balancing binary search tree (SB-BST) with m nodes, where for i = 1; 2;:::;m, node i in the tree corresponds to the index G i (n;n). This takes O(m logm) operations; see Chapter 6 in Knuth (1998) for more details. We use the SB-BST as our data structure because such a tree always maintains a height ofO(logm), allowing for efficient search operations. In each iterationt 1, we perform the following three operations: SEARCH: The algorithm searches for the nest with the minimum index G i (J t i ;J t i ) in O(logm) time under SB-BST. DELETE: If the minimum indexG i (J t i ;J t i ) is greater than or equal to the revenue of the current assort- ment, the algorithm terminates. Otherwise, the algorithm removes the productJ t i fromS t i in nesti and also removes nodei from the tree. The removal is done inO(logm) operations because the tree may need to re-balance its heights. INSERT: If the product J t i in nest i is removed and J t i > 1, the algorithm adds a new node with a corresponding indexG i (J t i 1;J t i 1) into the tree. Again, the insertion of a new node in SB-BST takes O(logm) operations. Thus, each iteration takes O(logm) time. As we have O(nm) iterations, the total running time is O(nm logm). LetS = ([1;p 1 ];:::; [1;p m ]) denote the assortment at the termination of the algorithm. For every nesti such thatp i 1, by our construction,G i (p i ;p i ) Rev(S). To complete the proof, we will show thatS satisfies the optimality condition in Theorem 1.3.3; that is, min j=1;2;:::;p i G i (j;p i )Rev(S) for eachi such thatp i 1. Suppose on the contrary that min j=1;2;:::;p i G i (j;p i ) < Rev(S). Letk i 2f1; 2;:::;p i 1g denote the largest index at which the optimality condition is violated; that is, G i (k i ;p i ) < Rev(S) G i (k i + 1;p i ): Then it follows from Lemma 1.3.2 that Rev(S 1 ;:::;S i n [k i ;p i ];:::;S m ) > Rev(S) 10 Rev(S 1 ;:::;S i n [k i + 1;p i ];:::;S m ): As S i n [k i ;p i ] = [1;k i 1] and S i n [k i + 1;p i ] = [1;k i ], by Lemma 1.3.2,G i (k i ;k i )< Rev(S 1 ;:::;S i n [k i + 1;p i ];:::;S m ) Rev(S): As we lump the products in Stage 1 andk i <p i , we haveG i (p i ;p i ) G i (k i ;k i ) < Rev(S); which contradicts our hypothesis onp i ! Therefore, min j=1;2;:::;p i G i (j;p i ) Rev(S). 2 1.5 Sensitivity Analysis In this section, we investigate how the optimal assortment changes with product revenues. Our goal is to tractably compute the optimal assortment that maximizes the total revenue when the revenue of every product is reduced by, for all2R + simultaneously. That is, we want to solve the following optimiza- tion problem: Z() = max S=(S 1 ;:::;Sm) m X i=1 X j2S i Q ji (S)(r ji ) = max S=(S 1 ;:::;Sm) Rev(S) [1Q 0 (S)] ; for all2R + , whereQ 0 (S) is the probability that a customer chooses the no-purchase option. Before we present the main result of this section, we first describe the structural properties ofZ() in the following lemma. The result is standard becausefZ() :2R + g is the maximum of linear functions, each of which is decreasing in, and we omit the detail of the proof. Lemma 1.5.1. For all2R + , the function7!Z() is convex, decreasing and piecewise linear. LetX() represent the corresponding optimal assortment associated with the optimization problem for Z(). Throughout this section, we assume thatX() corresponds to the optimal assortment obtained by running the GREEDY ALGORITHM with reduced revenue r ji for every product ji. Clearly, a naive application of the GREEDY ALGORITHM to computeZ() for every single2R + is infeasible. We will make use of the structural properties of Z() to develop an efficient algorithm that computes the entire profile ofZ() andX(), simultaneously for all2R + , with a running time ofO(nm logm). The main result of this section is stated in the following proposition. Proposition 1.5.2 (ComputingZ()). For all2R + , the function7!Z() has at mostnm breakpoints. Moreover, the entire profile of Z(), X() and the breakpoints can be computed simultaneously for all 2R + in a single run of the GREEDY ALGORITHM with a running time ofO(nm logm). 11 Before we prove Proposition 1.5.2, it is easy to verify that if we reduce the revenue of every product by , the index of each productji is given byG i (j;j), which is simply the difference between the original indexG i (j;j) defined in Section 1.3 and. Moreover, both the sequence of lumping and the ordering of the index of each product after lumping remain the same as we apply the GREEDY ALGORITHM with reduced revenuer ji for every productji. The following lemma shows the relationship betweenX() andS , whereS is the optimal assortment associated with the assortment optimization problem forZ as defined in Section 1.1. Lemma 1.5.3 (Optimal Assortment forZ()). For all 0,X()X(0) =S . Proof. The optimization problem forZ(0) is identical to the one forZ ; therefore,X(0) =S . Suppose the GREEDY ALGORITHM that computesS terminates inT iterations. For eacht = 1;:::;T , letS t = ([1;J t i ]; [1;J t 2 ];:::; [1;J t m ]) denote the assortment in iterationt. Then, the optimality condition implies that min i:J t i 1 G i (J t i ;J t i )< Rev(S t ); fort<T . Now compare this algorithm with the GREEDY ALGORITHM that computesX() with reduced revenuer ji for every productji. As the ordering of the index of each product is constant for all 0, the sequence of products removed remains the same before the termination of either algorithm. Moreover, in each iterationt<T , min i:J t i 1 (G i (J t i ;J t i ))<Rev(S t )<Rev(S t ) [1Q 0 (S t )]; implying that the GREEDY ALGORITHM that computes X() terminates in iteration T or later. Thus,X()S . 2 We denote the breakpoints of the function7! Z() byf t : t = 0; 1;:::g, 0 = 0 1 ::: and denote the optimal assortment for2 [ t ; t+1 ] byX t =X(). Recall that the optimal assortmentX t for 2 [ t ; t+1 ] corresponds to the optimal assortment obtained by running the GREEDY ALGORITHM with reduced revenuer ji for every productji. Thus, every product inX t has undergone the lumping process, and the indexG i (j;j) in every nesti is non-increasing inj. Exploiting this fact, the next lemma shows that givenX t1 , we can computeX t simply by removing a product with the lowest index. Lemma 1.5.4 (Computing an Optimal Assortment). SupposeX t1 is optimal for2 [ t1 ; t ]; an optimal assortmentX t for2 [ t ; t+1 ] can be obtained by removing a product with the minimum index fromX t1 . 12 Proof. As t is a breakpoint,X t 6=X t1 . Thus by Lemma 1.5.3,X t X t1 . Suppose the last productk in nest` has the minimum index value inX t1 . By continuity,Z( t ) =Rev(X t1 )[1Q 0 (X t1 )] t = G ` (k;k) t ; where the second equality follows from the optimality condition. AsG ` (k;k) t is equal toZ( t ), which is the total revenue ofX t1 with reduced revenuer ji t for every productji, it follows from Lemma 1.3.2 that the removal of product k` leaves the total revenue unchanged. That is, Z( t ) = Rev(X t1 nfk`g) [1Q 0 (X t1 nfk`g)] t : Thus,X t1 nfk`g is also optimal at t . LettingX t = X t1 nfk`g gives the desired result. 2 Lemma 1.5.4 implies that starting with an optimal assortmentX 0 = S , we can computeX t , for t = 1; 2;::: sequentially without any knowledge on the breakpoints t . The following lemma shows that given two consecutive optimal assortmentsX t1 andX t , we can compute the breakpoint t . Lemma 1.5.5 (Computing Breakpoints). SupposeX t1 is an optimal assortment for2 [ t1 ; t ], and X t is an optimal assortment for2 [ t ; t+1 ]. Then, t = Rev(X t1 )Rev(X t ) Q 0 (X t )Q 0 (X t1 ) : Proof. By continuity, we have thatZ( t ) =Rev(X t1 )[1Q 0 (X t1 )] t =Rev(X t )[1Q 0 (X t )] t , as bothX t1 andX t give the same revenueZ( t ) at the breakpoint t . Solving for t gives us the desired result. 2 Here is the proof of Proposition 1.5.2. Proof. Starting with the full assortment, we first apply the GREEDY ALGORITHM to computeX 0 =X(0) = S . GivenX 0 , we obtainX t for eacht2f1; 2;:::g by applying Lemma 1.5.4. Then, givenX t1 andX t , we apply Lemma 1.5.5 to calculate the breakpoint t . Finally, givenX t1 for2 [ t1 ; t ], the optimal revenue is computed asZ() = Rev(X t1 ) [1Q 0 (X t1 )]. There are at mostnm products inX 0 . As we remove a single product with the minimum index in each iteration, there are at mostnm removals, corresponding to at mostnm breakpoints of the function7! Z . The running time of the algorithm is O(nm logm) by Theorem 1.4.2. 2 13 Application to the Multi-Period Capacity Allocation Problem We apply our GREEDY ALGORITHM to solve the multi-period capacity allocation problem described in Talluri and van Ryzin (2004) under the two- level nested logit model. We denote the total capacity of seats on a flight leg byC and the total number of allocation periods byT . There arem groups of fare classes, with each group containingn fare classes. Each fare classj in groupi has revenuer ji , wherej = 1; 2;:::;n andi = 1; 2;:::;m. LetS = (S 1 ;S 2 ;:::;S m ) denote the assortment of fare classes. In each period, a single customer chooses a fare class according to a two-level nested logit model. LetJ t (x) denote the maximum expected revenue when we havex seats andt periods left. Then,J t (x) satisfies the following dynamic programming equation: J t (x) = max S=(S 1 ;:::;Sm) m X i=1 X j2S i Q ji (S)(r ji 4J t1 (x)) +J t1 (x); where4J t1 (x) = J t1 (x)J t1 (x 1), and J 0 0. It is a standard result that4J t (x) 0 for allx andt. Then, the first term on the right-hand side is exactly the optimization problem forZ() with =4J t1 (x). The traditional approach to solve the multi-period capacity allocation problem is by dynamic pro- gramming, in which J t (x) is computed for every single x and t, requiring a total running time of O(TCnm logm) if we apply the GREEDY ALGORITHM in each subproblem. As shown in the follow- ing proposition, we can do much better. Proposition 1.5.6 (Solving Multi-Period Capacity Allocation). Under the two-level nested logit model, the multi-period capacity allocation problem with total allocation periodsT and total capacityC can be solved using a single run of the GREEDY ALGORITHM with a total running time ofO(TC +nm logm). Proof. ComputingfX() : 2 R + g and breakpointsf t : t = 1; 2;:::g takes O(nm logm) time by Proposition 1.5.2. We can then build a hash table to store the intervals of breakpoints and the corresponding optimal assortments inO(nm) time. Suppose that for every2R + , we can look up in the hash table for X() inO(1) time. Asx can takeC+1 values, for eacht, the value functionJ t () can be computed inO(C) operations, simply by looking up the stored values in the hash table. There areT such value functions; thus it takesO(TC) time to determineJ t () for allt2f1;:::;Tg. Therefore, the total running time is bounded byO(TC +nm logm). 2 14 1.6 Conclusion We focus on computing the revenue-maximizing optimal assortment for the classical two-level nested logit models. We formulate the optimality condition and characterize the occurrence of ?lumpiness? in the optimal assortment. Exploiting the nest-specific nature of the lumpy structure and the optimality condition, we develop a simple and extremely fast greedy algorithm. As a direction for future research, it would be interesting to explore whether the optimality condition can be established for more general choice models, or if a similar lumpy structure exists for other choice models. 15 Chapter 2 The d-Level Nested Logit Model: Assortment and Price Optimization Problems 2.1 Introduction In their seminal work, Talluri and van Ryzin (2004) demonstrated the importance of incorporating customer choice behavior when modeling demand in operations management problems. They observed that customers make a choice among the available products after comparing them in terms of price, quality, and possibly other features. This choice process creates interactions among the demand for different products, and it is important to capture these interactions. One of the models that is most commonly used to capture the choice process of customers among different products is the multinomial logit model, pioneered by Luce (1959) and McFadden (1974). In this paper, we consider thed-level nested logit model, allowing for an arbitrary number of levelsd. In this model, each product is described by a list ofd features. As an example, if the products are flight itineraries, then such features might include the departure time, fare class, airline, and the number of connections. We describe the products by using a tree withd levels, where each level in the tree corresponds to a particular feature. When choosing a product, a customer first chooses the desired value of the first feature, say the departure time, which gives rise to a subset of compatible flight itineraries. Then, she chooses the value of the second feature, which further narrows down the set of compatible flight itineraries. Each subsequent feature selection reduces the compatible set of flight itineraries further, until we are left with a single flight itinerary afterd features have been selected. In this way, each product corresponds to a leaf node in the tree, whose unique path of lengthd from the root node describes the list ofd features of the product. 16 We consider both assortment optimization and pricing problems under thed-level nested logit model. In the assortment optimization problem, the prices of the product are fixed. The goal is to find the revenue- maximizing assortment of products to offer. In the pricing problem, the assortment of offered products is fixed, but the utility that a customer derives from a product depends on the price of the product. The goal is to find the revenue-maximizing set of prices for the products. We make contributions in terms of formulation of thed-level nested logit model, solution to the assortment optimization problem and solution to the pricing problem. We proceed to describing our contributions in each one of these three domains. We provide a novel formulation of thed-level nested logit model using a tree of depthd. Building on this formulation, we can describe the product selection probabilities through a succinct recursion over the nodes of the tree. Using the succinct description of the product selection probabilities, we formulate our assortment optimization and pricing problems. To our knowledge, this is the first paper to present a description of the d-level nested logit model that works for an arbitrary tree structure and an arbitrary number of levels. We provide a complete solution to the assortment optimization problem. We give a recursive charac- terization of the optimal assortment, which, in turn, leads to an efficient assortment optimization algorithm for computing the optimal assortment. For ad-level nested logit model withn products, the running time of the algorithm isO(dn logn). To our knowledge, this is the first solution to the assortment optimization problem under thed-level nested logit model. For the pricing problem, it is known that the expected revenue is not concave in the prices of the products even when the customers choose according to the two-level nested logit model. Thus, we cannot hope to find the optimal prices in general. Instead, we focus on finding stationary points, corresponding to the prices at which the gradient of the expected revenue function is zero. Using our recursive formulation of thed- level nested logit model, we give a simple expression for the gradient of the expected revenue. Although our gradient expression allows us to compute a stationary point by using a standard gradient ascent method, we find the gradient ascent method to be relatively slow and it requires a careful selection of step size. We develop a new iterative algorithm that generates a sequence of prices converging to a stationary point of the expected revenue function. Our algorithm is different from gradient ascent because the prices it generates do not follow the direction of the gradient. Furthermore, our algorithm completely avoids the problem of choosing a step size. Numerical experiments show that our algorithm converges to a stationary point of the expected revenue function much faster than gradient ascent. 17 2.2 Literature Review We focus our literature review on papers that use variants of the multinomial and nested logit models in assortment and price optimization problems. We refer the reader to K¨ ok et al. (2009) and Farias et al. (2013) for assortment optimization and pricing work under other choice models. The multinomial logit model dates back to the work of Luce (1959) and it is known to be consistent with random utility maximization. How- ever, this model suffers from the independence of irrelevant alternatives property, which implies that if a new product is added to an assortment, then the demand for each existing product decreases by the same percentage. This property should not hold when different products cannibalize each other to different extents and the multinomial logit model can lead to biased estimates of the selection probabilities in such settings; see Train (2003). The nested logit model avoids the independence of irrelevant alternatives property, while remaining compatible with random utility maximization; see B¨ orch-Supan (1990). Talluri and van Ryzin (2004) and Gallego et al. (2004) consider assortment optimization problems under the multinomial logit model. They show that the optimal assortment is revenue ordered in the sense that it includes a certain number of products with the largest revenues. Liu and van Ryzin (2008) provide an alternative proof of the same result. Gallego et al. (2011) consider generalizations of the multinomial logit model, in which the attractiveness of the no purchase option increases when more restricted assortments are offered to customers. They show that the assortment problem under this choice model remains tractable and make generalizations to the network revenue management setting, in which customers arriving into the system observe the assortment of available fare classes and make a choice among them. Bront et al. (2009) and Rusmevichientong et al. (2014) consider assortment optimization problems under the mixture of logits model. They show that the problem is NP-hard, propose heuristic solution methods and investigate integer programming formulations. The papers by van Ryzin and Mahajan (1999) and Li (2007) consider models in which a firm needs to make joint assortment optimization and stocking quantity decisions. There is a certain revenue and a procurement cost associated with each product. Once the firm chooses the products to offer and their corresponding stocking quantities, a random number of customers arrive into the system and each customer chooses among the offered products according to the multinomial logit model. The goal is to choose the assortment of offered products and the stocking quantities to maximize the expected profit. Li (2007) assumes that the demand for each product is proportional to a random store traffic volume, whereas van Ryzin and Mahajan (1999) model the demand for different products with random variables whose coefficients of variation decrease with demand volume, capturing economies of scale. The paper by van 18 Ryzin and Mahajan (1999) assumes that products have the same profit margin and shows that the optimal assortment includes a certain number of products with the largest attractiveness parameters in the choice model. If there arep products, then their result reduces the number of assortments to consider top. For a nested logit model withm nests andp products in each nest, even if their ordering result holds in each nest, the number of assortments to consider isp m . Assortment optimization under the two-level nested logit model has been considered only recently. K¨ ok and Xu (2010) study joint assortment optimization and pricing problems under the two-level nested logit model, where both the assortment of offered products and their prices are decision variables. They work with two nest structures. In the first nest structure, customers first choose a brand out of two brands, and then, a product type within the selected brand. In the second nest structure, customers first choose a product category, and then, a brand for the selected product category out of two brands. The authors show that the optimal assortment of product types within each brand is popularity ordered, in the sense that it includes a certain number of product types with the largest mean utilities. Thus, if there arep product types within each brand, then there arep possible assortments of product types to consider in a particular brand. Since there are two brands, the optimal assortment to offer is one ofp 2 assortments. The optimal assortment can be found by checking the performance of allp 2 assortments. The problem becomes intractable when there is a large number of brands. If there areb brands, then the number of possible assortments to consider isp b , which quickly gets large withb. We note that if we apply our assortment optimization algorithm in Section 2.5 to the two-level nested logit model withb brands andp product types within each brand, then the optimal assortment can be computed inO(2pb log(pb)) operations because the total number of products ispb. Davis et al. (2014) show how to compute the optimal assortment under the two-level nested logit model with an arbitrary number of nests. Assuming that there arem nests andp products within each nest (for a total ofmp products), the authors show that there are onlyp possible assortments of products to consider in each nest. Furthermore, they formulate a linear program to find the best one of these assortment to choose from each nest. In this way, their approach avoids checking the performance of all p m possible values for the optimal assortment. Gallego and Topaloglu (2014) extend this work to accommodate a variety of constraints on the offered assortment. Li and Rusmevichientong (2014) establish structural properties and use these properties to develop a greedy algorithm for computing the optimal assortment. All work so far focuses on the two-level nested logit model. To our knowledge, there is no assortment optimization work under the general d-level nested logit model. The linear program proposed by Davis et al. (2014) 19 and the greedy algorithm developed by Li and Rusmevichientong (2014) do not generalize to the d-level nested logit model. We note that, to our knowledge, assortment optimization problems under other variants of logit models – the mixture of logits, the cross nested logits (V ovsha 1997), paired combinatorial and generalized nested logit models (Wen and Koppelman 2001) – are not tractable. In this paper, we show that the assortment optimization problem under the nested logit model is tractable, regardless of the number of levels and the structure of the tree. For pricing problems under the multinomial logit model, Hanson and Martin (1996) point out that the expected revenue is not necessarily concave in prices. Song and Xue (2007) and Dong et al. (2009) make progress by noting that the expected revenue is concave in market shares. Chen and Hausman (2000) study the structural properties of the expected revenue function for a joint product selection and pricing prob- lem. There is some recent work on price optimization under the two-level nested logit model. Li and Huh (2011) consider the case in which the products in the same nest share the same price sensitivity parameter, and show that the expected revenue function is concave in market shares. Gallego and Wang (2013) gener- alize this model to allow for arbitrary price sensitivity parameters, but the expected revenue function is no longer concave in market shares. Similar to assortment optimization, all of the earlier work on pricing under the nested logit model focuses on two levels. Subsequent to our work, Li and Huh (2013) consider pricing problems under thed-level nested logit model. They give a characterization of the optimal prices, but do not provide an algorithm to compute them and do not address the assortment problem. The rest of the paper is organized as follows. After the literature review in Section 2.2, we describe the d-level nested logit model in Section 2.3 and formulate the assortment optimization problem. In Section, 2.4, we give a characterization of the optimal assortment. This characterization is translated into an algorithm for computing an optimal assortment in Section 2.5. In Section 2.6, we test the numerical performance of our assortment optimization algorithm. In Section 2.9, we formulate our pricing problem and give an algorithm to find a stationary point of the expected revenue function. In Section 2.10, we give numerical experiments that test the performance of our pricing algorithm. In Section 2.11, we provide practical motivation for the d-level nested logit model by showing that it is compatible with random utility maximization principle and equivalent to the elimination by aspects model. Also, we give practical situations where the assortment and price optimization problems studied in this paper become useful. We conclude in Section 2.12. 20 2.3 Assortment Optimization Problem We have n products indexed byf1; 2;:::;ng, and the no-purchase option is denoted by product 0. The taxonomy of the products is described by ad-level tree, denoted by T = (V;E) with vertices V and edges E. The tree hasn leaf nodes at depthd, corresponding to then products inf1; 2;:::;ng. Throughout the paper, we will use the terms depth and level interchangeably. A sample tree withd = 3 andn = 9 is given in Figure 2.1. The tree describes the process in which a customer chooses a product. Starting at the root node, denoted by root, the edges emanating from root represent the first criterion, or feature, that the customer uses in choosing her product. Each node in the first level corresponds to subsets of products that fit with a particular value of the first criterion. Product 0, corresponding to the no-purchase option, is labeled as node 0. It is located in level one and directly connected toroot, with no children. The edges in the second level represent the second criterion that a customer uses to narrow down her choices, and each level-two node represents a subset of products that are compatible with the particular values of the first two criteria. The same interpretation applies to other levels. We use Children(j) to denote the set of child nodes of nodej in the tree, andParent(j) to denote the parent node of nodej. Associated with each nodej is a set of productsN j f0; 1;:::;ng, whereN j is the set of products, or leaf nodes, that are included in the subtree rooted at nodej. In particular, ifj is a leaf node, thenN j =fjg, which is just a singleton consisting of the product itself. On the other hand, ifj is a non-leaf node, then N j = S k2 Children(j) N k , which is the disjoint union of the sets of products at the child nodes ofj. Note that if nodej is in levelh, thenN j corresponds to the subset of products that fit with the specific values of the first h criteria, as specified by the path fromroot to nodej. We refer to a subset of productsSf1; 2;:::;ng as an assortment. Each assortmentS defines a collection of subsets (S j : j2 V) at each node of the tree withS j N j , where ifj is a leaf node other than node 0, then S j = 8 < : fjg if j2S ? if j = 2S: If j corresponds to node 0, then we set S 0 = f0g by convention. If j is a non-leaf node, then S j = S k2 Children(j) S k . Thus,S j corresponds to the products inS that are included in the subtree rooted at node j; that is,S j =S\N j . Often times, we refer to an assortmentS by its collection of subsets (S j :j2 V), and write S = (S j : j 2 V). To illustrate the notation so far, we consider the tree shown in Figure 21 0 14 15 10 11 12 13 1 2 3 4 5 6 7 8 9 root Figure 2.1: An illustration of the model formulation. 2.1. Including the no-purchase option, there are 10 products in this tree, and they are given byf0; 1;:::; 9g. The nodes of the tree are indexed byf0; 1;:::; 15g. We have, for example, Children(10) =f1; 2; 3g, Parent(3) = 10,N 2 =f2g,N 10 =f1; 2; 3g, andN 14 =f1; 2; 3; 4; 5g. For an assortmentS =f1; 3; 4; 6g, we haveS 1 =f1g,S 2 =?,S 10 =f1; 3g,S 11 =f4g,S 13 =?,S 14 =f1; 3; 4g, andS root =f0; 1; 3; 4; 6g. Associated with each leaf nodej, we have the attractiveness parameterv j , capturing the attractiveness of the product corresponding to this leaf node. Given an assortmentS = (S j :j2 V), a customer associates the preference weight V j (S j ) with each node j2 V, which is a function of the offered assortment S = (S j :j2V) and the attractiveness parameters of the products (v 1 ;:::;v n ). To make her choice, a customer starts from root in the tree and walks over the nodes of the tree in a probabilistic fashion until she reaches a leaf node. In particular, if the customer is at a non-leaf nodej, then she follows nodek2 Children(j) with probability V k (S k )= P `2 Children(j) V ` (S ` ). Thus, the customer is more likely to visit the nodes that have higher preference weights. Once the customer reaches a leaf node, if this leaf node corresponds to a product, then she purchases the product corresponding to this leaf node. If this leaf node corresponds to the no-purchase option, then she leaves without making a purchase. As a function of the offered assortment S = (S j : j2 V) and the attractiveness parameters of the products (v 1 ;:::;v n ), the preference weight V j (S j ) for each nodej in the tree is computed as follows. Ifj is a leaf node, thenS j is eitherfjg or?. In this case, the preference weight of a leaf nodej is defined asV j (fjg) =v j andV j (?) = 0. We assume that the no-purchase option, node 0, has a preference weightv 0 so thatV 0 (S 0 ) =v 0 . For each non-leaf nodej, the preference weight of this node is computed as V j (S j ) = 0 @ X k2 Children(j) V k (S k ) 1 A j ; (2.1) 22 where j 2 (0; 1] is a parameter of the nested logit model associated with node j. As we discuss in Section 2.11, thed-level nested logit model can be derived by appealing to the random utility maximization principle. Roughly speaking, 1 j is a measure of correlation between the utilities of the products that are descendants of nodej. If j is closer to zero, then the utilities of the products that are descendants of node j are more positively correlated. If j = 1, then nodej can be omitted from the tree by connecting each child of nodej directly to the parent of nodej. If j = 1 for allj2 V, then each leaf node can directly be connected to root and we recover the multinomial logit model, corresponding to the case in which the utilities of all products are independent; see McFadden (1978) and Koppelman and Sethi (2000). Since V j (S j ) determines the probability that a customer chooses nodej in her choice process and the customer already starts atroot without ever coming back toroot,V root (S root ) does not play a role at all in the choice process. Therefore, without loss of generality, we set root = 0 so that we always haveV root (S root ) = 1. The discussion in the paragraph above describes the choice process under thed-level nested logit model. Next, we proceed to formulating our assortment optimization problem. We use r j to denote the revenue associated with productj. Given that we offer an assortmentS = (S j :j2V), we useR j (S j ) to denote the expected revenue obtained from a customer that is at nodej in the tree during the course of her choice process. If the customer is at leaf nodej and the product corresponding to this leaf node is offered, then the customer purchases the product and a revenue ofr j is obtained from this customer. If the customer is at leaf nodej and the product corresponding this leaf node is not offered, then no revenue is obtained from this customer. Therefore, ifj is a leaf node, then we can capture the expected revenue from a customer at this node by definingR j (fjg) =r j andR j (?) = 0. If a customer is at node 0 corresponding to the no-purchase option, then no revenue is obtained from the customer. Thus, we immediately haveR 0 (S 0 ) = 0. On the other hand, as mentioned in the paragraph above, if the customer is at a non-leaf nodej, then she chooses nodek2 Children(j) with probabilityV k (S k )= P `2 Children(j) V ` (S ` ). In this case, we can recursively write the expected revenue from a customer at a non-leaf nodej as R j (S j ) = X k2 Children(j) V k (S k ) P `2 Children(j) V ` (S ` ) R k (S k ) = P k2 Children(j) V k (S k )R k (S k ) P k2 Children(j) V k (S k ) : (2.2) 23 Since each customer starts her choice process fromroot, if we offer the assortmentS = (S j :j2V), then the expected revenue obtained from a customer isR root (S root ). We want to find an assortment that maximizes the expected revenue from a customer atroot, yielding the assortment problem Z = max Sf1;:::;ng R root (S root ): (2.3) Throughout, we useS = (S j :j2V) to denote an optimal solution to the assortment problem above. The objective function of this assortment problem is defined in a recursive fashion and involves nonlinearities, but it turns out that we can solve this problem in a tractable manner. 2.4 Properties of An Optimal Assortment In this section, we give a characterization of the optimal assortment. We use this characterization in the next section to develop an algorithm for computing the optimal assortment. To give our characterization, we let S = (S j :j2V) denote an optimal assortment and define the scalars (t j : j2 V) for each node in the tree as t j = max n t Parent(j) ; j t Parent(j) + (1 j )R j (S j ) o ; with the convention that t Parent(root) = 0. Since root = 0, we have t root = R root (S root ) = Z . If the optimal assortment is known, then we can compute (V j (S j ) : j 2 V) by starting from the leaf nodes, enumerating the nodes of the tree in a breadth-first manner and using (2.1). Once we have (V j (S j ) :j2V), we can compute (R j (S j ) : j2 V) in a similar fashion but by using (2.2) this time. Finally, once we have (R j (S j ) : j 2 V), we can compute (t j : j 2 V) starting from root and enumerating the nodes of the tree in a breadth-first manner. To characterize the optimal assortment, for each node j, we consider the optimization problem max S j N j V j (S j ) R j (S j )t Parent(j) : (2.4) Problem (2.4) only considers the products inN j , which are the products included in the subtree rooted at nodej. Thus, we refer to problem (2.4) as the local problem at nodej. As mentioned in Section 2.3, we haveV root (S root ) = 1. Sincet Parent(root) = 0, comparison of problems (2.3) and (2.4) shows that the local problem atroot is identical to the assortment optimization problem in (2.3). 24 Since the local problem at root is identical to the assortment problem in (2.3), an optimal solution to problem (2.3) also solves the local problem at root. In the following proposition, we generalize this observation by showing that ifS = (S j : j2 V) is an optimal solution to the assortment optimization problem in (2.3), then for any nodej,S j solves the local problem at nodej. Proposition 2.4.1 (Recovering Optimal Assortment). Let S = (S j : j2 V) be an optimal solution to problem (2.3). Then, for allj2V,S j is an optimal solution to the local problem at nodej. The proof of this proposition is in Appendix A.1. We interpret t Parent(j) as the minimum expected revenue from a customer at nodej to make it worthwhile to offer any of the products included in the subtree rooted at nodej. If it is not possible to obtain an expected revenue oft Parent(j) or more from a customer at nodej, then we are better off not offering any of the products included in the subtree rooted at nodej. To see this interpretation, note that ifS = (S j : j2 V) is an optimal assortment for problem (2.3), then by Proposition 2.4.1,S j solves the local problem at nodej. Also, the optimal objective value of the local problem at nodej has to be non-negative, because the empty set trivially yields an objective value of zero for problem (2.4). Thus, ifS j 6=? so that it is worthwhile to offer any of the products included in the subtree rooted at nodej, then we must haveR j (S j )t Parent(j) . Otherwise,S j provides a negative objective value for the local problem at nodej, contradicting the fact thatS j solves the local problem at nodej. If j is a leaf node, then the local problem at node j given in (2.4) can be solved efficiently because N j =fjg so thatS j is simply eitherfjg or?. The crucial question is whether we can use a bottom-up approach and construct a solution for the local problem at any node j by using the solutions to the local problems at its child nodes. The following theorem affirmatively answers this question by showing that we can synthesize the solution of the local problem at nodej by using the solutions of the local problems at each of its children. Theorem 2.4.2 (Synthesis from Child Nodes). Consider an arbitrary non-leaf node j 2 V. If for all k2 Children(j), b S k is an optimal solution to the local problem at nodek, then S k2 Children(j) b S k is an optimal solution to the local problem at nodej. Thus, if we solve the local problem at all nodesk2 Children(j) and take the union of the optimal solutions, then we obtain an optimal solution to the local problem at nodej. We give the proof of Theorem 2.4.2 in Appendix A.1. In Section 2.5, we use this theorem to develop an algorithm for computing the 25 optimal assortment. The next corollary shows that the optimal assortment can be obtained by comparing the revenues of the products with the scalars (t j :j2V). Corollary 2.4.3 (Revenue Comparison). Let b S = n `2f1; 2;:::;ng :r ` t Parent(`) o . Then, b S is an optimal solution to problem (2.3). Proof. Let b S j b S\N j = n `2N j :r ` t Parent(`) o , in which case, b S root = b S. By induction on the depth of nodej, we will show that b S j is an optimal solution to the local problem at nodej. Since the local problem atroot is identical to problem (2.3), the desired result follows. Ifj is a leaf node, thenN j =fjg, R j (fjg) =r j ,V j (fjg) =v j , andR j (?) =V j (?) = 0. So, arg max S j N j V j (S j ) R j (S j )t Parent(j) = arg max n v j r j t Parent(j) ; 0 o = 8 < : fjg if r j t Parent(j) ? if r j <t Parent(j) = b S j ; establishing the result for a leaf nodej. Now, suppose that the result is true for all nodes at depthh and consider nodej at depthh 1. By the induction hypothesis, for allk2 Children(j), b S k is an optimal solution to the local problem at nodek. So, by Theorem 2.4.2, S k2 Children(j) b S k is an optimal solution to the local problem at nodej. To complete the proof, note that [ k2 Children(j) b S k = [ k2 Children(j) n `2N k :r ` t Parent(`) o = n `2N j :r ` t Parent(`) o = b S j ; where the second equality uses the fact thatN j = S k2 Children(j) N k . 2 2 By the above result, if product` is in the optimal assortment, then any larger-revenue product sharing the same parent with product` is also in the optimal assortment. As shown below, for a tree with one or two levels, this result recovers the optimality of revenue ordered assortments found in the existing literature for the multinomial and two-level nested logit models. Example 2.4.4 (Multinomial Logit). The multinomial logit model with n products, indexed by f1; 2;:::;ng, corresponds to a one-level nested logit model, withn + 1 leaf nodes connected toroot. The first n of these nodes correspond to individual products and the last one corresponds to the no-purchase 26 option. Since these is only one level in the tree, each leaf node has root as its parent. So, it follows from Corollary 2.4.3 that an optimal assortment is given by S = n `2f1; 2;:::;ng :r ` t Parent(`) o =f`2f1; 2;:::;ng :r ` Z g ; where the second equality follows becauset Parent(`) = t root = Z . The last expression above shows that if a particular product is in the optimal assortment, then any product with a larger revenue is in the optimal assortment as well. Example 2.4.5 (Two-Level Nested Logit). In the two-level nested logit model, the set of n products f1; 2;:::;ng = S m i=1 N i is a disjoint union of m sets. The products in N i are said to be in nest i. In this case, the associated tree hasm + 1 first-level nodes. The last one of these first-level nodes corresponds to the no-purchase option. For the otherm first-level nodes, each first-level nodei hasjN i j children. Using Corollary 2.4.3, we have S = n `2f1;:::;ng :r ` t Parent(`) o = m [ i=1 n `2N i :r ` t Parent(`) o = m [ i=1 n `2N i :r ` t i o ; where the last equality uses the fact that the node corresponding to each product` in nesti has the first-level nodei as its parent. The last expression above shows that if a product in nesti is included in the optimal assortment, then any product in this nest with a larger revenue is included in the optimal assortment as well. This characterization of the optimal assortment for the two-level nested logit model was proved in Davis et al. (2014). 2.5 An Algorithm for Assortment Optimization In this section, we use the characterization of an optimal assortment given in Section 2.4 to develop an algorithm that can be used to solve the assortment optimization problem in (2.3). For any set X, let 2 X denote the collection of all subsets ofX. To find an optimal assortment, for each nodej, we will construct a collection of subsetsA j 2 N j such thatA j includes an optimal solution to the local problem at nodej. As pointed out in Section 2.4, the local problem atroot is identical to the assortment optimization problem in (2.3). Thus, the collectionA root includes an optimal solution to problem (2.3) and if the collectionA root 27 is reasonably small, then we can check the expected revenue of each subset inA root to find the optimal assortment. To construct the collectionA j for each nodej, we start from the leaf nodes and move up the tree by going over the nodes in a breadth-first manner. If` is a leaf node, thenN ` =f`g, so that the optimal solution to the local problem at node` is eitherf`g or?. Thus, if we letA ` =ff`g;?g for each leaf node`, then the collectionA ` includes an optimal solution to the local problem at node`. Now, consider a non-leaf nodej and assume that for each nodek2 Children(j), we already have a collectionA k that includes an optimal solution to the local problem at nodek. To construct a collectionA j that includes an optimal solution to the local problem at nodej, for eachk2 Children(j) andu2R, we let b S k (u) be an optimal solution to the problem max S k 2A k V k (S k ) (R k (S k )u): (2.5) Note that problem (2.5) considers only the subsets in the collectionA k . We claim that the collection n S k2 Children(j) b S k (u) :u2R o includes an optimal solution to the local problem at nodej. Claim 1: LetA j = n S k2 Children(j) b S k (u) :u2R o . Then, the collection of subsetsA j includes an optimal solution to the local problem at nodej given in problem (2.4). The objective function of problem (2.5) with u = t Parent(k) is identical to the objective function of the local problem at node k. Furthermore, by our hypothesis,A k includes an optimal solution to the local problem at node k. Thus, if we solve problem (2.5) with u = t Parent(k) , then the optimal solution b S k (t Parent(k) ) is an optimal solution to the local problem at nodek. In this case, Theorem 2.4.2 implies that the assortment S k2 Children(j) b S k (t Parent(k) ) is an optimal solution to the local problem at node j. Since S k2 Children(j) b S k (t Parent(k) )2A j , the collectionA j includes an optimal solution to the local problem at nodej, establishing the claim. Next, we show that all of the subsets in the collectionA j = n S k2 Children(j) b S k (u) :u2R o can be constructed in a tractable fashion. Claim 2: We havejA j j P k2 Children(j) jA k j, and all of the subsets in the collectionA j can be computed inO P k2 Children(j) jA k j logjA k j operations. 28 , () , () , () , () Figure 2.2: The linesfL k;Sk () : S k 2A k g and the pointsfI q k : q = 0;:::;Q k g for some collection A k =fS 1 k ;S 2 k ;S 3 k ;S 4 k g. For each S k 2 A k , let L k;S k (u) = V k (S k ) (R k (S k ) u), in which case, problem (2.5) can be written as max S k 2A k L k;S k (u). Since L k;S k (u) is a linear function of u, with slope V k (S k ) and y-intercept V k (S k )R k (S k ), problem (2.5) reduces to finding the highest line at point u among the lines fL k;S k () : S k 2A k g. By checking the pairwise intersection points of the linesfL k;S k () : S k 2A k g, we can find points1 = I 0 k I 1 k ::: I Q k 1 k I Q k k =1 withQ k =jA k j such that the highest one of the linesfL k;S k () : S k 2A k g does not change as long as u takes values in one of the intervals f[I q1 k ;I q k ] :q = 1;:::;Q k g. Figure 2.2 shows an example of the linesfL k;S k () :S k 2A k g for some col- lectionA k =fS 1 k ;S 2 k ;S 3 k ;S 4 k g. The solid lines in this figure correspond to the lines infL k;S k () :S k 2A k g and the circles correspond to the points infI q k : q = 0;:::;Q k g, withQ k = 4,I 0 k =1 andI 4 k =1. Thus, the discussion in this paragraph shows that we can constructQ k intervalsf[I q1 k ;I q k ] :q = 1;:::;Q k g such that these intervals partition the real line and the optimal solution to problem (2.5) does not change whenu takes values in one of these intervals. It is a standard result in analysis of algorithms that we can compute the intersection pointsfI q k :q = 0;:::;Q k g usingO(Q k logQ k ) =O(jA k j logjA k j) operations; see Kleinberg and Tardos (2005). Following the approach described in the above paragraph, we can construct the collection of points fI q k :q = 0;:::;Q k g for allk2 Children(j). Putting these collections together, we obtain the collection of points S k2 Children(j) fI q k :q = 0;:::;Q k g. Note thatI 0 k =1 andI Q k k =1 for allk2 Children(j). Ordering the points in this collection in increasing order and removing duplicates at1 and1, we obtain 29 the collection of pointsfI q j :q = 0;:::;Q j g, with1 =I 0 j I 1 j :::I Q j 1 j I Q j j =1. Note that Q j 1 + P k2 Children(j) (Q k 1) P k2 Children(j) Q k . The pointsfI q j : q = 0;:::;Q j g will form the basis for constructing the collection of subsetsA j . We make a crucial observation. Consider an arbitrary interval [I q1 j ;I q j ], with 1 q Q j . By our construction of the pointsfI q j : q = 0;:::;Q j g in the above paragraph, for all k 2 Children(j), there exists some index 1 k Q k such that [I q1 j ;I q j ] [I k 1 k ;I k k ]. On the other hand, we recall that the optimal solution to problem (2.5) does not change asu takes values in one of the intervals f[I q1 k ;I q k ] : q = 1;:::;Q k g. Therefore, since [I q1 j ;I q j ] [I k 1 k ;I k k ], the optimal solution to problem (2.5) does not change whenu takes values in the interval [I q1 j ;I q j ]. In other words, the collection of subsets n b S k (u) : u2 [I q1 j ;I q j ] o can be represented by just a single subset , which implies that the collection of subsets n S k2 Children(j) b S k (u) : u2 [I q1 j ;I q j ] o can also be represented by a single subset. Let b S q j denote the single subset that represents the collection of subsets n S k2 Children(j) b S k (u) : u2 [I q1 j ;I q j ] o . In this case, we obtain A j = n S k2 Children(j) b S k (u) :u2R o = Q j [ q=1 n S k2 Children(j) b S k (u) :u2 [I q1 j ;I q j ] o = n b S q j :q = 1; 2;:::;Q j o : (2.6) Thus, the number of subsets in the collection A j satisfies jA j j = Q j P k2 Children(j) Q k = P k2 Children(j) jA k j. Finally, the number of operations required to compute the collectionA j is equal to the running time for computing the intersection pointsfI q k : q = 0;:::;Q k g for eachk2 Children(j), giving the total running time ofO P k2 Children(j) jA k j logjA k j . The following lemma summarizes the key findings from the above discussion. Lemma 2.5.1. Consider an arbitrary non-leaf node j 2 V. For each k 2 Children(j), suppose there exists a collection of subsetsA k 2 N k such thatA k contains an optimal solution to the local problem at nodek. Then, we can construct a collection of subsetsA j 2 N j such thatA j includes an optimal solu- tion to the local problem at nodej withjA j j P k2 Children(j) jA k j. Moreover, this construction requires O P k2 Children(j) jA k j logjA k j operations. To obtain an optimal assortment, we iteratively apply Lemma 2.5.1 starting at the leaf nodes. For each leaf node`, lettingA ` =ff`g;?g suffices. Then, using the above construction and going over the nodes in 30 3 0 14 15 10 11 12 13 1 2 4 5 6 7 8 9 root j 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 root v j 17 4 5 10 6 9 4 7 10 13 r j 0 14.8 9 5 13.4 7.5 15 8 18 6 j 0.93 0.87 0.89 0.67 0.75 0.9 0 Figure 2.3: A three-level problem instance and its parameters. a breadth-first manner, we can construct the collectionA j for every nodej in the tree. To find an optimal assortment, we check the expected revenue of each subset in the collectionA root and choose the subset that provides the largest expected revenue. The following theorem shows that all of this work can be done in O(dnlogn) operations. Theorem 2.5.2. For each nodej2 V, we can construct a collectionA j 2 N j such thatjA j j 2n and A j includes an optimal solution to the local problem at nodej. Moreover, constructingA j for allj2 V requiresO(dnlogn) operations. As a function offjA k j :k2 Children(j)g, Lemma 2.5.1 bounds the number of subsets in the collec- tionA j and gives the number of operations required to construct the collectionA j . We can show Theorem 2.5.2 by using these relationships iteratively. The details are in Appendix A.2. 2.6 Numerical Illustrations for Assortment Optimization In this section, we demonstrate our assortment optimization algorithm on one small problem instance and report its practical performance when applied on large problem instances. For the demonstration purpose, we consider a three-level nested logit model with nine products indexed by 1; 2;:::; 9, along with a no-purchase option. We index the nodes of the tree byf0; 1;:::; 15g[frootg, where the nodesf0; 1;:::; 9g correspond to leaf nodes and the rest are non-leaf nodes. The top portion of Figure 2.3 shows the organization of the tree, whereas the bottom portion shows the model parameters. The parametersr j andv j are given for each leaf nodej, whereas the parameter j is given for each non-leaf nodej. 31 S 10 f1; 2; 3g f1; 2g f1g ? V 10 (S 10 ) 15.46 7.72 3.63 0 V 10 (S 10 )R 10 (S 10 ) 125.48 89.35 53.73 0 Optimal in Interval [1; 4:67] [4:67; 8:72] [8:72; 14:80] [14:80;1] S 11 f4; 5g f4g ? V 11 (S 11 ) 10.55 4.75 0 V 11 (S 11 )R 11 (S 11 ) 104.01 63.69 0 Optimal in Interval [1; 6:96] [6:96; 13:40] [13:40;1] u [1; 4:67] [4:67; 6:96] [6:96; 8:72] [8:72; 13:40] [13:40; 14:80] [14:8;1] b S 10 (u)[ b S 11 (u) f1; 2; 3; 4; 5g f1; 2; 4; 5g f1; 2; 4g f1; 4g f1g ? Table 2.1: Computation ofA 14 . For this problem instance, if we letA 10 = ff1; 2; 3g;f1; 2g;f1g;?g, then one can verify that this collection includes an optimal solution to the local problem at node 10. Similarly, if we letA 11 = ff4; 5g;f4g;?g, then this collection includes an optimal solution to the local problem at node 11. By using the discussion in Section 2.5, we will construct a collectionA 14 such that this collection includes an optimal solution to the local problem at node 14. To that end, for each S 10 2A 10 , we consider the line L 10;S 10 (u) = V 10 (S 10 ) (R 10 (S 10 )u) with the slopeV 10 (S 10 ) and y-intercept V 10 (S 10 )R 10 (S 10 ). The second and third rows in the top portion of Table 2.1 list the slopes and intercepts for the lines corre- sponding to the four assortments in the collectionA 10 . Finding the pairwise intersection points of these four lines, we determine that if u 2 [1; 4:67], then the highest one of these lines is the one corre- sponding to the subsetf1; 2; 3g. Similarly, if u 2 [4:67; 8:72], then the highest line is the one corre- sponding to the subsetf1; 2g. In other words, following the notation in Section 2.5, if u2 [1; 4:67], then the optimal solution to problem (2.5) for k = 10 is given by b S 10 (u) = f1; 2; 3g. Similarly, if u2 [4:67; 8:72], then b S 10 (u) =f1; 2g. The last row in the top portion of Table 2.1 lists the intervals such that ifu belongs to one of these intervals, then b S 10 (u) takes the value in the first row. Thus, the points fI q 10 :q = 0; 1;:::; 4g are given byf1; 4:67; 8:72; 14:80;1g. The middle portion of Table 2.1 focuses on the lines corresponding to the assortments in the collectionA 11 . The formats of the top and middle por- tions of the table are the same. Thus, the pointsfI q 11 :q = 0; 1;:::; 3g are given byf1; 6:96; 13:40;1g. Taking the union of these two sets of points, we obtainf1; 4:67; 6:96; 8:72; 13:40; 14:80;1g. In this case, we note that if u 2 [1; 4:67], then the optimal solution to problem (2.5) for k = 10 is given by b S 10 (u) = f1; 2; 3g, whereas the optimal solution to problem (2.5) for k = 11 is given by b S 11 (u) = f4; 5g. Thus, if u 2 [1; 4:67], then S k2 Children(14) b S k (u) = f1; 2; 3; 4; 5g. Similarly, if u 2 [4:67; 6:96], then the optimal solution to problem (2.5) for k = 10 is given by b S 10 (u) = f1; 2g, 32 S 12 f6; 7g f6g ? V 12 (S 12 ) 8.45 3.43 0 V 12 (S 12 )R 12 (S 12 ) 89.11 51.51 0 Optimal in Interval [1; 7:50] [7:50; 15:00] [15:00;1] S 13 f8; 9g f8g ? V 13 (S 13 ) 8.17 4.68 0 V 13 (S 13 )R 13 (S 13 ) 91.67 84.19 0 Optimal in Interval [1; 2:14] [2:14; 18:00] [18:00;1] u [1; 2:14] [2:14; 7:50] [7:50; 15:00] [15:00; 18:00] [18:00;1] b S 12 (u)[ b S 13 (u) f6; 7; 8; 9g f6; 7; 8g f6; 8g f8g ? Table 2.2: Computation ofA 15 . whereas the optimal solution to problem (2.5) for k = 11 is given by b S 11 (u) =f4; 5g, which implies that if u2 [4:67; 6:96], then S k2 Children(14) b S k (u) =f1; 2; 4; 5g. Following this line of reasoning, the bottom portion of Table 2.1 lists S k2 Children(14) b S k (u) for every value of u2 R. Noting (2.6), we have A 14 = n S k2 Children(14) b S k (u) :u2R o =ff1; 2; 3; 4; 5g;f1; 2; 4; 5g;f1; 2; 4g;f1; 4g;f1g;?g. On the other hand, if we letA 12 =ff6; 7g;f6g;?g, then one can verify that this collection includes an optimal solution to the local problem at node 12. Similarly, if we letA 13 =ff8; 9g;f8g;?g, then this collection includes an optimal solution to the local problem at node 13. We use an argument similar to the one in the above paragraph to construct a collectionA 15 from the collectionsA 12 andA 13 , such that the collectionA 15 includes an optimal solution to the local problem at node 15. Table 2.2 gives the details of our computations. The format of this table is identical to that of Table 2.1. The last row of the top portion shows that whenu takes values in the intervals [1; 7:50], [7:50; 15:00] and [15:00;1], the optimal solution to problem (2.5) fork = 12 is, respectively, b S 12 (u) =f6; 7g, b S 12 (u) =f6g and b S 12 (u) =?. The last row of the middle portion shows that whenu takes values in the intervals [1; 2:14], [2:14; 18:00] and [18:00;1], the optimal solution to problem (2.5) for k = 13 is, respectively, b S 13 (u) =f8; 9g, b S 13 (u) =f8g and b S 13 (u) = ?. Collecting these results together, the last row of the bottom portion of Table 2.2, under the heading b S 12 (u)[ b S 13 (u), lists S k2 Children(15) b S k (u) for every value of u2 R. Thus, we haveA 15 = ff6; 7; 8; 9g;f6; 7; 8g;f6; 8g;f8g;?g. We have the collectionsA 14 andA 15 such that these collections include an optimal solution to the local problems at nodes 14 and 15, respectively. Finally, we use these collections to construct a collectionA root such that this collection includes an optimal solution to the local problem at root. Our computations are given in Table 2.3 and they are similar to those in Tables 2.1 and 2.2. For example, the last row of the top portion shows that when u takes values in the intervals [1; 3:02] and [3:02; 5:50], the optimal solution 33 S 14 f1; 2; 3; 4; 5g f1; 2; 4; 5g f1; 2; 4g f1; 4g f1g ? V 14 (S 14 ) 11.52 8.84 6.64 4.93 2.63 0 V 14 (S 14 )R 14 (S 14 ) 101.62 93.53 81.44 69.01 38.92 0 Highest in Interval [1; 3:02] [3:02; 5:50] [5:50; 7:27] [7:27; 13:10] [13:10; 14:80] [14:80;1] S 15 f6; 7; 8; 9g f6; 7; 8g f6; 8g f8g ? V 15 (S 15 ) 12.55 10.15 6.58 4.01 0 V 15 (S 15 )R 15 (S 15 ) 136.48 133.96 110.08 72.16 0 Highest in Interval [1; 1:05] [1:05; 6:69] [6:69; 14:75] [14:75; 18:00] [18:00;1] u [1; 1:05] [1:05; 3:02] [3:02; 5:50] [5:50; 6:69] [6:69; 7:27] b S 14 (u)[ b S 15 (u) f1; 2; 3; 4; 5; 6; 7; 8; 9g f1; 2; 3; 4; 5; 6; 7; 8g f1; 2; 4; 5; 6; 7; 8g f1; 2; 4; 6; 7; 8g f1; 2; 4; 6; 8g R root ( b S 14 (u)[ b S 15 (u)) 5.80 6.09 6.32 6.38 6.34 u [7:27; 13:10] [13:10; 14:75] [14:75; 14:80] [14:80; 18:00] [18:00;1] b S 14 (u)[ b S 15 (u) f1; 4; 6; 8g f1; 6; 8g f1; 8g f8g ? R root ( b S 14 (u)[ b S 15 (u)) 6.28 5.68 4.70 3.43 0.00 Table 2.3: Computation ofA root . to problem (2.5) fork = 14 is, respectively, b S 14 (u) =f1; 2; 3; 4; 5g and b S 14 (u) =f1; 2; 4; 5g. The last row of the middle portion shows that whenu takes values in the intervals [1; 1:05] and [1:05; 6:69], the optimal solution to problem (2.5) fork = 15 is, respectively, b S 15 (u) =f6; 7; 8; 9g and b S 15 (u) =f6; 7; 8g. Collecting these results together, the second row of the bottom portion of Table 2.3, under the heading b S 14 (u)[ b S 15 (u), lists S k2 Children(root) b S k (u) for every value ofu2R. These subsets form the collection A root , which includes an optimal solution to the local problem at root. Since the local problem at root is equivalent to our assortment optimization problem, the collectionA root includes an optimal assortment. The last row of the bottom portion of Table 2.3, under the headingR root ( b S 14 (u)[ b S(u) 15 ), lists the expected revenue from each assortment inA root and the highest expected revenue is highlighted in bold. Therefore, the optimal assortment isf1; 2; 4; 6; 7; 8g with an expected revenue of 6:38. 2.7 Practical Performance of Assortment Optimization Algorithm We test the practical performance of our assortment optimization algorithm by applying it on test problems with two and three levels in the tree. To our knowledge, our algorithm is the only efficient approach for solving the assortment problem when there are three levels in the tree. For the test problems with three levels, we compare our assortment optimization algorithm with a full enumeration scheme that checks the expected revenue from every possible assortment. For the test problems with two levels, Davis et al. (2014) show that the optimal assortment can be obtained by solving a linear program. For the test problems with two levels, we compare our assortment optimization algorithm with the full enumeration scheme and the 34 linear programming approach in Davis et al. (2014). We refer to our assortment optimization algorithm, the full enumeration scheme and the linear programming approach as AA, FE and LP, respectively. Our numerical experiments indicate that our approach provides the optimal assortment within about one to two seconds for these test problems. For the test problems with three levels, we usem h to denote the number of children of each node at depth h and we vary (m 0 ;m 1 ;m 2 ) overf2; 4; 8gf2; 4; 8gf2; 4; 8g. Therefore, smallest problem instances have 2 3 = 8 products, whereas largest ones have 8 3 = 512 products. We refer to each combination of (m 0 ;m 1 ;m 2 ) as a problem class. In each problem class, we randomly generate 200 problem instances. To generate each problem instance, we sample the parametersr j andv j for each leaf node from the uniform distribution over [0; 5] and the parameter j for each non-leaf node from the uniform distribution over [0; 1]. We use the same approach to generate the test problems with two levels, but we vary (m 0 ;m 1 ) over f2; 4; 8gf2; 4; 8g. Table 2.4 gives our results. In the left portion of the table, the first three columns show the parame- ters (m 0 ;m 1 ;m 2 ) for each problem class with three levels. The next two columns show the running times for AA and FE, averaged over 200 problem instances in a particular problem class. When the number of products exceeds 16, FE is not able to check the expected revenues from all 2 n assortments in a reason- able amount of time. For such problem instances, since the number of operations required to compute the expected revenue from each assortment is the same, we check the expected revenues from 1,000 randomly sampled assortments and linearly extrapolate the running time for checking the expected revenues from all 2 n assortments. The format of the right portion of Table 2.4 is similar to that of the left portion, but the right portion focuses on the test problems with two levels and gives the average running times for AA, FE and LP. Considering the test problems with three levels in the left portion of Table 2.4, our results indicate that the running time for AA can be substantially faster than the running time for FE. For the problem class (2; 2; 2), which involves the smallest number of products, the average running times for AA and FE are comparable. For all other problem classes, the average running time for AA is smaller than the average running time for FE by orders of magnitude. The running time of AA is reasonable. Even for the largest problem instances with 512 products, the average running time for AA is slightly above one second. Over all of our test problems, the number of assortments in the collectionA root constructed by AA is roughly equal to the number of products. Considering the test problems with two levels in the right portion of Table 2.4, our results indicate that the running time for LP is generally faster than the running 35 Prb. Class Avg. Secs. Avg. Secs. m 0 m 1 m 2 for AA for FE 2 2 2 1.6410 2 9.5010 3 2 2 4 2.2910 2 2.4910 0 2 2 8 3.1710 2 1.5710 5 2 4 2 3.0510 2 2.4710 0 2 4 4 5.2310 2 1.8410 5 2 4 8 7.3510 2 7.0710 14 2 8 2 7.9410 2 1.6810 5 2 8 4 1.3010 1 7.2410 14 2 8 8 2.2110 1 1.4010 34 4 2 2 2.9610 2 2.5010 0 4 2 4 4.1710 2 1.6510 5 4 2 8 6.7510 2 7.1310 14 4 4 2 5.8910 2 1.6710 5 4 4 4 9.6810 2 7.3310 14 4 4 8 1.6010 1 1.3910 34 4 8 2 1.6210 1 7.6410 14 4 8 4 2.7010 1 1.4710 34 4 8 8 4.7210 1 5.3610 72 8 2 2 6.8310 2 1.7110 5 8 2 4 1.0510 1 7.4110 14 8 2 8 1.7110 1 1.4010 34 8 4 2 1.4110 1 7.8410 14 8 4 4 2.3810 1 1.5510 34 8 4 8 3.9410 1 5.3710 72 8 8 2 3.8610 1 1.6310 34 8 8 4 6.3210 1 5.9110 72 8 8 8 1.1010 0 7.7610 149 Prb. Class Avg. Secs. Avg. Secs. Avg. Secs. m 0 m 1 for AA for FE for LP 2 2 8.4110 3 3.7710 4 2.9410 3 2 4 1.1110 2 5.1010 3 3.0010 3 2 8 1.1410 2 1.0210 0 3.0310 3 4 2 1.3210 2 5.1010 3 3.0310 3 4 4 1.9810 2 1.1510 0 3.1710 3 4 8 2.7710 2 7.5910 4 3.2510 3 8 2 3.6310 2 1.1910 0 3.2610 3 8 4 5.4410 2 7.3610 4 3.1810 3 8 8 8.1610 2 3.1610 14 3.4210 3 Table 2.4: Running times for AA, FE and LP for the test problems with two and three levels. time for AA. For a two-level tree with m 0 and m 1 child nodes in the two levels of the tree, there are m 0 m 1 products and the linear program proposed by Davis et al. (2014) hasO(m 0 ) decision variables and O(m 0 m 1 ) constraints. The number of operations required to solve a general linear program of this size is O(m 4 0 m 1 ). In contrast, Theorem 2.5.2 shows that AA requires O(m 0 m 1 log(m 0 m 1 )) operations. So, the computational complexity for AA is better than the computational complexity for solving the linear program used by LP, but many constraints of the linear program used by LP have only two non-zero entries and it appears that linear programming software can effectively exploit this sparse structure to obtain quite satisfactory running times for LP. For these problem instances, we observe that LP is faster than the greedy algorithm in Li and Rusmevichientong (2014). Thus, our results indicate that LP is a strong approach when the tree has two levels, but we emphasize that LP cannot deal with trees with more than two levels. For small test problems with four or eight products, the running times for FE can be competitive to the running times for AA, but the running times for FE quickly deteriorate as the number of products increases. 36 2.8 Extension to Multi-Period Capacity Allocation In this section, we discuss the extension of our results to the multi-period capacity allocation model of Talluri and van Ryzin (2004). The model of Talluri and van Ryzin (2004) is one of the pioneering works demonstrating the importance of incorporating choice behavior into operational decisions. Let us briefly review the setup for this problem. We have an initial capacity of C seats on a flight leg that must be allocated over T periods. There are n products (fare classes) that can be offered to customers, indexed byf1;:::;ng. If we sell one ticket for fare class `, then we generate a revenue of r ` . In each period, based on the remaining capacity, we must find an assortment of fare classes to offer to an arriving customer, who chooses a fare class from the assortment according to thed-level nested logit model. The goal is to determine the revenue-maximizing policy for offering assortments of fare classes over the selling horizon. For simplicity, we assume that there is exactly one customer arriving in each period, but all of our results extend to the case in which there is a positive probability that no customer shows up in a period. We show that, under thed-level nested logit model, a nested control policy is optimal. In other words, we show that as we have less remaining capacity in a period, we offer a smaller assortment of fare classes. An important managerial implication of this result is that the optimal control policy can be implemented by using protection levels, which is the standard tool in traditional revenue management systems. That is, for each fare class `, there exists a protection level c `t such that if the remaining capacity in period t is less thanc `t , then it is optimal not to make fare class` available in this period. Due to the optimality of a protection level policy, thed-level nested logit model can be easily integrated with the existing revenue management controls. To establish the optimality of a nested control policy, for eachx2f0; 1;:::;Cg, letJ t (x) denote the maximum expected revenue when we havex units of capacity andt periods remaining in the selling horizon. Under the d-level nested logit model, if we offer the assortment S of fare classes, then the probability that a customer chooses fare class ` is ` (S) as given in (2.8). In this case, J t () satisfies the dynamic programming equation J t (x) = max Sf1;:::;ng X `2S ` (S) (r ` +J t1 (x 1)) + 0 (S)J t1 (x) = max Sf1;:::;ng ( X `2S ` (S) (r ` J t1 (x)) ) +J t1 (x); 37 where J t1 (x) =J t1 (x)J t1 (x1) denotes the marginal value of capacity. The boundary conditions of the dynamic program are J 0 (x) = 0 for all x and J t (0) = 0 for all t. Let S t (x) denote the optimal solution in the last problem on the right side above, corresponding to the optimal assortment to offer when we havex units of remaining inventory andt periods to go. The main result of this section is stated in the following theorem, whose proof is presented in Appendix A.3. Theorem 2.8.1 (Nested Policy). For the multi-period capacity allocation problem, there exists an optimal policy such that for eacht,S t (x) is non-decreasing inx, satisfyingS t (x 1)S t (x). Moreover, for each x,S t (x) is non-increasing int, satisfyingS t (x)S t1 (x). The first part of the above theorem shows that we offer a smaller assortment when we have less capacity in a period. In this case, letting c `t be the smallest x such that ` 2 S t (x), if the remaining capacity in period t is less than c `t , then it is optimal not to offer fare class ` in this period. Therefore, we can implement the resulting optimal policy by using protection level policies. On the other hand, the second part of the above theorem shows that we offer a larger assortment when we have fewer periods remaining in the selling horizon. 2.9 Price Optimization Problem In this section, we study the problem of finding revenue-maximizing prices, assuming that the assortment of products is fixed atf1; 2;:::;ng. Our formulation of the price optimization problem is similar to that of the assortment optimization problem in Section 2.3. The taxonomy of the products is described by a d-level tree, denoted byT = (V;E), which hasn leaf nodes at depthd, corresponding to then products in f1; 2;:::;ng. The no-purchase option is labeled as node 0, which is directly connected toroot. Associated with each nodej, we have a set of productsN j f0; 1;:::;ng, which corresponds to the set of products included in the subtree that is rooted at nodej. Ifj is a leaf node, thenN j =fjg is a singleton, consisting of the product itself. On the other hand, ifj is a non-leaf node, thenN j = S k2 Children(j) N k is the disjoint union of the sets of products at the child nodes ofj. The decision variable is the price vectorp = (p 1 ;:::;p n )2R n + , wherep ` denotes the price of product`. By convention, the price of the no-purchase option is fixed atp 0 = 0. For each subsetXf0; 1;:::;ng, p X = (p ` :`2X) denotes the vector of prices of the products inX. The choice process of the customer 38 is similar to the one in Section 2.3. Given the price vectorp2R n + , a customer associates the preference weightV j (p N j ) with each nodej2 V. The preference weight of each node is computed as follows. Ifj is a leaf node, then we haveV j (p N j ) = e j j p j , where j and j > 0 are parameters of thed-level nested logit model associated with productj. Noting that the price of the no-purchase option is fixed at zero, the preference weight of the no-purchase option isV 0 (p N 0 ) = e 0 . If, on the other hand,j is a non-leaf node, then we have V j p N j = 0 @ X k2 Children(j) V k (p N k ) 1 A j : During the course of her choice process, if the customer is at node j, then she follows node k 2 Children(j) with probabilityV k (p N k )= P `2 Children(j) V ` (p N ` ). We useR j (p N j ) to denote the expected revenue obtained from a customer at nodej. Ifj is a leaf node, thenR j (p N j ) =p j . If, however,j is a non-leaf node, then we have R j p N j = P k2 Children(j) V k (p N k )R k (p N k ) P k2 Children(j) V k (p N k ) : To facilitate our exposition, when it is clear from context, we will simply writeV j (p) andR j (p) to denote V j (p N j ) andR j (p N j ), respectively. The expected revenue associated with the price vectorp corresponds to the expected revenue atroot,R root (p). Thus, we are interested in the price optimization problem given by max p2R n + R root (p): For the two-level nested logit model, Gallego and Wang (2013) show that the expected revenue function R root () is not concave and can have multiple local maxima. Thus, we cannot hope to find the global maximum in general. Instead, we will focus on finding a stationary point of the expected revenue function R root (); that is, the prices at which the gradient of the expected revenue function is zero. We begin by giving a succinct characterization of a stationary point. Next, we develop an iterative algorithm that generates a sequence of prices converging to a stationary point of the expected revenue function. Our algorithm is different from the traditional gradient ascent method because it moves in a direction that is different from the gradient. Furthermore, our algorithm avoids the problem of choosing a step size. As shown in our numerical experiments in Section 2.10, our algorithm converges to a stationary point much faster than gradient-based methods. 39 2.9.1 Characterization of a Stationary Point For eachp2R n + , let a collection of scalars (u j (p) :j2V) be defined as u j (p) = j u Parent(j) (p) + (1 j )R j (p); where we setu root (p) =R root (p). Starting at the root node, we can compute the scalars (u j (p) :j2V) by traversing the tree in a breadth-first manner. The following lemma shows how to compute the gradient of the expected revenue function using the scalars (u j (p) : j2 V). In this lemma and throughout the rest of the paper, we use ` (p) to denote the probability that a customer chooses product` under the price vector p. As mentioned in Section 2.9, if the customer is at nodej, then she follows nodek2 Children(j) with probabilityV k (p)= P `2 Children(j) V ` (p). Thus, ` (p) is given by ` (p) = d Y h=1 V An(`;h) (p) P k2Sibling(An(`;h)) V k (p) ; where An(`;h) denotes the ancestor of the leaf node` in levelh, with An(`;d) = ` and An(`; 0) = root. Furthermore, for each nodej other than root, Sibling(j) = Children(Parent(j)) is the set of nodes that are siblings ofj; that is, the set of nodes with the same parent as nodej and this set includes nodej itself. Lemma 2.9.1 (Gradient of Expected Revenue). For eachp2R n + and for every leaf node`, @R root @p ` (p) = ` (p) ` p ` 1 ` u Parent(`) (p) ; where ` (p) is the probability that a customer chooses product` under the price vectorp. We give the proof of this lemma in Appendix A.4. By using the above lemma, we can efficiently obtain the gradient of the expected revenue at any price vectorp2R n + . In particular, we can compute the preference weightV j (p) and the expected revenueR j (p) for allj2 V, by starting with the leaf nodes and moving up the tree in a breadth-first manner. Once we compute the preference weights, it is straightforward to compute ` (p) for each leaf node`. Similarly, by using (R j (p) :j2V), we can compute (u j (p) :j2V) by starting from the root node and moving down the tree in a breadth-first manner. Once we know ` (p) and u Parent(`) (p) for each leaf node`, the above lemma gives a closed-form expression for the gradient. 40 Lemma 2.9.1 allows us to apply the traditional gradient ascent method to compute a stationary point of the expected revenue function; see Sun and Yuan (2006). However, in our numerical experiments in Section 2.10, we observe that the traditional gradient ascent method can be relatively slow in practice and requires a careful selection of the step size. In the rest of this section, we develop a new algorithm for computing a stationary point of the expected revenue function. This algorithm is different from the gradient ascent method and completely avoids the problem of step size selection. When developing our algorithm, we make use of a succinct characterization of the stationary points of the expected revenue function, given by the following corollary. The proof of this corollary immediately follows from the above lemma and the fact that the probability of choosing each product is always positive. Corollary 2.9.2 (A Necessary and Sufficient Condition for a Stationary Point). For eachp2 R n + ,p is a stationary point ofR root () if and only if p ` = 1 ` +u Parent(`) (p) for all`. 2.9.2 A Pricing Algorithm without Gradients By Corollary 2.9.2, if we can find a price vectorp 2 R n + such that p ` = 1 ` + u Parent(`) (p) for all `, then p corresponds to a stationary point of the expected revenue function. This observation yields the following iterative idea to find a stationary point of the expected revenue function. We start with a price vectorp 0 2R n + . In iterations, we compute (u j (p s ) :j2 V). Using this collection of scalars, we compute the price vectorp s+1 at the next iteration asp s+1 ` = 1 ` +u Parent(`) (p s ) for each leaf node`. It turns out that a modification of this idea generates a sequence of price vectorsfp s 2R n + :s = 0; 1; 2;:::g that converges to a stationary point of the expected revenue function and it forms the basis of our pricing algorithm. We give our pricing algorithm below, which we call the Push-Up-then-Push-Down (PUPD) algorithm. Push-Up-then-Push-Down (PUPD): Step 0. Set the iteration counters = 0 andp 0 = 0. Step 1. Compute the scalars (t s j : j 2 V) recursively over the tree as follows. For root, we have t s root =R root (p s ). UsingR s j =R j (p s ) to denote the expected revenue at nodej under the price vectorp s , for the other nodes, we have t s j = max n t s Parent(j) ; j t s Parent(j) + (1 j )R s j o : 41 Step 2. The price vectorp s+1 at the next iteration is given by p s+1 ` = 1 ` +t s Parent(`) for each leaf node `. Increases by one and go back to Step 1. There are several stopping criteria to consider for the PUPD algorithm. For example, we can terminate the algorithm ifkp s+1 p s k 2 is less than some pre-specified tolerance. In our numerical experiments in the next section that compare the PUPD algorithm with the gradient-based algorithm, we use a common criterion and stop the algorithms when the norm of the gradient of the expected revenue functionkrR root (p s )k 2 is less than some pre-specified tolerance. Our algorithm is called Push-Up-then-Push-Down because given the price vector p s , we compute R root (p s ) by “pushing up,” starting at the leaf nodes and computing the expected revenue at each node in the tree in a breadth-first manner, until we reach root. Once we obtain (R j (p s ) : j2 V), we compute (t s j :j2 V) by “pushing down,” starting fromroot and computing these scalars at each node in the tree in a breadth-first manner. Also, we note that our PUPD algorithm is different from the gradient ascent method because in each iterations, the gradient of the expected revenue function atp s is given by @R root @p ` (p) p=p s = ` (p s ) ` p s ` 1 ` u Parent(`) (p s ) ; whereas the change in the price vector in iterations under our PUPD algorithm is given byp s+1 ` p s ` = (p s ` 1 ` t s Parent(`) ). Thus, the direction of changep s+1 p s under our PUPD algorithm is not necessarily parallel to the gradient of the expected revenue function at price vectorp s . The following theorem gives our main result for the PUPD algorithm and it shows that the sequence of price vectors generated by our PUPD algorithm converges to a stationary point of the expected revenue function. The proof of this theorem is given in Appendix A.4. Theorem 2.9.3 (Convergence to a Stationary Point). The sequence of prices under the PUPD algorithm converges to a stationary point of the expected revenue functionR root () . 42 2.9.3 Extension to Arbitrary Cost Functions We can extend Lemma 2.9.1 to the case in which there is a cost associated with offering a product and this cost is a function of the market share of the product. In particular, we consider the problem max p (p)R root (p) n X `=1 C ` ( ` (p)); (2.7) where C ` () denotes the cost function associated with product ` and ` (p) denotes the probability that a customer chooses product` under the price vectorp. The above model allows each product` to involve a different costC ` (). The main result of this section provides an expression for the gradient of the profit function (). Before we proceed to the statement of the theorem, we introduce some notation. For any nodesi andj, let i j (p) denote the probability that we can get from nodei to nodej; that is, i j (p) = 8 > > < > > : m Y h=1 V k h P u2Sibling(k h ) V u if there is a (unique) path (i =K;k 1 ;:::;k m =j) fromi toj 0 otherwise: Note that root ` (p) is exactly the same as ` (p) defined just before Lemma 2.9.1, and thus, we can write root ` (p) simply as ` (p), dropping the superscript root for convenience. For allp2 R n + and for all leaf nodesi and`, letv i;` (p) be defined by: v i;` (p) 8 > > > > < > > > > : P s i;` s=0 Q d h=s+1 An(`;h) root An(`;s) (p) 1 An(`;s) if i6=` 1 root ` (p) + P d1 s=0 Q d h=s+1 An(`;h) root An(`;s) (p) 1 An(`;s) if i =`; where s i;` = maxfh : An(i;h) = An(`;h)g. Note that s i;` is well-defined with 0 s i;` d becauseroot at level 0 is a common ancestor of all nodes. Also, s i;` =d if and only ifi =`. Here is the main result of this section, whose proof is given in Appendix A.4. Theorem 2.9.4 (Gradient for Arbitrary Cost Functions). For eachp2R n + and for each leaf node`, @ @p ` (p) = root ` (p) ` p ` 1 ` u Parent(`) (p) + n X i=1 C 0 i ( root i (p)) root i (p)v i;` (p) ! : 43 Theorem 2.9.4 holds for anyd-level nested logit model with an arbitrary value ford. Since this theorem gives a tractable formula for the gradient of the expected profit with respect to the prices, we can use this formula to compute a stationary point of the profit function () through gradient-based methods. To our knowledge, this is the first such result. We observe that a stationary pointp of the expected profit function () corresponds to a value ofp that ensures that the value of the expression in the brackets in Theorem 2.9.4 is equal to zero. Thus, one can construct an analogue of the PUPD algorithm for the general expected profit function () as follows. Usingp s to denote the price vector at iterations, we can computeu Parent(`) (p s ), root i (p s ) andv i;` (p s ) at this price vector. In this case, the price vectorp s+1 at the next iteration is given by 1 ` +u Parent(`) (p s ) P n i=1 C 0 i ( root i (p s )) root i (p s )v i;` (p s ). Unfortunately, the dynamics of the price vector at successive iterations in this version of the PUPD algorithm are substantially more complicated than the dynamics of the PUPD algorithm given in Section 2.9.2. Thus, finding an analogue of the PUPD algorithm for the general expected profit function () is the subject of future research, but we note that the gradient expression in Theorem 2.9.4 does allow us to compute a stationary point of the expected profit function through gradient-based methods. 2.10 Numerical Results for Price Optimization In this section, we give numerical experiments to test the performance of our PUPD algorithm. In particular, we compare our PUPD algorithm with the traditional gradient ascent method. Numerical Setup: We use test problems with three levels in the tree. Lettingm h be the number of children of each node at depth h in the tree, we have a total of m 0 m 1 m 2 products. We vary (m 0 ;m 1 ;m 2 ) 2 f2; 4; 6gf2; 4; 6gf2; 4; 6g. We refer to each (m 0 ;m 1 ;m 2 ) combination as a different problem class. We randomly generate 200 individual problem instances in each problem class. To generate each problem instance, we set 0 = 0 so that the preference weight of the no-purchase option is e 0 = 1. For each non-leaf nodej, we sample the parameter j from the uniform distribution over [0; 1]. The parameters j and j for each leaf nodej are sampled from the uniform distributions over [1; 3] and [2; 3], respectively. Throughout this section, we refer to our PUPD algorithm as PUPD and the gradient ascent method as GA. We implement PUPD precisely as described in Section 2.9.2. In our implementation of GA, we generate a sequence of price vectorsfp s 2R n + :s = 0; 1; 2;:::g, wherep s+1 =p s +t s rR root (p s ),rR root (p s ) is the gradient of the expected revenue function evaluated atp s andt s is a step size parameter. To choose the 44 step size parametert s in iterations, we find a local solution to the problem max t0 R root (p s +trR root (p s )) by using golden section search; see Sun and Yuan (2006). We use the same stopping criterion for PUPD and GA, and we stop these algorithms when the norm of the gradient satisfieskrR root (p s )k 2 10 6 . Numerical Results: In Table 2.5, we compare the running times for PUPD and GA. The first three columns in Table 2.5 show the parameters of each problem class by giving (m 0 ;m 1 ;m 2 ). The fourth column shows the average number of iterations for PUPD to reach the stopping criterion, where the average is taken over all 200 problem instances in a problem class. Similarly, the fifth column shows the average number of iterations for GA to reach the stopping criterion. The next three columns give various statistics regard- ing the ratio between the numbers of iterations for GA and PUPD to reach the stopping criterion. To be specific, letting Itn GA (k) and Itn PUPD (k) respectively be the numbers of iterations for GA and PUPD to stop for problem instance k in a certain problem class, the sixth, seventh and eighth columns respec- tively give the average, minimum and maximum of the figuresfItn GA (k)=Itn PUPD (k) :k = 1;:::; 200g. Finally, the last three columns give various statistics regarding the ratio between the running times of GA and PUPD. In particular, letting Time GA (k) and Time PUPD (k) respectively be the running times for GA and PUPD to reach the stopping criterion for problem instancek in a certain problem class, the ninth, tenth and eleventh columns in the table respectively give the average, minimum and maximum of the figures fTime GA (k)=Time PUPD (k) :k = 1;:::; 200g. The results in Table 2.5 indicate that PUPD takes drastically fewer iterations to reach the stopping criterion when compared with GA. The ratio between the number of iterations for GA and PUPD ranges between 2 to 514. This ratio never falls below one, showing that PUPD stops in fewer iterations than GA in all problem instances. We observe that as the problem size measured by the number of products m 0 m 1 m 2 increases, the number of iterations for GA to reach the stopping criterion tends to increase, whereas the number of iterations for PUPD remains stable and it even shows a slightly decreasing trend. These observations suggest the superiority of PUPD especially in larger problem instances. The last three columns in Table 2.5 show that, in addition to the number of iterations, the running times for PUPD are also substantially smaller than those of GA. GA takes at least twice as long as PUPD over all problem instances. There are problem instances, in which the running time for GA is 466 times larger than that of PUPD. The ratios of the running times never fall below one in any problem instance, indicating that PUPD always reaches the stopping criterion faster than GA. This drastic difference between the performance of PUPD and GA can, at least partially, be explained by using the following intuitive argument. Noting Lemma 45 Ratio Between Ratio Between Prb. Class Avg. No. Itns No. Itns. Run. Times m 0 m 1 m 2 PUPD GA Avg. Min Max Avg. Min Max 2 2 2 124 2,019 12 2 131 10 2 108 2 2 4 112 3,393 25 5 184 21 4 188 2 2 6 93 4,201 34 8 158 29 6 141 2 4 2 122 5,194 36 4 230 31 3 196 2 4 4 104 5,390 47 11 172 40 9 146 2 4 6 95 7,117 69 16 251 60 13 275 2 6 2 124 6,520 47 6 168 41 5 139 2 6 4 113 8,401 72 15 235 64 13 252 2 6 6 101 10,390 102 24 294 89 19 299 4 2 2 161 6,470 35 4 205 31 3 204 4 2 4 128 6,848 50 11 179 43 9 159 4 2 6 114 9,136 77 19 236 67 16 225 4 4 2 135 10,045 72 11 220 63 9 242 4 4 4 101 12,147 118 25 270 106 22 289 4 4 6 92 12,799 139 36 305 120 30 281 4 6 2 122 13,204 110 14 303 98 12 335 4 6 4 98 15,851 166 57 371 145 52 325 4 6 6 85 17,913 217 73 396 191 62 422 6 2 2 158 8,564 54 6 191 47 6 215 6 2 4 125 10,441 83 16 212 73 13 181 6 2 6 104 10,857 104 27 257 90 24 247 6 4 2 125 14,912 120 25 280 106 22 309 6 4 4 98 15,932 165 48 327 145 42 343 6 4 6 86 18,087 217 101 419 194 81 446 6 6 2 109 17,197 164 47 326 144 42 311 6 6 4 87 19,896 239 102 398 213 84 417 6 6 6 75 21,531 299 116 514 266 104 466 Table 2.5: Performance comparison between PUPD and GA. 2.9.1, under GA, if the current price vector is p s , then the price of product ` changes in the direction ` (p s ) ` p s ` 1 ` u Parent(`) (p s ) . If the price of product` in price vectorp s is large, then the choice probability ` (p s ) is expected to be small, which implies that the change in the price of product ` under GA is expected to be relatively small as well. However, if the price of a product is large, then its price has to change significantly to make a noticeable impact in the price of this product! In contrast, under PUPD, the choice probability ` (p s ) does not come into play in a direct way when computing the change in the prices, except in an indirect way when computing the scalars (t s j : j2 V). We note that PUPD and GA can converge to different stationary points of the expected revenue function, but over all of our problem instances, the expected revenues at the stationary points obtained by PUPD and GA differ at most by 0:02%. We also tested the robustness of the PUPD algorithm by using different initial prices. The results are similar to those in Table 2.5. For details, the reader is referred to Appendix A.5. Finally, we applied the PUPD algorithm on problem instances with two levels in the tree. When there are two levels in the tree, Gallego and Wang (2013) provide a set of implicit equations that must be satisfied by the optimal price vector. We use golden section search to numerically find a solution to this set of equations. Similar to the stopping criterion for PUPD, we stop the golden section search when we reach a price vectorp s that satisfieskrR root (p s )k 2 10 6 . Our numerical results demonstrated that PUPD can run substantially faster than golden section search for problem instances with larger number of products. 46 Furthermore, over all of our problem instances, the expected revenues at the stationary points obtained by PUPD and the golden search are essentially equal. For the details, the reader is referred to Appendix A.5. 2.11 The Choice Model and the Assortment Optimization Problem In this section, we begin by providing practical motivation for the use of the d-level nested logit model. First, we show that this model is compatible with random utility maximization (RUM) principle. Second, we discuss that this model is equivalent to the elimination by aspects. Third, we provide practical situations in which the d-level nested logit model with three or more levels is used to capture the customer choice process. Finally, we demonstrate that the predictive accuracy of thed-level nested logit model increases as the number of levels in the tree increases. This discussion provides practical motivation for thed-level nested logit model. Following this discussion, we explain how the assortment and price optimization problems considered in this paper become useful when solving large-scale revenue management problems. Random Utility Maximization: An attractive framework for describing the choice process of customers is to appeal to the RUM principle, where a customer associates random utilities with the products and the no-purchase option. The customer then chooses the option that provides the largest utility. As discussed in Section 2.3, if the customer is at a non-leaf nodej, then she chooses nodek2 Children(j) with probability V k (S k )= P `2 Children(j) V ` (S ` ). For the customer to choose the product at product`, she needs to follow the path from root to the leaf node`. Thus, given an assortmentS = (S j :j2V), the probability that a customer chooses a product`2S is ` (S) = d Y h=1 V An(`;h) S An(`;h) P k2Sibling(An(`;h)) V k (S k ) : (2.8) The following theorem shows that the choice probability above can be obtained by appealing to the RUM principle. The proof is given in Appendix A.6. Theorem 2.11.1 (Consistency with Utility Maximization). The choice probability under thed-level nested logit model given in (2.8) is consistent with the RUM principle. B¨ orch-Supan (1990) shows the compatibility of the two-level nested logit model with the RUM princi- ple; see, also, McFadden (1974). The above theorem extends the result to thed-level setting. McFadden 47 (1978) and Cardell (1997b) discuss the consistency with the RUM principle for a generald-level setting, but they do not provide a formal proof. Elimination by Aspects Model: Theorem 2.11.1 provides a behavioral justification for thed-level nested logit model by relating it to the RUM principle. Another approach to provide justification for thed-level nested logit model is to relate it to the elimination by aspects model proposed by Tversky (1972a,b) and Tversky and Sattath (1979). This model is widely accepted in the psychology and marketing literature for providing a reasonable heuristic for decision making. Roughly speaking, the elimination by aspects model works similar to a screening rule. The customer chooses an attribute, and eliminates the alternatives that do not meet the requirements based on this attribute. She continues in this manner until she is left with a single alternative. There is a large body of literature that explores the equivalence between thed-level nested logit and the elimination by aspects model. Manski and McFadden (1981) point out the equivalence between the two models in somewhat restricted settings. Tversky and Sattath (1979), V ovsha (1997) and Train (2003) also comment on the equivalence between thed-level nested logit and elimination by aspects models. Batley and Daly (2006) provide a formal proof that thed-level nested logit model can be derived by appealing to the elimination by aspects model. Applications of Nested Logit Models with Three or More Levels: Psychologists and marketing researchers have conducted many experiments to validate the elimination by aspects model, and thus, pro- viding justification for thed-level nested logit model in the process. Studies show that consumers indeed use multiple attributes to sequentially eliminate products from their considerations. For example, Andrews and Manrai (1998) consider scanner panel data for fabric softeners and they find that customers screen fabric softeners based on four attributes, brand name, size, product form and formulation. For advanced photo sys- tems, Gilbride and Allenby (2004) show that customers consider over 10 attributes to determine the camera system that they will purchase. These attributes include body style, mid-roll change, operation feedback, zoom lens, view finder and price. Pihlens (2008) models the choice among different pasta sauce options by using 16 attributes, which include container size and type, sauce base, main ingredients and price. Waddell (1993) uses the three-level nested logit model to capture residential location choice. The three levels in the tree correspond to workplace, housing tenure, and residence choice. Coldren and Koppelman (2005) find that using three levels in the nested logit model provides an improvement over two levels when modeling demand for flight itineraries. Cervero and Duncan (2008) model rail travel adoption for daily commute by 48 using the nested logit model with three levels. Carson et al. (2009) use the four-level nested logit model to represent fishing demand in Alaska. Gramlich (2009) models the choice among different vehicle options using trees with four levels, which correspond to product features such as vehicle segment, size, comfort level, country of origin and specific vehicle model. Parameter Estimation and Predictive Accuracy as the Number of Levelsd Increases: In Appendix A.7, we show that the log-likelihood function under thed-level nested logit model exhibits desirable con- cavity properties, enabling us to develop an effective estimation method. In Appendix A.7, we exploit these concavity properties to present numerical experiments, in which we fit thed-level nested logit model to cus- tomer choice data with progressively increasing number of levelsd in the tree. Our experiments demonstrate that the predictive accuracy improves as the number of levelsd increases. Comparing with the multinomial logit model (d = 1), two-level models (d = 2) reduce the misclassification errors by about 9%, while with three-level models (d = 3), the misclassification errors are reduced by 15%. Assortment and Price Optimization Problems: Assortment and price optimization problems considered in this paper are useful when maximizing the immediate expected revenue from a customer but these prob- lems also find applications when solving large-scale revenue management problems. Consider the case in which an airline operates k flights legs. We index the flight legs by 1;:::;k. We let c i be the capacity available on flight legi. The airline offersn itineraries to its customers, where an itinerary is a combination of connecting flight legs and an associated price. We index the itineraries by 1;:::;n. We user ` to denote the revenue associated with itinerary`. We leta i` = 1 if itinerary` uses flight legi, otherwisea i` = 0. The problem takes place over a finite selling horizon ofT periods. Time periodT is the beginning of the selling horizon and the flight legs depart at time period 0. For simplicity, we assume that there is one customer arrival at each time period. The goal of the airline is to find a policy to decide which set of itineraries to make available to its customers to maximize the total expected revenue over the selling horizon. We let x i be the remaining capacity on flight leg i at the beginning of a time period and use x = (x 1 ;:::;x k ) as the state variable. The feasible set of itineraries that can be offered at a time period is given byF(x) =fSf1;:::;ng :a i` 1(`2S)x i 8i = 1;:::k; ` = 1;:::;ng, where 1() is the indicator function. The definition ofF(x) captures the fact that if we offer a set that includes itinerary` and itinerary 49 ` uses flight legi, then there has to be capacity available on flight legi. Thus, we can find the optimal policy by solving the dynamic program J t (x) = max SF(x) X `2S ` (S) r ` +J t1 (x P k i=1 a i` e i ) + 0 (S)J t1 (x) = max SF(x) ( X `2S ` (S) r ` +J t1 (x P k i=1 a i` e i )J t1 (x) ) +J t1 (x); (2.9) wheree i is a unit vector inR k with a one in the element corresponding to flight legi and the choice prob- ability ` (S) is driven by thed-level nested logit model. For practical airline networks with large number of flight legs, the dynamic program above involves a high-dimensional state variable. Thus, it is difficult to compute the value functions exactly. However, there exist a variety of methods to build approximations to the value functions; see Liu and van Ryzin (2008), Zhang and Adelman (2009). In this case, if we have an approximation ~ J t () to the value functionJ t (), then we can replaceJ t () in the second maximization problem above with ~ J t () and solve this problem to find a set of itineraries to offer. To solve this problem, we observe that we can drop all itineraries for which we do not have enough capacity to offer. Once we drop these itineraries, this problem becomes an assortment optimization problem in which the revenue of product ` is given byr ` + ~ J t1 (x P k i=1 a i` e i ) ~ J t1 (x) and we maximize the expected revenue from a customer. Thus, if we are given approximations to the value functions, then the problem of finding a set of itineraries to offer precisely corresponds to the assortment problem considered in this paper. Another use of our assortment optimization problems occurs when we build a linear programming approximation to the revenue management problem described above. To formulate this linear program, we use the decision variableh(S) to denote the number of time periods during which we offer the subsetS of itineraries. In this case, we can use the linear program max X Sf1;:::;ng n X `=1 r ` ` (S)h(S) (2.10) s.t. X Sf1;:::;ng n X `=1 a i` ` (S)h(S)c i 8i = 1;:::;k X Sf1;:::;ng h(S) =T h(S) 0 8Sf1;:::;ng; 50 to approximate the optimal expected revenue over the selling horizon; see Gallego et al. (2004), Liu and van Ryzin (2008). Noting that P n `=1 r ` ` (S) is the expected revenue at a time period during which we offer the subset S of itineraries, the objective function in the problem above computes the total expected revenue over the selling horizon. The first set of constraints ensures that the total expected capac- ity consumption on each flight leg does not exceed the capacity availability on the flight leg, whereas the second constraint ensures that we offer a subset of itineraries at each time period. If there are n itineraries, then the number of decision variables in the linear program above is 2 n . Thus, it is cus- tomary to solve the linear program by using column generation. Usingf i : i = 1;:::;kg and to denote the dual variables associated with the two sets of constraints above, the reduced cost of the decision variable h(S) is given by P n `=1 r ` ` (S) P k i=1 P n `=1 a i` ` (S) i . Thus, when gener- ating columns, to find the decision variable with the largest reduced cost, we can solve the problem max Sf1;:::;ng P n `=1 r ` ` (S) P k i=1 P n `=1 a i` ` (S) i = max Sf1;:::;ng P n `=1 ` (S) (r ` P k i=1 a i` i ). We observe that the last problem precisely has the same form as the assortment optimization problem con- sidered in this paper, where the revenue of product` isr ` P k i=1 a i` i and we find an assortment of products to maximize the expected revenue from a customer. Thus, the assortment optimization problem studied in this paper becomes useful when generating columns for linear programming approximations of large-scale revenue management problems. In the dynamic program in (2.9) and the linear program in (2.10), the prices of the itineraries are fixed and the airline decides which set of itineraries to make available. These models have their analogues in which the set of offered itineraries are fixed and the prices of the itineraries are decision variables; see Gallego and van Ryzin (1997). The pricing problem studied in this paper becomes useful when dealing with large-scale revenue management problems, in which the airline offers a fixed set of itineraries but dynamically adjusts the prices of the offered itineraries. 2.12 Conclusion We considered assortment and price optimization problems under thed-level nested logit model. We showed how to obtain an optimal assortment, and developed an algorithm to obtain a set of prices corresponding to a stationary point of the expected revenue function. We investigated several motivations for thed-level nested logit model and for our assortment and pricing problems. For future research directions, one can consider 51 extending the PUPD algorithm for joint assortment and price optimization, establishing connections between our assortment optimization algorithm and the linear programming formulation for the two-level model proposed by Davis et al. (2014), and developing assortment and pricing algorithms for the case where j > 1 for some nodesj2V. 52 Chapter 3 A Dual-Account Bus Card Problem 3.1 Introduction In the past few decades, public transportation has played a central role in addressing national priorities such as economic growth, energy efficiency, traffic congestion, and environmental sustainability. In the United States, public transportation is a $57 billion industry that employs nearly 400,000 people. In addition, every $1 million investment in the industry creates 36 jobs and $4 million in increased business sales. In 2011, it is estimated that the use of public transportation effectively reduced annual fuel consumption by 4.2 bil- lion gallons of gasoline and reduced traffic congestion costs by $16.8 billion in the United States (Dickens et al. 2012). Many studies (Bailey et al. 2008, APTA 2009) have shown that public transportation provides cost-effective approaches in reducing greenhouse gas emissions. According to the American Public Trans- portation Association (APTA), public transit reduces carbon emission by 37 million metric tons every year. The major revenue sources for public transit operators are passenger fares and financial assistance from federal, state, and local governments. To policy makers, it is more important for public transportation to create greater social, economic, and environmental benefits than to generate greater fare revenue. To promote passenger ridership, fares are usually set far below the market rates. As a result, very few public transit companies have a farebox recovering ratio (collected fare divided by total operating cost) of more than 100%. According to Dickens et al. (2012), public transportation in the United States incurred $56 billion operating and investment costs but only collected $12.6 billion passenger fares in 2010. The deficits in this industry are usually absorbed by the taxpayers, rather than the passengers (Ubbels and Nijkamp 2008). The demand for public transportation depends heavily on daily commuter activity, which is influenced by many factors, including fare, fuel price, quality of service, income, car ownership, consumer preferences, and transport policies (Paulley et al. 2006). Since the macroeconomic environment is largely exogenous, public transit companies can only monitor ridership by means such as adjusting fares and improving service levels. Fares are the most intensively studied factor, since they are easy to quantify and most frequently 53 adjusted. For example, Pham and Linsalata (1991) estimate the average bus fare elasticity to be0:36 in large cities. In this paper, we analyze the passenger behavior in a given fare system and suggest the optimal fares that can be set by the bus company. Before the adoption of electronic ticketing systems, fixed-price, time-limited transit passes were created to benefit the transit agencies and frequent riders, as both parties could gain efficiency by reducing the number of cash transactions. The price of the transit passes often depends both on the region of service and the term of validity. The transit passes allow passengers to either take unlimited trips or a fixed number of pre-purchased rides at a discounted price within a fixed period of time, typically in the duration of one day, seven days, or a month. At the end of each period, any remaining balance associated with the pass expires, and passengers who need to use the transit pass have to purchase the pass again at ticket machines or offices at a transit station. Today, RFID-enabled contactless smart cards are gaining popularity in many public transportation oper- ators around the world because they greatly reduce fare collection costs and improve transit efficiency (Iseki et al. 2008). Typically, a smart card can hold any combination of time-limited transit passes and cash value. Sometimes, the latter can also be used for many other purposes, such as paying for rides in other public transit services and grocery shopping. Many major cities in China have adopted the limited-rides transit pass. For example, the Beijing employee monthly pass at a price of $7 allows a maximum of 140 rides each month. The historical rea- son for the limited number of rides was to avoid abuse of the system, that is, the use of a single pass by multiple passengers, while still satisfying the demand of the majority of frequent passengers. However, the perceived cost or utility of using the monthly pass is different for a passenger who takes two rides a day than for someone who takes four rides a day. Moreover, passengers consider any remaining balance at the end of month as wastage and a form of unfair “mandatory spending.” To improve passenger satisfaction and to encourage more people to choose public transportation over private cars, many cities have started to replace the monthly transit pass system by a discounted pay-per-ride system. For example, the city of Chengdu implemented a discounted pay-per-ride monthly prepaid system in 2007. Using a transit smart card, a passenger could either pay 90% of the nominal fare for each ride by a regular account or pay 50% of the nominal fare per ride by the monthly “discount” account. A passenger could refill both accounts at designated locations. At the end of each month, any remaining balance in the 54 monthly discount account is reset to zero. It is estimated that the monthly residual value in the monthly accounts exceeds 0.6 million dollars. Public transit agencies or policy makers need to carefully assess the benefits of the dual-account bus cards. One one hand, the introduction of the discount account may reduce the revenue from the regular account and cash. On the other hand, the discount account can also attract more passengers who would otherwise not ride the bus. Therefore, it is crucial to examine the impact of the dual-account bus cards on the total revenue and the passenger ridership. In this paper, we study the optimal recharging policy of strategic passengers, who are forward-looking but uncertain about their future travel demand. Generally, a company sets the fares for regular and monthly discount accounts. In response to the bus company’s policy, strategic passengers select the size and timing of the refills based on their travel demand. The passengers incur a fixed cost each time they refill either account and holding costs for the remaining balances in the accounts. The fixed cost can be seen as a function of the effort and time the passengers spends to perform the replenishment, while the holding cost can be seen as the opportunity cost or interest rate. The passengers aim to minimize the total cost incurred. We formulate the passenger’s problem as a finite-horizon dynamic program. We find the structural properties of the value function and analyze the optimal refill policy for both discount and regular accounts. We also develop the optimal solution structures for a single-period myopic policy. Based on real passenger spending data, we estimate fixed costs and holding costs incurred by the passengers. Moreover, we perform counterfactual analyses to suggest the best cost parameters that the bus company should adopt to extract the most revenue from passengers. 3.2 Literature Review Our research relates to the following streams of literature. First, our problem is relevant to inventory models with lost sales. The single-item, discrete-time inventory model was first formulated by Karlin and Scarf (1958) and then studied by Morse (1959). Zipkin (2008) performs a structural analysis of the standard and some variations of the standard lost-sales inventory problem. Li and Yu (2012) show the quasiconcavity of the maximal profit function in some lost-sales inventory models with fixed costs under certain set of conditions. Vander Eecken (1968) formulates the finite-horizon discrete-time, batch-ordering problem with deterministic demand and lost sales as a dynamic programming model similar to the Wagner and Whitin 55 model and shows a set of conditions under which the“planning horizon theorem” applies. Since then, much research effort has been spent on extending Vander Eecken’s work to a case in which backorders are allowed; see, for example, Elmaghraby and Bawle (1972), Bitran and Matsuo (1986), Li et al. (2004), Lagodimos et al. (2012). Our research differs from the above-mentioned literature as we focus on a dual-account system. Both accounts face lost sales, that is, when an account has insufficient balance, the use of the account is forbidden. Thus, the demand for that account is“lost,” and alternative payment options have to be adopted. Second, our problem is also related to the multi-product extensions of the inventory problem. Johnson (1967) focuses on the problem ofn-product with a joint setup cost and shows that the optimal policy has a (;S) form, whereR n andS2R n . The optimal policy states that one should order up to the level S if the initial inventory levelx2 andx S, and not order anything ifx = 2 . Kalin (1967) calls the optimal policy a (;S()) policy by showing that whenx2 andx S, there exists S(x) x such that it is optimal to order S(x)x. Liu and Esogbue (2012) extend the multi-product with a joint setup cost model in a finite horizon case. By generalizing the concept ofK-convexity toR n , they prove the optimality of an (s t ;S t ) type of policy, under the condition that the initial inventory levelx 0 S 0 and that the thresholdS t must increase over periodt. The joint setup cost setting is relevant in our problem because it is usually the case that both accounts in the bus card can be refilled at the same designated location. We find that that our optimal refill policy has the form of (s t ;S t ), in the case of a joint setup cost. When there is a joint setup cost associated, we find that the optimal refill policy follows the form of ( t ;S t ). Furthermore, our dual-account bus card system under consideration is related to the problem of a peri- odic review inventory system with two supply modes: a regular mode and an emergency mode with a shorter lead time; see, for example, Chiang and Gutierrez (1996), Tagaras and Vlachos (2001). In this field of research, the common assumptions are that the decision epochs are infinite, the regular mode of supply has a longer lead time than the emergency mode, and the demand is backordered. In our paper, we use a finite-horizon model and assume lead time to be negligible in both accounts. In addition, we consider lost sales rather than back orders in our problem. Moreover, our research falls into the regime of an operations management and marketing interface. There is vast literature in marketing that studies consumer purchase quantity. These studies typically model consumer decisions on how many units to purchase for a particular brand chosen by consumers based on consumer scanner panel data; see, for example, Chiang (1991), Allenby et al. (2004). We adopt the utility 56 maximization approach to model passenger purchase quantity, in the sense that the actual cost is the passen- ger’s negative utility. Our research differs from the existing literature in several important ways. First, while the cost in the current literature is entirely determined by the price of the item, we include three additional cost elements: holding costs, fixed costs, and penalty costs. In addition, we use a finite-horizon model in which the passengers can make refilling decisions in each period. Last but not least, refilling the bus card accounts requires passengers to pay in advance for future con- sumption. Thus, our paper is also related to the literature on advance selling, whose benefit has been docu- mented in the literature. In particular, Xie and Shugan (2001) show that advance selling can generate greater profit by enabling more buyers to purchase. Wang et al. (2015) consider the dynamic pricing problem of a monopoly selling both individual units as well as passes that contain a fixed number of credits in the pres- ence of strategic customers. Our paper differs from these studies, as the fares are exogenously set (i.e., they cannot be changed frequently) and there is no limit in the number of credits (or rides) in each purchase. In addition, the remaining account balance may face an expiration date. We focus on analyzing the impact of passenger behavior on the bus company. The rest of the paper proceeds as follows. In Section 3.3, we describe our model, formulate the dynamic program, and offer insights on the optimal refill policy. In Section 3.4, we introduce the data and perform model estimation. We discuss managerial implications from counterfactual simulations in Section 3.5. We conclude the paper in Section 3.6. The details of the proofs in this chapter are presented in Appendix B. 3.3 The Model and Optimal Policy for Fully Strategic Customers 3.3.1 A Finite-Horizon Model We describe the dual account bus card problem as a finite-horizon inventory model with two supply sources. In this model, we assume that all the cost parameters are exogenous to each customer. Lett = 1;:::;T denote the decision epochs. At the beginning of each period, the remaining balances (in terms of number of rides) in the discount and regular accounts are denoted by x d 0 and x r 0, respectively. A consumer must decide whether to refill the discount account or the regular account or both accounts at the same time. Lety d 0 andy r 0 denote the after-refill balances (in terms of the number 57 of rides) in the discount and regular accounts, respectively. 1 The purchasing cost or the bus fares for the regular and discount accounts are denoted byc d > 0 andc r > 0, respectively. We model the total setup cost in the form of a joint replenishment costKI y d >x d oryr>xr , whereI () is the indicator function. We assume that the refill takes effect instantaneously. Therefore, in each period, the total purchasing or refilling costs incurred can be represented by c d (y d x d ) +c r (y r x r ) +KI y d >x d oryr>xr : LetD t ,t = 1;:::;T denote the demand incurred in periodt. Throughout this section, we do not impose any structural restrictions onD t , other than the fact that it is non-negative. In each periodt, the after-refill balance in the discount account,y d , is used to satisfy the demand first. IfD t > y d , then the balance in the discount account is used up to fulfill some partial demandD t y d . The balance in the regular account, y r , is then used to realize the rest of the demand as much as possible. Any excess demand that cannot be satisfied by the regular account, that is, (D t y d y r ) + , is considered lost, and a per-unit penalty costb is incurred. In particular,b can include the nominal bus fare and the inconvenience cost using cash, and it makes practical sense to assume thatc d < c r < b. At the end of each period, a per-unit holding costh d is incurred for any remaining balance in the discount account; a per-unit holding costh r is incurred for any remaining balance in the regular account. Therefore, in each periodt, the total holding and penalty costs incurred are: b(D t y d y r ) + +h d (y d D t ) + +h r (y r (D t y d ) + ) + : At the end of the entire planning horizon, that is, t = T + 1, any remaining balance in the discount account is considered sunk cost and irrecoverable, while the remaining balance in the regular account can be carried forward to the next planning horizon or refunded fully. Throughout this section, we assume thatb > c r > c d h d h r . Note thatb > c r > c d resembles the reality, as passengers incur greater cost when using the regular account or cash instead of the discount account. Moreover, the holding costsh d andh r cannot be too high; otherwise, passengers would never use the bus card. Finally, we assume thath d h r because the balances in the regular account can be used for many other purposes and function as a debit card. 1 In reality,x d ;xr;y d andyr should take non-negative integer values. Relaxing this constraint allows us to derive richer structural results of the value function. 58 Parameter Range Interpretation c d [0;1) Per-unit bus fare for the discount account cr [0;1) Per-unit bus fare for the regular account b [0;1) Per unit penalty cost (cost of using cash) h d [0;1) Per-unit holding cost for the discount account hr [0;1) Per-unit holding cost for the regular account K [0;1) Joint setup cost [0;1] Discount factor Table 3.1: List of parameters Let V t (x d ;x r ) denote the total expected cost incurred throughout the remaining Tt + 1 periods, given the initial balances (x d ;x r ) in both accounts. Then, the bus card problem can be formulated into the following integer program: V t (x d ;x r ) = min y d x d ;yrxr KI y d >x d oryr>xr +c d (y d x d ) +c r (y r x r ) +E h d (y d D t ) + +h r (y r (D t y d ) + ) + +b(D t y d y r ) + +V t+1 ((y d D t ) + ; (y r (D t y d ) + ) + ) ; (3.1) with the boundary conditionV T +1 (x d ;x r ) =c r x r representing the refundable regular account balance at the end of the planning horizon. The above dynamic program has an intuitive cost-reduction value function. The parameter denotes the discount factor that measures a subjective reduction in the expected future cost. Since the planning horizon is usually short (i.e., a month), we use = 1 from this point onwards. 2 In fact, as shown in the next lemma, this treatment allows us to transform the value function to a simpler form. Lemma 3.3.1 (Model Equivalency). Given = 1, the following dynamic program is equivalent to (3.1): J t (x d ;x r ) = min y d x d ;yrxr KI y d >x d oryr>xr +E c d min(D t ;y d ) +c r min((D t y d ) + ;y r ) +h d (y d D t ) + +h r (y r (D t y d ) + ) + +b(D t y d y r ) + +J t+1 ((y d D t ) + ; (y r (D t y d ) + ) + ) ; (3.2) with the boundary conditionJ T +1 (x d ;x r ) =c d x d . 2 Similar results can be achieved for2(0;1). 59 The difference between (1) and (2) is the following: in (1),c d (y d x d ) +c r (y r x r ) is the (variable) refilling cost incurred in that period. However, in (2),c d min(D t ;y d )+c r min((D t y d ) + ;y r ) accounts for the actual fare paid to the bus company in that period. Formulation (2) is simpler than (1) in the sense that the initial balances (x d ;x r ) do not participate directly in all the terms in the value functionJ t (x d ;x r ), except in the fixed cost functionKI y d >x d oryr>xr . The above transformation allows us to simplify the analysis of the value function substantially. The following lemma gives the expression and structural properties of the expected one-period cost incurred in periodt. Lemma 3.3.2 (Expected One-Period Cost). For each period t = 1;:::;T , given the after-refill balances (y d ;y r ), the expected one-period cost incurred in periodt is g t (y d ;y r ) =E c d D t + (c r c d h r )(D t y d ) + + (bc r +h r )(D t y d y r ) + +h d (y d D t ) + +h r y r : Moreover,g t (y d ;y r ) is jointly convex iny d 0 andy r 0 for allt = 1;:::;T . Given the initial balances (x d ;x r ) in periodt, letG t (x d ;x r ) denote the expected cost incurred for the remainingTt + 1 periods when we do not refill any of the accounts in periodt and act optimally in the remaining periods. Then, forx d ;x r 0, we have G t (x d ;x r ) =g T (x d ;x r ) +E[J t+1 ((x d D t ) + ; (x r (D t x d ) + ) + )] and J t (x d ;x r ) = min y d x d ;yrxr G t (y d ;y r ) +KI y d >x d oryr>xr ; whereJ t (;) is the value function for periodt, andJ T +1 (x;y) =c d x. The existence of the fixed costK clouds the structural properties of the value function. Therefore, in the following sections, we begin our analysis by exploring some special cases. 60 Optimal Policy with Zero Fixed Cost For the rest of this section, we assume thatK = 0. The assumption is reasonable when refilling account balances is convenient. For example, suppose that the account balances can be refilled using mobile appli- cations or through internet banking services. Then, it becomes virtually effortless to top up the bus card, and the fixed cost can be significantly reduced. The following theorem shows the structural properties ofG t and the value functionJ t . Theorem 3.3.3 (Structural Properties). LetK = 0. For anyt = 1;:::;T ,x 0 andy 0, (i) (Joint-Convexity).G t (x;y) is jointly convex in (x;y).J t (x;y) is jointly convex and non-decreasing in (x;y) (ii) (Subdifferentials). Fort = 1;:::;T , the subdifferentials ofG t (x;y) with respect toy satisfy @G t (x;y) @y @G t (x;y) @x c r c d for any (x;y) that is not a break point. Therefore, in the absence of fixed costs, the value function is jointly convex in the discount and regular account balances for any general demand function. Theorem 3.3.3 clearly suggests that the optimal policy has a base stock form. Let (x t d ;x t r ) denote the initial account balances and (y t d ;y t r ) denote the optimal after-refill balances in periodt. The next corollary shows the main result of this section. Corollary 3.3.4 (Partial Optimal Policy under Zero Fixed Costs). When there is no fixed costs, define thresh- olds (s t d ;s t r ) = arg min x;y J t (x;y). Let t =fx t d s t d \x t r s t r g and t =fx>s t d \y>s t r g. For each periodt = 1;:::;T , the optimal refill policy has a base stock (s t d ;s t r ) form, i.e.: (y t d ;y t r ) = 8 < : (s t d ;s t r ) if (x t d ;x t r )2 t (x t d ;x t r ) if (x t d ;x t r )2 t : Note that the above policy does not specify the optimal action when (x t d ;x t r ) = 2 t [ t . Therefore, the policy in Corollary 3.3.4 is only optimal when the initial balances (x t d ;x t r )2 t for all t = 1;:::;T . In other words, we need (x 1 d ;x 1 r ) 2 1 and 1 2 ::: T . Note that the latter requires that 61 s 1 d s 2 d ::: S T d ands 1 r s 2 r ::: S T r . These conditions are inconvenient because they cannot be verified a priori. Moreover, they may not hold under the finite-horizon model, because intuitively, we would expect a rational customer to refill to a lesser balance toward the end of the horizon. The optimal policy is rather complicated for forward-looking customers. To generate some insights on this policy, we can look at the optimal policy for myopic passengers, who only optimize over the per-period expected cost. The optimal policy is stated in the next theorem. Theorem 3.3.5 (Myopic Policy). LetF t be the inverse cumulative distribution function forD t , a continuous random variable. Define thresholds s t = F t ( bcr bcr +hr ), s t d =F t ( crc d crc d hr +h d ) and s t r = s t s t d . (i) If h d hr > bc d bcr , let t 1 =fx t d s t d \x t r s t r g, t 2 =fx t d s t d \x t r > s t r \ @g t (x;y) @x j x=x t d ;y=x t r 0g, t 3 =fx t d > s t d \x t d +x t r s t g, and t 4 = ( t 1 [ t 2 [ t 3 ) { . Then, the following policy is optimal myopically: (y t d ;y t r ) = 8 > > > > > > > < > > > > > > > : ( s t d ; s t r ) if (x t d ;x t r )2 t 1 ( y t d ;x t r ) if (x t d ;x t r )2 t 2 (x t d ; s t ) if (x t d ;x t r )2 t 3 (x t d ;x t r ) if (x t d ;x t r )2 t 4 ; where @g t (x;y) @x j x= y t d ;y=x t r = 0. (ii) If h d hr bc d bcr , let 1 =fx t d s t d \ @g t (x;y) @x j x=x t d ;y=x t r 0g and 2 = { 1 . The following policy is optimal myopically: (y t d ;y t r ) = 8 < : ( y t d ;x t r ) if (x t d ;x t r )2 t 1 (x t d ;x t r ) if (x t d ;x t r )2 t 2 ; where @g t (x;y) @x j x= y t d ;y=x t r = 0. Theorem 3.3.5 has the following implications: ifD t ,t = 1;:::;T follows an independent and identical distribution, then we have s 1 = ::: = s T , s 1 d = ::: = s T d and s 1 r = ::: = s T r . Thus, 1 j = ::: = T j for j = 1;:::; 4 and 1 k =::: = T k fork = 1; 2. Then, if the initial balances in period 1, (x 1 d ;x 1 r )2 1 1 (or 1 1 ), the optimal refill policy has a simple base stock form. 62 Figure 3.1: Illustration of the myopic optimal ordering policy with zero fixed costs when h d hr > bc d bcr . Optimal Policy with a Joint Fixed Cost Now suppose thatK > 0. That is, there is a non-zero joint setup cost. This setting is relevant to the situation in which both accounts can be refilled at the same locations such as bus stations. For ann-product model with a fixed joint setup cost, Liu and Esogbue (2012) generalize the concept ofK-convexity toR n . The next theorem states the optimal policy under the condition that the order-up- to levels are increasing over time. Our theorem can be proved using a similar analysis to that of Liu and Esogbue (2012), so we omit the details and refer readers to Gallego and Sethi (2005) and Liu and Esogbue (2012) for more information. Theorem 3.3.6. For each period t = 1;:::;T , let (S t d ;S t r ) be the global minimum of G t (x;y). Define a critical surface as = fx S t d ;y S t r jG t (x;y) = K + G t (S t d ;S t r )g. Furthermore, define the following two disjoint sets: t =fx S t d ;y S t r j9(a;b)2 s.t. a x S t d ;b x S t r g and t =fxS t d ;yS t r j(x;y) = 2 t g. Ifx t d S t d andx t r S t r fort = 1;:::;T , then (1) J t (x t d ;x t r ) = 8 < : K +G t (S t d ;S t r ) if (x t d ;x t r )2 t G t (x t d ;x t r ) if (x t d ;x t r )2 t ; andJ t (x;y) is continuous andK-convex onfxS t d ;yS t r g. 63 (2) The optimal ordering policy in periodt is (y t d ;y t r ) = 8 < : (S t d ;S t r ) if (x t d ;x t r )2 t (x t d ;x t r ) if (x t d ;x t r )2 t : It is worth mentioning that t is not the complement of t . Therefore, the optimal policy stated in Theorem 3.3.6 is only complete if the order-up-to levels (S t d ;S t r ) is non-decreasing overt. However, this condition cannot be verified a priori, and we can show that the order-up-to levels do not increase in our numerical experiments. Intuitively, it is optimal to refill the discount account up to a lesser amount toward the end of horizon because the remaining balance in the discount account expires at the end of the planning period. Numerically, in each periodt, we do observe the existence of t (refill up to the global minimum) and t (do not refill). Outside these two regions, the optimal refill-up-to levels depend on the initial balance (x t d ;x t r ) and have to be computed from theJ t function. 3.4 Parameter Estimation 3.4.1 Data We have the following data available over a 12-month period (3/1/2010 to 2/28/2011): daily demand data (number of times of usage) for the discount and regular account, remaining account balances for the discount account, and refill data (refill amount and date) for the discount account of 500 commuters. The remaining account balances for the regular account can only be observed when that account is used. Moreover, the amount and dates of refilling the regular account are not readily available. Thus, the information has to be inferred data. We will explain the estimation method in detail in Section 3.4.3. 3.4.2 Estimation of Daily Demand In our data, the scenario in which both the discount and regular accounts have zero balances rarely occurs. Therefore, for passengers who always carry some balances in the regular account, there will no penalty cost incurred in the actual bus usage. This observation makes the estimation of demand an easy task. 64 We assume that the daily demandD j for each customerj is an independent and identically distributed random variable that follows a discrete distributionF j (). LetT i be the total number of days in monthi, and ^ d i;t be the observed demand during thet th day in monthi,i = 1;:::;M, whereM is the total number of months. Then, we can estimate the probability that the demand for bus usage of consumerj isd on any given day to be: P (D j =d) = P M i=1 P T i t=1 I ^ d i;t =d P M i=1 T i ; whereI () is the indicator function. Let d j be the maximum daily demand observed for passengerj. Then, for each consumerj, we can create a discrete probability distribution function such that d j X d=0 P (D j =d) = 1: 3.4.3 Estimation of Refill Data There are many uncertainties in estimating the refill data for the regular account. First of all, we can only observe the remaining balance in the regular account when the passenger uses that account to ride a bus. Moreover, a passenger often uses the regular account to pay for other purchases. Thus, the refilled amount may not be entirely used to pay for bus usage. If the remaining balance is higher at the next time of usage, we can infer that the customer has refilled the regular account, but we do not know when this happened and exactly how much he or she recharged the regular account. Last but not least, we cannot observe the refill activity when a passenger recharged the regular account then made other purchases that brought the balance down. To deal with such uncertainties, we make the following assumptions. (1) If an observable recharge occurs, we assume that it happens at the latest possible date. For example, suppose a consumer used the regular account on March 1st, with a remaining balance of $3, and the next usage of the regular account is on March 15th, with a beginning balance of $53. We then assume that the passenger recharged the regular account on March 15th. (2) We assume that a passenger uses the regular account for bus only. Thus, we need to subtract any amount used for other purpose from the recharged amount. Using the previous example, suppose that the next recharge of the regular account took place just after March 28th with a remaining balance of $5, but during this period, we observe that the passengers had only used $15 in the regular account to pay for the 65 bus fare. This means that the passenger had used $30 for other purposes. We then assume that on March 15th, the passenger recharged $20 in the regular account (instead of $50). (3) We do not count unobservable recharges, since the balances in the regular account had been used for purchases other than for riding buses. 3.4.4 Grid Search for Parameters We assume that the discounted bus fares to bec d = $0:5 andc r = $0:9, and the cash fare (penalty cost) b = $1. We perform the following tests to estimate fixed costs and holding costs: (1)K = 0;h r 0 andh d 0. (2)K 0,h r 0 andh d 0. We use a grid size of 0:001 and search over [0; 5] for bothh r andh d . We search over [0; 20] for the fixed costs. For each set of parameters = (K;h r ;h d ) and dayt in each monthi, we generate a table that shows the optimal refill actions given the initial balancesx r andx d in both accounts. Let j i () = ( j i;1 ();:::; j i;T i ()) denote the optimal policies in monthi for passengerj. We split the data into training data (nine consecutive months of data) and validation data (the other three months of data) and perform a parameter estimation on the training data. We use the following measurements to train our models. For simplicity, we drop the superscript j that represents each individual passenger. Here, for each passenger, we want to search for the set of parameters = (K;h r ;h d ) that minimizes the aggregate difference between the actual spending and the simulated total spending based on the optimal policy(). Let ^ z d i;t and ^ z r i;t be the observed refill amount in the discount account and regular account on dayt in monthi, respectively. Let ^ z b i;t denote the estimated penalty cost incurred on dayt in monthi. Given = (K;h r ;h d ), letz d i;t j i () andz r i;t j i () be the optimal refill amount in the discount account and regular account, respectively, on dayt in monthi given the optimal policy i (). Moreover, letz b i;t j i () denote the penalty cost incurred on dayt in monthi given the optimal policy i (). Note that the penalty cost is 66 incurred only when the actual demand on that day exceeds the sum of after-refill balances in the discount and regular accounts. Then, our problem is to find: = arg min =(K;hr;h d ) M X i=1 T i X t=1 h (^ z d i;t + ^ z r i;t + ^ z b i;t ) (z d i;t j i () +z r i;t j i () +z b i;t j i ();D i;t ) i : Figure 3.2 shows the distribution of estimated holding costs in both accounts when fixed costs are assumed to be zero. We observe that h r , the estimated holding cost for the regular account, is always less than $0.16/day, whereash d , the estimated holding cost for the discount account, can be much higher, up to $0.60/day. The average holding costs are $0:12/day for the discount account and $0:07/day for the regular account. The smallerh r can be explained by the fact that the regular account can be used for multi- ple purposes and used as a natural “debit card.” As a result, passengers usually stock up the regular account to ensure that their demand can be met by the regular account balance so that they do not have to pay the higher cash fare. The higherh d demonstrates passengers’ loss aversion of losing the remaining balance in the discount account toward the end of each month. Note that the estimated holding cost for the discount account can sometimes be even higher than the $0:50 discount account fare. This is partially due to the fact that we assume the fixed costs are zero. As a result, the optimal policy is to refill the bus card as frequently as possible by small amounts to avoid incurring the holding costs. Figure 3.2: Simulated holding costs when fixed costs are zero. Figure 3.3 shows the distribution of estimated holding costs in both accounts when there is a joint setup cost. Compared to the first model (without fixed cost), the estimated holding costs are significantly lower and much more reasonable from a practical point of view. The average holding cost for the regular account is about $0:03/day, and the average holding cost for the discount account is $0:07/day. The estimated joint setup cost can be significant, up to $12:8. In such instances, the optimal action is to either recharge the bus 67 card accounts by a large amount at the beginning of the month (if holding costs are small) or to completely avoid recharging the bus card accounts by using cash, especially toward the end of the month. Figure 3.3: Simulated holding costs and joint fixed cost. To validate the fit of our models, we carry out a four-fold cross-validation on the data. We measure the out-of-sample prediction error using the standard mean absolute percentage error (MAPE) metric, which measures the average relative error in predicting the total costs. We compute the percentage prediction error by 100% Total Estimated SpendingTotal Actual Spending Total Actual Spending . On average, comparing to the model with no setup cost, the model with a joint setup cost reduces the prediction error by more than 50%. setup cost performs better than the model with no fixed cost, especially for more frequent travelers. In fact, on average, compare to the model with no setup cost, the model with a joint setup cost reduces the prediction error by 76%. 3.5 Counterfactual Analysis In this section, we carry out three counterfactual experiments to demonstrate the managerial implications of this research. In Section 3.4, we show that the refill model with a joint setup cost is superior to the one without a fixed cost. Therefore, we use the estimated parameters of the model with a non-negative joint setup cost in this section. In the first two sets of experiments, we estimated the optimal fares that should be set by the bus com- pany, using our finite horizon model with a joint setup cost. In Experiment 1, we assume that the $0:9 regular account fare and $1 cash fare are unchanged, and estimate the optimal discount account fare. In 68 Experiment 2, we assume that the $0:5 discount account fare and $1 cash fare are unchanged, and estimate the optimal fare for the regular account. To compute the optimal fare, we need to factor in the changes in ridership by varying bus fares. Typically, the transit price elasticities fall between -0.6 and -0.1 (Litman 2004). In each experiment, we compute the ridership generated by the new fare using the arc elasticity of ridership, given by E d = Q 1 Q 2 Q 1 +Q 2 P 1 +P 2 P 1 P 2 ; where Q 1 is the original ridership at the original bus fare P 1 of the tested account, and Q 2 is the new ridership generated by the updated bus fareP 2 . Arc elasticity is appropriate here because price change can be relatively large in our experiments. We then estimate the total revenue generated by the 500 passengers for each new bus fareP 2 . We compute the optimal fare of the tested account and the potential improvement in fare revenue for each value of fare elasticity. We present the results in Figure 3.4. The horizontal axis denotes the fare elasticity of the transit rider- ship. The left vertical axis represents the account fare, and the right vertical axis represents the percentage improvement in revenue. Our results show that the current discount account fare set by the bus company is likely to be underpriced. The optimal fare for the discount account increases as the demand becomes more inelastic. Moreover, the potential improvement in revenue can be significant. For example, if the fare elasticity of ridership for the discount account is -0.2, the optimal discount fare should be set to $0.85, and this results in about a 28% improvement in fare revenue for the bus company. Note that as the fare elasticity approaches 0, the optimal fare for the discount account becomes $0.9, which is exactly the same as the regular account fare. In this scenario, most passengers would only use the regular account because its bal- ance never expires. Thus, when demand is relatively inelastic, the extra revenue generated by the increased discount fare outweighs the revenue loss due to reduced ridership. In this case, the bus company could raise its fare revenue by increasing the discount account fare or even completely remove the discount account feature in the bus card. Note that abandoning the discount account feature would bring additional benefits to the bus company, including reduced manpower and infrastructure costs that could be used to fulfill the refilling demand of passengers. On the other hand, a reduced regular account fare has a similar effect as an increased discount account fare. As shown in Figure 3.4 (b), the current regular account fare is likely to be overpriced. This is an 69 intuitive result because when the discount account fare is increased (or when the regular account fare is decreased), passengers would be more willing to use the regular account rather than the discount account. When demand is inelastic, our results suggests that the optimal fare for the regular account should be $0.80. It is worth mentioning that one of the major goals of public transit agencies is to promote ridership rather than generate greater fare revenue. Therefore, compared to an increased discount fare, a reduced regular account fare should be the preferred option because it would generate extra ridership while still being able to maintain or even improve the fare revenue. Figure 3.4: Optimal fares and revenue improvement with respect to fare elasticity of ridership. Our results in Section 3.4 suggest that passengers may have different perceived holding and setup costs under the same bus fare system. Therefore, the public transit agency should take these costs into consider- ation when it makes policy changes. Intuitively, reducing the inconvenience cost would improve customer satisfaction and increase the potential demand for public transportation. Therefore, in our third experiment, we investigate the effect of reducing fixed costs incurred by the passengers when refilling their bus cards. In reality, the inconvenience cost can be reduced by setting up more recharge vending machines or enabling online or even automatic refill options. Figure 3.5: Percentage reduction in fare revenue with respect to reduction in fixed cost. 70 In this analysis, we keep the estimated holding costs for each of the 500 passengers and vary the indi- vidual fixed cost by a percentage. Using the updated fixed cost, we then compute the optimal refill behavior for each of the 500 passengers and the aggregate fare revenue.We show the percentage change in fare rev- enue with respect to the change in the fixed cost in Figure 3.5. Note that when the fixed cost is completely removed (corresponding to a 100% reduction in fixed cost), the aggregate fare revenue would be reduced by about 16%. This result is intuitive in the sense that when fixed costs are reduced, passengers can refill their bus cards as often as they want and completely avoid using cash (the penalty cost) for the bus fare, thereby reducing the total spending of each passenger. Therefore, unless the elimination of fixed cost would generate an extra 16% in ridership and government assistance, doing so would hurt the fare revenue for the bus company. 3.6 Conclusion The introduction of smart cards with discounted bus fares in public transit systems has offered great con- venience and affordability to bus riders. In this paper, we develop a finite-horizon framework to offer a cost-minimizing solution for passengers that use dual-account bus cards. We provide structural results on the optimal recharging policy of the passengers, estimate the perceived cost parameters associated with each account and the passenger recharging decisions, and offer important managerial contributions. Our coun- terfactual simulations show that the current bus fares should be adjusted to bring greater ridership and fare revenue for the bus company. 71 References Akaike, H. (1974), ‘A new look at the statistical model identification’, IEEE Transactions on Automatic Control 19(6), 716–723. Allenby, G., Shively, T. S., Yang, S. and Garratt, M. J. (2004), ‘A choice model for packaged goods: dealing with discrete quantities and quantity discounts’, Marketing Science 23(1), 95–108. Andrews, R. L. and Manrai, A. K. (1998), ‘Featured-based elimination: Model and emipirical comparison’, European Journal of Operational Research 111(2), 248–267. APTA (2009), Recommended Practice for Quantifying Greenhouse Gas Emissions from Transit, APTA CC- RP-001-09. Bailey, L., Mokhtarian, P. L. and Little, A. (2008), The broader connection between public transportation, energy conservation and greenhouse gas reduction, Fairfax, V A: ICF International. Batley, R. and Daly, A. (2006), ‘On the equivalence between elimination-by-aspects and generalised extreme value models of choice behaviour’, Journal of Mathematical Psychology 50(5), 456–467. Ben-Akiva, M. and Lerman, S. (1985), Discrete Choice Analysis: Theory and Application to Travel Demand, MIT Press, Cambridge, MA. Bitran, G. R. and Matsuo, H. (1986), ‘Approximation formulation for the single-product capacitated lot size problem’, Operations Research 34(1), 63–74. B¨ orch-Supan, A. (1990), ‘On the compatibility of nested logit models with utility maximization’, Journal of Econometrics 43(3), 373–388. Boyd, S. and Vandenberghe, L. (2004), Convex Optimization, Cambridge University Press, Cambridge, UK. Bront, J. M., Mendez Diaz, I. and Vulcano, G. (2009), ‘A column generation algorithm for choice-based network revenue management’, Operations Research 57(3), 769–784. 72 Cardell, N. S. (1997a), ‘Variance components structures for the extreme-value and logistic distributions with application to models of heterogeneity’, Econometric Theory 13(02), 185–213. Cardell, N. S. (1997b), ‘Variance components structures for the extreme-value and logistic distributions with applications to models of heterogeneity’, Economic Theory 13(2), 185–213. Carson, R. T., Hanemann, W. M. and Wegge, T. C. (2009), ‘A nested logit model of recreational fishing demand in alaska’, Marine Resource Economics 24(2), 101–129. Cervero, R. and Duncan, M. (2008), Residential self selection and rail commuting: A nested logit analysis, Technical report, University of California, Berkeley. Chen, K. D. and Hausman, W. H. (2000), ‘Technical note: Mathematical properties of the optimal product line selection problem using choice-based conjoint analysis’, Management Science 46(2), 327–332. Chiang, C. and Gutierrez, G. J. (1996), ‘A periodic review inventory system with two supply modes’, Euro- pean Journal of Operational Research 94, 527–547. Chiang, J. (1991), ‘A simultaneous approach to the whether, what and how much to buy questions’, Market- ing Science 41(5), 297–315. Coldren, G. M. and Koppelman, F. S. (2005), ‘Modeling the competition among air-travel itinerary shares: Gev model development’, Transportation Research Part A 39(4), 345–365. Davis, J. M., Gallego, G. and Topaloglu, H. (2014), ‘Assortment optimization under variants of the nested logit model’, Operations Research 62(2), 250–273. Dickens, M., Neff, J. and Grisby, D. (2012), APTA 2012 public transportation fact book, American Public Transportation Association. Dong, L., Kouvelis, P. and Tian, Z. (2009), ‘Dynamic pricing and inventory control of substitute products’, Manufacturing & Service Operations Management 11(2), 317–339. Eecken, J. V . (1968), Investigation of a class of constrained inventory problems, PhD thesis, Yale University. Elmaghraby, S. and Bawle, V . (1972), ‘Optimization of batch ordering under deterministic varying demand’, Management Science 18(9), 508–517. Farias, V . F., Jagabathula, S. and Shah, D. (2011), Assortment optimization under general choice. Working Paper, MIT, Cambridge, MA. Farias, V . F., Jagabathula, S. and Shah, D. (2013), ‘A non-parametric approach to modeling choice with limited data’, Management Science 59(2), 305–322. 73 Gallego, G., Iyengar, G., Phillips, R. and Dubey, A. (2004), Managing flexible products on a network. Working Paper, Columbia University. Gallego, G., Ratliff, R. and Shebalov, S. (2011), A general attraction model and an efficient formulation for the network revenue management problem, Technical report, Columbia University, New York, NY . Gallego, G. and Sethi, S. (2005), ‘K-convexity inR n ’, Journal of Optimization Theory and Applications 1(1). Gallego, G. and Topaloglu, H. (2014), ‘Constrained assortment optimization for the nested logit model’, Management Science 60(10). Gallego, G. and van Ryzin, G. (1997), ‘A multiproduct dynamic pricing problem and its applications to yield management’, Operations Research 45(1), 24–41. Gallego, G. and Wang, R. (2013), ‘Multi-product price optimization and competition under the nested logit model with product-differentiated price sensitivities’, Operations Research 62(2), 450–461. Gaur, V . and Honhon, D. (2006), ‘Assortment planning and inventory decisions under a locational choice model’, Management Science 52(10), 1528–1543. Gilbride, T. J. and Allenby, G. M. (2004), ‘A choice model with conjunctive, disjunctive, and compensatory screening rules’, Marketing Science 23(3), 391–406. Gramlich, J. P. (2009), Gas prices and fuel efficiency in the US automobile industry: Policy implications of endogenous product choice, Ph.D. Dissertation, Georgetown University. Grant, M. and Boyd, S. (2008), Graph implementations for nonsmooth convex programs, in V . Blondel, S. Boyd and H. Kimura, eds, ‘Recent Advances in Learning and Control’, Lecture Notes in Control and Information Sciences, Springer-Verlag Limited, pp. 95–110. http://stanford.edu/ ˜ boyd/ graph_dcp.html. Grant, M. and Boyd, S. (2013), ‘CVX: Matlab software for disciplined convex programming, version 2.0 beta’,http://cvxr.com/cvx. Hanson, W. and Martin, K. (1996), ‘Optimizing multinomial logit profit functions’, Management Science 42(7), 992–1003. Honhon, D., Gaur, V . and Seshadri, S. (2010), ‘Assortment planning and inventory decisions under stockout- based substitution’, Operations Research 58(5), 1364–1379. 74 Iseki, H., Demisch, A., Taylor, B. D. and Yoh, A. C. (2008), Evaluating the costs and benefits of transit smart cards, Technical report, California PATH Program, Institute of Transportation Studies, University of California, Berkley, CA. Johnson, E. L. (1967), ‘Optimality and computation of (;s) policies in the multi-item infinite horizon inventory problem’, Management Science 13(7), 475–491. Kalin, D. (1967), ‘On the optimality of (;s) policies’, Mathematics of Operations Research 5(2), 293–307. Karlin, S. and Scarf, H. (1958), ‘Inventory models of the arrow-harris-marschak type with time lag’, Studies in the Mathematical Theory of Inventory and Production pp. 155–178. Kleinberg, J. and Tardos, E. (2005), Algorithm Design, Addison Wesley, New York. Knuth, D. E. (1998), The Art of Computer Programming, V ol. 3, second edn, Addison-Wesley Professional. K¨ ok, A. G., Fisher, M. L. and Vaidyanathan, R. (2009), Assortment planning: Review of literature and industry practice, in ‘Retail Supply Chain Management’, Springer US, pp. 99–153. K¨ ok, A. G. and Xu, Y . (2010), ‘Optimal and competitive assortments with endogenous pricing under hierar- chical choice models’, Management Science 57(9), 1546–1563. Koppelman, F. S. and Sethi, V . (2000), Closed-form discrete-choice models, in D. A. Hensher and K. J. Button, eds, ‘Handbook of Transport Modeling’. Kunnumkal, S. (2014), ‘Randomization approaches for network revenue management with customer choice behavior’, Production and Operations Management 23(9). Lagodimos, A. G., Christou, I. T. and Skouri, K. (2012), ‘Optimal (r, nq, t) batch-ordering policy under stationary demand’, International Journal of Systems Science 43(9), 1774–1784. Li, C., Hsu, V . N. and X, W. (2004), ‘Dynamic lot sizing with batch ordering and truckload discounts’, Operations Research 52(4), 639–654. Li, G. and Rusmevichientong, P. (2014), ‘A greedy algorithm for the two-level nested logit model’, Opera- tions Research Letters 42(5), 319–324. Li, H. and Huh, T. (2013), Pricing under the nested attraction model with a multi-stage choice structure. INFORMS Annual Meeting, Minneapolis, MN. Li, H. and Huh, W. T. (2011), ‘Pricing multiple products with the multinomial logit and nested logit models: Concavity and implications’, Manufacturing & Service Operations Management 13(4), 549–563. 75 Li, Q. and Yu, P. (2012), ‘Technical note on the quasiconcavity of lost-sales inventory models with fixed costs’, Operations Research 60(2), 286–291. Li, Z. (2007), ‘A single-period assortment optimization model’, Production and Operations Management 16(2), 369–380. Litman, T. (2004), ‘Transit price elasticities and cross-elasticities’, Journal of Public Transportation 7(2), 37–58. Liu, B. and Esogbue, A. O. (2012), Decision Criteria and Optimal Inventory Processes, V ol. 20, Springer Science & Business Media. Liu, Q. and van Ryzin, G. (2008), ‘On the choice-based linear programming model for network revenue management’, Manufacturing & Service Operations Management 10(2), 288–310. Luce, R. D. (1959), Individual Choice Behavior: A theoretical Analysis, Wiley, New York, NY . Manski, C. F. and McFadden, D. L. (1981), Structural Analysis of Discrete Data and Econometric Applica- tions, The MIT Press. McFadden, D. (1974), Conditional logit analysis of qualitative choice behavior, in P. Zarembka, ed., ‘Fron- tiers in Econometrics’, Academic Press, pp. 105–142. McFadden, D. (1978), ‘Modeling the choice of residential location’, Transportation Research Record (672), 72–77. Morse, P. M. (1959), ‘Solutions of a class of discrete-time inventory problems’, Operations Research 7(1), 67–78. Paulley, N., Balcombe, R., Mackett, R., Titheridge, H., Preston, J., Wardman, M., Shires, J. and White, P. (2006), ‘The demand for public transport: The effects of fares, quality of service, income and car ownership’, Transport Policy 13(4), 295–306. Pham, L. and Linsalata, J. (1991), Effects of fare changes on bus ridership, American Public Transportation Association. Pihlens, D. A. (2008), The Multi-Attribute Elimination-By-Aspects Model, Ph.D. Dissertation, University of Technology Sydney. Rusmevichientong, P., Shen, Z.-J. M. and Shmoys, D. B. (2009), ‘A PTAS for capacitated sum-of-ratios optimization’, Operations Research Letters 37(4), 230–238. 76 Rusmevichientong, P., Shmoys, D. B., Tong, C. and Topaloglu, H. (2014), ‘Assortment optimization under the multinomial logit model with random choice parameters’, Production and Operations Management 23(11), 2023–2039. Song, J.-S. and Xue, Z. (2007), Demand management and inventory control for substitutable products. Working Paper, Duke University, Durham, NC. Sun, W. and Yuan, Y . (2006), Optimization Theory and Methods: Nonlinear Programming, V ol. 1, Springer. Tagaras, G. and Vlachos, D. (2001), ‘A periodic review inventory system with emergency replenishments’, Management Science 47(3), 527–547. Talluri, K. and van Ryzin, G. J. (2004), ‘Revenue management under a general discrete choice model of consumer behavior’, Management Science 50(1), 15–33. Train, K. (2003), Discrete Choice Methods with Simulation, Cambridge University Press, New York, NY . Tversky, A. (1972a), ‘Choice-by-elimination’, Journal of Mathematical Psychology 9(4), 341–367. Tversky, A. (1972b), ‘Elimination-by-aspect: A theory of choice’, Psychological Review 79(4), 281–299. Tversky, A. and Sattath, S. (1979), ‘Preference trees’, Psychological Review 86(6), 542–573. Ubbels, B. and Nijkamp, P. (2008), ‘Unconventional funding of urban public transport’, Transportation Research Part D 7(5), 317–329. van Ryzin, G. and Mahajan, S. (1999), ‘On the relationship between inventory costs and variety benefits in retail assortments’, Management Science 45(11), 1496–1509. V ovsha, P. (1997), ‘Application of cross-nested logit model to mode choice in Tel Aviv, Israel metropolitan area’, Transportation Research Record 1607, 6–15. Waddell, P. (1993), ‘Exogenous workplace choice in residential location models: Is the assumption valid?’, Geographical Analysis 25(1), 65–82. Wang, J., Levin, Y . and Nediak, M. (2015), Selling passes to strategic customers. Working Paper, Queen’s University, Kingston, ON, Canada. Wen, C.-H. and Koppelman, F. S. (2001), ‘The generalized nested logit model’, Transportation Research Part B 35, 627–641. Xie, J. and Shugan, S. M. (2001), ‘Electronic tickets, smart cards, and online prepayments: When and how to advance sell’, Marketing Science 20(3), 219–243. 77 Zhang, D. and Adelman, D. (2009), ‘An approximate dynamic programming approach to network revenue management with customer choice’, Transportation Science 43(3), 381–394. Zipkin, P. (2008), ‘On the structure of lost-sales inventory models’, Operations Research 56(4), 937–944. 78 Appendix A Technical Appendix to Chapter 2 A.1 Omitted Results in Section 2.4 The proofs of Proposition 2.4.1 and Theorem 2.4.2 use the next lemma, which shows the relationship between the local problem at nodej given in (2.4) and the same problem at its children nodes. Lemma A.1.1. Consider an arbitrary non-leaf nodej. Assume that for everyk2 Children(j), there exists a set b S k N k such thatV k ( b S k ) R k ( b S k )t Parent(k) 0 and V k ( b S k ) R k ( b S k )t Parent(k) V k (S k ) R k (S k )t Parent(k) : (A.1) Then, letting b S j = S k2 Children(j) b S k , we have V j ( b S j ) R j ( b S j )t Parent(j) V j (S j ) R j (S j )t Parent(j) : (A.2) Moreover, if the inequality in (A:1) is strict for somek2 Children(j), so is the inequality in (A:2). Proof. There are two cases to consider: R j (S j ) t Parent(j) and R j (S j ) > t Parent(j) . First, consider the case R j (S j ) t Parent(j) . Then, it follows from the definition of t j that t j = t Parent(j) . By our hypothesis,V k ( b S k ) R k ( b S k )t j 0 for allk2 Children(j), which implies that V j ( b S j ) 1= j R j ( b S j )t j = X k2 Children(j) V k ( b S k ) R k ( b S k )t j 0; where the equality follows by noting that V j ( b S j ) 1= j = X k2 Children(j) V k ( b S k ) and R j ( b S j ) = P k2 Children(j) V k ( b S k )R k ( b S k ) P k2 Children(j) V k ( b S k ) : Thus,V j ( b S j ) R j ( b S j )t j 0V j (S j ) R j (S j )t Parent(j) , completing the first case. 79 Now, assume thatR j (S j ) > t Parent(j) . SinceV k ( b S k ) R k ( b S k )t j V k (S k ) R k (S k )t j for allk2 Children(j), we obtain V j ( b S j ) 1= j R j ( b S j )t j = X k2 Children(j) V k ( b S k ) R k ( b S k )t j X k2 Children(j) V k (S k ) R k (S k )t j = V j (S j ) 1= j R j (S j )t j ; where the two equalities above follow from an argument similar to the one we used in the first case. The above expression implies that R j ( b S j )t Parent(j) " V j (S j ) V j ( b S j ) # 1= j R j (S j )t j +t j t Parent(j) = j " V j (S j ) V j ( b S j ) # 1= j R j (S j )t Parent(j) + (1 j ) R j (S j )t Parent(j) = 0 @ j " V j (S j ) V j ( b S j ) # 1= j + 1 j 1 A R j (S j )t Parent(j) ; where the first equality follows from the fact that t j = j t Parent(j) + (1 j )R j (S j ) because R j (S j ) > t Parent(j) . The functionx7!x 1= j is convex, whose derivative atx = 1 is 1= j , and thus, for allx2R + , we havex 1= j 1 + 1 j (x 1). Writing this inequality as j x 1= j + 1 j x, and using this inequality withx =V j (S j )=V j ( b S j ) in the above expression, we obtain R j ( b S j )t Parent(j) V j (S j ) V j ( b S j ) R j (S j )t Parent(j) ; which yields V j ( b S j ) R j ( b S j )t Parent(j) V j (S j ) R j (S j )t Parent(j) and completes the second case. Finally, going through the same reasoning in the proof, it is easy to verify that if the inequality in (A.1) is strict for somek, then the inequality in (A.2) is also strict. 2 2 Here is the proof of Proposition 2.4.1. Proof. Proof of Proposition 2.4.1: We will prove the result by induction on the depth of nodej. The result is trivially true atroot because the local problem atroot is identical to the assortment problem in (2.3). Now, 80 assume that the result is true for all nodes at depthh. We want to show that the result also holds for all nodes at depthh + 1. Suppose, on the contrary, that there exists nodei at depthh + 1 such that V i (S i ) R i (S i )t Parent(i) < max S i N i V i (S i ) R i (S i )t Parent(i) ; and let e S i N i be the set that achieves the maximum on the right side above. By our construction, we have thatV i ( e S i ) R i ( e S i )t Parent(i) 0 because an empty set yields a value of zero for the optimization problem on the right side above. Also, V i ( e S i ) R i ( e S i )t Parent(i) > V i (S i ) R i (S i )t Parent(i) : We usej to denote the parent of nodei. For eachk2 Children(j), let b S k be defined by: b S k = 8 < : e S i if k =i arg max S k N k V i (S k ) R k (S k )t j if k6=i: By our construction, for everyk2 Children(j), V k ( b S k ) R k ( b S k )t Parent(k) V k (S k ) R k (S k )t Parent(k) ; and the inequality above is strict ifk = i. Letting b S j = S k2 Children(j) b S k , it then follows from Lemma A.1.1 that V j ( b S j ) R j ( b S j )t Parent(j) > V j (S j ) R j (S j )t Parent(j) : Since nodej is at depthh, by the inductive hypothesis,S j solves the local problem at nodej, yielding V j (S j )(R j (S j )t Parent(j) ) = max S j N j V j (S j ) R j (S j )t Parent(j) : The above equality contradicts the last inequality! Therefore, the result must also hold for all nodes at depth h + 1, as desired. 2 2 Below is the proof of Theorem 2.4.2. 81 Proof. Proof of Theorem 2.4.2: Because of the hypothesis of the theorem, since b S k solves the local problem at nodek, we must haveV k ( b S k ) R k ( b S k )t Parent(k) 0 because the empty set gives an objective value of zero for the local problem at nodek and b S k is an optimal solution to the local problem at nodek. Also, by definition of b S k , for everyk2 Children(j), V k ( b S k ) R k ( b S k )t Parent(k) V k (S k ) R k (S k )t Parent(k) : Let b S j = S k2 Children(j) b S k . It then follows from Lemma A.1.1 that V j ( b S j ) R j ( b S j )t Parent(j) V j (S j ) R j (S j )t Parent(j) = max S j N j V j (S j ) R j (S j )t Parent(j) ; where the last equality follows from Proposition 2.4.1. This completes the proof. 2 2 A.2 Omitted Results in Section 2.5 Below, we give the proof of Theorem 2.5.2. Proof. Proof of Theorem 2.5.2: We will prove by induction that for eachh = 0; 1; 2;:::;d, X j2V : depth(j)=h jA j j 2n; where depth(j) denotes the depth of node j. The result is true for h = d because for each leaf node j, A j =ffjg;?g. Suppose the result is true forh + 1. Consider nodes at levelh. It follows from Lemma 2.5.1 that X j2V : depth(j)=h jA j j X j2V : depth(j)=h X k2 Children(j) jA k j = X j2V : depth(j)=h+1 jA j j 2n; 82 where the last inequality follows from the inductive hypothesis. Therefore, the total running time of the algorithm is d X h=0 X j2V : depth(j)=h jA j j log (jA j j) d X h=0 0 @ X j2V : depth(j)=h jA j j 1 A log 0 @ X j2V : depth(j)=h jA j j 1 A d X h=0 2n log(2n) = 2(d + 1)n log(2n) = O(dn logn); which completes the proof. 2 2 A.3 Omitted Results in Section 2.8 Here we present the proof of Theorem 2.8.1. We define some additional notation to prove Theorem 2.8.1. For each 0 and each assortmentS, let f (S) denote the expected revenue under assortmentS = (S j :j2 V) when the revenue of every product is increased by; that is, f (S) X `2S (r ` +) ` (S) = X `2S r ` ` (S) + X `2S ` (S) =f 0 (S) + X `2S ` (S); and letS denote an assortment that maximizesf (); that is,S = arg max Sf1;:::;ng f (S). The proof of Theorem 2.8.1 uses lemmas that establish properties of the collectionA j in the assortment optimization algorithm in Section 2.5. Throughout this section, we recall that b S k (u) is an optimal solution to problem (2.5) and the collection of subsetsA j is constructed as in (2.6). In the next lemma, we show that the subsets in the collectionA j can be ordered such that one subset is included in another one, naturally with the exception of the largest subset. This lemma becomes useful in the proof of Lemma A.3.3. Lemma A.3.1. For any nodej, consider the collectionA j =f b S q j :q = 1;:::;Q j g constructed as in (2:6). The subsets in this collection can be ordered such that b S 1 j b S 2 j ::: b S Q j j . Proof. We show the result by induction on the depth of nodej. For leaf nodej, we haveA j =f?;fjgg. Since?fjg, the result holds for leaf nodes. Assume that the result holds for all nodes at depthh + 1 and consider nodej at depthh. By the inductive hypothesis, for allk2 Children(j), the subsets inA k can be 83 ordered such that b S 1 k b S 2 k ::: b S Q k k . For allk2 Children(j), we claim that the optimal solution to problem (2.5) becomes a larger subset asu gets smaller. Claim 1: Letting b S k (u) be an optimal solution to problem (2.5), ifu>u 0 , then b S k (u) b S k (u 0 ). By the definition of b S k (u), we have V k ( b S k (u)) (R k ( b S k (u) u) V k ( b S k (u 0 )) (R k ( b S k (u 0 ) u) and V k ( b S k (u 0 )) (R k ( b S k (u 0 )u 0 ) V k ( b S k (u)) (R k ( b S k (u)u 0 ). Adding these two inequalities yields V k ( b S k (u 0 )) (uu 0 ) V k ( b S k (u)) (uu 0 ). Using u > u 0 , we get V k ( b S k (u 0 )) V k ( b S k (u)). Noting problem (2.5), S k (u) and b S k (u 0 ) are both inA k by definition and since the subsets in the collectionA k satisfy b S 1 k b S 2 k ::: b S Q k k by the inductive hypothesis, we must have either S k (u) b S k (u 0 ) or S k (u) b S k (u 0 ). Since we haveV k ( b S k (u 0 ))V k ( b S k (u)), the preference weight under the subset b S k (u 0 ) is larger. Thus, the subset b S k (u 0 ) must be larger than b S k (u), establishing the claim. By the claim above, b S k (u) b S k (u 0 ) for all k 2 Children(j), so that S k2 Children(j) b S k (u) S k2 Children(j) b S k (u 0 ). By (2.6),A j is constructed asA j = n S k2 Children(j) b S k (u) :u2R o , so the subsets in the collectionA j can be ordered such that one subset includes the other one, as desired. 2 2 By Claim 1 in Section 2.5, the collectionA j includes an optimal solution to the local problem at node j. In the next lemma, we show that the collectionA j still includes an optimal solution to the local problem at nodej even after increasing the revenues associated with the products by. Lemma A.3.2. For any nodej, consider the collectionA j constructed as in (2:6). This collection includes an optimal solution to the local problem at nodej even when we increase the revenues associated with the products by any 0. Proof. The solution to the local problem at a leaf nodej is always eitherfjg or?. Thus,A j =ffjg;?g always includes an optimal solution to the local problem at a leaf nodej. Consider a non-leaf nodej. For eachk2 Children(j), we useR k (S k ) to denote the expected revenue obtained from a customer at node k given that we offer the assortmentS = (S ` : `2 V) and we increase the revenues of the products by. Since the expected revenue at each node is a convex combination of the product revenues, it follows that R k (S k ) =R k (S k ) +. In this case, when we increase the revenues of the products by, the local problem at nodek is given by max S k N k V k (S k ) R k (S k )t Parent(k) = max S k N k V k (S k ) R k (S k ) (t Parent(k) ) : 84 In the expression above, lettingS = (S ` :`2V) be an optimal solution to problem (2.3) when we increase the revenues of the products by, the scalars (t j :j2V) are given by t j = max n t Parent(j) ; j t Parent(j) + (1 j )R j (S j ) o ; with the convention that t Parent(root) = 0. Comparing the problem above with (2.5) and noting that we use b S k (u) to denote an optimal solution to problem (2.5), it follows that b S k (t Parent(k) ) is an opti- mal solution to the local problem at node k when we increase the revenues of the products by . In this case, by Theorem 2.4.2, S k2 Children(j) b S k (t Parent(k) ) is an optimal solution to the local prob- lem at node j when we increase the revenues of the products by . By (2:6),A j is constructed as A j = n S k2 Children(j) b S k (u) :u2R o , so S k2 Children(j) b S k (t Parent(k) )2A j . Thus, it follows that the collectionA j includes an optimal solution to the local problem at node j even when we increase the revenues of the products by any. 2 2 The next lemma shows that when we increase the revenues of the products by a positive constant, the optimal assortment becomes larger. Lemma A.3.3 (Larger Revenues Lead to Larger Assortments). For all 0,S 0 S . Proof. By Lemma A.3.2, the collectionA root includes an optimal solution to the local problem at root when we increase the revenues of the products by any amount 0. Furthermore, the local problem at root is equivalent to the assortment problem that we want to solve. Thus, noting that S 0 is the optimal assortment andS is the optimal assortment when we increase the revenues of the products by, bothS 0 and S are inA root . On the other hand, by Lemma A.3.1, if S 0 and S are inA root , then we must have S 0 S orS S 0 . Given an assortmentS = (S j : j2 V), we use ` (S) to denote the probability that a customer purchases product` under thed-level nested logit model. By the definitions ofS 0 andS , we have P `2S 0 ` (S 0 )r ` P `2S ` (S )r ` and P `2S ` (S ) (r ` +) P `2S 0 ` (S 0 ) (r ` +). Adding these two inequalities yields P `2S ` (S ) P `2S 0 ` (S 0 ), indicating that when we offer the assortment S , the probability that a customer purchases a product within this assortment is larger when compared with the case when we offer the assortmentS 0 . It is straightforward to check that under thed-level nested logit model, if we offer a larger assortment, then the probability that a customer makes a purchase within this assortment gets larger. At the beginning of the proof, we show that we have eitherS 0 S orS S 0 . 85 Thus, noting that the probability that a customer makes a purchase is larger when we offer the assortment S , it must the case thatS 0 S . 2 2 Proof of Theorem 2.8.1 Proof. Lemmas 4 and 5 in Talluri and van Ryzin (2004) show that the value functions are concave in the remaining capacity and the first differences of the value functions decrease as we approach the end of the selling horizon; that is, J t (x) J t (x 1) and J t (x) J t1 (x) for t = 1; 2;:::;T and x = 1; 2;:::;C. To show thatS t (x 1)S t (x), by definition, we have S t (x) = arg max Sf1;:::;ng X `2S ` (S) (r ` J t1 (x 1) + J t1 (x 1) J t1 (x)) S t (x 1) = arg max Sf1;:::;ng X `2S ` (S) (r ` J t1 (x 1)) : Let = J t1 (x 1) J t1 (x). Note that 0 since J t (x) J t (x 1). Thus, applying Lemma A.3.3, withS 0 =S t (x 1) andS =S t (x), yieldsS t (x 1)S t (x), as desired. To show that S t (x)S t1 (x), by definition, we have S t (x) = arg max Sf1;:::;ng X `2S ` (S) (r ` J t1 (x)) S t1 (x) = arg max Sf1;:::;ng X `2S ` (S) (r ` J t1 (x) + J t1 (x) J t2 (x)) : Let = J t1 (x) J t2 (x), which is non-negative since J t1 (x) J t2 (x). Applying Lemma A.3.3 withS 0 =S t (x) andS =S t1 (x) yieldsS t (x)S t1 (x), as desired. 2 2 A.4 Omitted Results in Section 2.9 A.4.1 Proof of Lemma 2.9.1 Proof. For each nodej2V, letF j () :R n + R n + !R be defined by: for eachp2R n + andq2R n + , F j (p;q) =V j (p) R j (p)u Parent(j) (q) ; 86 with the convention thatu Parent(root) () = 0. Since root = 0, we haveV root (p) = 1 by noting the definition ofV j (p), in which case,F root (p;q) =R root (p) for allp;q2R n + . At any price vector ^ p, we will now show that for every non-leaf nodej and product` that is a descendant ofj, we have @F j @p ` (p;q) (p;q)=(^ p;^ p) = V j (^ p) P k2 Children(j) V k (^ p) @F i @p ` (p;q) (p;q)=(^ p;^ p) ; wherei is the child ofj that is the ancestor of`. To see this, note that if we letG k (p) =V k (p)R k (p), then we have F j (p;q) = 0 @ X k2 Children(j) V k (p) 1 A j P k2 Children(j) G k (p) P k2 Children(j) V k (p) u Parent(j) (q) ! : By definition, among (V k (p) : k2 Children(j)), onlyV i (p) depends onp ` . Similarly, among (R k (p) : k2 Children(j)), onlyR i (p) depends onp ` . Thus, differentiating the expression above with respect to p ` , it follows that @F j @p ` (p;q) = j 0 @ X k2 Children(j) V k (p) 1 A j 1 @V i @p ` (p) P k2 Children(j) G k (p) P k2 Children(j) V k (p) u Parent(j) (q) ! + 0 @ X k2 Children(j) V k (p) 1 A j 2 6 4 @G i @p ` (p) P k2 Children(j) V k (p) @V i @p ` (p) P k2 Children(j) G k (p) P k2 Children(j) V k (p) 2 3 7 5 = 0 @ X k2 Children(j) V k (p) 1 A j 1 @V i @p ` (p) j R j (p) j u Parent(j) (q) + 0 @ X k2 Children(j) V k (p) 1 A j 1 " @G i @p ` (p) P k2 Children(j) V k (p) @V i @p ` (p) P k2 Children(j) G k (p) P k2 Children(j) V k (p) # ; 87 where the second equality uses the fact thatR j (p) = P k2 Children(j) G k (p)= P k2 Children(j) V k (p). Using this identity once more, we can continue the chain of equalities above as @F j @p ` (p;q) = 0 @ X k2 Children(j) V k (p) 1 A j 1 @V i @p ` (p) j R j (p) j u Parent(j) (q) + 0 @ X k2 Children(j) V k (p) 1 A j 1 @G i @p ` (p) @V i @p ` (p)R j (p) = 0 @ X k2 Children(j) V k (p) 1 A j 1 @G i @p ` (p) @V i @p ` (p) j u Parent(j) (q) + (1 j )R j (p) : Therefore, noting that j u Parent(j) (^ p) + (1 j )R j (^ p) =u j (^ p) by definition, the expression above yields @F j @p ` (p;q) (p;q)=(^ p;^ p) = 0 @ X k2 Children(j) V k (^ p) 1 A j 1 ( @G i @p ` (p) p=^ p u j (^ p) @V i @p ` (p) p=^ p ) : By definition, we have F i (p;q) = G i (p) u j (q)V i (p), which implies that @F i @p ` (p;q) = @G i @p ` (p) u j (q) @V i @p ` (p); and thus, @F j @p ` (p;q) (p;q)=(^ p;^ p) = 0 @ X k2 Children(j) V k (^ p) 1 A j 1 @F i @p ` (p;q) (p;q)=(^ p;^ p) = V j (^ p) P k2 Children(j) V k (^ p) @F i @p ` (p;q) (p;q)=(^ p;^ p) ; which is the desired result. To complete the proof, we recursively apply the above result starting at the root node. Recalling thatF root (p;q) =R root (p) for allp;q2R n + , we obtain @R root @p ` (p) p=^ p = @F root @p ` (p;q) (p;q)=(^ p;^ p) = V root (^ p) P k2 Children(root) V k (^ p) @F An(`;1) @p ` (p;q) (p;q)=(^ p;^ p) = 1 P k2 Children(root) V k (^ p) @F An(`;1) @p ` (p;q) (p;q)=(^ p;^ p) ; 88 where the second equality follows by root = 0 so thatV root (^ p) = 1. We continue the chain of inequalities above by replacing the last partial derivative by its equivalent expression to get @R root @p ` (p) p=^ p = 1 P k2 Children(root) V k (^ p) V An(`;1) (^ p) P k2 Children(An(`;1)) V k (^ p) @F An(`;2) @p ` (p;q) (p;q)=(^ p;^ p) = V An(`;1) (^ p) P k2Sibling(An(`;1)) V k (^ p) 1 P k2Sibling(An(`;2))) V k (^ p) @F An(`;2) @p ` (p;q) (p;q)=(^ p;^ p) : Iteratively applying the same argument, we have that @R root @p ` (p) p=^ p = d1 Y h=1 V An(`;h) (^ p) P k2Sibling(An(`;h)) V k (^ p) ! 1 P k2Sibling(`) V k (^ p) @F ` @p ` (p;q) (p;q)=(^ p;^ p) : Finally, note thatF ` (p;q) =V ` (p ` )(p ` u Parent(`) (q)) =e ` ` p ` p ` u Parent(`) (q) , which yields @F ` @p ` (p;q) = ` e ` ` p ` p ` u Parent(`) (q) +e ` ` p ` = ` V ` (p ` ) p ` 1 ` u Parent(`) (q) : Thus, @R root @p ` (p) p=^ p = ` d Y h=1 V An(`;h) (^ p) P k2Sibling(An(`;h)) V k (^ p) ! p ` 1 ` u Parent(`) (^ p) ; and the desired result follows by noting that the product on the right side above gives ` (^ p). 2 2 A.4.2 Proof of Theorem 2.9.3 In the rest of this section, we give a proof for Theorem 2.9.3 by using a series of lemmas. The following lemma gives an upper bound ont s j for each non-leaf nodej in each iterations of the PUPD algorithm. Lemma A.4.1. Let V = Vnf1; 2;:::;n;rootg denote the set of non-leaf nodes, after excludingroot, and let min = min j2 V j . Then, for any non-leaf nodej in levelh and for alls, t s j t s root + 1 ( min ) h Q s ; whereQ s = max t s root ; max j2 V t s j ; max j2 V R s j . 89 Proof. The result is trivially true forroot. Ifj is a non-leaf node in level 1, then by definition, t s j = max t s root ; j t s root + (1 j )R s j = j t s root + (1 j ) max t s root ;R s j t s root + (1 min )Q s ; where the last inequality follows from the definition of min and Q s . Suppose the result is true for all non-leaf nodes in levelh 1. For any non-leaf nodej in levelh, t s j = max n t s Parent(j) ; j t s Parent(j) + (1 j )R s j o = j t s Parent(j) + (1 j ) max n t s Parent(j) ;R s j o j t s root + 1 ( min ) h1 Q s + (1 j )Q s = j t s root + 1 j ( min ) h1 Q s t s root + 1 ( min ) h Q s ; where the fist inequality follows from the inductive hypothesis and the definition ofQ s . This completes the proof. 2 2 The following lemma establishes a relationship between the expected revenues in consecutive iterations of the PUPD algorithm. Its proof follows from an argument similar to the one in the proof of Lemma A.1.1. Lemma A.4.2. Assume that the price vectorsfp s :s = 0; 1; 2;:::g are generated by the PUPD algorithm. For each non-leaf nodej, we have R s+1 j 1 j j + V s j V s+1 j 1 ! R s j t s j + +t s j ; whereV s j =V j (p s ) andV s+1 j =V j p s+1 . Proof. We will prove the result by induction on the depth of nodej, starting with nodes at depthd 1. If nodej is at depthd 1, then it is the parent of a leaf node. Note the inequality in the lemma is trivially true if R s j t s j . In particular, by our construction of the PUPD algorithm, for each `2 Children(j), p s+1 ` t s Parent(`) = t s j , in which case, sinceR s+1 j is a convex combination of (p s+1 ` : `2 N j ), we have 90 R s+1 j t s j and the lemma follows. So, assume thatR s j > t s j . By definition of the PUPD algorithm, for each`2 Children(j),p s+1 ` = 1 ` +t s j = arg max x0 V ` (x) xt s j , so for each`2 Children(j), V s+1 ` R s+1 ` t s j V s ` R s ` t s j : Adding this inequality above over all`2 Children(j), we get 0 @ X `2 Children(j) V s+1 ` 1 A R s+1 j t s j = X `2 Children(j) V s+1 ` R s+1 ` t s j X `2 Children(j) V s ` R s ` t s j = 0 @ X `2 Children(j) V s ` 1 A R s j t s j ; where the two equalities above use the fact thatR s+1 j = P k2 Children(j) V s+1 k R s+1 k = P k2 Children(j) V s+1 k andR s j = P k2 Children(j) V s k R s k = P k2 Children(j) V s k . Noting thatV s+1 j = P k2 Children(j) V s+1 j j and V s j = P k2 Children(j) V s j j , we write the above inequality as V s+1 j 1= j R s+1 j t s j V s j 1= j R s j t s j = V s j 1= j R s j t s j + ; which implies that R s+1 j V s j V s+1 j ! 1= j R s j t s j + +t s j 1 j j + V s j V s+1 j 1 ! R s j t s j + +t s j ; where the last inequality follows from the fact that the functionq7!q 1= j is convex whose derivative at 1 is 1= j , and thus,q 1= j 1 + 1 j (q 1) = 1 j ( j +q 1) for allq2R + . This completes the base case. Now, suppose the result holds for every node at depthh + 1. Consider a nodej at depthh. We want to show that R s+1 j 1 j j + V s j V s+1 j 1 ! R s j t s j + +t s j : The inequality is trivially true ifR s j t s j . In particular, due to the way we compute the scalars (t s j :j2V) in the PUPD algorithm, for each node` that is a descendent of nodej, we havet s Parent(`) t s j . Thus, for each leaf node (or product) ` that is a descendent of node j, our computation of the prices in the PUPD algorithm yieldsp s+1 ` t s Parent(`) t s j . SinceR s+1 j is a convex combination of the pricesp s+1 ` at the leaf 91 nodes`2N j that are descendants of nodej, we getR s+1 j min `2N j p s+1 ` t s j and the inequality above immediately follows. So, in the rest of the proof, we assume thatR s j > t s j . We now claim that for every nodek2 Children(j), V s+1 k R s+1 k t s j V s k R s k t s j : The claim is trivially true ifR s k t s j because by our construction of the PUPD algorithm we haveR s+1 k t s k t s Parent(k) = t s j , which can be shown by using the same argument that we just used above. So, consider the case in whichR s k >t s j . By definition oft s k , we have that t s k = max t s j ; k t s j + (1 k )R s k = k t s j + (1 k )R s k ; which implies that R s k t s k = k R s k t s j and t s k t s j = (1 k ) R s k t s j . Since node k 2 Children(j) is at depthh + 1, it follows from the inductive hypothesis that R s+1 k t s j 1 k k + V s k V s+1 k 1 ! (R s k t s k ) + + t s k t s j = k + V s k V s+1 k 1 ! R s k t s j + (1 k ) R s k t s j = V s k V s+1 k R s k t s j ; which establishes the claim. To finish the proof, since the claim we established holds for every k 2 Children(j), adding this inequality over all`2 Children(j), we have that 0 @ X k2 Children(j) V s+1 k 1 A R s+1 j t s j = X k2 Children(j) V s+1 k R s+1 k t s j X k2 Children(j) V s k R s k t s j = 0 @ X k2 Children(j) V s k 1 A R s j t s j ; or equivalently, V s+1 j 1= j R s+1 j t s j V s j 1= j R s j t s j = V s j 1= j R s j t s j + : 92 Therefore, R s+1 j V s j V s+1 j ! 1= j R s j t s j + +t s j 1 j j + V s j V s+1 j 1 ! R s j t s j + +t s j ; where the final inequality follows because the functionq7! q 1= j is convex whose derivative at 1 is 1= j , and thus,q 1= j 1 + 1 j (q 1) = 1 j ( j +q 1) for allq2R + . This completes the induction and the result follows. 2 2 The following lemma builds on the previous one to show the monotonicity of the expected revenues in consecutive iterations of the PUPD algorithm. Lemma A.4.3. Assume that the price vectorsfp s :s = 0; 1; 2;:::g are generated by the PUPD algorithm. Ifp s+1 ` p s ` for every leaf node`, then for each non-leaf nodej,R s+1 j R s j . Proof. It follows from Lemma A.4.2 that R s+1 j R s j 1 j j + V s j V s+1 j 1 ! R s j t s j + R s j t s j : (A.3) There are two cases to consider: R s j t s j andR s j > t s j . WhenR s j t s j , it follows immediately from the above inequality thatR s+1 j R s j . So, consider the case whereR s j > t s j . In this case, it follows from the definition oft s j in the PUPD algorithm thatR s j >t s j t s Parent(j) , which implies that t s j = max n t s Parent(j) ; j t s Parent(j) + (1 j )R s j o = j t s Parent(j) + (1 j )R s j ; and thus,R s j t s j = j R s j t s Parent(j) . Therefore, it follows from (A.3) that R s+1 j R s j j + V s j V s+1 j 1 ! R s j t s Parent(j) j R s j t s Parent(j) = V s j V s+1 j 1 ! R s j t s Parent(j) : By our hypothesis, we have that p s+1 ` p s ` for every product `. As prices of the products increase, the corresponding preference weights decrease. Thus, we have V s j V s+1 j , so that V s j V s+1 j 1 0. Since R s j >t s Parent(j) , the chain of inequalities above yieldsR s+1 j R s j . 2 2 93 Before we turn to the proof for Theorem 2.9.3, we give the following corollary to the last lemma showing that the scalars (t s j :j2V) are monotonically increasing in consecutive iterations. Corollary A.4.4. Assume that the price vectorsfp s : s = 0; 1; 2;:::g are generated by the PUPD algo- rithm. Ift s j t s1 j for every non-leaf nodej, thent s+1 j t s j for every non-leaf nodej. Proof. By definition, we havep s ` = 1 ` +t s1 Parent(`) andp s+1 ` = 1 ` +t s Parent(`) for every leaf node`. Thus, our hypothesis implies that p s+1 ` p s ` for every leaf node `. Then, it follows from Lemma A.4.3 that R s+1 j R s j for every non-leaf node j. So, t s+1 root = R s+1 root R s root = t s root . In this case, for every node j2 Children(root), we get t s+1 j = max n t s+1 root ; j t s+1 root + (1 j )R s+1 j o max t s root ; j t s root + (1 j )R s j = t s j ; establishing the desired monotonicity result for the first-level nodes. Having establish monotonicity for the first-level nodes, using exactly the same argument as above, we can show thatt s+1 j t s j for all nodesj in level two. By induction, the result then holds for all non-leaf nodes. 2 2 The next lemma shows that the revenues at each node generated under the PUPD algorithm are always bounded. Lemma A.4.5. Let V = Vnf1; 2;:::;n;rootg denote the set of non-leaf nodes, after excludingroot, and letQ s = max t s root ; max j2 V t s j ; max j2 V R s j . Then, the sequencefQ s :s = 0; 1; 2;:::g is bounded. Proof. Sincep 0 = 0,R j (p 0 ) = t 0 j = 0 for allj. Thus,t 1 j t 0 j for every non-leaf nodej. By Corollary A.4.4, the scalars (t s j : j2 V) are always monotonically increasing in consecutive iterations of the PUPD algorithm. In particular,ft s root : s = 0; 1;:::g is an increasing sequence. Since t s root = R root (p s ) = P j2 Children(root) R s j V s j v 0 + P j2 Children(root) V s j ; the sequenceft s root :s = 0; 1;:::g is also bounded because the no-purchase option 0 prevents the revenue from becoming infinity. The boundedness implies that lim s!1 t s root = t root for some t root 2R + . Let min = min `=1;:::;n ` , min = min j2 V j , A = 1 min + t root , and = 1 ( min ) d1 . Note thatA is finite because min > 0, and 0< 1 because 0< min 1. Claim: For alls,Q s+1 A +Q s . 94 To establish the claim, first consider any non-leaf nodej in leveld 1, soj is a parent of some leaf nodes. SinceR s+1 j is the convex combination of the prices at children nodes ofj in iterations + 1, R s+1 j max `2 Children(j) p s+1 ` = max `2 Children(j) 1 ` +t s j 1 min +t s j 1 min +t s root + 1 ( min ) d1 Q s A +Q s ; where the equality follows from the definition of the PUPD algorithm, and the next-to-last inequality follows from Lemma A.4.1. The final inequality follows from the fact thatt s root t root for alls. Since the above upper bound holds for every node in leveld 1, and the revenue at each node is a convex combination of the revenue at children nodes, it follows thatR s+1 j A +Q s for all non-leaf nodes, includingroot. We will now show thatt s+1 j A +Q s for all non-leaf nodej. This is true at root becauset s+1 root = R s+1 root . Consider any nodej in level 1. Then, by definition, t s+1 j = max n t s+1 root ; j t s+1 root + (1 j )R s+1 j o A +Q s ; where the inequality follows from the fact that botht s+1 root andR s+1 j are bounded above byA+Q s . Suppose the result is true for all non-leaf nodes in levelh 1. For any non-leaf nodej in levelh, t s+1 j = max n t s+1 Parent(j) ; j t s+1 Parent(j) + (1 j )R s+1 j o A +Q s ; where the inequality follows from the inductive hypothesis and the fact thatR s+1 j A+Q s for all non-leaf nodesj. This completes the induction. Since R s+1 j and t s+1 j are bounded above by A + Q s for all non-leaf nodes j, it follows that Q s+1 A +Q s , establishing the claim. Repeated application of the claim implies that for alls, Q s A 1 + + 2 + + s1 + s Q 0 A 1 ; where the last inequality follows becauseQ 0 = 0, sincep 0 = 0. This is the desired result. 2 2 Proof of Theorem 2.9.3 95 Proof. Sincep 0 = 0, R j (p 0 ) = t 0 j = 0 for all j. Thus, t 1 j t 0 j for every non-leaf node j. It then follows from Corollary A.4.4 that the scalars (t s j : j 2 V) are always monotonically increasing in con- secutive iterations of the PUPD algorithm. By Lemma A.4.5, the scalars (t s j : j2 V) are also bounded. Thus, the sequencef(t s j :j2V) :s = 0; 1; 2;:::g must converge. Therefore, for every non-leaf node j, lim s!1 t s j = t j ; for some t j . Since the prices in the PUPD algorithm are computed asp s ` = 1 ` +t s1 Parent(`) , the prices must also converge. It remains to show that the limit point of the prices is a stationary point of the expected revenue functionR root (). The way we compute the scalars (t s j : j2 V) in the PUPD algorithm implies thatt s j t s Parent(j) so thatt s j t s root for every non-leaf nodej and in every iterations. Thus, the prices in the PUPD algorithm satisfyp s ` = 1 ` +t s1 Parent(`) 1 max +t s1 root for each leaf node`, where max = max `=1;:::;n ` . Since the expected revenue at each node inVnfroot; 0g can be written as a convex combination of the prices, we get R s j 1 max +t s1 root for everyj2Vnfroot; 0g. By definition, for eachj2 Children(root), t s j t s root = maxf0; (1 j )(R s j t s root )g (1 j ) 1 max +t s1 root t s root + ; where the last inequality follows from the fact that R s j 1 max + t s1 root . Since t s root is convergent, lim s!1 t s1 root t s root = 0. Therefore, there existsT 0 such that for alls T 0 , t s1 root t s root 1 2max , which implies thatt s j t s root 1 j 2max > 0. Sincej2 Children(root) is arbitrary, this means that for all sT 0 and for every childj ofroot,t s j >t s root , and thus, t s j = max t s root ; j t s root + (1 j )R s j = j t s root + (1 j )R s j : Similarly, consider an arbitrary nodej in the second level. Then, t s j t s Parent(j) = maxf0; (1 j )(R s j t s Parent(j) )g (1 j ) 1 max +t s1 Parent(j) t s Parent(j) + ; where the inequality follows, once again, from the fact that R s j 1 max +t s1 Parent(j) . Since t s Parent(j) is convergent, lim s!1 t s1 Parent(j) t s Parent(j) = 0. Then, using exactly the same argument as the previous paragraph, there existsR 0 such that for allsR 0 and for all nodesj in the second level,t s j > t s Parent(j) , and thus, t s j = j t s Parent(j) + (1 j )R s j : 96 By iteratively applying exactly the same argument, we conclude that there exists Q 0 such that for all sQ 0 and for every non-leaf nodej, t s j = j t s Parent(j) + (1 j )R s j : However, noting the definition ofu j (p) at the beginning of Section 2.9.1, this means that for allsQ 0 and for every non-leaf nodej,t s j = u j (p s ). Therefore, for allsQ 0 , the sequence of prices at the leaf nodes is defined by: for` = 1;:::;n, p s+1 ` = 1 ` +u Parent(`) (p s ): So, since the prices converge, the limit point p = ( p 1 ;:::; p n ) satisfies the following system of equations: for` = 1;:::;n, p ` = 1 ` +u Parent(`) ( p); and it follows from Corollary 2.9.2 that p is a stationary point of the expected revenue function. 2 2 A.4.3 Proof of Theorem 2.9.4 Previously, we have shown that for each leaf node` @R root @p ` (p) = root ` (p) ` p ` 1 ` u Parent(`) (p) : Thus, it suffices to derive the partial derivative of the cost function. The following lemma gives the partial derivative of @ root i (p) @p ` . Lemma A.4.6 (Partial Derivative of Probabilities). For each leaf node`, @ root i (p) @p ` = root i (p) 8 < : s i;` X s=1 @V An(`;s) (p) @p ` 1 V An(`;s) (p) 1 P k2Sibling(An(`;s)) V k (p) ! @V An(`; s i;` +1) (p) @p ` P k2Sibling(An(`; s i;` +1)) V k (p) 9 = ; ; where s i;` = maxfh :An(i;h) =An(`;h)g. Also, we defineV An(`;d+1) () = 0. 97 Proof. Fix`. Consider an arbitraryi. By definition, @ root i (p) @p ` = d X s=1 0 @ Y h:1hd;h6=s V An(i;h) (p) P k2Sibling(An(i;h)) V k (p) 1 A @ @p ` V An(i;s) (p) P k2Sibling(An(i;s)) V k (p) = d X s=1 ( 0 @ Y h:1hd;h6=s V An(i;h) (p) P k2Sibling(An(i;h)) V k (p) 1 A @V An(i;s) (p) @p ` P k2Sibling(An(i;s)) V k (p)V An(i;s) (p) P k2Sibling(An(i;s)) @V k (p) @p ` P k2Sibling(An(i;s)) V k (p) 2 ) : To compute the expression on the righthand side, there are 3 cases to consider. Case 1: An(`;s) =An(i;s). In this case, @V An(i;s) (p) @p ` P k2Sibling(An(i;s)) V k (p)V An(i;s) (p) P k2Sibling(An(i;s)) @V k (p) @p ` P k2Sibling(An(i;h)) V k (p) 2 = @V An(`;s) (p) @p ` P k2Sibling(An(i;s)) V k (p)V An(i;s) (p) P k2Sibling(An(i;s)) V k (p) 2 = @V An(`;s) (p) @p ` P k2Sibling(An(i;s)) V k (p) 1 V An(i;s) (p) P k2Sibling(An(i;s)) V k (p) ! = @V An(`;s) (p) @p ` P k2Sibling(An(i;s)) V k (p) @V An(`;s) (p) @p ` P k2Sibling(An(i;s)) V k (p) V An(i;s) (p) P k2Sibling(An(i;s)) V k (p) : 98 Case 2: An(`;s)6=An(i;s), butAn(`;s)2Sibling (An(i;s)). In this case, @V An(i;s) (p) @p ` P k2Sibling(An(i;s)) V k (p)V An(i;s) (p) P k2Sibling(An(i;s)) @V k (p) @p ` P k2Sibling(An(i;s)) V k (p) 2 = V An(i;s) (p) @V An(`;s) (p) @p ` P k2Sibling(An(i;s)) V k (p) 2 = @V An(`;s) (p) @p ` P k2Sibling(An(i;s)) V k (p) V An(i;s) (p) P k2Sibling(An(i;s)) V k (p) : Case 3: An(`;s)6=An(i;s), andAn(`;s) = 2Sibling (An(i;s)). In this case, @V An(i;s) (p) @p ` P k2Sibling(An(i;s)) V k (p)V An(i;s) (p) P k2Sibling(An(i;s)) @V k (p) @p ` P k2Sibling(An(i;s)) V k (p) 2 = 0: Let s i;` = maxfh : An(i;h) = An(`;h)g. Note that s i;` is always well-defined and 0 s i;` d becauseroot is a common ancestor for bothi and`. Note that ifh s i;` , thenAn(i;h) =An(`;h). Also, we 99 have thatAn(`; s i;` +1)2Sibling(An(i; s i;` +1)). Also, for allh> s i;` +1,An(`;h) = 2Sibling (An(i;h)). This implies that @ root i (p) @p ` = s i;` X s=1 Y h:1hd;h6=s V An(i;h) (p) P k2Sibling(An(i;h)) V k (p) @V An(`;s) (p) @p ` P k2Sibling(An(i;s)) V k (p) 1 V An(i;s) (p) P k2Sibling(An(i;s)) V k (p) ! Y h:1hd;h6= s i;` +1 V An(i;h) (p) P k2Sibling(An(i;h)) V k (p) @V An(`; s i;` +1) (p) @p ` P k2Sibling(An(i; s i;` +1)) V k (p) V An(i; s i;` +1) (p) P k2Sibling(An(i; s i;` +1)) V k (p) = s i;` X s=1 Y h:1hd;h6=s V An(i;h) (p) P k2Sibling(An(i;h)) V k (p) @V An(`;s) (p) @p ` P k2Sibling(An(i;s)) V k (p) s i;` X s=1 Y h:1hd;h6=s V An(i;h) (p) P k2Sibling(An(i;h)) V k (p) @V An(`;s) (p) @p ` P k2Sibling(An(i;s)) V k (p) V An(i;s) (p) P k2Sibling(An(i;s)) V k (p) Y h:1hd;h6= s i;` +1 V An(i;h) (p) P k2Sibling(An(i;h)) V k (p) @V An(`; s i;` +1) (p) @p ` P k2Sibling(An(i; s i;` +1)) V k (p) V An(i; s i;` +1) (p) P k2Sibling(An(i; s i;` +1)) V k (p) = s i;` X s=1 @V An(`;s) (p) @p ` V An(`;s) (p) i (p) s i;` X s=1 @V An(`;s) (p) @p ` P k2Sibling(An(i;s)) V k (p) i (p) @V An(`; s i;` +1) (p) @p ` P k2Sibling(An(i; s i;` +1)) V k (p) i (p) = root i (p) 8 < : s i;` X s=1 @V An(`;s) (p) @p ` 1 V An(`;s) (p) 1 P k2Sibling(An(`;s)) V k (p) ! @V An(`; s i;` +1) (p) @p ` P k2Sibling(An(`; s i;` +1)) V k (p) 9 = ; : 2 2 The following lemma gives an explicit formula for the partial derivative @V An(`;s) (p) @p ` . Lemma A.4.7 (Derivative of Preference Weight). For any 1sd and product`, @V An(`;s) @p ` = ` V ` d1 Y h=s An(`;h) V 1 1 An(`;h) An(`;h) ! = ` V ` d1 Y h=s An(`;h) V An(`;h) P k2 Children(An(`;h)) V k ! = ` ( d Y h=s+1 An(`;h) V An(`;h) P k2Sibling(An(`;h)) V k !) An(`;s) V An(`;s) = ` ( d Y h=s+1 An(`;h) ) An(`;s) ` (p) An(`;s) V An(`;s) = ` ( d Y h=s An(`;h) ) An(`;s) ` (p)V An(`;s) : 100 Proof. We prove the above proposition by induction. Whens =d, LHS = @V An(`;d) @p ` = @V ` @p ` = @ exp( ` ` p ` ) @p ` = ` V ` =RHS; as Q d1 j=d An(`;j) V An(`;j) 1 1 An(`;j) = 1 . Thus, the result is true in the base case. Now suppose the proposition is true fors =k + 1, i.e., @V An(`;k+1) @p ` = ` V ` d1 Y j=k+1 An(`;j) V 1 1 An(`;j) An(`;j) ! : Consider the case whens =k. By definition,V An(`;k) = P s2 Children(An(`;k)) V s An(`;k) . Thus, @V An(`;k) @p ` = An(`;k) 0 @ X s2 Children(An(`;k)) V s 1 A An(`;k) 1 0 @ X s2 Children(An(`;k)) @V s @p ` 1 A = An(`;k) 0 @ X s2 Children(An(`;k)) V s 1 A An(`;k) 1 @V An(`;k+1) @p ` = An(`;k) V An(`;k)1 An(`;k) An(`;k) 0 @ ` V ` d1 Y j=k+1 An(`;j) V 1 1 An(`;j) An(`;j) ! 1 A = An(`;k) V 1 1 An(`;k) An(`;k) 0 @ ` V ` d1 Y j=k+1 An(`;j) V 1 1 An(`;j) An(`;j) ! 1 A = ` V ` d1 Y j=k An(`;j) V 1 1 An(`;j) An(`;j) ! ; where the second equality follows because @Vr @p ` = 0 ifr is not an ancestor node of leaf node`; the third equality follows fromV An(`;k) = P s2 Children(An(`;k)) V s An(`;k) and the inductive hypothesis. Since the result also holds fors =k, it follows from induction that the result is for alls. 2 2 Proof of Theorem 2.9.4: Proof. The above lemma implies the following expression for the partial derivative: 101 Case 1: i6=`. In this case, we have that s i;` d 1, and @ root i (p) @p ` = ` root i (p) 8 < : 0 @ d Y h= s i;` +1 An(`;h) 1 A An(`; s i;` ) ` (p) + s i;` X s=1 d Y h=s An(`;h) ! An(`;s) ` (p) An(`;s1) ` (p) 9 = ; = ` root i (p) 8 < : 0 @ d Y h= s i;` +1 An(`;h) 1 A An(`; s i;` ) ` (p) + s i;` X s=1 d Y h=s An(`;h) ! An(`;s1) ` (p) An(`;s) ` (p) 9 = ; = ` root i (p) 8 < : s i;` X s=1 An(`;s) ` (p) d Y h=s+1 An(`;h) ! 1 An(`;s) + d Y h=1 An(`;h) ! root ` (p) 9 = ; = ` root i (p) s i;` X s=0 An(`;s) ` (p) d Y h=s+1 An(`;h) ! 1 An(`;s) = ` root ` (p) root i (p) s i;` X s=0 Q d h=s+1 An(`;h) root An(`;s) (p) 1 An(`;s) = ` root ` (p) root i (p)v i;` (p): Case 2: i =`. In this case, we have that s i;` =d, and @ root i (p) @p ` = ` root i (p) d X s=1 d Y h=s An(`;h) ! An(`;s) ` (p) An(`;s1) ` (p) = ` root i (p) d X s=1 d Y h=s An(`;h) ! An(`;s1) ` (p) An(`;s) ` (p) = ` root i (p) ( An(`;d) An(`;d) ` + d1 X s=1 An(`;s) ` (p) d Y h=s+1 An(`;h) ! 1 An(`;s) + d Y h=1 An(`;h) ! root ` (p) ) = ` root ` (p) root i (p) ( 1 root ` (p) + d1 X s=0 Q d h=s+1 An(`;h) root An(`;s) (p) 1 An(`;s) ) = ` root ` (p) root i (p)v i;` (p): So, we have that @ @p ` (p) = root ` (p) ` p ` 1 ` u Parent(`) (p) n X i=1 C 0 i ( root i (p)) @ i (p) @p ` = root ` (p) ` p ` 1 ` u Parent(`) (p) + n X i=1 C 0 i ( root i (p)) root i (p)v i;` (p) ! ; which is the desired result. 2 2 102 A.5 Additional Numerical Results for Price Optimization In this section, we provide additional numerical results for price optimization problems. In our first set of results, we focus on the test problems with three levels in the tree and test the performance of PUPD and GA with different initial starting points. Following these numerical results, we focus on test problems with two levels in the tree. When there are two levels in the tree, Gallego and Wang (2013) give a set of conditions that need to be satisfied by the optimal price vector. Therefore, for a tree with two levels, we can numerically look for a price vector that satisfies the optimality conditions in Gallego and Wang (2013). Problem Instances with Three Levels in the Tree: Our results in Section 2.10 in the paper show that PUPD performs significantly better than GA, but gradient ascent methods can provide more satisfactory performance when the initial vector of prices is close to a stationary point of the expected revenue function. To check the performance of GA with such informed initial prices, for each problem instance, we compute a stationary point of the expected revenue function by using GA. Then, we perturb these prices by 2.5% to obtain prices that are close to a stationary point of the expected revenue function and we use these prices as a vector of informed initial prices. In Table A.1, we show the performance of GA and PUPD when we start these algorithms by using the informed initial prices. The format of this table is identical to that of Table 2.5. The results in Table A.1 indicate that PUPD continues to perform significantly better than GA, both in terms of the number of iterations and the running time to reach the stopping criterion. The ratios between the iterations and the running times never fall below one, indicating that PUPD provides improvements over GA for all of the problem instances. We note that other alternative implementations of the gradient ascent method are not likely to be more effective than PUPD. In particular, noting Lemma 2.9.1, the computational work per iteration for PUPD is no larger than the computational work required to compute the gradient of the expected revenue function at a particular price vector. Thus, given that a gradient ascent method needs to compute the gradient of the expected revenue function in each iteration, PUPD and GA will have comparable computational work per iteration even if we can compute the optimal step size for GA instantaneously. Since PUPD reaches the stopping criterion in substantially fewer iterations than GA, it is likely to maintain its performance advantage over GA for any alternative implementations of the gradient ascent method. Problem Instances with Two Levels in the Tree: Our experimental setup for the test problems with two levels is similar to the one for the test problems with three levels. Usingm h to denote the number of children 103 Ratio Between Ratio Between Prb. Class Avg. No. Itns No. Itns. Run. Times m 0 m 1 m 2 PUPD GA Avg. Min Max Avg. Min Max 2 2 2 83 1,797 15 2 195 12 2 153 2 2 4 72 3,059 33 5 265 28 3 295 2 2 6 57 3,882 50 9 285 41 6 317 2 4 2 82 4,702 52 4 346 40 4 274 2 4 4 66 4,628 63 11 265 53 9 234 2 4 6 58 5,878 93 19 383 78 13 423 2 6 2 83 5,445 60 5 248 49 4 257 2 6 4 72 6,925 95 18 361 80 12 414 2 6 6 62 8,429 138 29 517 112 25 383 4 2 2 107 5,722 46 3 302 39 3 267 4 2 4 81 6,126 69 11 276 59 8 269 4 2 6 70 8,132 110 19 362 93 17 281 4 4 2 89 8,580 94 12 334 81 11 300 4 4 4 67 9,855 145 36 362 120 22 298 4 4 6 55 10,407 189 41 393 145 26 427 4 6 2 80 10,878 141 18 407 120 15 345 4 6 4 61 12,940 222 68 490 192 47 543 4 6 6 51 14,542 298 90 607 255 82 642 6 2 2 106 7,651 72 7 320 61 5 220 6 2 4 79 9,371 118 16 515 99 12 364 6 2 6 64 9,314 146 27 493 120 21 364 6 4 2 82 12,590 156 24 345 135 19 389 6 4 4 61 13,263 226 58 477 190 49 492 6 4 6 51 14,717 298 128 699 262 106 634 6 6 2 70 14,098 212 72 459 186 59 444 6 6 4 53 16,136 321 126 709 278 98 628 6 6 6 45 17,124 411 148 762 362 113 837 Table A.1: Performance comparison between PUPD and GA with informed initial prices. of each node at depthh, we vary (m 0 ;m 1 )2f2; 4; 8gf2; 4; 8g, yielding nine problem classes. In each problem class, we randomly generate 200 individual problem instances by using the same sampling strategy that we use for the test problems with three levels. For the two-level nested logit model, Gallego and Wang (2013) give a set of conditions that need to be satisfied by the optimal price vector. These conditions come in the form of an implicit set of equations. We use golden section search to numerically find a solution to this set of conditions. We use OC to refer to the solution strategy that finds a solution to the optimality conditions of Gallego and Wang (2013). We stop the golden section search for OC when we reach a price vectorp s at which the gradient of the expected revenue function satisfieskrR root (p s )k 2 10 6 . We use the same stopping criterion for PUPD. We summarize our results in Table A.2. The first two columns in this table show the parameters (m 0 ;m 1 ) of each problem class. The third, fourth and fifth columns give various statistics regarding the ratio between the running times of OC and PUPD. In particular, these three columns give the average, min- imum and maximum of the ratios of the running times of OC and PUPD, where the average, minimum and maximum are computed over the 200 problem instances in a particular problem class. The results in Table 104 Ratio Between Prb. Class Run. Times m 0 m 1 Avg. Min Max 2 2 9.6 0.8 42.8 2 4 9.2 1.0 55 2 8 11.4 1.2 72.8 4 2 10.7 1.2 49.6 4 4 14.1 1.1 65.3 4 8 19.7 2.5 93.5 8 2 15.1 2.9 68.6 8 4 23.7 4.2 196.1 8 8 34.8 5.1 163.5 Table A.2: Performance comparison between PUPD and OC. A.2 indicate that average running times for PUPD are substantially smaller than those of OC. There are problem instances where the ratio of the running times is below one, indicating that OC runs faster than PUPD for these problem instances, but such problem instances correspond to the cases where the number of products is small. In larger problem instances, PUPD can be more than 160 times faster than OC. Although we do not report this statistic in Table A.2, considering the total of 1; 800 problem instances we work with, in more than 99.4% of them, the running time of PUPD is faster than the running time of OC. The ratios of the running times have an increasing trend as the problem size measured by the number of productsm 0 m 1 increases, which indicates that the running time advantage of PUPD becomes more pronounced as the num- ber of products increases. PUPD and OC can converge to different stationary points of the expected revenue function and this expectation was confirmed in our numerical experiments. Nevertheless, over all of our problem instances, the expected revenues at the stationary points obtained by PUPD and OC are essentially equal. Overall, our experiments demonstrate that PUPD performs remarkably well, when compared to both GA and OC. A.6 Proof of Theorem 2.11.1 It suffices to consider the case when we offer the full assortment of productsS =f1; 2;:::;ng. Without loss of generality, we can ignore the no-purchase option and treat one of the products as the no-purchase option. For each non-leaf node j, define the scalar j as j = Parent(j) j with the convention that root = 1, and j = Parent(j) for each leaf nodej. Let a functionG() :R n + !R + be defined by: for each y = (y 1 ;y 2 ;:::;y n )2R n + , G(y) = X j2 Children(root) Y j (y); 105 where for each non-leaf node j, Y j (y) = P k2 Children(j) Y k (y) 1= j j , and if j is a leaf node, then Y j (y) = y j . Note that for each non-leaf node j, Y j (y) is a function of y ` for all leaf nodes ` that are descendants ofj. We also have thatG(y) = Y root (y) because root = 1. We will show thatG() satisfies the following properties: (a) G(y) 0 for ally2R n + . (b) G() is homogenous of degree 1; that is,G(y) =G(y) for all 0. (c) The functionG() is positive whenever its arguments are positive. (d) For each distinctfi 1 ;:::;i k gf1; 2;:::;ng, @ k G @y i 1 @y i 2 @y i k (y) is non-negative if k is odd, and non-positive ifk is even. Then, it follows from Theorem 1 in McFadden (1978) that for each product`, if the selection probability of this product is given by y ` G(y) @G @y ` (y) y = (e 1;:::;e n ) ; then we obtain a choice model that is consistent with utility maximization, where the random utility of each product` is given byUtility ` = ` +" ` , and (" 1 ;:::;" n ) has a multivariate extreme value distribution. To complete the proof, in the rest of this section, we will show that the selection probability of product ` is indeed equal to the above expression for appropriately chosen ( 1 ;:::; n ); in particular, we will show that d Y h=1 V An(`;h) S An(`;h) P k2Sibling(An(`;h)) V k (S k ) = y ` G(y) @G @y ` (y) y = (v 1 1 ;:::;v n n ) : Recall that j = Parent(j) j for each non-leaf nodej with the convention that root = 1 and j = Parent(j) for each leaf nodej. We need two lemmas before completing the proof of Theorem 2.11.1. The following lemma gives an expression for the derivative ofY j (y) with respect toy ` . Lemma A.6.1. Assume thatj is a non-leaf node and` is a leaf node that is a descendant of nodej. Then, we have @Y j @y ` (y) =Y j (y) ( j 1)= j 2 4 Y v2path(j;`) Y v (y) (v1)=v 3 5 y (1 ` )= ` ` ; wherepath(j;`) is the set of nodes on the path from nodej to`, excluding the end nodesj and`. 106 Proof. We show the result by using induction on the depth of nodej. If nodej is at depthd 1, then we haveY j (y) = P k2 Children(j) y 1= j k j , and differentiating with respect toy ` , we get @Y j @y ` (y) = j 0 @ X k2 Children(j) y 1= j k 1 A j 1 1 j y (1 j )= j ` =Y j (y) ( j 1)= j y (1 ` )= ` ` ; where the second equality holds because ` = Parent(`) = j for the leaf node `. This completes the base case. Now, assume that the result holds at any node at depth h + 1 and focus on node j that is at depthh. Consider the path in the tree from nodej to leaf node` and denote the nodes on this path byj = j h ;j h+1 ;:::;j d1 ;j d =`, wherej g corresponds to the node at depthg. For eachk2 Children(j), observe thatY k (y) depends ony ` only if node` is included in the subtree rooted at nodek. Thus, among (Y k (y) :k2 Children(j)), onlyY j h+1 (y) depends ony ` . In this case, recallingY j (y) = P k2 Children(j) Y k (y) 1= j j , we get @Y j @y ` (y) = j 0 @ X k2 Children(j) Y k (y) 1= j 1 A j 1 1 j Y j h+1 (y) (1 j )= j @Y j h+1 @y ` (y) = Y j (y) ( j 1)= j Y j h+1 (y) (1 j )= j Y j h+1 (y) ( j h+1 1)= j h+1 2 4 Y v2path(j h+1 ;`) Y v (y) (v1)=v 3 5 y (1 ` )= ` ` = Y j (y) ( j 1)= j Y j h+1 (y) ( j h+1 1)= j h+1 2 4 Y v2path(j h+1 ;`) Y v (y) (v1)=v 3 5 y (1 ` )= ` ` = Y j (y) ( j 1)= j 2 4 Y v2path(j;`) Y v (y) (v1)=v 3 5 y (1 ` )= ` ` ; where the second equality is by the induction assumption and the third equality follows by noting that 1 j j + j h+1 1 j h+1 = 1 j h+1 j h+1 j 1 = 1 j h+1 j h+1 j h 1 = 1 j h+1 j h+1 1 . 2 2 The following lemma shows a relationship between the preference weights (V j (N j ) :j2V) and (Y j (y) :j2V), when (Y j (y) :j2V) are evaluated aty = (v 1 1 ;:::;v n n ). Lemma A.6.2. Letting ^ y = (v 1 1 ;:::;v n n ), for each non-leaf nodej, we have V j (N j ) =Y j (^ y) j = j : 107 Proof. We show the result by using induction on the depth of nodej. If nodej is at depthd 1, then we get V j (N j ) = 0 @ X k2 Children(j) v k 1 A j = 0 @ X k2 Children(j) v k = j k 1 A j = 0 @ X k2 Children(j) ^ y 1= j k 1 A j =Y j (^ y) j = j ; where the second equality follows from the fact that k = Parent(k) = j whenk is a leaf node, and the last equality follows from the definition ofY j (^ y). Suppose the result holds for nodes at depthh+1 and consider nodej at depthh. We have V j (N j ) = 0 @ X k2 Children(j) V k (N k ) 1 A j = 0 @ X k2 Children(j) Y k (^ y) k = k 1 A j = 0 @ X k2 Children(j) Y k (^ y) 1= j 1 A j =Y j (^ y) j = j ; where the second equality is by the induction assumption and the third equality follows from the fact that k = k = Parent(k) = k = j . 2 2 Finally, here is the rest of the proof of Theorem 2.11.1. Proof. Proof of Theorem 2.11.1: Properties (a), (b) and (c) in the proof of Theorem follow immediately. We will first focus on property (d). Let V = Vnf1; 2:::;n;rootg denote the set of non-leaf nodes in the tree, after excluding the root node. For part (d), it suffices to show that fork = 1;:::;n, @ k Y root @y k @y k1 @y 2 @y 1 (y) = 2 4 k Y j=1 y (1 j )= j j 3 5 2 4 X s2I k c s (k) Y v2 V Y ds(k;v) v (y) 3 5 ; whereI k is some index set and the exponentd s (k;v) satisfiesd s (k;v) 0 for alls2I k andv2 V. Most importantly, the coefficientsc s (k) have the following property: ifk is odd, thenc s (k) 0 for alls2 I k , and ifk is even, thenc s (k) 0 for alls2I k . SinceY v (y) 0 for ally, the properties of the coefficients give the desired result. 108 We will prove the above equality by induction onk. Consider the base case ofk = 1. Since root = 1, it follows from Lemma A.6.1 that @Y root @y 1 (y) = 2 4 Y v2path(root;1) Y v (y) (v1)=v 3 5 y (1 1 )= 1 1 =y (1 1 )= 1 1 2 4 Y v2 V Y v (y) d 1 (1;v) 3 5 ; where we let d 1 (1;v) = 8 > < > : v 1 v if v2path(root; 1) 0 otherwise: Note that the exponent d 1 (1;v) is always non-positive because v 1. Therefore, the expression for @Y root @y 1 (y) has the desired form withI 1 =f1g andc 1 (1) = 1. This establishes the base case. Suppose the claim is true for somek. It follows from the inductive hypothesis and the product rule of differentiation that @ k+1 Y root @y k+1 @y k @y 2 @y 1 (y) = 2 4 k Y j=1 y 1 j j j 3 5 2 4 X s2I k c s (k) @ @y k+1 Y v2 V Y v (y) ds(k;v) 3 5 = 2 4 k Y j=1 y 1 j j j 3 5 2 4 X s2I k c s (k) 8 < : X u2 V 0 @ Y v2 V:v6=u Y v (y) ds(k;v) 1 A @Y ds(k;u) u @y k+1 (y) 9 = ; 3 5 = 2 4 k Y j=1 y 1 j j j 3 5 2 4 X s2I k c s (k) 8 < : X u2 V 0 @ Y v2 V:v6=u Y v (y) ds(k;v) 1 A d s (k;u) Y u (y) ds(k;u)1 @Y u @y k+1 (y) 9 = ; 3 5 = 2 4 k Y j=1 y 1 j j j 3 5 2 4 X s2I k c s (k) 8 < : X u2 V 0 @ Y v2 V:v6=u Y v (y) ds(k;v) 1 A d s (k;u)Y u (y) ds(k;u)1 Y u (y) (u1)=u 2 4 Y w2path(u;k+1) Y w (y) (w1)=w 3 5 y (1 k+1 )= k+1 k+1 9 = ; 3 5 = 2 4 k+1 Y j=1 y 1 j j j 3 5 2 4 X (s;u)2I k V c s (k)d s (k;u) 0 @ Y v2 V:v6=u Y v (y) ds(k;v) 1 A 2 4 Y w2path(u;k+1) Y w (y) (w1)=w 3 5 Y u (y) ds(k;u)1+(u1)=u 3 5 ; 109 where the fourth equality follows by replacing@Y u (y)=@y k+1 by its equivalent form given by Lemma A.6.1. The last summation above is indexed byI k V. We letI k+1 =I k V. Focusing on each term (s;u)2I k V in this summation individually and rearranging the terms, we have 0 @ Y v2 V:v6=u Y v (y) ds(k;v) 1 A 2 4 Y w2path(u;k+1) Y w (y) (w1)=w 3 5 Y u (y) ds(k;u)1+(u1)=u = 0 @ Y v2 V:v6=u;v= 2path(u;k+1) Y v (y) ds(k;v) 1 A 2 4 Y v2 V:v6=u;v2path(u;k+1) Y v (y) ds(k;v)+ v1 v 3 5 Y u (y) ds(k;u)1+ u1 u = Y v2 V Y v (y) d (s;u) (k+1;v) ; where we let d (s;u) (k + 1;v) = 8 > > > > < > > > > : d s (k;v) if v6=u andv = 2path(k + 1;u) d s (k;v) + v 1 v if v6=u andv2path(k + 1;u) d s (k;v) 1 + v 1 v if v =u: Therefore, if we definec (s;u) (k + 1) asc (s;u) (k + 1) =c s (k)d s (k;u), then we have that @ k+1 Y root @y k+1 @y k @y 2 @y 1 (y) = 2 4 k Y j=1 y 1 j 1 j 3 5 2 4 X (s;u)2I k+1 c (s;u) (k + 1) Y v2 V Y v (y) d (s;u) (k+1;v) 3 5 ; which has exactly the desired form. Ifk + 1 is odd, thenk is even and thus, c s (k)d s (k;u) 0 because c s (k) 0 andd s (k;u) 0 by the inductive hypothesis. Similarly, ifk + 1 is even, thenk is odd, and thus c s (k)d s (k;u) 0 becausec s (k) 0 andd s (k;u) by the inductive hypothesis. Thus, the coefficients have the correct sign. Moreover, by the definition ofd (s;u) (k + 1;v), all of the exponents are non-positive because v 1 for allv2V andd s (k;v) 0 for allv2 V by the inductive hypothesis. This completes the induction and property (d) follows. To complete the proof of Theorem 2.11.1, we show that the product selection probabilities have the desired form. Let ^ y = (v 1 1 ;:::;v n n ). We haveG(y) = P j2 Children(root) Y k (y) 1= root root = Y root (y) 110 because root = 1. Thus, we can use Lemma A.6.1 to compute the derivative ofG(y) with respect toy ` . Evaluating this derivative aty = ^ y, we get @G @y ` (y) y=^ y = @Y root @y ` (y) y=^ y = 2 4 Y j2path(root;`) Y j (^ y) ( j 1)= j 3 5 ^ y (1 ` )= ` ` = 2 4 Y j2path(root;`) V j (N j ) ( j 1)= j 3 5 V ` (N ` ) 1 ` = 2 4 Y j2path(root;`) V j (N j ) P k2 Children(j) V k (N k ) 3 5 V ` (N ` ) 1 ` ; where the third equality follows from Lemma A.6.2 and the fact thatV ` (N ` ) =v ` = ^ y 1= ` ` for the leaf node `, and the last equality uses the fact thatV j (N j ) = P k2 Children(j) V k (N k ) j . Observe that by Lemma A.6.2, we have G(^ y) = X k2 Children(root) Y k (^ y) = X k2 Children(root) V k (N k ) k = k = X k2 Children(root) V k (N k ); where we use the fact that k = k = Parent(k) = k = root = k for allk2 Children(root). Thus, using the above equality, we have that ^ y ` G(^ y) @G @y ` (y) y=^ y = v ` ` P k2 Children(root) V k (N k ) " d1 Y h=1 V An(`;h) (N An(`;h) ) P k2 Children(An(`;h)) V k (N k ) # V ` (N ` ) 1 ` = Q d h=1 V An(`;h) (N An(`;h) ) Q d1 h=0 P k2 Children(An(`;h)) V k (N k ) = Q d h=1 V An(`;h) (N An(`;h) ) Q d1 h=0 P k2Sibling(An(`;h+1)) V k (N k ) = d Y h=1 V An(`;h) N An(`;h) P k2Sibling(An(`;h)) V k (N k ) ; where the first equality uses the fact that the nodes inpath(root;`) are precisely the ancestors of the node` at levels 1; 2;:::;d 1, the second equality follows by the fact thatv ` ` =V ` (N ` ) and node` is the ancestor of node` at deptd, and the third equality is by noting that the children of a node are the same as the siblings of one of its children. 2 2 A.7 Parameter Estimation Flexibility of the d-level nested logit model can be enhanced by increasing the number of levels in the tree. In this section, our goal is to demonstrate that the parameters of thed-level nested logit model can be 111 estimated relatively easily, and using multiple levels in the tree can help better predict the customer choice behavior. The rest of this section is organized as follows. We describe the setup for the parameter estimation problem and show our main theoretical result in Section A.7.1. In Section A.7.2, we give a numerical example to demonstrate that it is possible to significantly improve the prediction accuracy of the d-level nested logit model by increasing the number of levels in the tree. In Section A.7.3, we describe the setup of our numerical experiments and summarize the practical performance ofd-level nested logit models for an ensemble of data sets. Last but not least, proof of the main theorem is given in Section A.7.4. A.7.1 Properties of the Log-Likelihood Function To estimate the parameters of thed-level nested logit model, note that this model has a parameter j for each non-leaf nodej and a parameterv ` for each leaf node`. We capture the parameters of the non-leaf nodes at levelh as h = ( j : depth(j) = h), where depth(j) is the depth of nodej. Therefore, ( 1 ;:::; d1 ) represents the parameters of the non-leaf nodes. To capture the parameters of the leaf nodes, we assume thatv ` = e ` for some ` 2R, in which case, we represent the parameters of the leaf nodes as = ( ` : depth(`) =d). Thus, the parameters of thed-level nested logit model are given by ( 1 ;:::; d1 ;) and we write the choice probability in (2.8) as ` (Sj 1 ;:::; d1 ;) to explicitly show its dependence on the parameters. Our goal is to use data to estimate the parameters ( 1 ;:::; d1 ;). Let us describe the setup for the parameter estimation problem. In the data, we have the offered assort- ments and the choices ofT customers. For each customert = 1;:::;T , we useS t to denote the assortment offered to this customer andc t to denote the option chosen by this customer. Assuming that the choices of the customers are independent of each other and the customers make their choices according to thed-level nested logit model with parameters ( 1 ;:::; d1 ;), the log-likelihood function is given by L ( 1 ;:::; d1 ;) = T X t=1 log c t(S t j 1 ;:::; d1 ;): (A.4) The above log-likelihood function is not jointly concave in ( 1 ;:::; d1 ;), but the following theorem shows that it is concave in each one of thed components 1 ;:::; d1 , and. The proof of the theorem is given in Section A.7.4. 112 Theorem A.7.1. The log-likelihood function in (A:4) satisfies the following properties: (a) The functionL( 1 ;:::; d1 ;) is concave in. (b) Forh = 1; 2;:::;d 1, the functionL ( 1 ;:::; h1 ; h ; h+1 ;:::; d1 ;) is concave in h . Thus, a natural approach for maximizing the log-likelihood function is coordinate ascent, where we fix all-but-one of the d components 1 ;::: d1 ;, and maximize the log-likelihood function over one component at a time. We can sequentially maximize over one component at a time and cycle over each component multiple times until no further improvement is possible. For the two-level nested logit model, Train (2003) suggests estimating the parameters of the leaf nodes first, and then, the parameters of the non-leaf nodes. For trees with more levels, he suggests starting from the leaf nodes and moving up the tree to sequentially estimate the parameters at each level. His approach makes a single pass over the tree and exploits the fact that for a customer at a particular node, the conditional choice probabilities are similar to the choice probabilities under the multinomial logit model. In contrast, we maximize the log-likelihood function over one component at a time and cycle over each component multiple times. Thus, we make multiple passes over the tree. To our knowledge, for thed-level nested logit model, the concavity of the log-likelihood function as a function of the parameters at each level does not appear in the existing literature and Theorem A.7.1 closes this gap. Furthermore, this result shows that maximizing the log-likelihood function only over the parameters at one level is a tractable problem. A.7.2 Demonstration of Improvement in Prediction Accuracy in a Single Data Set In this section, we provide a numerical example to demonstrate that by increasing the number of levels in the tree, it is possible to significantly improve the prediction accuracy of thed-level nested logit model. In our example, we considern = 8 products and a mixture of logits model withK = 20 distinct cus- tomer segments. There are two sets of parameters associated with the mixture of logits model: the propor- tions of each segment = ( 1 ;:::; K ) with P K k=1 k = 1 and the preference weightsv k = (v k 1 ;:::;v k n )2 R n + that customers in segmentk associate with the different products fork = 1;:::;K. The parameters of the mixture of logits model are presented in Table A.3. We consider a collection of 36 unique offered assortments, each of which consists of 6 or 7 products. We generate a training data set that consists of assortments offered and the choices of 10,000 customers, 113 k k Preference Weight of Each Product v k 1 v k 2 v k 3 v k 4 v k 5 v k 6 v k 7 v k 8 1 1.30E-1 1.00E0 5.44E-1 7.81E-1 2.43E-1 7.67E-1 1.51E-1 1.80E-1 4.23E0 2 1.64E-7 6.64E-3 2.41E0 1.63E-1 6.48E0 4.21E0 1.51E0 3.05E0 9.20E0 3 1.44E-2 1.49E0 3.46E-1 3.54E-1 2.52E-1 1.10E-1 1.15E0 8.29E-2 7.29E0 4 1.59E-5 6.78E-1 1.42E0 4.50E0 2.00E-1 7.42E0 1.06E0 3.87E-1 3.70E0 5 1.18E-4 1.71E0 1.57E0 1.72E0 6.33E-1 4.60E0 1.62E0 9.11E-2 2.44E-1 6 9.20E-3 2.54E-1 9.83E-2 2.01E0 3.05E0 2.39E0 8.38E-1 8.53E-2 4.87E-1 7 1.09E-6 1.68E0 1.18E0 6.70E0 7.91E0 1.19E0 4.59E-1 1.64E-1 8.47E0 8 1.01E-5 1.46E0 7.45E-1 1.13E0 4.82E-1 1.59E-2 6.62E0 2.36E0 7.03E0 9 3.33E-6 6.41E-1 1.46E0 7.74E-1 5.47E0 1.97E0 1.29E0 2.43E-1 4.54E-1 10 1.82E-15 1.43E0 7.15E0 1.72E-1 1.59E-2 7.14E-1 5.00E0 4.52E0 8.94E0 11 1.46E-3 4.18E-1 8.45E-1 1.19E0 4.49E0 1.67E0 3.47E0 1.99E-1 4.72E0 12 3.07E-1 6.34E0 4.90E-1 2.61E-1 1.56E0 3.36E-1 1.90E-1 7.61E-2 4.57E-1 13 2.22E-10 7.35E0 1.31E0 5.84E0 3.50E0 4.85E0 7.63E0 2.83E-1 3.05E-2 14 3.90E-1 3.73E0 2.64E0 1.31E0 3.20E0 2.20E-1 2.19E0 1.81E-1 2.07E0 15 4.74E-13 9.52E-1 5.41E0 1.27E0 1.85E0 6.47E0 4.97E-1 6.20E-1 4.51E-1 16 5.60E-7 9.94E-1 1.12E0 9.09E-1 3.88E0 3.35E-1 4.79E0 3.91E0 7.82E0 17 1.01E-1 8.07E0 5.43E0 2.72E0 7.77E-2 1.36E-2 3.15E0 9.42E0 4.35E-1 18 4.81E-2 1.01E0 1.70E0 2.21E0 9.22E-1 1.22E0 6.31E0 3.60E-1 6.01E0 19 5.51E-11 1.54E0 9.15E-1 2.24E-1 1.62E0 6.81E-1 6.60E0 4.99E0 9.13E-2 20 4.16E-8 8.30E0 2.44E0 2.13E0 9.68E-1 6.68E0 1.41E0 2.95E-1 7.08E-1 Table A.3: Parameters of the mixture of logits model. assuming that the choices of the customers are governed by the mixture of logits model. We estimate the parameters for thed-level nested logit models and select the best fittedd-level models ford = 1; 2; 3. Under the mixture of logits model, given an offered assortmentS, a customer chooses product`2 S with probability ` (SjReal) = P K k=1 k v k ` P j2S v k j . For every offered assortmentS, we denote the top-selling product under the mixture of logits model by TopSell(SjReal) = arg max `2S ` (SjReal): Moreover, for d = 1; 2; 3, we use d ` (SjEst) to denote the probability that a customer chooses product `2S under the best fittedd-level nested logit model. We define the predicted top-selling product under the d-level nested logit model as TopSell d (SjEst) = arg max `2S d ` (SjEst): We are interested in comparing the predictive accuracy among the one, two and three-level nested logit models. In particular, we want to evaluate the ability of the one, two and three-level nested logit models to extrapolate and predict the top-selling products against the ground truth – the mixture of logits model – over any arbitrary offered assortment. Note that we choose to compare our fitted nested logit models directly 114 against the ground truth instead of using validation and testing data sets because in our generated training data, the ground truth, i.e., the parameters of the underlying mixture of logits model, is already available. We compare the predictive accuracy of the fitted models are follows. Note that the full assortment of 8 products gives rise to 2 8 1 = 255 non-empty unique subsets of products. For each subset of prod- ucts S, we calculate TopSell(SjReal), the top-selling product under the mixture of logits model and TopSell d (SjEst), the predicted top-selling product under each of the fitted d-level nested logit models, ford = 1; 2; 3. Since there are 255 unique subsets of products, we have 255 choices of top-selling products for each of the mixture of logits model and the fitted one, two and three-level nested logit models. The results for eachd-level nested logit model can be summarized in the form of ann confusion matrix. The confusion matrices for the one, two and three-level nested logit models are shown in Figure A.1. For d = 1; 2; 3, in each confusion matrix for the d-level nested logit model, the rows correspond to the actual top-selling products under the mixture of logits model; the columns correspond to the predicted top-selling products under the fittedd-level nested logit model. Fori = 1;:::; 8 andj = 1;:::; 8, we denote each entry in the confusion matrix by ConfMatr d i;j = X Sf1;:::;8g 1 l fTopSell(SjReal)=i;TopSell d (SjEst)=jg ; where 1 l fg is the indicator function. Then, ConfMatr d i;j corresponds to the number of times that the d- level nested logit model predicts the actual top-selling producti asj. Thus, the diagonal elements in the confusion matrix show the number of correct predictions for each top-selling product. Any non-zero off- diagonal element ConfMatr d i;j ;i6= j, is highlighted in bold, and it represents the number of times that the d-level nested logit model falsely predicts the actual top-selling producti asj. If all the off-diagonal entries in the confusion matrix are zero, then thed-level nested logit model predicts with 100% accuracy. We summarize the confusion matrix for each d-level nested logit model. The second last column in each table that contains the confusion matrix ford-level model reports the total number of predictions for each top-selling product under the mixture of logit models. The last column reports the total number of false predictions under the fitted d-level nested logit model for each top-selling product. Finally, the last row reports the total number of predictions for each top-selling product under the fittedd-level nested logit model. 115 Confusion Matrix ford = 1 No. of Pred. No. of Actual Top- Predicted Top-Selling Product under Mix. of False Selling Product 1 2 3 4 5 6 7 8 Logits Model Pred. 1 128 0 0 0 0 0 0 0 128 0 2 0 16 0 0 0 0 0 1 17 1 3 0 0 4 0 0 1 0 0 5 1 4 0 0 0 32 0 0 0 32 64 32 5 0 0 0 0 1 0 1 0 2 1 6 0 0 0 0 0 7 0 0 7 0 7 0 0 0 0 0 0 1 0 1 0 8 0 0 0 0 0 0 0 31 31 0 No. of Pred. 128 16 4 32 1 8 2 64 255 35 under Est. Model Confusion Matrix ford = 2 No. of Pred. No. of Actual Top- Predicted Top-Selling Product under Mix. of False Selling Product 1 2 3 4 5 6 7 8 Logits Model Pred. 1 128 0 0 0 0 0 0 0 128 0 2 0 16 0 0 0 0 0 1 17 1 3 0 0 4 0 0 1 0 0 5 1 4 0 0 0 49 0 0 0 15 64 15 5 0 0 0 0 2 0 0 0 2 0 6 0 0 0 0 0 7 0 0 7 0 7 0 0 0 0 0 0 1 0 1 0 8 0 0 0 0 0 0 0 31 31 0 No. of Pred. 128 16 4 49 2 8 1 47 255 17 under Est. Model Confusion Matrix ford = 3 No. of Pred. No. of Actual Top- Predicted Top-Selling Product under Mix. of False Selling Product 1 2 3 4 5 6 7 8 Logits Model Pred. 1 128 0 0 0 0 0 0 0 128 0 2 0 16 0 0 0 0 0 1 17 1 3 0 0 4 0 0 1 0 0 5 1 4 0 0 0 64 0 0 0 0 64 0 5 0 0 0 0 2 0 0 0 2 0 6 0 0 0 0 0 7 0 0 7 0 7 0 0 0 0 0 0 1 0 1 0 8 0 0 0 0 0 0 0 31 31 0 No. of Pred. 128 16 4 64 2 8 1 32 255 2 under Est. Model Figure A.1: Confusion matrices for the best fitted one, two and three-level nested logit models. The results in Figure A.1 show that the one-level nested logit model predicts 32 correctly out of 64 times, i.e., with 50% = 100% 32 64 accuracy when the actual top-selling product is product 4. Similarly, the one-level nested logit model predicts with 50% = 100% 1 2 accuracy when the actual top-selling product is product 5. The two-level nested logit model performs much better than the one-level, improving the predictive accuracy to 77% = 100% 49 64 and 100% = 100% 64 64 when the actual selling products are 4 and 5, respectively. The three-level nested logit model further improves the predictive accuracy for top- selling product 4 to 100%. Overall, the one-level nested logit model falsely predicts 35 out of 255 times; the 116 two-level nested logit model falsely predicts only 17 times; the three-level nested logit model significantly reduces the false predictions to just 2 times. A.7.3 Numerical Experiments for an Ensemble of Data Sets In the this section, we provide a numerical study to demonstrate that using larger number of levels in the tree can help better predict the customer choice behavior. We generate data consisting of the assortment offers and choices of multiple customers, under the assumption that the choices of the customers are governed by a mixture of logits model. Using the generated data, we fit the parameters of one, two and three-level nested logit models and compare the performance of the fitted models in predicting the customer choice behavior. Numerical Setup: Throughout our numerical study, the number of products is n = 8. Without loss of generality, we can treat one of the products as the no-purchase option. We generate 50 data sets as the training data sets in our numerical study. To generate each data set, we use a mixture of logits model withK = 20 customer segments. As explained before, the parameters of the mixture of logits model are the proportions of each segment = ( 1 ;:::; K )2 [0; 1] K with P K k=1 k = 1, and for each segment k = 1;:::;K, the preference weightsv k = (v k 1 ;:::;v k n )2R n + that customers in segmentk associate with the different products. Under the mixture of logits model, if we offer the assortment S, then a customer chooses product`2S with probability ` (SjReal) = P K k=1 k v k ` P j2S v k j . For each data set, we randomly choose the parameters of the mixture of logits model as follows. We generate = ( 1 ;:::; K ) from aK-dimensional Dirichlet distribution with parameters (;:::;), so that the expectation of k is equal to 1 K for all k = 1;:::;K. The coefficient of variation of k is equal to q K1 K+1 , and we calibrate the parameter so that the coefficient of variation is equal to 2:5. To introduce heterogeneity among different customer segments, we use the following strategy to generate preference weights (v 1 ;:::;v K ). For each product ` = 1;:::;n, we sample ` from the uniform distribution over [0; 1]. If ` is close to one, then product` is a specialty product and customers in different segments associate significantly different preference weights with this product. On the other hand, if ` is close to zero, then product` is a staple product and customers in different segments evaluate product` similarly, associating similar preference weights with this product. To generate preference weights with such characteristics, for each product ` and segment k, we sample # k ` from the uniform distribution over [0; 10] and let v k ` = (1 ` )# k ` with probability 1/2 andv k ` = (1 + ` )# k ` with probability 1/2. In this case, if ` is close to 117 zero, thenv k ` takes a value close to# k ` . If, however, ` is close to one, thenv k ` either takes a value close to zero or a value close to 2# k ` . Therefore, the parameter ` indeed captures how much the preference weight of product` differs among the different customer segments. Once we choose the parameters of the mixture of logits model, we generate choices of 10; 000 customers, assuming that the choices of the customers are governed by this choice model. In particular, for each one of the 10; 000 customers, we randomly sample an assortment of products offered to this customer. We consider 36 unique offered assortments, each of which consists either 6 or 7 out of the 8 products. Given that the customer is offered a particular assortment, we sample the choice of the customer within this assortment by using the choice probabilitiesf ` (SjReal) :` = 1;:::;n; Sf1;:::;ngg. The assortments offered to 10; 000 customers and the products chosen by these customers out of the assortments form one data set. Parameter Estimation and Model Selection: For each data set, we estimate the parameters of the one, two and three-level nested logit models by minimizing the Akaike Information Criterion (AIC) value. Introduced by Akaike (1974), the AIC value is defined as AIC d = 2(n d p 2L d ); wheren d p denotes the number of parameters in the model, andL d denotes the value of the log-likelihood function in (A.4). The model with the minimum AIC value minimizes the expected Kullback-Leibler dis- tance between the model and the underlying mixture of logits model. By our construction as elaborated in the following two paragraphs, in each data set, n d p is a constant for fixedd. Therefore, minimizing the AIC value is equivalent to maximizing the log-likelihood function, for the models with the same number of levels. To maximize the log-likelihood function, we sequentially maximize over the parameters at each level of the tree and stop when no further improvement in the value of the log-likelihood is possible. We use the convex optimization software CVX to solve the optimization problems; see Grant and Boyd (2008, 2013). Note that there can be many different tree structures to consider for the two and three-level nested logit models, corresponding to different organizations of the products in the tree. We use the following simple heuristic to obtain a good tree structure and report the results for the best tree structure that we find. We only consider the tree structures in which the non-leaf nodes in the same level have equal number of children nodes. In particular, for the two-level nested logit model, we only consider the tree structures in which 118 the root node has two children and each node in level one has four children. When we consider such tree structures, there are 8 4 =2 = 35 distinct ways of placing the products in the tree. We estimate the parameters of the two-level nested logit model for each one of these 35 tree structures. We choose the model that yields the smallest AIC value as the best two-level nested logit model. For the three-level nested logit model, we only consider the tree structures where each non-leaf node has two children. When we consider such tree structures, there are ( 8 4 ) 2 ( 4 2 ) 2 2 = 315 distinct ways of placing the products in the tree. Since the number of possible three-level tree structures is large, we use a simple heuristic to focus on a small number of tree structures. In particular, by using the approach in the previous paragraph, we find the best two-level tree structure. Once we have the best two-level tree structure, we only consider the three-level tree structures that are compatible with the best two-level tree structure. In other words, we leta andb be the two nodes in the first level of the best two-level tree structure. We use N a and N b to denote the sets of products that are included in the subtrees rooted at nodesa andb. In the three-level tree structures we consider, the sets of products that are included in the subtrees rooted at the two nodes in the first level of the tree are identical to N a and N b . Considering only such three-level tree structures reduces the number of possibilities to ( 4 2 ) 2 ( 4 2 ) 2 = 9. Once we restrict our attention to 9 possible three-level tree structures, similar to the approach in the previous paragraph, we estimate the parameters of the three- level nested logit model for each one of these 9 tree structures, and choose the model that has the smallest AIC value as the best three-level nested logit model. In each data set, we also compare the AIC values of the one, two and three-level nested logit models. For the one, two and three-level models we consider,n 1 p = 8,n 2 p = 10 andn 3 p = 14. We find that in general, a nested logit model with a larger number of levels has lower AIC value than a nested logit model with smaller number of levels. This means that the issue of overfitting is under control, if we use AIC to estimate the parameters. Summary of Results: For each data set, we create three confusion matrices for the chosen one, two and three-level nested logit models, similar to those in Figure A.1. We use the following metric to summarize the confusion matrices for all 50 data sets. For each data setk = 1;:::; 50, we define the number of falsely predicted top-selling products by thed-level nested logit model as TotalErr d (k) = X i6=j ConfMatr d i;j (k): 119 Comparison of Total Errors Avg. S.E. 100 TotalErr 1 (k)TotalErr 2 (k) TotalErr 1 (k) 8.7 4.0 100 TotalErr 1 (k)TotalErr 3 (k) TotalErr 1 (k) 14.9 4.3 Table A.4: Comparison of the total errors for one, two and three-level nested logit models. We use TotalErr d (k) as our prediction accuracy metric for d-level nested logit model in data set k. Thus, the smaller the value ofTotalErr d (k), the better predictive accuracy of thed-level nested logit model. If TotalErr d (k) = 0, then the d-level nested logit model predicts the top-selling products perfectly, with 100% accuracy. We are interested in the percentage reduction in the number of falsely predicted products as we increase the number of levels in the tree. We report our findings in Table A.4. The first row in the table focuses on the percentage deviation between the prediction accuracy metrics for the one-level and two-level nested logit models. In this row, the second column shows the percentage deviation between the prediction accuracy metrics for the one-level and two-level nested logit models, averaged over all 50 data sets. In other words, using TotalErr d (k) to denote the prediction accuracy metric of the fitted d-level nested logit model for problem instance k, the second column gives the average of the figuresf100 TotalErr 1 (k)TotalErr 2 (k) TotalErr 1 (k) : k = 1;:::; 50g. The third column in the first row gives the standard error of the percentage deviation between the prediction accuracy metrics for the one-level and two-level nested logit models, over all 50 data sets. Then, the entries in the first row indicate how much we can reduce the number of falsely predicted top-selling products of the one-level nested logit model by using a two-level one. The format of the second row is similar to that of the first row, but the second row focuses on the percentage deviation between the prediction accuracy metrics for the one-level and three-level nested logit models. The results in Table A.4 indicate that on average, having a greater number of levels in the tree can significantly improve the prediction accuracy of the one-level nested logit model. Note that in all of the 50 data sets, we do not observe any instance in which the one-level nested logit model predicts with 100% accuracy. On average, the two-level nested logit model reduces the number of falsely predicted products of the one-level model by 8.7%. By using a three-level nested logit model, we can improve the predictive accuracy even more significantly. On average, the number of falsely predicted products is reduced by 14.9% when we use a three-level nested logit model. We note that there are data sets in which the number of falsely predicted products of the lower-level nested logit model is smaller than that of the higher-level nested logit 120 model. Due to finite sample estimation errors as we only estimated a limited number of two-level and three-level models, it is not possible to guarantee that a higher-level nested logit model always improves the prediction accuracy of a lower-level model, but our numerical study indicates that using a three-level nested logit model can provide improvements in a great majority of the cases and the improvements can be significant. A.7.4 Proof of Theorem A.7.1 In this section, we present the proof of Theorem A.7.1. Proof. Using (2.8) and denotingV j (S t ) byV t j for simplicity, we write the log-likelihood function as L ( 1 ;:::; d1 ;) = T X t=1 log d Y h=1 V t An(c t ;h) P j2Sibling(An(c t ;h)) V t j ! = T X t=1 log 0 B @e c t d Y h=2 1 P j2Sibling(An(c t ;h)) V t j 1 An(c t ;h1) 1 P j2Sibling(An(c t ;1)) V t j 1 C A = T X t=1 2 4 c t d X h=2 1 An(c t ;h1) log 0 @ X j2 Children(An(c t ;h1)) V t j 1 A log 0 @ X j2 Children(root) V t j 1 A 3 5 = T X t=1 2 4 c t d X h=2 1 An(c t ;h1) log V t An(c t ;h1) 1= An(c t ;h1) log 0 @ X j2 Children(root) V t j 1 A 3 5 ; where, lettingLeaf andNonLeaf respectively be the leaf and non-leaf nodes in the tree, we use the fact that V t j = 8 > > > > < > > > > : 0 if j2Leaf and j = 2S t e j if j2Leaf and j2S t P i2 Children(j) V t i j if j2NonLeaf: 121 In the expression above, the preference weight V t j for each node j depends on the parameters ( 1 ;:::; d1 ;), but we suppress this dependence for clarity. For each customer t and non-leaf node j, lety t j be defined recursively as follows: y t j = 8 > > > > > < > > > > > : log X `2 Children(j)\S t e ` ! if depth(j) =d 1 log X i2 Children(j) e i y t i ! if depth(j)<d 1; where we set log 0 =1. Note that if none of the descendants of nodej at the leaves belong toS t , then y t j =1 by our construction. Claim 1: For each customert and non-leaf nodej,V t j =e j y t j . We will prove the claim by induction on the depth of node j. If depth(j) = d 1, then we have thatV t j = P `2 Children(j)\S te ` j = e j y t j ; where the last equality follows from the definition ofy t j . Suppose the result is true for all nodes at depthh + 1. Consider an arbitrary nodej at depthh<d 1. By induction hypothesis,V t j = P `2 Children(j) V t ` j = P `2 Children(j) e ` y t ` j = e j y t j ; where the last equality follows from the definition ofy t j . This establishes the claim. Using the claim above, the log-likelihood function is given by L ( 1 ;:::; d1 ;) = T X t=1 2 4 c t d X h=2 1 An(c t ;h1) y t An(c t ;h1) log 0 @ X j2 Children(root) e j y t j 1 A 3 5 = T X t=1 " c t d1 X h=1 1 An(c t ;h) y t An(c t ;h) y t root # = T X t=1 " c t d1 X h=0 1 An(c t ;h) y t An(c t ;h) # = n X `=1 M ` ` d1 X h=0 X j : depth(j)=h (1 j ) X t :S t \Descendant(j)6=? y t j ; (A.5) whereM ` is the total sales for product` and Descendant(j) denotes all nodes that are descendants ofj, excludingj itself. Note thaty t j is a function of 1 ;:::; d1 ;, so we will now writey t j ( 1 ;:::; d1 ;) to emphasize this dependence. Since j 2 (0; 1] for allj2 NonLeaf and noting the expression above, the following claim establishes the first part of the theorem. 122 Claim 2: For fixed 1 ;:::; d1 , for allt andj,y t j ( 1 ;:::; d1 ;) is convex in. We will prove the claim by induction on the depth of nodej. The result is true whendepth(j) =d 1 by the convexity of the log-sum-exp function; see Boyd and Vandenberghe (2004). Suppose the result is true for all nodes of depth h + 1. Consider node j at level h. It is a well-known results that if the functionsg i () are convex, then log( P i e g i () ) is also convex. Sincey t j = log( P i2 Children(j) e i y t k ), we have that y t j ( 1 ;:::; d1 ;) is also convex in, completing the induction. This establishes the claim. The following claim is useful to show the second part of the theorem. Claim 3: For allt,h andj,y t j ( 1 ;:::; h1 ; h ; h+1 ;:::; d1 ;) is convex in h . Fix an arbitrary customert andh2f1;:::;d 1g. We show by induction on the node depth thaty t j is convex in h for all nodesj. IfDescendant(j)\S t =?, theny t j is constant at1, which is convex. So, it suffices to consider only the nodes that are ancestors of products in S t . First, consider nodes j in levelsd 1; d 2;:::;h. By definition,y t j depends only on ( i :i2Descendant(j)). In this case, the result automatically holds becausey t j is independent of h . Now, consider node j such that depth(j) = h 1. By definition, it holds that y t j = log P k2 Children(j) e k y t k . For eachk2 Children(j),depth(k) =h andy t k is independent of h , which implies thaty t j depends on h through ( ` :`2 Children(j)) via the log-sum-exp function. Therefore,y t j is convex in h . Next, consider nodej at levelh 2. Sincey t j = log P k2 Children(j) e k y t k and depth(k) = h 1 for allk2 Children(j),y t j only depends on h throughy t k , which we have just shown to be convex in h . Therefore,y t j is also convex in h . Exactly the same argument applies to all nodes in levelsh3;h4;:::; 1 and this establishes the desired claim. We will now use Claim 3 to establish the second part of the theorem. Note that h appears in the log-likelihood function in (A.5) in two terms X j:depth(j)=h (1 j ) " X t :S t \Descendant(j)6=? y t j # and h1 X g=1 X j:depth(j)=g (1 j ) " X t :S t \Descendant(j)6=? y t j # : The first term is an affine function in h because for nodesj in levelh, P t :S t \Descendant(j)6=? y t j is inde- pendent of h , so the first term is concave in h . For the second term, h only appears throughy k j , which 123 we know from Claim 3 to be convex in h , and thus, the second term is concave in h . Therefore, the log-likelihood function is concave in h , as desired. 2 2 It is not difficult to generate counterexamples to show that the log-likelihood function is not jointly concave in ( 1 ;:::; d1 ;). Note that fixing, the log-likelihood function in (A.5) contains a product of (1 j ) andy t j , which depends on ( ` :`2Descendant(j)). Product of two variables is not a jointly concave or convex function of the two variables. Thus, it is not difficult to see that the log-likelihood function is not jointly concave in ( 1 ;:::; d1 ;). 124 Appendix B Technical Appendix to Chapter 3 Proof of Lemma 3.3.1: It suffices to show that the total costs incurred under (1) and (2) are the same. First note that the total fixed costs, the total expected holding costs and the total penalty costs throughout the entire planning horizon are the same in both formulations. Consider the total expected purchasing cost of rides in the discount account throughout the entire planning horizon in theV t formulation. Given the initial balancex t d and after-refill balancey t d in the discount account fort = 1;:::;T , E[ T X t=1 c d (y t d x t d )] = c d E[(y 1 d 0) + (y 2 d (y 1 d D 1 ) + ) + (y 3 d (y 2 d D 2 ) + ) +::: + (y T d (y T1 d D T1 ) + )] = c d E[(y 1 (y 1 d D 1 ) + ) + (y 2 d (y 2 d D 2 ) + ) +::: + (y T1 d (y T1 d D T1 ) + ) +y T d ] = c d E[( T X t=1 y t d (y t d D t ) + ) + (y T d D T ) + )] = T X t=1 E[c d min(y t d ;D t )] +E[c d (y T d D T ) + ]: This is exactly the total expected cost of consumed rides in the discount account in the J t formulation. Using the same approach, consider the total expected purchasing cost of rides in the regular account in the V t formulation. Given the initial balancex t r and after-refill balancey t r in the regular account fort = 1;:::;T , E[ T X t=1 c r (y t r x t r )c r x T +1 r ] = c r E[(y 1 r 0) + (y 2 r (y 1 r (D 1 y 1 d ) + ) + ) +::: + (y T r (y T1 r (D T1 y T1 d ) + ) + )x T +1 r ] = c r E[( T X t=1 y t r (y t r (y t d D t ) + ) + ) + (y T r (y T d D T ) + ) + (y T r (y T d D T ) + ) + )] = T X t=1 E[c r min(y t r ; (D t y t d ) + )]: 125 This is exactly the total expected cost of consumed rides in the regular account in theJ t formulation. 2 Proof of Lemma 3.3.2. Note that min(a;b) =a (ab) + . SettingK = 0 and using (2), g t (y d ;y r ) = E c d min(D t ;y d ) +c r min((D t y d ) + ;y r ) +h d (y d D t ) + +h r (y r (D t y d ) + ) + +b(D t y d y r ) + = E c d (D t (D t y d ) + ) +c r ((D t y d ) + ((D t y d ) + y r ) + ) +h d (y d D t ) + +h r (y r (D t y d ) + ) + +b(D t y d y r ) + = E c d D t + (c r c d )(D t y d ) + +h d (y d D t ) + + (bc r )(D t y d y r ) + +h r (y r ((D t y d ) + ((D t y d ) + y r ) + )) = E c d D t + (c r c d h r )(D t y d ) + + (bc r +h r )(D t y d y r ) + +h d (y d D t ) + +h r y r ; where the third equality follows from (y r (D t y d ) + ) + =y r min(y r ; (D t y d ) + ) and ((D t y d ) + y r ) + = (D t y d y r ) + . The joint-convexity ofg t () follows from the fact that every additive term in the expression is jointly convex in (y d ;y r ), given thatbc r c d andh d h r . 2 Before we prove Theorem 3.3.3, we need the following lemma. Lemma B.0.2. Iff(x;y) is jointly convex in (x;y), wherex2 C x ;y2 C y ,C x andC y are compact sets, then g(x;y) = min x 0 x;y 0 y f(x 0 ;y 0 ) is non-decreasing and jointly convex in (x;y). Proof. It is intuitive thatg(x;;y) is non-decreasing because whenx ory increases, the range of minimiza- tion shrinks, thus the minimum value has to increase or stay the same. For any (x 1 ;y 1 ) and (x 2 ;y 2 ), let ( x 1 ; y 1 ) = arg min xx 1 ;yy 1 f(x;y) and ( x 2 ; y 2 ) = arg min xx 2 ;yy 2 f(x;y): 126 Sincef(x;y) is jointly convex in (x;y), we have f( x 1 + (1) x 2 ; y 1 + (1) y 2 )f( x 1 ; y 1 ) + (1)f( x 2 ; y 2 )g(x 1 ;y 1 ) + (1)g(x 2 ;y 2 ): for allx 1 ;x 2 2C x andy 1 ;y 2 2C y . Moreover, note that g(x 1 +(1)x 2 ;y 1 +(1)y 2 )g( x 1 +(1) x 2 ; y 1 +(1) y 2 )f( x 1 +(1) x 2 ; y 1 +(1) y 2 ); where the first inequality follows from the fact thatg(x;y) is non-decreasing inx andy and x i x i ; y i y i fori = 1; 2. Combining the above inequalities, we have g(x 1 + (1)x 2 ;y 1 + (1)y 2 )g(x 1 ;y 1 ) + (1)g(x 2 ;y 2 ); which is the desired result. 2 Proof of Theorem 3.3.3. We prove the theorem by backward induction and show that (i) and (ii) hold for eacht = 1; 2;:::;T . Fort =T , by definition,J T +1 ((xD T ) + ; (y (D T x) + ) + ) =E[c d (xD T ) + ]. Thus, G T (x;y) =g t (x;y) +E[c d (xD T ) + ]: By Lemma 3.3.2,g t (x;y) is jointly convex in (x;y). ClearlyE[c d (xD T ) + ] is jointly convex in (x;y), thus G T (x;y) is jointly convex in (x;y). Then, the joint-convexity and non-decreasing properties ofJ T (x;y) in (x;y) follow directly from Lemma B.0.2. Note thatG T (x;y) is defined as G T (x;y) =E c d D T +h r y + (c r c d h r )(D T x) + + (bc r +h r )(D T xy) + + (c d +h d )(xD T ) + : Thus, it is easy to show that the subdifferentials ofG T (x;y) at any non-break point (x;y) are @G T (x;y) @x =(c r c d h r )P(D T >x) (bc r +h r )P(D T >x +y) + (c d +h d )P(D T x); 127 and @G T (x;y) @y =(bc r +h r )P(D T >x +y) +h r : Therefore, @G T (x;y) @y @G T (x;y) @x = h r + (c r c d h r )P(D T >x) (c d +h d )P(D T x) = (c r c d )P(D T >x) + (h r c d h d )P(D T x) c r c d ; where the inequality holds becauseh r c d h d < 0. Now suppose that (i) and (ii) hold fort =k;k +1;:::;T where 2kT1. Then,G k (x;y) is jointly convex in (x;y),J k (x;y) is jointly convex and non-decreasing in (x;y), and @G k (x;y) @y @G k (x;y) @x (c r c d ). Consider period k 1. We first show that G k1 (x;y) and J k1 (x;y) are jointly convex in (x;y). Let D k1 =D for any fixedD 0, consider the following 4 cases for any2 [0; 1] andy 1 ;y 2 0. Case 1: x 1 ;x 2 D. Then,x 1 + (1)x 2 D. Then, G k1 x 1 + (1)x 2 ;y 1 + (1)y 2 jD k1 =D = g k1 x 1 + (1)x 2 ;y 1 + (1)y 2 jD k1 =D +J k (x 1 + (1)x 2 D) + ; (y 1 + (1)y 2 (Dx 1 (1)x 2 ) + ) + = g k1 x 1 + (1)x 2 ;y 1 + (1)y 2 jD k1 =D +J k 0; ((x 1 +y 1 D) + (1)(x 2 +y 2 D)) + g k1 x 1 + (1)x 2 ;y 1 + (1)y 2 jD k1 =D +J k 0;(x 1 +y 1 D) + + (1)(x 2 +y 2 D) + h g k1 x 1 ;y 1 jD k1 =D +J k 0; (x 1 +y 1 D) + i + (1) h g k1 x 2 ;y 2 jD k1 =D +J k 0; (x 2 +y 2 D) + i = h g k1 x 1 ;y 1 jD k1 =D +J k (x 1 D) + ; (y 1 (Dx 1 ) + ) + i + (1) h g k1 x 2 ;y 2 jD k1 =D +J k (x 2 D) + ; (y 2 (Dx 2 ) + ) + i = G k1 x 1 ;y 1 jD k1 =D + (1)G k1 x 2 ;y 2 jD k1 =D); 128 where the first inequality follows from the non-decreasing property ofJ k (x;y) in (x;y) and the second inequality follows from the joint-convexity ofJ k (x;y) in (x;y). Case 2: x 1 D;x 2 >D andx 1 + (1)x 2 D. It is easy to verify that g k1 x 1 + (1)x 2 ;y 1 + (1)y 2 jD k1 =D g k1 x 1 ;y 1 jD k1 =D (1)g k1 x 2 ;y 2 jD k1 =D = (c r c d h r )(Dx 1 (1)x 2 ) + (bc r +h r )(D(x 1 +y 1 ) (1)(x 2 +y 2 )) + (c r c d h r )(Dx 1 ) (1)h d (x 2 D)(bc r +h r )(Dx 1 y 1 ) + ) (1)(c r c d +h d h r )(x 2 D) + (bc r +h r )((Dx 1 y 1 ) + + (1)(Dx 2 y 2 ) + ) (bc r +h r )(Dx 1 y 1 ) + ) = (1)(c r c d +h d h r )(x 2 D): Note that becauseJ k (x;y) is jointly convex and non-decreasing in (x;y) J k ((x 1 + (1)x 2 D) + ; (y 1 + (1)y 2 (Dx 1 (1)x 2 ) + ) + ) = J k (0; ((x 1 +y 1 D) + (1)(x 2 +y 2 D)) + ) J k (0;(x 1 +y 1 D) + + (1)(x 2 +y 2 D)) J k (0; (x 1 +y 1 D) + ) + (1)J k (0;x 2 +y 2 D): LetJ k (x 2 D;y 2 ) = G k (x ;y ), where by definition ofJ k (;), x x 2 D andy y 2 . Thus, J k (0;x 2 +y 2 D)J k (x 2 D;y 2 ) G k (x x 2 +D;y +x 2 D)G k (x ;y ) = h G k (x x 2 +D;y +x 2 D)G k (x x 2 +D;y ) i h G k (x ;y )G k (x x 2 +D;y ) i (x 2 D)(c r c d ); where the first inequality follows fromx 2 D > 0 and the definition ofJ k (;), and the second inequality follows from @G k (x;y) @y @G k (x;y) @x c r c d . 129 Then, G k1 x 1 + (1)x 2 ;y 1 + (1)y 2 jD k1 =D = g k1 x 1 + (1)x 2 ;y 1 + (1)y 2 jD k1 =D +J k ((x 1 + (1)x 2 D) + ; (y 1 + (1)y 2 (Dx 1 (1)x 2 ) + ) + ) g k1 x 1 ;y 1 jD k1 =D + (1)g k1 x 2 ;y 2 jD k1 =D (1)(c r c d +h d h r )(x 2 D) +J k (0; (x 1 +y 1 D) + ) + (1)J k (x 2 D;y 2 ) + (1)(x 2 D)(c r c d ) h g k1 x 1 ;y 1 2jD k1 =D +J k (x 1 D) + ; (x 1 +y 1 D) + i +(1) h g k1 x 2 ;y 2 jD k1 =D +J k x 2 D;y 2 i = G k1 x 1 ;y 1 jD k1 =D + (1)G k1 x 2 ;y 2 jD k1 =D); where the last inequality follows fromh d h r andx 2 >D. Case 3: x 1 D;x 2 >D andx 1 + (1)x 2 >D. It is easy to verify that g k1 x 1 + (1)x 2 ;y 1 + (1)y 2 jD k1 =D g k1 x 1 ;y 1 jD k1 =D (1)g k1 x 2 ;y 2 jD k1 =D = h d (x 1 + (1)x 2 D) + (bc r +h r )(D(x 1 +y 1 ) (1)(x 2 +y 2 )) + (c r c d h r )(Dx 1 ) (1)h d (x 2 D)(bc r +h r )(Dx 1 y 1 ) + ) (c r c d +h d h r )(Dx 1 ) + (bc r +h r )((Dx 1 y 1 ) + + (1)(Dx 2 y 2 ) + ) (bc r +h r )(Dx 1 y 1 ) + ) (c r c d +h d h r )(Dx 1 ): Note that J k ((x 1 + (1)x 2 D) + ; (y 1 + (1)y 2 (Dx 1 + (1)x 2 ) + ) + ) = J k (x 1 + (1)x 2 D;y 1 + (1)y 2 ) = J k (0 + (1)(x 2 D + 1 (x 1 D));(x 1 +y 1 D) + + (1)(y 2 + 1 min(Dx 1 ;y 1 ))) J k (0; (x 1 +y 1 D) + ) + (1)J k (x 2 D 1 (Dx 1 );y 2 + 1 min(Dx 1 ;y 1 )); 130 where the inequality follows from the joint-convexity ofJ k (;). LetJ k (x 2 D;y 2 ) =G k (x ;y ), where by definition ofJ k (;),x x 2 D andy y 2 . Thus, J k (x 2 D 1 (Dx 1 );y 2 + 1 min(Dx 1 ;y 1 ))J k (x 2 D;y 2 ) J k (x 2 D 1 (Dx 1 );y 2 + 1 (Dx 1 ))J k (x 2 D;y 2 ) G k (x 1 (Dx 1 );y + 1 (Dx 1 ))G k (x ;y ) = G k (x 1 (Dx 1 );y + 1 (Dx 1 ))G k (x 1 (Dx 1 );y ) G k (x ;y )G k (x 1 (Dx 1 );y ) 1 (Dx 1 )(c r c d ); where the first inequality follows from non-decreasing property and the definition ofJ k (;), and the second inequality follows from @G k (x;y) @y @G k (x;y) @x c r c d . Then, G k1 x 1 + (1)x 2 ;y 1 + (1)y 2 jD k1 =D = g k1 x 1 + (1)x 2 ;y 1 + (1)y 2 jD k1 =D +J k x 1 + (1)x 2 D;y 1 + (1)y 2 g k1 x 1 ;y 1 jD k1 =D + (1)g k1 x 2 ;y 2 jD k1 =D (c r c d +h d h r )(Dx 1 ) +J k (0; (x 1 +y 1 D) + ) + (1)J k (x 2 D;y 2 ) +(Dx 1 )(c r c d ) = h g k1 x 1 ;y 1 jD k1 =D +J k (x 1 D) + ; (y 1 (Dx 1 ) + ) + i +(1) h g k1 x 2 ;y 2 jD k1 =D +J k x 2 D;y 2 i = G k1 x 1 ;y 1 jD k1 =D + (1)G k1 x 2 ;y 2 jD k1 =D); where the inequality follows fromh d h r andx 1 D. 131 Case 4: x 1 ;x 2 >D. Then,x 1 + (1)x 2 >D. We have G k1 x 1 + (1)x 2 ;y 1 + (1)y 2 jD k1 =D = g k1 x 1 + (1)x 2 ;y 1 + (1)y 2 jD k1 =D +J k x 1 + (1)x 2 D;y 1 + (1)y 2 h g k1 x 1 ;y 1 jD k1 =D +J k x 1 D;y 2 i +(1) h g k1 x 2 ;y 2 jD k1 =D +J k x 2 D;y 2 i = G k1 x 1 ;y 1 jD k1 =D + (1)G k1 x 2 ;y 2 jD k1 =D ; where the inequality follows from the joint-convexity ofJ k (x;y) in (x;y). Since expectation preserves convexity, G k1 (x;y) and J k1 (x;y) are jointly convex in (x;y), and J k1 (x;y) is non-decreasing in (x;y) by Lemma B.0.2. Now consider the subdifferentials ofG k1 (x;y) with respect toy. First, consider the one period costg k1 (x;yjD k1 =D) for any fixedD 0. It is easy to verify from the expression ofg k1 (;) that @g k1 (x;yjD k1 =D) @y @g k1 (x;yjD k1 =D) @x = 8 < : c r c d if xD h r h d if xD : Let (x ;y ) be the global minimizer ofG k (;). Ifx<D, then by the definition ofJ k (;), @J k ((xD) + ; (y (Dx) + ) + ) @x = @J k (0; (x +yD) + ) @x = 8 < : 0 if xD;x +yDy @J k (0;x+yD) @y if xD;x +yD>y ; where the last equality follows fromJ k (0; (x +yD) + ) =G k (x ;y ) forxD andx +yDy . Thus, @J k (0;x+yD) @y @J k ((xD) + ;(y(Dx) + ) + ) @x = 0 ifxD. Then, @G k1 (x;yjD k1 =D) @y @G k1 (x;yjD k1 =D) @x = @g k1 (x;yjD k1 =D) @y @g k1 (x;yjD k1 =D) @x + @J k ((xD) + ; (y (Dx) + ) + ) @y @J k ((xD) + ; (y (Dx) + ) + ) @x c r c d ; 132 where the inequality follows from the fact that forx>D, @J k ((xD) + ; (y (Dx) + ) + ) @y @J k ((xD) + ; (y (Dx) + ) + ) @x max x 0 ;y 0 @G k1 (x 0 ;y 0 ) @y @G k1 (x 0 ;y 0 ) @x c r c d and @g k1 (x;yjD k1 =D) @y @g k1 (x;yjD k1 =D) @x < 0. This completes the proof fort =k 1. 2 Proof of Theorem 3.3.5: It is easy to verify that the sub differentials ofg t (x;y) at any non-negative (x;y) are @g t (x;y) @x = (c r c d h r +h d )P(D t x) + (bc r +h r )P(D t x +y) + (c d b)P(D t x) and @g t (x;y) @y = (bc r +h r )P(D t x +y) + (c r b)P(D t x): Then, the unconstrained maximizers (x ;y ) should satisfy x = 8 < : 0 if P(D t y)> bc d bcr +hr x if P(D t y) bc d bcr +hr ; where (c r c d h r +h d )P(D t x) + (bc r +h r )P(D t x +y) =bc d , and y = 8 < : 0 if P(D t x)> bcr bcr +hr F 1 ( bcr bcr +hr )x if P(D t x) bcr bcr +hr : Consider the following Lagrangian function: L(y d ;y r ; 1 ; 2 ) =g t (y d ;y r ) 1 (y d x t d ) 2 (y r x t r ): 133 Fort = 1;:::;N 1, the optimal solution (y d ;y r ; 1 ; 2 ) must satisfies (c r h r c d +h d )P (Dy d ) + (bc r +h r )P (Dy d +y r ) +c d b 1 = 0 (B.1) (bc r +h r )P (Dy d +y r ) +c r b 2 = 0 (B.2) 1 (y d x t d ) = 0 (B.3) 2 (y r x t r ) = 0 (B.4) 1 ; 2 0 (B.5) Following the definition of s t ; s t d and s t r , the optimal solution must falls into the following four cases: Case 1: 1 = 2 = 0. Then it must be true thaty d = s t d ,y r = s t r andy d +y r = s t . Case 2: 1 > 0; 2 = 0. In this case, we havey d =x t d > s d ,y r < s r andy d +y r =x t d +y r = s t . Case 3: 1 = 0; 2 > 0. In this case, we havey d < s t d ,y r =x t r > s t r andy d +y r =y d +x t r > s t . Case 4: 1 > 0; 2 > 0. In this case, we havey d =x t d ,y r =x t r andy d +y r =x t d +x t r > s t . If h d hr > bc d bcr , then F ( s t )F ( s t d ) = bc r bc r +h r c r c d c r h r c d +h d = (bc r )(h d h r )h r (c r c d ) (bc r +h r )(c r h r c d +h d ) = h d (bc r )h r (bc d ) (bc r +h r )(c r h r c d +h d ) > 0: Thus, the threshold for the combined balance is greater than the threshold for the discount account. The policy can be checked easily using the four cases stated earlier. (a) When (x t d ;x t r )2 t 1 , only Case 1 holds. (b) When (x t d ;x t r )2 t 2 , only Case 3 holds. (c) When (x t d ;x t r )2 t 3 , only Case 2 holds. (d) When (x t d ;x t r )2 t 4 , only Case 4 holds. On the other hand, if h d hr bc d bcr , thenF t ( s t )F t ( s t d )< 0. Thus, s t < s t d , implying that Case 1 and Case 2 in the Lagrangian analysis can no longer hold. We only need to consider Case 3 and Case 4. The optimal 134 policy can be checked easily: when (x t d ;x t r )2 t 1 , only Case 3 holds; when (x t d ;x t r )2 t 2 , only Case 4 holds. This completes the proof. 2 135
Abstract (if available)
Abstract
While companies around the world strive to meet the demand of their customers, the more ambitious ones seek to shape consumer demand. To help companies make better operational and marketing decisions, we need to understand how and why customers make their purchase decisions. However, it is difficult to know the actual formulae that customers use when making their choices due to the complex nature of human behavior and unobservable externalities. My research interests lie in modeling customer choice behaviors, as well as in seeking solutions to high-impact operations management problems, such as assortment planning and price optimization, using both analytical and data-driven approaches. In this dissertation, I seek to examine three facets relating to my research interests, focusing on assortment optimization algorithms, price optimization problems, and a dual-account bus card system. ❧ The first chapter, “A Greedy Algorithm for the Two-Level Nested Logit Model,” is based on a joint work with Professor Paat Rusmevichientong. We consider the assortment optimization problem under the classical two-level nested logit model, where the goal is to find the revenue-maximizing assortment of products to be offered. We establish a necessary and sufficient condition for the optimal assortment and use this optimality condition to develop a simple greedy algorithm that iteratively removes at most one product from each nest, until the optimality condition is satisfied. Our algorithm exploits the “lumpy” structure of the optimal solution, where in each nest, a certain set of “consecutive” products will always appear together in the optimal assortment. The algorithm is simple, intuitive, and extremely fast. For a problem with m nests, with each nest having n products, the running time is O(n m log m). This is the fastest known running time for this problem. ❧ The second chapter, “The d-Level Nested Logit Model: Assortment and Price Optimization Problems,” is based on a joint work with Professors Paat Rusmevichientong and Huseyin Topaloglu. We consider assortment and price optimization problems under the d-level nested logit model. In the assortment optimization problem, the goal is to find the revenue-maximizing assortment of products to offer, when the prices of the products are fixed. Using a novel formulation of the d-level nested logit model as a tree of depth d, we provide an efficient algorithm to find the optimal assortment. For a d-level nested logit model with n products, the algorithm runs in O(d n log n) time. In the price optimization problem, the goal is to find the revenue-maximizing prices for the products, when the assortment of offered products is fixed. Although the expected revenue is not concave in the product prices, we develop an iterative algorithm that generates a sequence of prices converging to a stationary point. Numerical experiments show that our method converges faster than gradient-based methods by many orders of magnitude. In addition to providing solutions for the assortment and price optimization problems, we give support for the d-level nested logit model by demonstrating that it is consistent with the random utility maximization principle and equivalent to the elimination by aspects model. ❧ The third chapter, “A Dual-Account Bus Card Problem,” is based on a joint work with Professors Paat Rusmevichientong and Sha Yang. We study the optimal recharging policy of strategic passengers in a dual-account bus card system. The first account offers a heavy discount on the bus fare, but its remaining balance expires at the end of each month
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Essays on revenue management with choice modeling
PDF
Efficient policies and mechanisms for online platforms
PDF
Statistical learning in High Dimensions: Interpretability, inference and applications
PDF
Real-time controls in revenue management and service operations
PDF
Essays on service systems
PDF
Essays on information design for online retailers and social networks
PDF
Multi-armed bandit problems with learned rewards
PDF
Essays on consumer product evaluation and online shopping intermediaries
PDF
Inverse modeling and uncertainty quantification of nonlinear flow in porous media models
PDF
Scalable exact inference in probabilistic graphical models on multi-core platforms
PDF
Essays on understanding consumer contribution behaviors in the context of crowdfunding
PDF
Scheduling and resource allocation with incomplete information in wireless networks
PDF
Modeling and simulation of multicomponent mass transfer in tight dual-porosity systems (unconventional)
PDF
Performance prediction, state estimation and production optimization of a landfill
PDF
Efficient inverse analysis with dynamic and stochastic reductions for large-scale models of multi-component systems
PDF
Modeling and simulation of complex recovery processes
PDF
Reinforcement learning in hybrid electric vehicles (HEVs) / electric vehicles (EVs)
PDF
Detecting joint interactions between sets of variables in the context of studies with a dichotomous phenotype, with applications to asthma susceptibility involving epigenetics and epistasis
Asset Metadata
Creator
Li, Guang
(author)
Core Title
Modeling customer choice in assortment and transportation applications
School
Marshall School of Business
Degree
Doctor of Philosophy
Degree Program
Business Administration
Publication Date
01/14/2018
Defense Date
05/09/2016
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
assortment optimization,assortment planning,customer choice model,dual-account,greedy algorithm,multi-level nested logit model,OAI-PMH Harvest,optimality condition,price optimization,Public transportation,revenue management,smart card
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Rusmevichientong, Paat (
committee chair
), Yang, Sha (
committee member
), Zhu, Leon (
committee member
)
Creator Email
guang.li.2016@marshall.usc.edu,liguangjiajia@hotmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-268675
Unique identifier
UC11281279
Identifier
etd-LiGuang-4544.pdf (filename),usctheses-c40-268675 (legacy record id)
Legacy Identifier
etd-LiGuang-4544.pdf
Dmrecord
268675
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Li, Guang
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
assortment optimization
assortment planning
customer choice model
dual-account
greedy algorithm
multi-level nested logit model
optimality condition
price optimization
revenue management
smart card