Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Computational validation of stochastic programming models and applications
(USC Thesis Other)
Computational validation of stochastic programming models and applications
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Computational Validation of Stochastic Programming Models and Applications by Jiajun Xu A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) August 2022 Copyright 2022 Jiajun Xu I dedicate this thesis to my family and friends, for their inspiration and support. ii Acknowledgements I would like to express my sincere gratitude to many people. Without the help, support, and inspiration from others, this dissertation would not exist. First and foremost, my deepest gratitude goes to my advisor, Dr. Suvrajeet Sen. He has played an instrumental role in my life at USC. His suggestion and advice guide me in not only my studies but also in career development and daily life. Five years ago, when I was at the end of my first-year Ph.D. study, I was confused about the research direction and anxious about my future career. He provided me with a fascinating area, operation research, which is aligned with my research interest. His suggestions on course selection and paper reading help me survive and thrive in this area. My gratitude also goes to Dr. Rahul Jain. His intellectual insights and care for students have inspired me a lot. He saw my potential, introduced me to my advisor, and my journey in opera- tion research started. His fantastic probability and stochastic systems courses help me build solid foundations for my research. I would like to thank Dr. Carl Kesselman and Dr. Petros Ioannou for serving on my dissertation committee and their insight, comments, and feedback. I would like to thank the members of USC3DLAB for their inspiration and collaboration. I acknowledge Shelly Lewis and Diane Demetras for their excellent administrative work and the partial financial support of the Office of Naval Research and National Science Foundation. I would like to thank all of my friends and my family. They are always my backbone. Finally, I would like to thank all the people I met. Everything that happens in life is meaningful. I believe in interdependent co-arising. iii Table of Contents Dedication ii Acknowledgements iii List of Tables vii List of Figures viii Abstract x Chapter 1: Introduction to Stochastic Programming 1 1.1 Significance of the Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Contributions of the Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 Practical Stochastic Linear Programming Application . . . . . . . . . . . . 5 1.2.2 Stochastic Combinatorial Optimization . . . . . . . . . . . . . . . . . . . 6 1.2.3 Multi-stage Stochastic Linear Programming . . . . . . . . . . . . . . . . . 7 1.2.4 Cyber-Infrastructure for Computational SP . . . . . . . . . . . . . . . . . 8 1.3 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.1 Sample Average Approximation . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.2 Statistical Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3.3 Summary of SAA Procedure . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.3.4 Stochastic Decomposition and Related . . . . . . . . . . . . . . . . . . . . 17 1.4 Organization of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Chapter 2: A Contemporary Stochastic Linear Programming Application: Resource Allocation during the COVID-19 Pandemic 20 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2 Prediction and Allocation under Uncertainty . . . . . . . . . . . . . . . . . . . . . 21 2.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.3.1 Resource Allocation with Deterministic Model . . . . . . . . . . . . . . . 27 2.3.2 Resource Allocation with Stochastic Model . . . . . . . . . . . . . . . . . 31 2.4 Computations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Chapter 3: Two-stage Stochastic Combinatorial Optimization: Ensemble Models for Variance Reduction 41 iv 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.2.1 Bagging Solution with Kernel . . . . . . . . . . . . . . . . . . . . . . . . 45 3.2.2 The Scope of This Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.3 Variance Reduced Ensemble Decisions: Connection with Machine Learning . . . . 47 3.3.1 Performance of the Bagging Solution . . . . . . . . . . . . . . . . . . . . 48 3.3.2 Performance of the Compromise Decision . . . . . . . . . . . . . . . . . . 50 3.3.3 Stopping Rules for SCO . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.4 Efficient Budget Allocation: Connection with Simulation Optimization . . . . . . . 57 3.5 Very Large-Scale Applications: Stochastic Facility Location . . . . . . . . . . . . 62 3.5.1 The Mathematical Structure . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.6 Computations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.6.1 Problem Generation and Model Characteristics . . . . . . . . . . . . . . . 66 3.6.2 Instance Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.6.3 Computational Results with the Bagging and Compromise Decisions . . . . 70 3.6.4 Results with Optimal Computational Budget Allocation (OCBA) . . . . . . 72 3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Chapter 4: Multi-stage Stochastic Linear Programming: Compromise Policy for Bias and Variance Reductions 76 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.2 Formulation of MSLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.3 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.3.1 SDDP Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.3.1.1 Calculation of Bounds in SDDP . . . . . . . . . . . . . . . . . . 84 4.3.2 The Hydro-Thermal Scheduling Problem Formulation . . . . . . . . . . . 85 4.4 Compromise Policy: An Ensemble Method . . . . . . . . . . . . . . . . . . . . . 88 4.4.1 Multi-replication Approach for Compromise Function and Compromise Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.4.2 An Example: Hydro-Thermal Scheduling . . . . . . . . . . . . . . . . . . 90 4.5 Variance and Bias Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.5.1 Variance Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.5.2 Bias Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.6 A Meta-Algorithm for MSLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.6.1 Online Dual Dynamic Programming . . . . . . . . . . . . . . . . . . . . . 101 4.6.2 Parallel Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.7 Further Computational Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.7.1 Results for Meta + ODDP . . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.7.2 Comparison among Algorithms . . . . . . . . . . . . . . . . . . . . . . . 109 4.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Chapter 5: A Cyber-Infrastructure for Computational Stochastic Programming 116 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.2 System Design and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.3 Computations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 v 5.3.1 Reproducibility of Computational Comparisons of Algorithms: Multi- Dimensional Newsvendor . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 5.3.2 Reusability of a Research Example: LEO-Wyndor . . . . . . . . . . . . . . 126 5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Chapter 6: Conclusions and Future Work 128 6.1 Summary of the Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 6.2 Future Research Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 6.2.1 Multi-stage Dynamic System for Deep Learning . . . . . . . . . . . . . . . 130 6.2.2 Machine Learning Model for the Policy in MSLP . . . . . . . . . . . . . . 131 References 134 vi List of Tables 2.1 This table shows the total amount of national unmet demand for different models in the period from Mar. 25 to Jun. 02, and from Nov. 19 to Dec. 16, 2020. Each column represents a different percentage of total ventilators available for COVID-19 patients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.2 This table shows the total unmet demand, in the period from Mar. 25 to Jun. 02, and from Nov. 19 to Dec. 16, 2020, for all models under different policy parameters. P covid is set as 0.6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.3 This table shows the total amount of unmet demand for different models in different weeks of 2020. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.1 Characteristics of Test Problems with District demand nodes and ZIP demand nodes. 68 3.2 Variance Reduction with the Bagging and Compromise Decisions: Computational results with demand node for each District, corresponding to 15 demand nodes. . . 71 3.3 Variance Reduction with Bagging and Compromise Decisions: Computational results with demand node for each ZIP code, corresponding to 118 demand nodes. 71 3.4 The computational time (secs.) for solving the examples with District and ZIP demand nodes, corresponding to the results in Table. 3.2 and Table. 3.3, respectively. 72 3.5 OCBA technique and ordinary SAA method: Computational results with ZIP demand, corresponding to 118 demand nodes. . . . . . . . . . . . . . . . . . . . . 73 4.1 Comparison among different algorithms: the upper/lower bounds and optimality gap (with 95% confidence intervals). . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.1 SA vs. RobustSA vs. SD on the MDNV problem. . . . . . . . . . . . . . . . . . 126 vii List of Figures 1.1 The connections between stochastic programming and other related areas. . . . . . 5 1.2 The summary of the contributions. . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1 The demand for ventilators in the U.S. Data source: IHME group [34] released on Dec. 17, 2020. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.2 The amount of unmet demand in different weeks based on the decisions from all models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.3 The supply and demand for the ventilators in New York and Kansas in different time. These figures show the results of the experiment with P covid = 60%;r = 0:5;q = 0:2. The states are connected with adjacent states. . . . . . . . . . . . . . 39 3.1 (Snapshot of the animation on Youtube.) An illustration of the solution for the Weighted Centroid model with 15 potential facilities and five open facilities. The red dots represent the selected facilities from the optimal solution, while the green dots are those potential facilities that are not selected. The orange dots are the locations for open facilities. The grey circles are the demand node, where their radii are proportional to the quantities of the demands—the locations and radii of the demand nodes change based on the data in different scenarios. . . . . . . . . . 69 4.1 In-Sample Upper Bound CIs and Lower Bounds . . . . . . . . . . . . . . . . . . . 91 4.2 Out-of-Sample Gap v.s. In-Sample Gap . . . . . . . . . . . . . . . . . . . . . . . 91 4.3 Diagram of the meta-algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.4 Optimality Gap v.s. Outer Iteration; Out-of-Sample Upper Bound and Lower Bound v.s. Outer Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.5 Out-of-Samples upper bound and lower bound of the individual policies and the compromise policy in different outer iteration . . . . . . . . . . . . . . . . . . . . 107 4.6 Cumulative distribution function of individual policy evaluation in different outer iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 4.7 Comparison among different algorithms: the upper/lower bounds and optimality gap (with 95% confidence intervals). . . . . . . . . . . . . . . . . . . . . . . . . . 115 viii 5.1 CORE: computational operations research exchange platform design. . . . . . . . 122 5.2 Homepage of the cORe platform. . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.3 Workflow of LEO-Wyndor in cORe Platform. . . . . . . . . . . . . . . . . . . . . 127 ix Abstract My thesis focuses on computational validation of stochastic programming models and applications, which includes topics in two-stage stochastic linear programming (2-SLP), two-stage stochastic combinatorial optimization (SCO), and multi-stage stochastic linear programming (MSLP). I touch upon compelling applications in the healthcare area for these fundamental building blocks, study- ing statistical issues for variance reduction in SCO and designing algorithms for distributed compu- tations in MSLP. Specifically, the computational experiments focus on network-related problems, such as resource allocation, facility location problems, and power systems. This thesis begins with an overview of stochastic programming (SP), followed by a contem- porary application in resource allocation. Our models and the associated computational results illustrate the power of 2-SLP models for reducing resource shortages nationwide (US) using data for ventilator demand/availability. I demonstrate both their computational practicality and im- proved decision-making via vastly reduced shortages during the peak of the COVID-19 pandemic. From 2-SLP, I look at SCO, where the challenges include non-convexity (due to discrete decision variables), as well as variability (due to sampling methods). Our practices intend to overcome the combinatorial explosion in the size of realistic instances. Among these approaches to reducing variance, I introduce new concepts, such as kernel methods from statistical learning, compromise decisions from stochastic linear programming, and optimal computing budget allocation (OCBA) from simulation optimization into large-scale SCO models. These methods expand the computa- tional horizons of SCO and improve the solution quality of SCO significantly. A stopping rule is proposed based on bagging and compromise decisions, which provides a probability guarantee of the quality of the final decision. Following the above advances, we move to the next class of x challenging SP models, namely, MSLP, where we study a meta-algorithm. The meta-algorithm can incorporate most value function-based algorithms, such as SDDP and SDLP. A compromise policy is proposed, which exploits distributed computing. I show that the compromise policy can reduce variance and bias. I propose a sequential sampling algorithm called the online dual dynamic pro- gramming (ODDP) algorithm. This approach introduces multiple replications for MSLP models to improve the accuracy of the estimated optimum values and decisions. I propose an optimality test using an out-of-sample optimality gap used in the ODDP algorithm. Finally, as part of a team, I have developed a cyber-infrastructure called computational Operation Research exchange (cORe), which promotes reproducibility in computational experiments for stochastic programming. xi Chapter 1 Introduction to Stochastic Programming 1.1 Significance of the Research Stochastic Programming (SP) [1], also known as optimization under uncertainty, deals with a class of optimization problems where the data is uncertain, but the distribution is known or can be es- timated. It has been successfully applied in many areas, including finance [2], server location on networks [3], transportation [4], and supply chain [5], among others. Moreover, it is closely related to many classes of conceptual models, including but not limited to approximate dynamic programming, data science, optimal control, simulation optimization, and others. A detailed re- view aligning the concepts from theory and practice can be found in [6]. Moreover, a universal modeling framework is proposed to encompass all these conceptual models under the umbrella of stochastic programming [7]. Because of such a broad footprint, its impact transcends SP itself and can be extended to neural networks. Thus research on various facets of SP will not only lead to successful applications in various domains but also shed light on issues that are of interest in other related communities. A general SP formulation is represented by the following mathematical model: min f(x) :=E P [F(x; ˜ w)] s.t. x2 X (1.1) 1 where the random vector ˜ w is defined on a probability space(W;F;P), and the expectation opera- tor is taken with respect to the random variable ˜ w. The objective function F(x; ˜ w)= g(x)+h(x; ˜ w) can be divided into two parts: the first part g(x) only depends on the first-stage decisions x, while the second part h(x; ˜ w) depends on x and the realization of the uncertainty ˜ w, which is also called ‘cost-to-go’ function. The decision x is restricted to the feasible set X. Notice that function h can be the value function of another optimization problem. With a realization w, the subproblem can be expressed as: h(x;w)= min g(y;x;w) s.t. y2 Y(x;w) (1.2) where the function h is realized only after the decision x is made. The feasible set Y(x;w) depends on the decision x and the random variable realizationw. A special case in the SP is the two-stage stochastic linear programming (2-SLP) model [8], in which the feasible set X and Y are composed of only linear constraints and we will only have two stage decisions x and y. The main computational challenges of an SP model can be attributed to the fact that the space over which the uncertainty is to be accommodated is too large. Specifically, it is computationally infeasible to calculate the expectation in Eq. (1.1), which requires us to solve all the subproblems of h(x;w). Suppose the realizations of the random variables are limited. In that case, we can formulate an SP problem as a deterministic problem by enumerating all the possible scenarios, which is known as the deterministic equivalent formulation (DEF). Such a DEF can be solved by using deterministic methods when the number of possible scenarios is manageable. However, in most realistic instances, the sample space of random vectors W can be enormous! In such cases, the DEF results in a very high-dimensional linear programming problem or mixed-integer programming (MIP) problem, making them computationally impractical. Sampling-based methods are proposed to approximate the expectation in SP problems. The core idea is to use limited samples to approximate the true expectation problem. By qualifying the solution we obtained from the deterministic sampling problem, we can get some probabilistic guarantee for the true problem. Moreover, as the sample size increases, the sampling problem can 2 be proved to converge uniformly to the expected problem. Although for the multi-stage problems, the sample size increases exponentially as the number of stages increases. One useful extension of stochastic programming (SP) is the combination with mixed-integer programming. Both SP and MIP have gained much more attention for their ability to handle im- portant and challenging applications over the years. However, they are known for their notorious computational difficulty, although they have developed solid mathematical theories and advanced algorithms. It is no surprise that combining these two classes of models, specifically stochastic combinatorial optimization (SCO), leads to an even more demanding class of models. Neverthe- less, the discreteness of decision variables and the ability to capture uncertainty make the SCO model address a much broader class of applications. It has been successfully applied to a vari- ety of areas ranging from electricity power portfolio optimization [9] and electric grid operations [10] to facility location [11] and supply chain design [12]. Due to such vast applicability, there is a growing demand for methods that can not only accelerate the optimization process but also provide reliable decisions. In general, SCO problems are much more computationally challenging, even when state-of-the- art computational resources are available. In the SCO model, parts or all of the data are unknown when decisions are made, where some presumed distribution can approximate the uncertainty. Romeijnders et al. [13] show that two-stage stochastic linear programming with either discrete or continuous recourse is #P-hard, where they are in the class P #P = #P [14]. MIP introduces discrete and non-convex proprieties into the model, which makes two-stage SCO a class of NP #P problem. A generalized model in SP is the multi-stage stochastic programming (MSP) model, which is a natural model to find sequences of optimal decisions under evolving and uncertain conditions. The decisions are optimized in discrete time, where the decision made in one period has to consider what will happen in the future and will affect the future decisions. This model has been success- fully applied to a wide variety of real-world applications, ranging from financial planning [15] to production-inventory planning [16] to electric power system [17, 18]. In addition, the sequential 3 decision-making property (under uncertainty) of the MSP is close related to dynamic program- ming, stochastic optimal control [19], and approximate dynamic programming (ADP) [20]. As a result, the methodology explored in the MSP field can also be extended to other related areas. In the MSP setting, the problem can be defined in a stagewise manner, where the function g(y;x;w) can be defined recursively. Notice that while the setting is reminiscent of dynamic programming (DP), the specifics of computational methodology are very different and often draw upon methods of constrained deterministic optimization. The research in this thesis covers the overlap with domains such as simulation optimization, mixed-integer programming, dynamic programming, and online optimization. I touch upon all the SP areas mentioned above in the research, including two-stage stochastic linear programming, stochastic combinatorial optimization, and multi-stage stochastic linear programming. I demon- strate the power of SP modeling by tackling the practical resource allocation problems. Moreover, I improve the quality of the final solutions in SCO and reduce their variance by introducing the concepts from machine learning and simulation optimization. The multi-stage stochastic linear programming part of my research not only provides new insight for MSP but also sheds light on other areas, such as adaptive dynamic programming and optimal control. The connections be- tween stochastic programming and the other regions are summarized in Figure 1.1. Finally, the cyber-infrastructure cORe platform enhances the reproducibility of all the computational research in SP. 1.2 Contributions of the Research In this dissertation, I first provide an overview of trending and challenging topics in SP. It includes advanced theory to solve SP problems computationally and successful applications in various ar- eas, including finance and transportation. I also connect SP with related areas, such as optimal 4 Figure 1.1: The connections between stochastic programming and other related areas. control, Markov decision processes, reinforcement learning, simulation optimization, robust opti- mization, and approximate dynamic programming. The contributions of this research in different areas are summarized as follows. 1.2.1 Practical Stochastic Linear Programming Application • To demonstrate the advantage of SP and the significance of the related topics in the proposal, I begin with a practical application. In order to find the optimal healthcare resource allocation for COVID-19, I developed three optimization models, including one deterministic model and two stochastic programming models (simple and general recourse). These models apply 5 a rolling horizon scheme to weekly decision-making. They integrate both the predictive and the perspective models to find the optimal choices. • The extensive computational results show that the stochastic programming models, which consider the uncertainty of the prediction data, are significantly better than the deterministic ones. The simple recourse model decouples the decisions for each state and thus requires less computational time. On the other hand, the general recourse model considers a looking- ahead period, further reducing unmet demand, thereby saving more patients’ lives. 1.2.2 Stochastic Combinatorial Optimization • I introduce the kernel method, which is used to find the final decision, into the SCO problem. I apply a similarity measure with the kernel function and find the final decision based on the centroid. The kernelized variance is calculated to measure the dispersal of the solutions. This method is basically a generalized method of bagging, which is a popular technique to enhance algorithms’ performance using variance reduction. • I extend the concept of compromise decision into the SCO problems, where I formulate a mixed integer programming (MIP) problem with the average function approximation. The objective value of this MIP problem has lower variance and thus provides a better approx- imation for the “true” problem. Solving this problem obtains a tighter lower bound for the optimal objective value and a more accurate optimality gap. • The above two methods can be categorized as aggregation in the solution space and aggre- gation in the function space. They can be applied beyond 2-SCO. Any algorithm for SP with function approximations can use these two approaches. In addition, I propose a stopping criterion based on these two methods for the optimization of the SMIP problems. • I develop an optimal computational budget allocation algorithm for SCO problems, origi- nally in simulation optimization. This algorithm can find a lower variance estimate of the 6 upper bound for the optimal objective value with a given computational budget. Moreover, a lower bound for the probability of correct selection is available. • I conduct comprehensive computational experiments on the stochastic facility location prob- lems with the above three approaches. They demonstrate a considerable advantage over the baseline method. 1.2.3 Multi-stage Stochastic Linear Programming • I propose a meta-algorithm for MSLP, which can incorporate most value function-based algorithms in MSLP, such as SDDP and SDLP. The algorithm consists of distributed opti- mization and aggregated validation. By exploiting the distributed/parallel computing, my algorithm can build a compromise policy with better performance. • I show the compromise policy’s variance reduction and bias reduction properties. I prove that the compromise policy eliminates the negative bias of the lower bound estimate compared with the commonly used lower bound. • I solve a hydro-thermal problem with the SDDP algorithm and show that the one replica- tion algorithm in MSLP may perform badly. Although the in-sample optimality gap in one replication is small enough, the out-of-sample optimality gap can be very large. • I propose a stopping rule to check the optimality of the policy. The stopping rule is based on the out-of-sample upper bound estimate and the lower bound estimate. Based on this stopping rule, I provide a sequential sampling SDDP algorithm, in which we first choose a sample size and solve the SAA problems. If the out-of-sample optimality gap is less than the tolerance, then we enlarge the sample set and solve a larger SAA problem. • I propose the online dual dynamics programming (ODDP), which can observe the sample on-the-fly. The ODDP algorithm only requires an oracle to generate the random samples instead of the true distribution of the random variables. In addition, the ODDP algorithm 7 is suitable for problems with an enfolding environment. I integrate the meta-algorithm with ODDP. The new algorithm is able to pause for optimality check and resume to update the function approximation. The efficiency and effectiveness of the algorithm are illustrated with extensive computations of the hydro valley problem. 1.2.4 Cyber-Infrastructure for Computational SP • We develop a new cyber-infrastructure that facilitates computational operations research ex- changes, especially for computational stochastic programming. The development of this robust cyber-infrastructure promotes a greater exchange of data, models, software, and ex- periments, which enhances reproducibility in SP. It provides a pedagogical tool for training students, and promoting reusability by sharing data, code, methods, and evaluations across the SP community. • This framework, which lets future researchers and students reproduce my work, has been used in courses like ISE 533. We have already confirmed that groups of students can use cORe for projects ranging from restaurant choice, meal planning, and many other applica- tions that combine data science and optimization. To summarize, the contributions of this thesis are that I provide a unified procedure to reduce the bias and variance in solving the SP problems and a computational method to validate the models. I provide not only the theoretical guarantee but also the practical applications with better performance. The unified procedure can cover the problem from two-stage to multi-stage, from continuous to discrete case. And we build a cyber infrastructure to facilitate the reproducibility of all my research. The relationship among each pillars is summarized in Fig. 1.2. 8 Figure 1.2: The summary of the contributions. 1.3 Background 1.3.1 Sample Average Approximation The SAA method has long been applied to solve Stochastic Programming (SP) [21] problems, and specifically, SCO problems [22]. The basic idea is to turn the stochastic problem into a determin- istic version problem by sampling a “large enough” set of samples. This Monte Carlo simulation- based approach replaces the expected value function with an approximate average function through random sampling. Then it optimizes the sample average function. In SAA, each Monte Carlo approximation is called replication. In one replication, suppose the random vector ˜ w has N real- izations, denoted as w 1 ;:::;w N , which are independently and identically distributed (i.i.d). With 9 these N samples, we can approximate the original SP problem in Eq. (1.1) with the sample average approximation problems, which is expressed as: min x2X f f N (x) := c > x+ 1 N N å i=1 h x;w i g (1.3) Here, the objective value f N (x) is an unbiased estimate of the “true” objective value function f(x), that is E[ f N (x)]= f(x), where f is the objective function in Eq. (1.1). The advantage of this method is that it converts a stochastic problem to a deterministic one. Thus, the deterministic solvers can be exploited, which significantly increases the number of applicable algorithms for SP. When we consider an SP problem, if we can enumerate all the possible scenarios, the problem can be written as a deterministic equivalent formulation (DEF). When the number of possible scenarios is manageable by deterministic solvers (e.g., All-in-One optimization model or using Benders’ decomposition), they can be solved with deterministic methods. For scenarios w i 2 W;i= 1;2;:::jWj, we denote p(w i ) as the probability of occurrence and y(w i ) as the second-stage decisions variables. In cases where one samples outcomes from the probability space, one can use the idea of “empirical risk minimization” and in that case, p(w i )= 1=N when the sample size is N. In the SP literature, this approximation is known as Sample Average Approximation (SAA). This deterministic linear programming or mixed-integer programming formulation may be in very high dimension, which, in two-stage problems, has n 1 +jWj n 2 decision variables in total, where n 1 ;n 2 are the dimensions of the first stage decision variables and second stage decision variables, respectively. A large number of scenarios makes the DEF impractical to solve and computationally infeasible. For fixed first-stage variables x, the problem can be decomposed into jWj independent subproblems with only n 2 decision variables. In addition, instead of considering all the possible realizations, we may sample the random variables from W and approximate the expectation with its empirical estimate. Sampling-based algorithms try to find a balance between optimality and the computational burden. In order to balance the trade-offs in a manner that reduces the computational burden, many sampling-based approaches have been proposed to tackle these SP problems. 10 When we apply the sampling-based approaches to solve the SP problems, a fundamental prop- erty is the theoretical guarantee of the optimal solution. Since the sampling process involves ran- dom variables, a probabilistic guarantee should be provided to bound the solution quality. In this subsection, we will discuss the most popular sampling-based method, Sample Average Approxi- mation (SAA), for the general SP problems, where we will discuss the criteria to select the sample size. In addition, we will discuss the proprieties of SAA with multiple replications, where we provide insight into balancing the sample size with the number of replications. Finally, we will ex- plore the pros and cons of the post-processing methods for the current SAA approach, which often involves the confidence interval of the objective estimate. The current post-processing methods are usually time-consuming, which inspires us to explore faster methods to find the final decision. 1.3.2 Statistical Inference With the approximation problem in Eq. (1.3), we are able to solve it with deterministic algorithms. To examine the quality of the solutions, we first define the optimal solution set and optimal value for the actual (“true”) problem in Eq. (1.1) as X and v := min x2X f(x), while the optimal solution set and the corresponding value for the SAA problem in Eq. (1.3) are define as ˆ X N and ˆ v N := min x2X f N (x). We assume that there is only one optimal solution in X . When using the SAA method to solve an SP problem, two kinds of errors are introduced. One is the sampling error due to replacing the expectation with a finite number of samples. Under certain assumptions (see [22]), the sampling error can be shown to converge to zero uniformly. Note that the sampling error also gives rise to an error in optimization as well. Due to these two errors, the solutions we get by solving the sample average problem might not be optimal. In order to characterize the above errors, we first define the set ofe-optimum solutions as follows. Definition 1.1. (e-optimal) Fore 0, if ¯ x2 X and f( ¯ x) v +e, then ¯ x is ane-optimal solution of problem in Eq. (1.1). The set of alle-optimal solutions of the ‘true’ problem is denoted as X e . Correspondingly, the set of alle-optimal solutions of problem in Eq. (1.3) is denoted as ˆ X e N . 11 By the statement “the sampling error will converge to zero”, we mean that the difference be- tween the objective values and solutions in the true problem and in the sample average problem converge to zero with probability one, which can be proved by applying the strong law of large numbers. Lemma 1.1. [PROPOSITION 2.1 in [22]] As N!¥, (i) ˆ v N ! v w.p.1, and (ii)8e 0 the event ˆ X e N X e happens w.p.1. Following [22], we further assume that there exists a mapping u(x) that satisfies f(u(x)) f(x)e;8x2 XnX e . To find ane-optimal solution with probability at least 1a, where 0 x+E[h(x; ˜ w)]j Ax b;x2XR n 1 B p 1 g; (3.1) where for any outcome w of the random variable ˜ w, h(x;w) is a function mapping the decision variable (vector) x andw to a real valued function defined by: h(x;w) := minfd(w) > yj W(w)y h(w) T(w)x; y2Y R n 2 Z p 2 g; (3.2) where the setY imposes ranges and discrete constraints on the decision variables y. For a given pair x and w, we also define the feasible set of the second stage variables Y(x;w)=fyjW(w)y= h(w) T(w)x;y2Yg. We assume that the objective function is real-valued (as opposed to “ex- tended” real-valued) onX , which is also known as the relatively complete recourse assumption, and implies that h(x;) has a finite value (almost surely) for all x2X . For a model with fixed- and-complete recourse, the matrix W is assumed to be deterministic; and for any x2 X andw2W, the feasible set Y(x;w) is not empty and the function value h(x;w) is finite. We assume the model has fixed-and-complete recourse which means that the matrix W is deterministic and for any x2 X andw2W, the feasible set Y(x;w) is non-empty and the value function h(x;w) is finite. SCO problems have long been considered as a class of well-defined problem in mathemat- ics and yet computational challenging problems. Notice that, for fixed first-stage variables, the problem in Eq. (3.1) can be decomposed intojWj independent subproblems, each of which, as shown in Eq. (3.2), has only n 2 + p 2 decision variables. Instead of considering all the possible 43 realizations, we may sample the random variables fromW, and approximate the expectation with its empirical estimate, which is also known as the Sample Average Approximation (SAA) [21]. By solving the corresponding SAA problem, we obtain a solution that is near-optimal to the orig- inal expectation problem with some probabilistic guarantee. In addition, as the sample size in the SAA problem increases, the function approximation converges uniformly to the “true” objec- tive function. However, it is important to emphasize that if one maintains the SAA problem as an All-in-One (deterministic equivalent) instance, then the resulting MIP problem may stretch the computational capabilities of a deterministic MIP solver, even though the sample size in SAA may be finite. Thus SAA, by itself, may not be able to provide a workable framework for SCO with a large number of discrete second stage variables. The one case where there is reasonable hope of addressing very large scale SCO problems is one where all integer variables are in the first-stage, and the second-stage only contains continuous variables. For such instances, one can employ Ben- ders’ decomposition for the large SAA instance, with integer variables appearing only in the first stage. In this case, decomposition of the large All-in-One formulation can be solved by using a decomposition method, and an SAA instance can be solved via Benders or similar decomposition schemes. Since the process to formulate the SAA problem involves random sampling, one introduces variance in the approximation. In an SP problem, the estimate of the value associated with random variables may have significant variances, limiting the precision of the results. Variance reduction is a common technique to enhance the accuracy of the estimate. By reducing the variance, we can improve the solution quality of the sample-based algorithms significantly. Many famous variance reduction techniques have been exploited extensively [56, 57]. By reducing the variability of the estimates, the objective function approximations are more accurate, which provides better deci- sions. In general, the multiple replication version of the SAA method is applied to enhance the quality of the solutions. As shown in [31], we can reduce the variance of the Stochastic Linear Programming (SLP) problem by finding the compromise decision. In this section, we will first discuss the SAA method and the multiple replication version. Next, we discuss the optimality gap 44 of the multi-replication SAA method, which includes the upper and lower bounds estimate. The multiple replications of solving inspires us on the ensemble meta-algorithms for the SCO problem. 3.2.1 Bagging Solution with Kernel Suppose we solve multiple replications of the SAA problem, the M realizations of the sample set areL 1 N ;:::;L M N . The final solution in the m-th replication is denoted as ˆ x m , which represents ˆ x(L m N ) in the interest of brevity. The set containing all the optimal decisions in replications is de- fined as ˆ X =f ˆ x 1 ;:::; ˆ x M g. When the decision variables are continuous, and the set X is convex, the final bagging solution is the expectation of the solutions with respect to the random variable ˜ L N , which is estimated by the mean of the potential solutions from all the replications ¯ x=å M m=1 ˆ x m =M. The mean solution is the centroid of the set ˆ X. Since the bagging solution is the expectation of the potential solutions with respect to the dataset random variable, it has lower variance compared with all other potential solutions. Note that all solutions in the set ˆ X are near-optimal with respect to their datasetL N . The bagging technique has been extended to the unconstrained discrete optimization by introducing the kernel method [54]. Their underlying methodology is the starting point of our research which not only provides a theoretical guarantee for the final solution, but also leads to a stopping rule based on our analysis. Instead of using the mean solution, which may not be feasible in discrete optimization, one chooses that solution in the set ˆ X which is closest to the mean solution with respect to a given distance measure. Let k(;) : X X!R be a similarity function or a kernel function. The corresponding feature mapping j : X!H from the solution space to the Hilbert space satisfies: k(x;x 0 )= j(x);j x 0 ; 45 whereh;i is the inner product. The mean solution of ˆ X in the Hilbert space can be defined as ¯ j=å M m=1 j( ˆ x m )=M. Thus, we can calculate the distance between the mean solution and any other solution x as: kj(x) ¯ jk 2 =hj(x);j(x)i 2 M M å i=1 j(x);j ˆ x i + 1 M 2 M å i=1 M å j=1 j ˆ x i ;j ˆ x j For all the potential solutions in the set ˆ X, we can first calculate the Gram matrix K2R MM , where the (i; j) element in the matrix is defined as K i j := k( ˆ x i ; ˆ x j )= j ˆ x i ;j ˆ x j . Then, for ˆ x m 2 ˆ X, we have kj( ˆ x m ) ¯ jk 2 = K mm 2 M M å i=1 K im + 1 M 2 M å i=1 M å j=1 K i j : (3.3) Note that the above computations do not require the feature mappingj, which would be very com- plicated in general. Instead, if we have access to the entries of the Gram matrix K, the calculation of the batching solution can be easily obtained by x b 2 arg min ˆ x m 2 ˆ X kj( ˆ x m ) ¯ jk 2 ; (3.4) which is obtained by comparing the first two terms in Eq. (3.3). We refer to this solution as the bagging solution. Notice that, since all solutions in ˆ X are sub-optimal, the bagging solution retains the inherited feasibility and optimality properties. Unlike [54] which applies a kernel in a elementwise calculation, we choose the kernel functions that remain the feasibility of the decision. Moreover, we enhance the performance of the bagging solution through the combination with the compromise decision. 3.2.2 The Scope of This Study We are interested in decisions that are very likely to be optimal, and the variance of the estimated objective is acceptably small. We exploit statistical methods to reduce the variance and accelerate 46 algorithms for use with parallel computing, where a multiple replication version of the SAA ap- proach [22] is applied. Each replication will be solved independently and potentially in parallel. The decisions and function approximations in all the replications are then combined in such a way as to identify one implementable, and potentially “near-optimal” decision. We explore method- ologies from machine learning in Section 3.3 and simulation optimization in Section 3.4 to design new procedures for lower variance decision and objective estimates in SCO problems. Our methods can be regarded as different types of meta-algorithms for stochastic combina- torial optimization. They do not depend on any specific algorithm we choose for solving each replication and, thus, can be integrated with most commonly used algorithms for two-stage SCO problems. If second-stage variables are continuous, we can apply the Benders decomposition al- gorithm [58] to solve the problems. Moreover, if the variables in both stages are mixed-binary, we can exploit the function approximations produced by the D2-BAC algorithm [59]. In problems with binary first-stage variables and general integer second-stage variables, we can approximate the value function by adding parametric Gomory cuts iteratively [60]. More applicable algorithms for different assumptions of SCO models can be found in [61, 62]. Since these algorithms all ap- proximate the value function with convex piecewise linear functions, the master problem enjoys similar properties and, thus, can be used within the proposed meta-algorithms. 3.3 Variance Reduced Ensemble Decisions: Connection with Machine Learning In general, variance reduction can be achieved via multiple replications in the case of sampling- based SP models. Unlike replications for simulation-based studies, however, the use of replications in stochastic programming has its pros and cons. The positives for multiple replications arise due to the reduced variance in the objective estimate, while the negative aspects result from the fact that, for small sample sizes, each replication tends to produce a decision which fits very well to the sampled data set but could be lacking in generalizability [23]. Because it is difficult to predict an 47 appropriate sample size a priori (i.e., prior to solving the problem) one often adopts a trial-and-error approach in which experiments are conducted with alternative sample sizes. Current multi-replication methods for SP may require significant computational resources, and, moreover, some may only work for smooth problems [63]. The optimization literature (including SP) has paid little attention to variance of decisions in the case of combinatorial decision models. Since we wish to tackle stochastic combinatorial optimization models (i.e., the SFLP model and its extensions) which have binary decision variables in the first stage, approaches which require convexity of the decision space are not applicable. We observe that multiple replications in an SAA-type algorithm are equivalent to bootstrapping the samples from the random vector space — with each replication of the algorithm providing a near-optimal solution as well as an approxima- tion for the sampled objective function. We combine the bagging method and compromise problem to recommend the final solution through the set of near-optimal solutions and value function approximations. The generalized bag- ging method exploits the aggregation of the near-optimal solutions, with a similarity measure using kernel methods [64]. We provide a theoretical guarantee for the bagging solution. As for the com- promise problem, we take advantage of the collection of value function approximations from all the replications. We formulate an MIP problem through aggregation in the function approxima- tion space. By solving this aggregated MIP problem, a promising solution, called the compromise decision, can be identified. Furthermore, this function aggregation problem provides us a tighter lower bound estimate for the optimal objective value and, hence, a tighter optimality gap. Finally, we propose a stopping rule with these two concepts, which provides a probabilistic guarantee for early termination of the algorithm. 3.3.1 Performance of the Bagging Solution SCO problems are usually complicated, especially those with a large number of first-stage decision variables. When we directly apply bagging to SCO problems, the expectation of the potential solutions, estimated by the mean of the potential solutions, might not be integer feasible for the 48 original problems. Moreover, when we solve an SCO problem with the SAA approach, the number of samples to approximate the objective function is critical for the quality of the solutions. As the sample size increases, the computational requirement increases exponentially. Thus, it is natural to extend bagging to solve an SCO problem, which increases the solution-stability. To reduce the variance of the objective estimate of a SCO problem, we can choose the multi- replication SAA method, where the bagging solution with the kernel method is applied to identify the final decision. If we choose the kernel function as an indicator function, that is, k(x;x 0 )=Ifx= x 0 g, whereIfg is one if the condition is satisfied (and zero otherwise), then our method reduces to the majority voting algorithm. In this case, we choose the solution which appears most often in ˆ X as the final decision. In general, an appropriate kernel function should be selected depending on the structure of the solution space. Theoretically, we have the following theorem to provide the optimality guarantee for this bagging solution. Theorem 3.1. Suppose we solve a SCO problem with the SAA method, where we run M replications and the sample size for each replication is N. Next, suppose we choose the indicator function as the kernel function. Define k as the number of trials for which the bagging solution is the optimal. In each replication, we are given the probability a2 (0;1), and tolerance d = ˆ e=2, where ˆ e = 2 3s 2 max N log jXj a 1=2 . Then, we have i) Pfx b 2 X e 0 g 1a k , wheree 0 ˆ e; ii) Pfx b 2 X e 1 g 1a, wheree 1 ˆ e( 1 2 p k + 1 2 ). Proof. i) We define I as the index set, where the final solution in the i2 I replication is the same as the bagging solution. Thus, the cardinality of I is k. According to the algorithm, the replications are executed independently of each other, and can be viewed as a Bernoulli process. Hence for each replication, we have Pfx b 2 X e g 1a, which is Pfx b = 2 X e g v c where f(x c ) is the true function value at the compromise decision, which is an upper bound of the optimal value, and v c is a lower bound value. Let ˆ m denote the estimate of the function value at the compromise decision, which is used as a test statistics for f(x c ). With N 0 i.i.d. out-of-samples, we calculate: ˆ m = 1 N 0 N 0 å i=1 F(x c ;w i ); ˆ s = v u u t 1 N 0 (N 0 1) N 0 å i=1 (F(x c ;w i ) ˆ m) 2 : (3.8) Due to central limit theorem, ˆ m follows a normal distribution with estimated standard deviation ˆ s. This hypothesis test is able to bound the Type I error, as well as the Type II error [65]. Theorem 3.4. Suppose we obtain x c ;v c from the compromise problem, and calculate ˆ m; ˆ s with Eq. (3.8). Defined := ˆ mv c , and findq= 1F(d= ˆ s), whereF is the cumulative distribution function of the standard normal distribution. Given a constanth in each optimality test of Algorithm 2, we have Probability of premature termination= Pr(retain x c j f(x c )=v c > 1+h)<g: (3.9) whereg = 1F( hv c ˆ s n 1q ), andn 1q =F 1 (1q). Proof. This proof follows the similar procedures in [65]. The lower bound v c is a known value and the upper bound f(x c ) has a test statistics ˆ m, which follows a normal distribution due to the Central Limit Theorem. The mean of N estimates of f(x c ) has the standard deviations, which is estimated by ˆ s. ˆ m and ˆ s are calculated by Eq. (3.8). The power function of our test is: b( f(x c ))= P f(x c ) ( ˆ m> v c + a)= P f(x c ) ( ˆ m f(x c ) s > v c + a f(x c ) s )= 1F( v c + a f(x c ) s ) 55 where P f(x c ) represents the probability of an event for a given value with f(x c ), and F() is the cumulative distribution function of standard normal distribution. In addition, we haveq =b(v c ) 1F(a=s). Thus, for the fixed significance levelq, we have a=sn 1q , wheren 1q =F 1 (1 q). We replace s with its sampling estimate, then we have b( f(x c ))= 1F( v c +a f(x c ) ˆ s ). When f(x c )=v c > 1+h, the null hypothesis H 0 is false. In this case, b( f(x c )) equals the probability of correct rejection. Thus, the Type II error, i.e., the probability of premature termination is given by: 1b( f(x c ))=F( v c + a f(x c ) ˆ s )<F( hv c ˆ s +n 1q )= 1F( hv c ˆ s n 1q )=g: This proves our statement. Algorithm 2 Multi-rep SAA approach using compromise decision with kernel initialization: number of replications M, optimization sample size N, evaluation sample size N 0 ,h;q 0 ;g 0 , . 1: for m= 1;:::;M (in parallel) do 2: Optimize an SAA problem with N samples. Construct an approximation ˆ f N (x;L m N ). 3: Obtain ˆ x m = argmin ˆ f N (x;L m N ) and ˆ v m N = min ˆ f N (x;L m N ). 4: end for 5: Calculate x b with Eq. (3.4). Calculate x c and v c with Eq. (3.7). 6: if x b == x c then 7: Evaluate f(x c ) with N 0 samples. Obtain ˆ m and ˆ s with Eq. (3.8). 8: Calculated = ˆ m v c . Findq = 1F(d= ˆ s),g = 1F( hv c ˆ s n 1q ). 9: ifqq 0 , andgg 0 then stop. 10: end if 11: end if 12: N 2N. Repeat from line 1. output: x c ;v c ; ˆ m; ˆ s;g. With Theorem 3.4, we obtain an upper bound for the Type II error, which is the probability of premature termination. When the algorithm stops, the probability of incorrectly identifying optimality is no greater thang in the case that f(x c )=v c > 1+h. Note that if we run the optimality test and continue the algorithm, the Type I error has an upper bound q. The smaller the q is, the stronger the evidence is that x c is not optimal. 56 Compared with standard post-processing, after computing all the replications separately, we generate a tremendous number of scenarios to evaluate every solution’s performance from the replication. With the bagging and compromise solutions, we obtain high quality with less compu- tational effort. This will save a considerable amount of computational time in post-processing. For instance, in the inexact-SAA algorithm [26], the samples for evaluation N 0 are much larger than the sample for optimization N, that is N 0 N. Furthermore, these N 0 are used for all potential solutions. The total budget to evaluate the performance isO(M N 0 ), which is extremely large. With the aggregation techniques, these computational budgets are reduced. 3.4 Efficient Budget Allocation: Connection with Simulation Optimization In this section we propose a post-processing procedure to accommodate situations with a limited computational budget. In these situations, we should allocate the budget so that, with high prob- ability, we can identify the optimal solution out of the set of potential solutions. Intuitively, we should spend less computational time on the less promising solutions while allocating more to those that are more likely to be optimal. Overall, a more significant portion of the computing esti- mates the objective function value at the critical solutions. In this way, the variance of objective at the critical solutions, used as the upper bound estimate, is reduced, and the accuracy of the optimal solution is increased. In this section, we exploit the popular optimal computing budget allocation (OCBA) technique [66] and apply it to SCO to identify the final decision. The budget allocation problem for the solutions selection procedure can be formulated as an optimization problem: given a fixed amount of computing budget, the probability of selecting the best solution from the potential solution set is maximized. This problem can be categorized as an Ordinal Optimization problem [67], where a best ‘design’ is obtained through ordinal comparison. In this section, we first state the budget allocation problem in the SCO context and formulate it as 57 an optimization problem. Furthermore, we exploit the popular Optimal Computing Budget Allo- cation (OCBA) technique [55, 66] to find an allocation method and apply it to the post-processing procedure of SCO to find the optimal solution from the potential solutions set. The allocation method is proved to be optimal for the optimization problem. This allocation method will se- lect the solution that is guaranteed to be optimal with a certain probability. The variance of the corresponding optimal value is less than that with the equal allocation method. Suppose we solve a two-stage SCO problem with M replications, where the sample set for replication m isL m N , m= 1;2;:::;M. These replications are solved independently, and the cor- responding optimal solutions are x(L 1 N );:::;x(L M N ). Suppose that there are a total of l distinct solutions, denoted as ˇ X = ˇ x 1 ;:::; ˇ x l . Then, these l solutions are candidates for the final decision of the SCO problem. The number of times that these distinct solutions appear during the experi- ment are denoted q 1 ;:::;q l . We have q 1 ++q r = M. The average lower bound estimate at these distinct solutions are ˇ v 1 ;:::; ˇ v l . Similar to the SCO problem in Eq. (3.1), the solution-selection problem can be defined as min x2 ˇ X f f(x) := E[F(x; ˜ w)]g: Since the feasible set ˇ X is limited to a car- dinality of`, we can enumerate these solutions. The best one can be determined by comparing the objective value at all solutions in ˇ X, where these objective values are estimated using the SAA method. Let the estimate be defined as f i := 1 N 0 i å N 0 i j=1 F ˇ x i ;w i j where i= 1;:::;l. N 0 i represents the number of samples for the i th solution, and w i j represents the j th samples for i th solution. We denote the sample variance of F ˇ x i ; ˜ w as S 2 i , and the variance of f i can be expressed as S 2 i =N 0 i . As the sample size N 0 i increases, f i becomes a more accurate estimator of the real objective value. We apply Central Limit Theorem (CLT) for this estimator, then we can show that the corresponding confidence interval decreases at the rate of 1= p N 0 . Thus, with an equal allocation method, we need a large number of samples for all candidate solutions to identify the best solution with confidence, which is too time-consuming. However, with ordinal comparison, the probability that the selected solution is the best one can converge exponentially fast [68]. Since the solution selection procedure aims to identify the most promising solution, there is no need to find an accurate evaluation estimate 58 for all solutions. We will apply the ordinal comparison framework for this procedure and exploit the exponential converge rate. We first assume that the computational times to evaluate different candidate solutions with one sample are approximately the same. Thus, we can use the number of samples to represent the computational budget. The performance F(x; ˜ w) is assumed to follow a normal distribution. Thus, with sample size N 0 i , we have the objective function estimator ˜ f i N f i ; S 2 i N 0 i where the normal distribution has mean f i , and variance S 2 i N 0 i . We will use ˜ f i to estimate the value of the objective function. We also denote ˇ x s as the selected solution, and ˇ x as the optimal solution with the objective value f . Following [69], we formulate the budget allocation as an optimization problem, where the decision variables are the number of samples to evaluate all the solutions, that are N 0 1 ;:::;N 0 l . The sum of these samples, or the total computing budget is given, that is, N 0 1 ++ N 0 l = T . We want to maximize the probability Pf ˇ x s = ˇ x ? g that the selected solution is the best solution, which is also referred to as the probability of correct selection. f ˇ x s = ˇ x ? g is equivalent to the event that the function estimator at solution ˇ x s is less than or equal to the function estimator at all the other solutions, which is the intersection of eventf ˜ f s ˜ f i g, i2f1;:::;lg and i6= s. By applying the Bonferroni inequality, we can find a lower bound for the objective value: Pf ˇ x s = ˇ x ? g= Pf l \ i=1;i6=s ˜ f s ˜ f i g 1 l å i=1;i6=s P ˜ f s > ˜ f i (3.10) where the RHS is called the approximate probability of correct selection (APCS) [69]. Compared with the original probability estimate, this lower bound 1å l i=1;i6=s P ˜ f s > ˜ f m is decoupled for different events which can be estimated easily without Monte Carlo simulation. Thus, for the prob- lem of budget allocation for the solution selection, we can maximize the lower bound probability instead of maximizing the original. We further assume the decision variables N 0 i are continuous, which gives us a continuous relaxation of the original integer programming problem. Since the 59 value of N 0 i is usually tremendous, we can round the continuous value to determine the number of samples with negligible error. The problem can be formulated as min N 0 1 ;;N 0 l f l å i=1;i6=s P ˜ f s > ˜ f i j l å i=1 N 0 i = T;N 0 i 0;i= 1;:::;lg; (3.11) Suppose we are given a total number of samples T , and a set of distinct potential solutions ˇ X =f ˇ x 1 ;:::; ˇ x l g. The decision variables N 0 i are the number of samples allocated to solution ˇ x i ;i= 1;2;:::;l. We use s as the index of the solution that is thought to be optimal based on the mean of the performance measure, which is f s = min i2f1;:::;lg f i . We also assume N 0 s N 0 i , for i= 1;2;:::;l;i6= s. According to [55], as T!¥, the problem (3.11) can be asymptotically optimized if: N 0 i N 0 j = S 2 i S 2 j f j f s f i f s 2 ;i; j2f1;2;:::;lg; and i6= j6= s; and N 0 s = S s v u u t l å i=1;i6=s N 02 i S 2 i (3.12) We propose a sequential algorithm for the post-processing procedure in the SAA approach with the asymptotic result for the computing budget allocation. In the interest of brevity, we ignore the superscript prime. The decision variables in iteration t are denoted N t 1 ;:::;N t l . In the algorithm, three parameters are predefined: computational budget for each iteration (D), the tolerance (d), and the minimum sample size (N min ). This sequential algorithm for the budget allocation of the solution evaluation can be summarized as follows: 1. t 0; Sample totalD scenarios for the objective function evaluation, with budget allocation N t 1 ;N t 2 ;;N t l following N i =N j = q j q i ˇ v j ˇ v s ˇ v i ˇ v s 2 ;i; j2f1;2;:::;lg; and i6= j6= s, and N s = s l å i=1;i6=s q i N 2 i = p q s ; Update the mean f i and estimate the variance S 2 i of the objective function where i= 1;:::;l. 2. If S s d and N s > N min , estimate probability in Eq. (3.11), and stop. 3. Increase the number of samples for evaluation byD and compute the new budget allocation, N t+1 1 ;N t+1 2 ,;N t+1 l following Eq. (3.12). 60 4. Calculate the additional max 0;N t+1 i N t i for solution ˇ x i ;i= 1;:::;l. Recalculate the mean and variance of the objective function value. t t+ 1. Go to Step 2. This algorithm chooses the optimal solution from M candidates with a limited computing bud- get and provides a lower bound of the probability that the selected solution is the optimal one. It stops when the variance of the estimated objective value at the selected solution is small enough. Theorem 3.5. Consider an SCO problem solved with M replications, where the number of samples in each SAA problem is N. Suppose in each replication, the optimal solution has a probability at least 1a in the e-optimality set X e . After optimizing M replications, the OCBA technique is applied to select the final solution. Suppose the distinct optimal solutions are ˇ x 1 ; ˇ x 2 ;:::; ˇ x l , the final selected solution is ˇ x s , and the optimal solution out of these l solution is ˇ x ? . Then, we have the following probability guarantee that the selected solution is in X e . Pf ˇ x s 2 X e g(1 l å i=1;i6=s P ˜ f s > ˜ f i )(1a M ) Proof. Based on [69], the probability that the selected solution equals the best solution is estimated by Pf ˇ x s = ˇ x ? g= Pf l \ i=1;i6=s ˜ f s ˜ f i g 1 l å i=1;i6=s P ˜ f s > ˜ f i : In addition, the probability that ˇ x ? is within optimality set X e (i.e., there exists at least one solution within X e ) can be estimated by a Bernoulli process. Thus, we can calculate the lower bound of the probability that the selected solution is within the bound, which is: Pf ˇ x s 2 X e g(1 l å i=1;i6=s P ˜ f s > ˜ f i )(1a M ) This proves our statement. With the calculation above, we obtain a probabilistic guarantee for the optimality of the selected solution, where the valuea can be estimated with Eq. (1.4). The value of the probability that the 61 selected solution is in the optimal set X e can be used as a stopping criterion for the algorithm. If the probability is larger than the threshold, then we may stop the algorithm. However, if the probability is lower than the threshold, we may need to increase the sample size N, or the replication number M, to find a solution with a better theoretical bound. 3.5 Very Large-Scale Applications: Stochastic Facility Location Among the first applications of the facility location problem (FLP) is a paper by [70]. The prob- lem is regarded as a two-stage Integer Programming (IP) problem: the facilities are first selected from the available positions, then the selected facilities are allocated to serve all customers. The objective is to minimize the cost of building the facilities and the cost to serve customers’ demands from the established facilities. The FLP model has several practical applications in various areas, from logistics, and warehouse location, to storage system layout and telecommunication network design. In FLP, the cost coefficients in the second stage, i.e., the service cost to satisfy the customer demand for each facility, are known in advance. However, in most real-world problems, such as the daily customer demand for package delivery and telecommunication requests, the cost coefficients are stochastic since the quantity of the demand and the location of the request are usually unknown beforehand. Moreover, the decision to select potential facilities needs to be completed before the allocations, and decisions on facility location are difficult to reverse, considering the fixed costs associated with opening a facility. Thus, the practical problem, also known as the Stochastic Fa- cility Location Problem (SFLP) [71, 72], is to establish the facilities such that the establishment costs and the expectation of the assignment cost are minimized. Nowadays, with the rapid expansion of the Internet of Things (IoT) and the decreasing price of data storage, tremendous amounts of data are available. While it is beneficial for decision-makers to exploit all available data, the notoriously high dimension of the problem presents an obstacle. For many real-world problems, such as the distribution center selection problem, the cost to deliver the packages needs to be considered, where the number of packages in one city can easily exceed 62 ten thousand per day. Since stochastic processes govern these inputs to the simulator, and there are Spatio-temporal variabilities associated with such procedures, the overall variability of these forecasts can be significant. Thus despite the data-rich environment, these simulators introduce substantial variability and call for new variance reduction tools for SAA-based methods for SFLP. 3.5.1 The Mathematical Structure The Stochastic Facility Location Problem (SFLP) can be formulated as a two-stage SCO problem. The facility locations are selected in the first stage before we know the realization of demands. The facilities are assigned to meet demands once the uncertainty, such as the quantities and locations of these demands, has been resolved. The decision variables in both stages are binary. The first stage decision variables represent whether the potential facilities are selected or not, while the second stage variables assign facilities according to customer demands. The objective is to minimize the cost of building facilities and the expected cost of serving all potential demands. We first define the following sets: P : Potential facilities O : Open facilities F = P [ O : Total facilities D : Demands 63 The “Potential facilities” set P contains all the predetermined locations which are available to establish new facilities. The set O contains all the sites for “Open facilities”. Then the decision variables and parameters can be defined as follows: x i ;8i2 P : binary variables, represents whether to open the facility at i-th position or not y i j ;8i2 F;8 j2 D : binary variables, represents whether to serve demand j with facility i or not c i : the cost to establish facility i k i : the capacity of facility i l;u : the minimum/maximum number of facilities to build a i j (w) : the cost to serve demand j with facility i in scenariow: d i j (w) : the distance to serve demand j with facility i in scenariow: q j (w) : the quantity of demand j in scenariow: The uncertainty of this problem may originate from several places, including the location, the quantity, and the cost of serving the demands from facilities. In our model, we assume that the cost to serve the demand from the facility equals the distance between them multiplied by the quantity of the demand, which is a i j (w)= d i j (w)q j (w). With these variables and parameters, the problem can be formulated as the following two-stage SCO: Min å i2P c i x i +E[h(x; ˜ w)] s.t. l å i2P x i u x i 2f0;1g; 8i2 P (3.13) 64 where h(x;w)= Min å j2D å i2F q j (w)d i j (w)y i j s.t. å i2F y i j = 1;8 j2 D å j2D y i j k i ; 8i2 F y i j x i ; 8i2 P;8 j2 D y i j 1; 8i2 O;8 j2 D y i j 2f0;1g; 8i2 F;8 j2 D (3.14) Equations (3.13) and (3.14) provide the formulation for the capacitated SFLP, where the number of destinations of each facility i2 F is less than or equal to its capacity k i , which is assumed to be an integer. We further assume that this problem is always feasible, which requires the sum of the capacity for all the facilities to be greater than the total amount of demand, which is å i2F k i jDj, wherejDj is the cardinality of set D. For the SFLP model, when given the value of the first stage decision variables, the second stage can be regarded as a network flow problem. In the second stage, the decision variables y i j are binary: y i j = 1 means that facility i is assigned to demand j, while 0 represents the opposite. Since the second stage problem is an assignment problem, the linear programming (LP) relaxation of this problem would have the same results as the original Integer Programming problem. We can drop the binary constraints for the second stage variables and solve the second stage subproblem as an LP problem (with bounds). As a result, we treat this problem with binary first stage variables and continuous second stage variables. The second stage variables are constrained by the fact that they are between 0 and 1, that is, 0 y i j 1;8i2 F;8 j2 D. The randomness of this problem originates from the assignment cost from facility i2 F to de- mand j2 D. In different scenarios, the cost of the assignment between facility i and demand j are different. In our model, this assignment cost is decided by the distance between facility i and demand j, as well as the weight associated with demand j. Since the locations of facilities are pre- determined, the distance between a facility and a demand is decided by the location of the customer 65 demand. Formulations for SFLP are often based on one of two alternative distance-metrics: a) Eu- clidean distances and b) Manhattan distances. The former is often used in telecommunications networks, whereas the latter is more common for logistic networks. 3.6 Computations In order to study the computational effectiveness of our methods, we focus on very large-scale stochastic facility location problems in which the first-stage problems have binary variables, whereas the second-stage problems are solved using network flow methods. More generally, the methods of this paper are applicable for structures where the second-stage problem may include binary variables. For such problems, an extended version of Benders decomposition [61] is applicable, and therefore, our methods are applicable as well. All algorithms were programmed in the C lan- guage, and all LPs and MIPs were solved using the CPLEX solver. All computational experiments are conducted on a MacBook Pro with the following specifications: the Intel Core i7 processor @2.7 GHz and 16 GB Memory @2133 MHz. Finally, all data and code are available on the cORe platform. 3.6.1 Problem Generation and Model Characteristics We first describe the datasets we use and how we build models for the SFLP. We use the dataset, which contains all the addresses in the city of Los Angeles. We also have the Brazilian E- Commerce Public Dataset by Olist, which has daily sales numbers for one year. In addition, we have the Zillow Home Value Index (ZHVI), similar to median sale prices, for different zip code areas in Los Angeles. In our SFLP model, the potential locations of facilities are predetermined, and the cost to build each is related to its location. The locations of facilities where we pick them at random from addresses in Los Angeles are assumed to be given. The cost to build each facility is based on the ZHVI in the zip code area. The scenario data are generated for three types (Type G, S, and 66 W) based on the specific manner in which the costs are generated. These types have the same mathematical formulation as in Eq. (3.13)-(3.14), but the cost coefficients in the second stage are different. They can be applied to different real world scenarios.These coefficients can be calculated using the demand quantity and the distance between demand nodes and facility nodes for each model. Further, since the facility nodes’ locations are predetermined, we only need to generate the data of location and quantity for all demand nodes in different scenarios. For the geometric centroid model, the locations of the demand nodes are fixed and can be cal- culated using the average latitude and longitude of the locations in one group. The quantity is assumed to follow the empirical distribution in the Olist dataset. The sampled centroid model’s de- mand nodes’ locations are decided by sampling a number of addresses in one group and calculating the sampled average longitude and latitude. The quantity of each demand node also follows the empirical distribution in the Olist dataset. For the weighted centroid model, we sample a number of addresses in one group, and each address is associated with a demand quantity. The quantity of the demand node is the total quantity of addresses in one group. Furthermore, its location is the weighted average longitude and latitude of the addresses in that group, where the weight is the de- mand quantity. These three models provide estimates for the actual assignment costs. They should be applied based on different conditions or as a combination to make decisions. The first model (geometric centroid) can be applied for point-to-point transshipment problems. For instance, in a hierarchical logistics, cargo is often transported from a distribution center (parent node) to a fulfillment center (child), where the locations of the child facilities remain fixed while the cargo quantities vary. The sampled centroid model can be used in cases where only regional sales data are available, or the products for each address are the same, for example, newspaper delivery or telecommunication. Finally, the weighted centroid model should be applied to cases where each destination may have different demand for package delivery, such as the deployment in military. In our computations, we formulate two separate classes of models: in one class of models, each council district has a demand node associated with it; and in the other, each ZIP code area is 67 Table 3.1: Characteristics of Test Problems with District demand nodes and ZIP demand nodes. Model Problem # of variables in 1st-/2nd-stage # of Random Elements District Demand P30 O10 30/600 600 P45 O15 45/900 900 P60 O20 60/1200 1200 ZIP Demand P30 O10 30/4720 4720 P45 O15 45/7080 7080 P60 O20 60/9440 9440 associated with a demand node. For the former, there are 15 different districts (15 demand nodes); and for the latter, there are 118 ZIP code areas (118 demand nodes). Each class of model has three types of data: Type G, S, and W. In each instance, we generate the data for 100 different scenarios. The realization of different demand nodes is assumed to be independent. Thus, if we consider the ZIP demand nodes, the total number of scenarios is 100 118 , which is impossible to enumerate and solve precisely. Furthermore, in a case where there are 80 facilities, including potential and built facilities, the total number of random variables is more significant than ten thousand. Therefore, it is a computational challenge for current methods to solve problems and with such a large number of random variables. Based on the computational results reported below, we show how our methods provide much greater support for such models than is available today. The characteristics of the test problems are summarized in Table 3.1. For example, the problem P30 O10 represents the example with 30 Potential and 10 Open facilities. We create demand nodes based on their district or ZIP code. For all the examples, the lower bound of the built facilities is 1, while the upper bound of the built facilities is twice the number of open facilities. For each instance, we run 30 replications and solve each till optimality can be statistically deduced for the sampled instance. 68 3.6.2 Instance Virtualization Fig. 3.1 1 illustrates the optimal solution of the P15 O5 problem in the weighted centroid model, where the red dots represent the selected facilities from the optimal solution, the green dots are those potential facilities that are not selected, and the orange dots are the locations for open facili- ties. The grey circles are the demand nodes, where their radii are proportional to the quantities of the demands—the locations and radii of the demand nodes change based on the data in different scenarios. Figure 3.1: (Snapshot of the animation on Youtube.) An illustration of the solution for the Weighted Centroid model with 15 potential facilities and five open facilities. The red dots rep- resent the selected facilities from the optimal solution, while the green dots are those potential facilities that are not selected. The orange dots are the locations for open facilities. The grey cir- cles are the demand node, where their radii are proportional to the quantities of the demands—the locations and radii of the demand nodes change based on the data in different scenarios. 1 To give the reader a more dynamic representation, an animation is provided on Youtube athttps://youtu.be/ QWnhWHklO6U 69 3.6.3 Computational Results with the Bagging and Compromise Decisions We conduct the computational experiment with variance reduction techniques by applying the stopping rule in Section 3.3.3, where the Benders decomposition algorithm is applied to solve each replication. The quantity h is set to 0.005, and g is the probability of early termination as defined in Eq. (3.9). We first consider the demands with respect to each council district, where we have 15 demand nodes in each scenario. We conduct experiments for all three types (i.e., G,S,W). Let I c be an index set, where I c =fmjx c = argmin x2X ˆ f N (x;L m N );m= 1;2;:::;Mg. Suppose its cardinality is k. We define ¯ v k := 1 k å m2I c ˆ f N (x c ;L m N ); (3.15) which is the average of the objective values from the replications where the final solution is the same as the compromise decision. Recall that, in many cutting plane based-algorithms, such as Benders decomposition and stochastic decomposition, when the algorithms stop, the function ap- proximation value at x c is evaluated with the sample setL m N . Thus, ¯ v k is an unbiased estimate for f(x c ). Moreover, since ¯ v k is based on the samples from k replications, its variance (inversely proportional to the sample size) should be less than the objective estimate from one replication. We calculate the standard deviation of ¯ v k , labeled ‘Std[ ¯ v k ]’. Suppose the sample sizes for all replica- tions are the same. The reduced standard deviation (‘Reduced Std’) can be estimated by 11= p k, compared with the objective value from a single replication. Note that the reduction in the stan- dard deviation is significant, ranging from approximately 75.75% to 81.43%. We also calculate the lower and upper bounds for the optimal objective value v . The lower bound value can be obtained by solving the compromise problem in Eq. (3.5). According to Proposition 3.1, the optimal value of the compromise problem, v c , is a valid lower bound. Its variance is reduced as shown in Eq. (3.6). We also calculate the confidence intervals for this lower bound, labeled as ‘Agg. LB 95% CI’. The upper bound 95% confidence intervals can be obtained by sampling further using the com- promise decision. The gap between the lower and upper bounds ensures closeness to optimality. 70 Meanwhile, note that g is close to zero for all problems with high probability, which guarantees the solution quality and stability of our algorithm. The detailed results can be found in Table 3.2. Table 3.2: Variance Reduction with the Bagging and Compromise Decisions: Computational re- sults with demand node for each District, corresponding to 15 demand nodes. Type Problem ¯ v k Std[ ¯ v k ] Reduced Std Agg. LB 95% CI 95% CI UB g Type G P30 O10 2612.904 8.398 76.43% [2603.614, 2627.132] [2603.682, 2615.012] 1.93e-11 P45 O15 2021.292 5.108 81.43% [2009.920, 2029.963] [2014.132, 2023.315] 6.79e-07 P60 O20 1743.945 3.312 77.64% [1740.343, 1751.727] [1737.885, 1745.177] 5.62e-13 Type S P30 O10 2612.459 6.082 75.75% [2592.896, 2611.421] [2593.834, 2604.780] 6.72e-09 P45 O15 2016.882 6.801 80.75% [2005.895, 2030.237] [2014.024, 2023.206] 2.32e-05 P60 O20 1735.755 3.615 81.10% [1727.712, 1741.472] [1731.186, 1738.444] 1.47e-06 Type W P30 O10 2595.960 1.300 80.39% [2593.268, 2597.990] [2595.163, 2598.017] 0 P45 O15 2023.033 1.022 81.10% [2020.939, 2024.719] [2020.934, 2023.490] 0 P60 O20 1949.132 1.043 80.00% [1947.494, 1951.288] [1948.642, 1951.386] 0 Table 3.3: Variance Reduction with Bagging and Compromise Decisions: Computational results with demand node for each ZIP code, corresponding to 118 demand nodes. Type Problem ¯ v k Std[ ¯ v k ] Reduced Std Agg. LB 95% CI 95% CI UB g Type G P30 O10 3044.264 2.810 80.75% [3039.026, 3049.590] [3040.174, 3049.752] 1.59e-14 P45 O15 2773.942 3.260 78.18% [2768.673, 2778.726] [2769.912, 2778.726] 2.03e-13 P60 O20 2617.940 2.479 76.43% [2614.523, 2621.620] [2616.929, 2624.861] 1.27e-12 Type S P30 O10 3053.927 3.429 80.75% [3046.108, 3059.514] [3046.062, 3056.142] 2.05e-11 P45 O15 2783.701 4.287 73.27% [2775.521, 2787.501] [2786.342, 2795.845] 3.42e-02 P60 O20 2624.868 2.539 74.18% [2624.726, 2631.597] [2622.932, 2631.031] 3.49e-03 Type W P30 O10 3219.671 0.540 76.43% [3218.843, 3220.621] [3218.583, 3219.841] 0 P45 O15 2779.293 0.465 77.64% [2778.341, 2779.972] [2778.210, 2779.428] 0 P60 O20 2627.271 0.558 75.75% [2627.459, 2629.552] [2627.230, 2628.365] 0 In order to show the scalability of our algorithmic process, we test larger problems with 118 demand nodes, called ZIP demand nodes. The detailed results are summarized in Table 3.3. The parameters for the algorithm are the same as the ones in Table 3.2. However, due to the large size of the demand node, the number of random elements and decision variables is enormous and impos- sible to solve exactly. As shown in the tables, the variance of estimated value at the compromise decision is significantly reduced in our method. This estimated objective values reported in these two tables are always very close to the lower and upper bounds. Similar to the results in the district problem, the values ofg are close to zero as well. Clearly, one can be very confident in the quality of these decisions. The reader may observe that since the LB and UB estimates in Tables 3.2 and 3.3 are the result of independent samples, there are some instances in which the lower bound of 71 Table 3.4: The computational time (secs.) for solving the examples with District and ZIP demand nodes, corresponding to the results in Table. 3.2 and Table. 3.3, respectively. Type Problem District Model ZIP Model Opt. Time (s) Agg. Time (s) Opt. Time (s) Agg. Time (s) Type G P30 O10 7.587 0.015 58.554 0.087 P45 O15 10.187 0.079 117.342 0.111 P60 O20 5.314 0.022 157.823 0.177 Type S P30 O10 9.724 0.018 51.067 0.075 P45 O15 8.527 0.016 165.478 0.305 P60 O20 4.574 0.022 365.753 0.356 Type W P30 O10 5.714 0.017 193.391 0.480 P45 O15 16.892 0.021 178.064 0.345 P60 O20 19.002 0.039 164.824 0.246 the LB confidence interval is higher than the lower bound of the UB confidence interval. Such negligible inexactness is to be expected in calculating sample-based bounds, especially in cases where a solution is very close to being optimal. The computational time for solving the examples with the District demand nodes and ZIP demand nodes corresponds to the results in Table 3.2 and Table 3.3, respectively, are summarized in Table 3.4. The ‘Opt. Time’ is the time to solve 30 replications sequentially, where each one is solved with the Benders decomposition algorithm. The ‘Agg. Time’ is the time for the aggregation calculation, which includes the time to find the bagging and compromise solutions and determine if they are equal. Based on these results, we conclude that the approaches presented in this paper provide reasonable solutions within reasonable computational times, even in instances that are too large for exact deterministic optimality. 3.6.4 Results with Optimal Computational Budget Allocation (OCBA) In this subsection we report computational results obtained by using Benders decomposition to- gether with the OCBA technique to solve the SFL problems. We use the same methodology for creating instances (using model-types G, S, and W). We focus on problems with ZIP demand, be- cause they contain more random variables and have higher variances, which are better suited for 72 the OCBA method. For the OCBA technique, we setD as 500. In our experiments, the best solu- tion will be evaluated with the error term less than 0.005. We choose multiple small sample sets to formulate SAA problems and obtain multiple candidate solutions. Then, we apply the OCBA method to determine the final decision. The sample size is set as N= 30 for each replication. Table 3.5: OCBA technique and ordinary SAA method: Computational results with ZIP demand, corresponding to 118 demand nodes. Type Problem OCBA samples Reduced Resource LB 95% CI UB 95% CI APCS Type G P30 O10 2154 46.15% [3038.789, 3049.396] [3040.202, 3053.036] 0.75 P45 O15 2749 54.18% [2768.321, 2778.512] [2769.139, 2780.784] 0.40 P60 O20 3147 77.52% [2611.596, 2622.895] [2607.864, 2618.649] 0.75 Type S P30 O10 2171 45.73% [3046.048, 3059.443] [3040.545, 3053.383] 0.99 P45 O15 3647 54.41% [2774.401, 2786.544] [2778.837, 2790.776] 0.35 P60 O20 2180 63.67% [2621.517, 2633.134] [2622.126, 2632.904] 0.80 Type W P30 O10 2108 64.87% [3218.684, 3220.464] [3218.203, 3219.953] 0.46 P45 O15 2143 64.28% [2778.260, 2779.878] [2777.118, 2778.811] 0.97 P60 O20 2183 72.71% [2627.229, 2629.195] [2626.912, 2628.444] 0.88 The OCBA technique allocates the samples based on Eq. (3.12), while the ordinary SAA method treats all distinct candidate solutions evenly and evaluate them with the error term less than the tolerance. We count the number of samples used in OCBA method for evaluation, named ‘OCBA samples’. The reduced computational resources (labeled as ‘Reduced Resource’) is the number of evaluation samples used in ordinary SAA minus those in OCBA and then divided by those in ordinary SAA. This value indicates the improvement when applying the OCBA technique instead of the ordinary SAA method. In our experiments, the reduced resource achieves up to 77.52%. The lower bound confidence interval is calculated based on the lower bound estimate from each replication, and the upper bound confidence interval is based on the evaluation from the OCBA algorithm. The APCS, which is the lower bound of the probability for correct selection, is also calculated for each example. These results are summarized in Table 3.5. Since each candidate solution is evaluated with independent Monte Carlo sampling, their evaluations are approximated with normal distribution. Thus, when their means are significantly different and the variances are low, the APCS is close to 1. When the APCS is low, we can either allocate more samples to evaluate the solutions or optimize the SAA problems with larger sample sets, which provides a 73 higher APCS value. Since APCS is a lower bound of the probability of correct selection, it can be conservative. In computations of Table 3.5, the closeness of the UB and LB CIs ensure the optimality of our final results. By checking the optimality gap and APCS jointly, we recognize that the low values of APCS appear to be too conservative, and there is nothing to gain by running the OCBA algorithm further. 3.7 Conclusions This paper is devoted to very large-scale SCO applications, leading to models with astronomically a large number of scenarios. In this paper, we proposed several techniques to study variance re- duction and computational budget allocation for SCO problems. We generalized the concept of compromise decisions from SLP to SCO and derive a stopping rule based on the optimality gap of combinatorial optimization. In addition, an upper bound of the probability of early termination is calculated to provide an estimate reliability of the solution. Second, we apply the OCBA technique to find the final decision in cases with a given computing budget for evaluation, where we save a significant amount of computing time evaluating the solutions and acquire a lower variance esti- mate for the optimal values. Detailed experiments were conducted on several large SFLP instances that arise in modern logistics applications where demands over networks can exhibit significant dimensionality and high variability. Nevertheless, we are able to provide decision with significant reliability for very large scale SFLP. Detailed experiments were conducted on several large SFLP instances that arise in modern logistics applications where demands over networks can exhibit significant dimensionality and high variability. Our computational results demonstrate the advantages of variance reduction tools with a focus on the convex problem. The methods discussed in this paper can be extended to the non-convex and discontinuous case as well, where the further computations will be included in future works. In summary, this paper provides a bridge between stochastic discrete/network flow 74 optimization and other areas such as machine learning, continuous stochastic programming, and simulation optimization. 75 Chapter 4 Multi-stage Stochastic Linear Programming: Compromise Policy for Bias and Variance Reductions 4.1 Introduction Multistage stochastic programming (MSP) is a natural model to find sequences of optimal deci- sions under evolving and uncertain conditions. The decisions are optimized in discrete time, where the decision made in one period should consider uncertainty regarding the future and how current decisions affect the future. This model has been successfully applied in a wide variety of real- world applications, ranging from financial planning [15] to production-inventory planning [16] to electric power system [17, 18] and others. In addition, the sequential decision-making framework (under uncertainty) of the MSP is closely related to dynamic programming, stochastic optimal control [19], and approximate dynamic programming (ADP) [20]. As a result, our contributions in MSP can also be extended to other related areas. One particular class of MSP problems is the multistage stochastic linear programming (MSLP) problem, where the objective function, constraints, and system dynamics are linear. MSLP prob- lems are notoriously difficult to solve computationally since they suffer from the well-known “curse of dimensionality”. Some such difficulties result from the fact that state and decision spaces are continuous and, typically, involve multi-dimensional uncertainty (i.e., vector-valued stochastic processes). Because of multi-dimensionality, the space over which the stochastic process evolves 76 grows exponentially in the number of stages (time horizon). In general, the MSLP problem is known to be PSPACE-hard [73, 74]. It requires exponential computational effort in the number of stages to obtain accurate approximations with high probability [75]. These characteristics make these problems challenging to solve computationally. This chapter proposes a meta-algorithm to solve multistage stochastic linear programming problems. We build an ensemble model that can accommodate the most widely-used value function- based algorithms, such as stochastic dual dynamic programming (SDDP) [76] and stochastic dy- namic linear programming (SDLP) [77]. Our meta-algorithm provides a unified framework to solve the MSLP problems, which provides reliable estimates of optimality through in-sample and out-of-sample stopping rules. In the process of doing so, we construct a new policy with reduced variance and bias, referred to as the “compromise policy”. The additional computational effort required to obtain a compromise policy is relatively minimal due to the method’s ability to exploit parallel computing. Using a standard hydro-thermal scheduling problem from the literature and a well-tested open-source implementation of SDDP [78], we illustrate that in comparison with the approach of this chapter, the standard (single replication) version of SDDP produces relatively unreliable computational results because of it lacks the variance reduction features we introduce. Furthermore, we propose an online dual dynamic programming (ODDP) sequential sampling al- gorithm that only requires an oracle to generate the random samples instead of the accurate distri- bution of the random variables. We integrate the meta-algorithm with ODDP, where we are able to pause the algorithm for optimality tests and resume the algorithm if necessary for further refine- ment of the policy. Finally, we compare the performance of the one replication SDDP algorithm, our meta-algorithm with SDDP, and our meta-algorithm with ODDP. Through extensive compu- tations, using data with relatively long planning horizons, we demonstrate the effectiveness of our approach. This chapter is structured as follows. The second section provides the formulation of mul- tistage stochastic linear programming (MSLP), which includes the mathematical setting and the model formulation. Section 3 contains the background, which includes the SDDP algorithm and 77 the hydro-thermal scheduling problem. Section 4 formulates the “ensemble model” of MSLP, which produces a policy that we refer to as the “Compromise Policy”. Section 5 presents the bias and variance reduction properties of the Compromise Policy. Following this section, we present the details of our meta-algorithm, which is composed of distributed optimization together with aggregated validation steps for bias and variance reduction. We propose the integration of our meta-algorithm with SDDP and ODDP and the parallel scheme. Section 7 presents further compu- tational results of our meta-algorithm. Finally, we present the conclusions of our study in the last section. 4.2 Formulation of MSLP This chapter considers a risk-neutral multistage stochastic linear programming (MSLP) prob- lem with continuous decision variables. This problem can be regarded as a multistage dynamic process under uncertainty, where sequential decisions are made at discrete times. We denote T =f0;:::;Tg as the set containing all decision-making times, where T is finite and is a once- for-ever fixed entity. Therefore, our focus is a finite horizon decision model with T+ 1 stages. The uncertainty in the system is characterized as an exogenous processf ˜ w t g T t=0 , which has the filtered probability space (W;F;P). w t denotes one observation of the random variable ˜ w t and the random space has the relationshipW=W 0 W 1 W T , where w t 2W t . The s-algebras F t F include the scenario data until time t. Thus, we haveF 0 F 1 F T . For any time stage t, we denote t+ and t as the succeeding and preceding time periods, respectively. Furthermore, we use the subscript [t] to represent the history of the stochastic processfw t g T t=0 until the current stage (included), i.e., w [t] = w 0 ;:::;w t , and use the subscript (t) to denote the process starting from t to the end of the horizon, i.e., w (t) =w t ;:::;w T . Notice that at any time t, we can only observe the exogenous state before t, i.e., w [t1] . We useh;i to represent the inner product of vectors, i.e.,hu;vi= u > v. If the former variable is a matrix, then it is the product of a 78 transposed matrix and a vector, i.e.,hM;vi= M > v. Ji; jK denotes the set of integers between i and j, both are included. In many MSLP formulations [76, 79, 80], there are only control variables, which focus on finding optimal decisions. In addition to control variables u t , we also introduce state variables x t , which contains historical information. This formulation is more general and applicable to a larger variety of situations, especially those involving dynamic systems. It also build the bridge between MSLP and reinforcement learning (RL)/stochastic optimal control. As for the modeling choice in MSLP, there are two types of information structure in MSLP: decision-hazard and hazard-decision. In the decision-hazard information structure, the agent chooses a control u t before observing a realization of the random data w t 2W t according to a decision-rule m t (x t ). In contrast, under a hazard-decision scheme, the control u t is selected after observing w t according to a decision-rule m t (x t ;w t ). In fact, these two types of modeling choices are mutable [81]. For example, a one-stage decision-hazard problem can be decomposed into a deterministic problem followed by a hazard-decision problem. Our method can be applied to both information structures. In the interest of brevity, we present the problem with the decision-hazard scheme in the remainder of this chapter. The state variables x t are defined for any t2T , where the initial state x 0 is given. With a given state x t , a control variable u t , and a exogenous information w t , we obtain the next state according to the system dynamics as follows: x t+ =D t (x t ;u t ;w t )= a t + A t x t + B t u t (4.1) whereD t is the transition function, and (a t ;A t ;B t ) depend on the exogenous state w t . As the randomness dataw (0) been observed gradually over time, the states are available and our decisions should be adapted to this process. The whole process has the form: A given state (x 0 ) decision (u 0 ) observation (w 0 ) state (x 1 ) state (x T ) decision (u T ) observation (w T ) state (x T+1 ) (4.2) 79 The state x t depends on the random information up to time t1, but not on the future observations. This is the requirement of the nonanticipativity property. Given the past observations and decisions, we can calculate the state value x t explicitly by iteratively applying the system dynamics. Since the states sequencefx t g T t=0 and decisions sequencefu t g T t=0 depend onw [t1] , these two sequences can be viewed as stochastic processes as well. Now consider the following problem: hc 0 ;x 0 i+ minhd 0 ;u 0 i+E ˜ w (0) h hc 1 ;x 1 i+hd 1 ;u 1 i+E ˜ w (1) [:::+E ˜ w T1 [hc T ;x T i+hd T ;u T i] s.t. u t 2U t (x t ) :=fu t j D t u t b t C t x t ;u t 0g 8t2T x t+ =D t (x t ;u t ;w t )= a t + A t x t + B t u t 8t2T (4.3) where we have applied the assumption that the expected terminal cost for T+ 1 stage is 0. The expectation is taken in a nested form for the stochastic process ˜ w (t) from t stage to the end of the horizon. The feasible set of the decision variablesU t (x t ) is a closed convex set which depends on the state x t , where (b t ;C t ) and the recourse matrix D t are fixed. Notice that if (b t ;C t ) contain randomness, we can incorporate the random parts into the system dynamics in Eq. (4.1) and define an intermediate state, which contains all the random information. Based on the system dynamics, the stage variable x t can be explicitly expressed with the initial state x 0 , all past decisions u [t1] , and exogenous informationw [t1] . Our problem can also be defined recursively. For any t2T , we define h t (x t )=hc t ;x t i+ minhd t ;u t i+E[h t+ ( ˜ x t+ )] s.t. u t 2U t (x t ) :=fu t j D t u t b t C t x t ;u t 0g (4.4) where ˜ x t+ =D t (x t ;u t ; ˜ w t ). Function h t () is called the post-decision value function or just value function. The expectation is taken with respect to the random variable ˜ w t conditional on the given x t . Since we assume the stagewise independence, this conditional expectation is equivalent to the expectation with respect to ˜ w t . Note that, if we take the expectation with the successive stage 80 random variable, the functionE[h t+ ( ˜ x t+ )] is a function of x t and u t . The objective function in the optimization problem is referred to as the pre-decision value function: f t (x t ;u t ) :=hc t ;x t i+hd t ;u t i+E[h t+ ( ˜ x t+ )] (4.5) We have a relationship in which the post-decision value function equals the minimum of the pre- decision value function: h t (x t )= min u t 2U t (x t ) f t (x t ;u t ) (4.6) In most of the MSLP problems, the size of the random space can be enormous or infinite, in which case it is computationally intractable to enumerate all the samples. One can choose a small number of the samples for each w t , and create a so-called sample average approximation (SAA) problem [75] by replacing the expectation with the average over the selected samples. Thus, the corresponding sampled value function (or SAA function) is expressed as: ˇ f t (x t ;u t )=hc t ;x t i+hd t ;u t i+ å w t 2 ˆ W t p(w t )h t+ (x t+ ) (4.7) where the summation is over the sampled scenarios at t stage, denoted as ˆ W t . p(w t ) is the empirical probability that scenariow t happened. In an SAA problem with N t Monte Carlo samples, the true random space is replaced with a simulated sample setfw 1 t ;w 2 t ;:::;w N t t g, where each sample vector w n t has the empirical probability p(w n t )= 1=N t ;8n2 J1;N t K. We refer to one SAA problem as one replication. For consistency, we first summarize the assumptions that will be used in this chapter: • (A1) The feasible set of root-stage decisionU 0 is compact. • (A2) The recourse is complete at all non-root stages, that is, the feasible setU t (x t ) is non- empty for all state x t for all t2Tnf0g. 81 • (A3) The recourse is fixed, i.e., the matrices D t are deterministic. Matrices D t have full row rank. • (A4) Zero is a valid lower bound for all post-decision value function. • (A5) The exogenous information is stagewise independence and its support is finite. • (A6) The expected stagewise cost for T+ 1 stage is 0 These assumptions are reasonably general in SP problems [1, 77, 80]. The first two assumptions (A1-A2) imply that the objective value is less than+¥. (A2) ensures that the feasible set is always non-empty, and thus, no feasibility cut will be added to the problem. (A3) implies that the recourse matrix D t is fixed. The assumption (A4) is valid in many engineering problems, where the objective represents cost, which is generally non-negative. Notice if, for some situations, the lower bound of the value function is not 0, we can lift the objective function by adding a constant. This will not change the optimal decision, and we can get the true optimal objective value by removing the constant after the optimization. With (A4), the optimal value is not negatively infinite, and thus the problem is dual feasible. The finite support assumption (A5) ensures that the setW t is finite, and the stagewise independence assumption enables cut sharing in the algorithm. Finally, since we focus on the limited horizon problems, (A6) is required. 4.3 Background 4.3.1 SDDP Algorithm In order to highlight our contributions, it is best to summarize the nature of the SDDP algorithm [76]. One iteration of the SDDP algorithm consists of a forward pass and a backward pass, sum- marized in Algorithm 3. The forward pass simulates the decision path along a selected sample path, while the backward pass updates the cost function approximation. The approximation ap- proaches the SAA function as the number of iterations increases. This kind of forward-backward 82 pass algorithm [77] is common to tackle the MSLP problems. This cutting planes method is able to accommodate the non-smoothness and stochasticity of the models. Algorithm 3 Stochastic Dual Dynamic Programming Algorithm (One iteration) Initialization: Iteration number k. Fixed sample set ˆ W. Initial state x 0 . Set of cutsJ t . Forward pass: Simulate the the state/decision path along a sample path: 1: Set x k 0 = x 0 . 2: Generate a sample pathw k [T] from ˆ W. 3: for t= 0;:::;T do 4: Solve the optimization problem u k t = argmin u t ˆ f k1 t (x k t ;u t ) to obtain the decision. 5: Find the next state based on system dynamic x k t+ =D t x k t ;u k t ;w k t . 6: end for Backward pass: Update pre-decision value function approximation: 7: for t= T 1;:::;0 do 8: forw t 2 ˆ W t do 9: Find the state x t+ =D t x k t ;u k t ;w t 10: Solve the subproblem with x t+ as input: min u t+ ˆ f k t+ (x t+ ;u t+ ); and obtain optimal dual solutionp t+ . 11: Compute lower bounding affine function l t+ (x t+ ) := a t+ +hb t+ ;x t+ i, where a t+ = hb t+ ;p t+ i;b t+ = c t+ hC t+ ;p t+ i. 12: Update the set of cuts as:J t+ J t+ S f(a t+ ;b t+ )g 13: end for 14: Obtain the updated stage cost-to-go value function approximation using ˆ h k t+ (x t+ )= max (a t+ ;b t+ )2J t+ fa t+ +hb t+ ;x t+ ig to obtain ˆ f k t (x t ;u t )=hc t ;x t i+hd t ;u t i+ å w t 2 ˆ W t p(w t ) ˆ h k t+ (x t+ ) (4.8) 15: end for We describe one iteration of the optimization step in detail as follows. The forward pass starts from the root node to the end of the horizon. The initial state is given at the root node, which is x 0 . Then, we iterate over all the time stage to identify the state and decision paths:fx t g T t=0 andfu t g T t=0 , for the given sample-pathw [T] . At each stage, we iteratively optimize the function approximation to obtain the decision (Line 4 in Alg. 3) and apply the system dynamics to obtain the state (Line 83 5). The backward pass starts from the last stage to the first, where we iteratively update the pre- decision value function approximation. At each stage, we loop through all realizations of the random variables to generate new cuts (Line 8-13 in Alg. 3). We collect all these new cuts and construct a better piecewise affine function for the pre-decision value function (Line 14 in Alg. 3). 4.3.1.1 Calculation of Bounds in SDDP As mentioned earlier, the MSLP problems are usually complicated and challenging. A standard procedure to solve an MSLP problem is first to choose a sample set ˆ W, and construct an SAA problem. Next, we apply an algorithm (e.g., SDDP) to solve the SAA problem. We keep solving the problem until a stopping rule is met. This procedure is also referred to as the SAA method (or single replication SAA method) for MSLP. The stopping rule often involves the calculation of the upper bound (UB) and lower bound (LB) estimates of the optimal objective value. Since the SDDP algorithm constructs an outer approximation of the SAA function, we can obtain a lower bound estimate with: ˆ v 0 = min u 0 2U 0 ˆ f 0 (x 0 ;u 0 ): This value is a valid LB for the SAA problem, which is one point estimate of the LB for the ‘true’ problem. In [82], the authors provide an approach with theoretical justification to obtain an LB with a confidence interval. They choose the minimum of the in-sample LB and in-sample UB as their LB, whose uncertainty is estimated by the uncertainty level of the in-sample UB. The SDDP algorithm does not provide an upper bound directly. However, the SDDP algorithm constructs an admissible policy by optimizing the algorithmic approximation function. We can obtain an unbiased estimate for the upper bound value by simulating it with Monte Carlo sampling with this admissible policy. We apply this policy to a sampled scenario from the first stage to the end of the horizon, which will provide one estimate of the objective. This estimate is the upper 84 Algorithm 4 Upper Bound Calculation Initialization: Fixed sample set ˆ W, initial state x 0 , and sample size N 0 . Forward pass: Simulate the state/decision paths along N 0 sample paths. 1: for n= 1;:::;N 0 do 2: Set x n 0 = x 0 . 3: Generate a sample pathw n [T] from ˆ W. 4: for t= 0;:::;T do 5: Solve the optimization problem u n t = argmin u t ˆ f t (x n t ;u t ) to obtain the decision. 6: Find the next state based on system dynamics x n t+ =D t (x n t ;u n t ;w n t ). 7: end for 8: Q n = T å t=0 [hc t ;x n t i+hd t ;u n t i] 9: end for Output: The upper bound sample mean: ¯ Q= 1 N 0 N 0 å n=1 Q n ; and the standard deviation of ¯ Q: 1 N 0 s N 0 å n=1 (Q n ¯ Q) 2 N 0 1 . bound of the optimal objective since it applies an implementable policy. Multiple scenarios are usually sampled to estimate the upper bound value accurately. The detailed calculation for the upper bound estimate is shown in Algorithm 4. Since the sample set is the same as the set to obtain the policy, this is also referred to as the in-sample upper bound. In Algorithm 4, if we replace the in-sample set ˆ W with the ‘true’ dataset, the output is the out-of-sample upper bound or an upper bound estimate for the ‘true’ problem. The commonly used stopping rule checks whether the lower and in-sample upper bounds are close enough. If their difference is less than the predefined tolerance, the algorithm terminates and outputs the policy. However, their difference only indicates the optimality of policy for the SAA problem, not the ‘true’ problem. In the following subsection, we will show that this in-sample stopping rule may not be reliable. 4.3.2 The Hydro-Thermal Scheduling Problem Formulation The hydro-thermal scheduling problem we consider is a variety of the one in the sddp.jl tutorial. The goal is to operate one thermal generator and N hydro-dams in a valley chain over t stages, 85 considering the rainfall uncertainty. The objective is to minimize the generation cost (including the cost of hydro and thermal generation) while the power demand is satisfied. The problem has a hazard-decision scheme, where, at each stage, the decision is made after observing a realization of the random data. At stage t= 1;:::;t, we have the N state variables: • r i t : represents the volume in reservoir (hydro generator) i= 1;:::;N where the initial states r i 1 ;8i= 1;:::;N are given. The decision variables are • g t : the power generated by the thermal generator; • o i t : the water from reservoir i= 1;:::;N used for power generation; • s i t : the water spilling out of reservoir i= 1;:::;N; • l i t : the water flowing in reservoir i= 1;:::;N; All the state/decision variables have the unit MWh. In general, water is measured in m 3 . We measure it in the energy-equivalent MWh unit for simplicity. In practice, we can build a function to convert water in m 3 to the value in MWh. The system dynamics of our problem is as follows: r 1 t+ = r 1 t + l 1 t o 1 t s 1 t r i t+ = r i t + l i t o i t s i t + o i1 t + s i1 t ;8i= 2;:::;N (4.9) And the extra constraints are • w t : rainfall, the stagewise independent random data; • c 0 t : cost per unit power generated by the thermal generator; • c i t : cost per unit power generated by the reservoir i= 1;:::;N; 86 • a i : the coefficient for reservoir i= 1;:::;N that transform the rainfallw t to the inflow l i t . • d t : the amount of demand. • b i : the capacity for reservoir i= 1;:::;N. We have the following constraints, N å i=1 o i t + g t = d t l i t = a i w t ;8i= 1;:::;N (4.10) where the first one means that electricity demand is met, and the second is the relation between the rainfall and inflow for all the hydro generators. In addition, all the state variables and decision variables are greater than 0. The volume in each reservoir is less than its capacity. The stagewise cost is the cost for power generation which is the sum for thermal generator cost and hydro generators cost: c 0 t g t + N å i=1 c i t o i t (4.11) To summarize, the value function at stage t= 1;:::;t, with state r t and the random dataw t can be obtained solving the following problem h t (r t ;w t )= min g t ;o t ;s t ;l t ;r t+ c 0 t g t + N å i=1 c i t o i t +E[h t+ (r t+ ;w t+ )] r 1 t+ = r 1 t + l 1 t o 1 t s 1 t r i t+ = r i t + l i t o i t s i t + o i1 t + s i1 t ;8i= 2;:::;N N å i=1 o i t + g t = d t l i t = a i w t ;8i= 1;:::;N g t ;s i t ;o i t ;l i t 0;8i= 1;:::;N 0 r i t+ b i ;8i= 1;:::;N (4.12) 87 In Section 4.4.2, the example considers at= 4 stage problem, where we have only N= 1 hydro generator. In Section 4.7, we have N= 4 hydro generators. In Section 4.7.1, we consider at = 10 stage problem, while in Section 4.7.2, we consider the problems wheret = 24=48=72=96=120. 4.4 Compromise Policy: An Ensemble Method In most MSLP problems, the number of random variables is enormous. The size of the random space increases exponentially as the number of stages increases. As a result, it is intractable to optimize the ‘true’ value function and find the optimal policy. Moreover, in many real-world problems, we usually do not know the true distribution of all the random variables. Thus, the SAA method is used to approximate the expectation with an empirical estimation (the SAA function) which converges uniformly to the ‘true’ function [1]. Note that the computational demands with increasing sample sizes also increase significantly [75]. We will use distributed computation to overcome the increases in computational complexity, and this strategy will result in creating a multi-stage compromise policy. We first set up the concepts for solving MSLP problems, then construct the compromise policy and illustrate its effectiveness with one example. For this example, we solve the problem with multiple replications, where each replication is solved by the SDDP algorithm. The details of the SDDP algorithm and the calculation of bounds are summarized in Section 4.3.1. 4.4.1 Multi-replication Approach for Compromise Function and Compromise Policy Suppose we solve an MSLP problem with M replications in parallel, where, in each replication, we generate a sample set independently and formulate the corresponding SAA problem. We apply an SDDP-type algorithm for each replication, which generates an approximation for the SAA 88 function. The algorithm iteratively improves the approximation, and terminates according to a pre- defined stopping criterion. We represent the final approximation function in the m-th replication as ˆ f m t (x t ;u t ). We define the average pre-decision value function across the replications as ˆ F t (x t ;u t ) := 1 M M å m=1 ˆ f m t (x t ;u t ); t2TnfTg (4.13) which is referred to as the compromise function. The compromise policy is thus defined as ˆ m c t (x t ) := argmin u t 2U t (x t ) ˆ F t (x t ;u t ); t2TnfTg: (4.14) This policy is defined for any non-terminal stage. At the terminal stage T , it is a linear program- ming (LP) problem, and we can apply an LP solver directly to find the decision u T for a given state x T . For the purpose of analysis, we differentiate among three different types of pre-decision value functions at each stage of an MSLP problem. The first is the ‘true’ value function, denoted as f t (x t ;u t ), which embodies the true expectation of the random variables (see, e.g., Eq.(4.5)). The second is the SAA function, denoted as ˜ f t (x t ;u t ). It is a random function and depends on the random sample set of the SAA problem ˜ W. The SAA function is formulated by replacing the ‘true’ expectation with sampled scenarios (i.e., empirical distribution). The function in Eq.(4.7) is one realization of the SAA function with N t samples. Suppose we fix the sample size and solve M replications in the experiment. The data in these M replications are sampled independently, which provides M realizations of the SAA function. We represent these M functions byf ˇ f m t (x t ;u t )g M m=1 . The last type is the algorithmic approximation function, which is generated by a specific algorithm. Each algorithmic approximation function is an approximation of the corresponding SAA function. We represent the M algorithmic approximation functions asf ˆ f m t (x t ;u t )g M m=1 . Take SDDP as an example. It applies duality in linear programming to build an outer approximation of the SAA function. This outer approximation, which is piecewise linear, is the algorithmic approximation function. 89 Clearly, the compromise function ˆ F t (x t ;u t ) is an outer approximation for the average SAA function: ˇ F t (x t ;u t ) := 1 M M å m=1 ˇ f m t (x t ;u t ); t2TnfTg: (4.15) [80] has shown that, under certain assumptions, the algorithmic approximation functionf ˆ f m t (x t ;u t )g M m=1 , converges uniformly to their corresponding SAA functionf ˇ f m t (x t ;u t )g M m=1 . Thus, the compromise function ˆ F t (x t ;u t ) converges uniformly to ˇ F t (x t ;u t ). Suppose the average SAA function ˇ F t (x t ;u t ) consider M replications, where each replication considers N samples. Then, ˇ F t (x t ;u t ) is equivalent to an SAA function with M N samples. As shown in [1], under some regularity conditions, we have that ˇ F t (x t ;u t ) converges pointwise to the ‘true’ function f t (x t ;u t ) w.p.1 as M!¥ or N!¥. Under the assumptions (A1)-(A6), the ‘true’ function is finite valued and continuous. Together with the uniform convergence of the compromise function [80], we have that the compromise function ˆ F t (x t ;u t ) also converges uniformly to the ‘true’ function. 4.4.2 An Example: Hydro-Thermal Scheduling Let us now consider a four-stage hydro-thermal scheduling problem, which is an extension of the hydro-thermal problem in the sddp.jl tutorial [78]. Our objective is to operate one hydro generator and one thermal generator in order to meet the demand in the face of inflow uncertainty while the cost is minimized. The detailed formulation is provided in Section 4.3.2. This is a relatively simple MSLP problem, which has one state variable (volume of the hydro generator), three control vari- ables (thermal generation, hydro generation, and hydro spill), and one random variable (inflow) at each stage. We consider the case that the random variable has low variance. We choose 30 dif- ferent sample sets for the random variables independently and construct 30 SAA problems, where each one considers 10 8 possible scenario paths. We solve the problem with the SDDP algorithm for each replication, which terminates when the lower bound estimate is stable. Specifically, the 90 algorithm stops when the change of the lower bound estimate is less than 10 4 for 20 consecutive iterations. Fig. 4.1 shows the lower bound estimate and the 95% percent confidence interval of the in- sample upper bound. The upper bound is based on 2,000 in-sample scenario paths (N 0 = 2000), and calculated with Alg. 4 in Section 4.3.1. The first 30 policies are the final policies from the 30 replications, called individual policies. As we can see, the lower bound and the upper bound estimates in all replications are very close. However, the estimated optimal objectives in replications are very different from each other. In other words, individual policies’ estimated objectives have significant variance. If we only solve one replication, the estimated objective can be any of them. Thus, this estimate is unreliable, and the output policy might not work well for the ‘true’ problem. The last policy (i.e., the 31-st policy) is the compromise policy constructed based on Eq. (4.14), where the compromise function is an average of the algorithmic approximation function in the 30 replications. We calculate the upper bound and the compromise lower bound with variance (details will be discussed in the next section). As we can see in Fig. 4.1, its lower and upper bounds are very close. Figure 4.1: In-Sample Upper Bound CIs and Lower Bounds 1 6 11 16 21 26 Compromise P olicy P olicies 8 10 12 14 16 Estimated Objective LB In-Sample UB CI In-Sample Compromise UB CI Compromise LB CI Figure 4.2: Out-of-Sample Gap v.s. In-Sample Gap 0 5 10 15 20 25 Out-of -Sample Gap (%) 0 1 2 3 4 5 6 7 8 In-Sample Gap (%) Individual P olicies Compromise P olicy To evaluate the performance of these policies in the ‘true’ problem, we need to choose out-of- sample scenarios paths and calculate the confidence interval of the out-of-sample upper bounds. We calculate the out-of-sample upper bounds with 20,000 scenario paths based on Alg. 4. Fig. 4.2 91 compares the out-of-sample optimality gap and in-sample optimality gap of all the policies, where the optimality gap is calculated by optimality gap= jupper bound estimate - lower bound estimatej upper bound estimate : (4.16) As shown, although the in-sample optimality gaps in all replications are low, in some replica- tions, the out-of-sample gaps can be greater than 25%! The performances of the individual policies on the ‘true’ problem vary a lot. For one individual policy, the in-sample gap is less than 4%. How- ever, the out-of-sample gap is around 27%. Thus, the in-sample estimated objective is unreliable, and the output policy might not be optimal. Note that in this example, we already consider 10 8 possible scenarios, and the variance of the random variable is low. In real-world problems, the performance of one replication SDDP algorithm is even worse. We construct the compromise policy to reduce the variance and find a stable policy. In Fig. 4.2, we can see that the performance of the compromise policy is much more stable than the individual policies. The compromise policy has a lower variance estimate. Based on the compromise policy, we can also calculate a confidence interval of the lower bound estimate. We will show that this compromise lower bound is better than the mean of the lower bounds from replications. Combining the upper and lower bound CI, we can estimate the optimality gap of the compromise policy. 4.5 Variance and Bias Reduction In this section, we show that the compromise policy not only reduces the variance of the function approximation but also reduces the bias, which provides a tight lower bound for the optimal value. In general, the performance of the compromise policy is better than individual policies. To con- struct the compromise policy, we run each replication in parallel and entirely independently, and thus, the extra computation time is negligible. As a result, the compromise policy is a free lunch to identify a better decision rule. 92 4.5.1 Variance Reduction Suppose we solve an MSLP problem with M replications, where each replication has N samples in each stage. We first analyze the following case: in each replication, the algorithm (e.g., SDDP) solves the problem till optimal, where the algorithmic approximation function equals the SAA function, that is ˆ f t (x t ;u t ) ˇ f t (x t ;u t ) for all(x t ;u t ). We further define v 0 := min u 0 2U 0 f(x 0 ;u 0 ) as the optimal value of the true problem; ˆ v c 0 as the compromise lower bound, ˆ v c 0 := min u 0 2U 0 ˆ F 0 (x 0 ;u 0 ); (4.17) and ˆ v m 0 as the optimal value of the m-th replication, ˆ v m 0 := min u 0 2U 0 ˆ f m 0 (x 0 ;u 0 ); 8m2 J1;MK: (4.18) Theorem 4.1. Suppose Assumption (A1-A6) hold, and we solve an MSLP problem with M replica- tions, where each replication has N samples and is solved with the SDDP algorithm. Suppose the M individual policies from SDDP are the optimal policy for their corresponding SAA problems. i) At a given point (x t ;u t ), the variance of ˜ f t (x t ;u t ) is denoted by s 2 x t ;u t . Then, we have j ˆ F t (x t ;u t ) f t (x t ;u t )j is bounded by O p (s x t ;u t = p M) ii) For any stage t with state x t , we denote ˆ v m t ; ˆ v c t ;v t as the objective value from the individual policy, compromise policy and ‘true’ optimal policy, respectively. We denote ˆ s 2 t as the variance ofj ˆ v m t v t j. Then, we have that the compromise policy has reduced variance compared with the individual policies. Specifically,j ˆ v c t v t j is bounded by O p ( ˆ s t = p M). Proof. i) First note that due to the finite support assumption (A5), the dual vectors of the last stage form a finite set. Hence, the “true” value function for replication m (denoted f m T (x T ;u T )) is a piecewise linear convex function with finitely many pieces. Since the probability distribution in any replication is fixed, the approximation of the SAA functionf ˇ f m T (x T ;u T )g M m=1 must have finitely many pieces, and is therefore, piecewise affine. As a result, the average of finitely many 93 value functions, i.e., the average SAA function ˇ F T (x T ;u T ), is also a function with finitely many pieces. For any fixed (arbitrary) point(x T ;u T ), we have that ˇ F T (x T ;u T )= 1 M M å m=1 ˇ f m T (x T ;u T ): Because ˇ f m T (x T ;u T );8m2J1;MK are different realizations of ˜ f T (x T ;u T ), the variance of ˇ f m T (x T ;u T ) is approximately s 2 x T ;u T , where s 2 x T ;u T is the variance of ˜ f T (x T ;u T ). By applying the central limit theorem (CLT), we have, M 1=2 ( ˇ F T (x T ;u T ) f T (x T ;u T )) d !N (0;s 2 x T ;u T ); where the notation d ! means convergence in distribution. Since for m= 1;:::;M, the expecta- tion of ˇ f m T (x T ;u T ) equals f T (x T ;u T ), the expectation of ˇ F T (x T ;u T ) is also f T (x T ;u T ). We have thatj ˇ F T (x T ;u T ) f T (x T ;u T )j is bounded by O p (s x T ;u T = p M). Since ˆ F T (x T ;u T ) = ˇ F T (x T ;u T ), j ˆ F T (x T ;u T ) f T (x T ;u T )j is bounded by O p (s x T ;u T = p M). The above is the ground step of the backward induction, which uses only the finite dimensional CLT. Having established the property for the last stage (stage T ), we now move to stage T 1 which uses a value function with finitely many pieces. At stage T 1, because of the finite support assumption (A5) and the known probability distributions, the ‘true’ function and the SAA functionsf ˇ f m T1 (x T1 ;u T1 )g M m=1 have finitely many pieces. Hence, the LP approximation of the value functions used in stage T 1 results in a finite dimensional LP, implying that there are only finitely many dual extreme points in stage T 1. By applying the CLT, we have that, M 1=2 ( ˇ F T1 (x T1 ;u T1 ) f T1 (x T1 ;u T1 )) d !N (0;s 2 x T1 ;u T1 ): Because of the equality of ˇ F T1 (x T1 ;u T1 ) and ˆ F T1 (x T1 ;u T1 ), we have thatj ˆ F T1 (x T1 ;u T1 ) f T1 (x T1 ;u T1 )j is bounded by O p (s x T1 ;u T1 = p M). Proceeding in this manner, we conclude 94 that the value functions in all stages are piecewise linear functions. Moreover, in all stages, we apply the CLT to assert that,j ˆ F t (x t ;u t ) f T (x t ;u t )j is bounded by O p (s x t ;u t = p M). ii) Next, we show that the compromise policy has reduced variance. Recall that we formulate the compromise policy at the end of running SDDP. We have already shown that the compro- mise function at any particular point has reduced variance compared with the variance associated with individual value functions of any replication. That is,j ˆ F t (x t ;u t ) f T (x t ;u t )j is bounded by O p (s x t ;u t = p M), whilej ˆ f m t (x t ;u t ) f T (x t ;u t )j is bounded by O p (s x t ;u t ). At any stage, the compro- mise policy is averaged over finitely many replications. Sincef ˆ f m t (x t ;u t )g M m=1 consist of finitely many pieces, their average function ˆ F t (x t ;u t ) also consists of finitely many pieces. Moreover, with finite support assumption (A5), the number of paths going forward are finite. In the fol- lowing, we will prove the variance reduction of the compromise policy from the root node to the terminal node. At the root node, the initial stage x 0 is given. S 0 denotes the set of optimal decision. Y(u 0 ) is defined as a random variable following a normal distribution with mean 0, and variance s 2 x 0 ;u 0 , written Y(u 0 )N (0;s 2 x 0 ;u 0 ). With the assumptions (A1) and (A2), the objective value is finite valued. Together with assumption (A4), the expected value function has a finite value at any point. Since the value functions at all stages are piecewise linear functions, the variance is also finite. By applying the Theorem 5.7 in [1], we have ˆ v c 0 = min u 0 2S 0 ˆ F 0 (x 0 ;u 0 )+ O p ((N M) 1=2 ); M 1=2 ( ˆ v c 0 v 0 ) d ! min u 0 2S 0 Y(u 0 ): (4.19) If, moreover,S 0 =f ˆ u 0 g is a singleton, then M 1=2 ( ˆ v c 0 v 0 ) d !N (0;s 2 x 0 ; ˆ u 0 ): (4.20) 95 Similarly, applying the same result (Theorem 5.7 in [1]) to the individual policies, we have ˆ v m 0 = min u 0 2S 0 ˆ f m 0 (x 0 ;u 0 )+ O p (N 1=2 );8m2 J1;MK; ˆ v m 0 v 0 d ! min u 0 2S 0 Y(u 0 ): (4.21) WhenS 0 =f ˆ u 0 g is a singleton, ( ˆ v m 0 v 0 ) d !N (0;s 2 x 0 ; ˆ u 0 );8m2 J1;MK: (4.22) Thus, when we denote ˆ s 2 0 as the variance ofj ˆ v m 0 v 0 j, we have thatj ˆ v c 0 v 0 j is bounded by O p ( ˆ s 0 = p M). As a result, we have that the variance of the compromise policy at the root node, defined as ˆ m c 0 (x 0 ) := argmin u 0 2U 0 ˆ F 0 (x 0 ;u 0 )= argmin u 0 2U 0 M å m=1 ˆ f m 0 (x 0 ;u 0 ); is 1=M of the variance of the individual policy, defined as ˆ m m 0 (x 0 ) := argmin u 0 2U 0 ˆ f m 0 (x 0 ;u 0 ): We implement the compromise policy ˆ m c 0 (x 0 ), and apply the system dynamicsD t (x t ;u t ;w t ) to achieve the state x 1 at the second stage. Due to the finite support assumption and the fact that the value function has finite many pieces, we have only finite number of possible value for x 1 . Similarly, we apply the Theorem 5.7 in [1], which provides the same variance reduction property of the optimal values at the second stage. Moving forward recursively, we conclude that the variance of the compromise policy is 1=M of the variance of the individual policies. When we denote ˆ s 2 t as the variance ofj ˆ v m t v t j, we have thatj ˆ v c t v t j is bounded by O p ( ˆ s t = p M). In the M replications, the SAA function valuef ˇ f m t (x t ;u t )g M m=1 are different realizations of the random function ˜ f t (x t ;u t ). The realizations depend on their sample setf ˆ W m g M m=1 , where each 96 sample set contains N i.i.d. samples. At a fixed point(x t ;u t ), ˜ f t (x t ;u t ) follows a normal distribu- tionN (m x t ;u t ;s 2 x t ;u t ), where the mean value m x t ;u t equals the true function value f t (x t ;u t ) and the value of s 2 x t ;u t depends on the problem structure and the sample size. Note that the average SAA function ˇ F t (x t ;u t ) (defined in Eq. (4.15)) is an average of M realizations of ˜ f t (x t ;u t ), which is equivalent to an SAA function with MN samples. Therefore,j ˇ F t (x t ;u t ) f t (x t ;u t )j is bounded by O p (s x t ;u t = p M), while the individual function approximations is bounded byj ˇ f m t (x t ;u t ) f t (x t ;u t )j O p (s x t ;u t );m2 J1;MK. The compromise policy is obtained by optimizing the compromise function in Eq. (4.13). Since each ˆ f m t (x t ;u t ) is approximately equal to ˇ f m t (x t ;u t ), ˆ F t (x t ;u t ) is approximately equal to the average SAA function ˇ F t (x t ;u t ). Due to the finite support assumption (A5) and the known proba- bility distributions,f ˇ f m t (x t ;u t )g M m=1 and ˇ F t (x t ;u t ) have only finitely many pieces. By applying the functional CLT, we can show that the compromise policy has reduced variance. The algorithmic approximation function is generally less than or equal to the SAA function. In practice, these two functions may not be equal at all points. However, they are close in the neighborhood of the optimal solutions. In the SDDP algorithm, as the iteration number increases, the algorithmic approximation function keeps approaching the SAA function. As a result, the al- gorithm identifies the decision near the optimal one, further improving the approximation function near the optimal solution. Hence, ˆ F t (x t ;u t ) enjoys the same variance reduction property of ˇ F t (x t ;u t ) near the optimal solutions. In [83], the authors show the variance reduction property of the compromise decision in two- stage stochastic linear programming, which only requires the equality of the algorithmic approx- imation function and SAA function at their own optimal solution locally. Consider a given state ˆ x t . Let ˆ U =f ˆ u 1 t ;:::; ˆ u M t g denote the set of the optimal solutions in all the replications, where ˆ u m t = argmin u t ˆ f m t ( ˆ x t ;u t ). Define ¯ u t = 1 M å M m=1 ˆ u m t , and ˆ u c t = argmin u t ˆ F t ( ˆ x t ;u t ). The authors in [83] prove that when ˆ u c t and ¯ u t are equal, the estimate ˆ F t (x t ; ˆ u c t ) has reduced variance. Here, we extend this concept and show the variance reduction of ˆ F t (x t ; ˆ u c t ) if ˆ u c t is in a neighborhood of ¯ u t . 97 Theorem 4.2. Suppose Assumption (A1-A6) hold. Consider a given state ˆ x t . Suppose we solve the problem with M replications, where each replication has N samples and is optimized by SDDP . For m2J1;MK, suppose that ˆ f m t ( ˆ x t ; ˆ u m t ) is close to the true function such thatj f t ( ˆ x t ; ˆ u m t ) ˆ f m t ( ˆ x t ; ˆ u m t )j is no greater than O p (N 1=2 ). Then, there exists a neighbor of ¯ u t , denoted as NE( ¯ u t ). If ˆ u c t 2 NE( ¯ u t ), j f t ( ˆ x t ; ˆ u c t ) ˆ F t ( ˆ x t ; ˆ u c t )j O p ((N M) 1=2 ): Proof. Based on the assumptions, f t ( ˆ x t ;) is a convex Lipschitz continuous function. There exists a neighbor of ¯ u t that satisfies if u t 2 NE( ¯ u t ), f t ( ˆ x t ;u t ) 1 M M å m=1 f t ( ˆ x t ; ˆ u m t ): If ˆ u c t 2 NE( ¯ u t ), f t ( ˆ x t ; ˆ u c t ) 1 M M å m=1 f t ( ˆ x t ; ˆ u m t ) = 1 M M å m=1 ˆ f m t ( ˆ x t ; ˆ u m t )+ 1 M M å m=1 f t ( ˆ x t ; ˆ u m t ) ˆ f m t ( ˆ x t ; ˆ u m t ) 1 M M å m=1 ˆ f m t ( ˆ x t ; ˆ u c t )+ 1 M M å m=1 f t ( ˆ x t ; ˆ u m t ) ˆ f m t ( ˆ x t ; ˆ u m t ) = ˆ F t ( ˆ x t ; ˆ u c t )+ 1 M M å m=1 f t ( ˆ x t ; ˆ u m t ) ˆ f m t ( ˆ x t ; ˆ u m t ) (4.23) Sincej f t ( ˆ x t ; ˆ u m t ) ˆ f m t ( ˆ x t ; ˆ u m t )j is bounded by O p (N 1=2 ). We have 1 M M å m=1 j f t ( ˆ x t ; ˆ u m t ) ˆ f m t ( ˆ x t ; ˆ u m t )j is bounded by O p ((N M) 1=2 ). 4.5.2 Bias Reduction Next, we show that the compromise policy has reduced bias as well. Many algorithms in MSLP construct an algorithmic approximation function for the corresponding SAA function. These algo- rithmic approximation functions are lower bounds of their related SAA functions. Moreover, the average of the optimal values of these SAA functions itself is negatively biased by the true optimal value [84, 85]. In Theorem 4.3, we show that the compromise lower bound reduces this negative bias. 98 Theorem 4.3. Suppose Assumption (A1-A6) hold. The initial state x 0 is given. ˆ v c 0 is defined in Eq. (4.17); ¯ v := 1 M å M m=1 ˆ v m 0 . Then, ˆ v c 0 and ¯ v are valid lower bound estimates for the optimal value of the true problem. And we have ¯ v ˆ v c 0 : Proof. Due to linear programming duality, we have ˆ f m 0 (x 0 ;u 0 ) ˇ f m t (x 0 ;u 0 ) for m2 J1;MK. Then, ˆ v m 0 = min u 0 ˆ f m 0 (x 0 ;u 0 ) min u 0 ˇ f m 0 (x 0 ;u 0 ): ˇ f m 0 (x 0 ;u 0 ) is one realization of the random function ˜ f 0 (x 0 ;u 0 ). Moreover, E[min u 0 ˜ f 0 (x 0 ;u 0 )] min u 0 E[ ˜ f 0 (x 0 ;u 0 )]= v 0 . ˆ v m 0 is a valid lower bound estimate of v 0 . Thus, ¯ v is a lower bound es- timate of the ‘true’ optimal value. Since ˆ F m 0 (x 0 ;u 0 ) ˇ F m 0 (x 0 ;u 0 ), we have ˆ v c 0 = min u 0 ˆ F m 0 (x 0 ;u 0 ) min u 0 ˇ F m 0 (x 0 ;u 0 ) v 0 , and thus, ˆ v c 0 is a valid lower bound estimate. In addition, ˆ v c 0 = min u 0 ˆ F 0 (x 0 ;u 0 )= min u 0 1 M M å m=1 ˆ f m 0 (x 0 ;u 0 ) 1 M M å m=1 min u 0 ˆ f m 0 (x 0 ;u 0 )= ¯ v: We provide some insights for Theorem 4.3. ¯ v is a lower bound estimate of the ‘true’ optimal value for two reasons: 1) the algorithm constructs a lower bound approximation of the SAA func- tion, i.e., ˆ f m 0 (x 0 ;u 0 ) ˇ f m 0 (x 0 ;u 0 ) for m2 J1;MK; 2) the mean of the optimal values of multiple SAA functions is negatively biased for the ‘true’ optimal value, i.e.,E[min u 0 ˜ f 0 (x 0 ;u 0 )] v 0 . In contrast, ˆ v c 0 is a valid lower bound estimate only because of the first reason. Our compromise pol- icy eliminates this negative bias, since we have min u 0 ˇ F 0 (x 0 ;u 0 ) min u 0 E[ ˜ f 0 (x 0 ;u 0 )]= v 0 . As a result, the compromise policy provides a tighter lower bound estimate. 99 For the given state x 0 , we have the decision ˆ u c 0 = ˆ m c 0 (x 0 ). At (x 0 ; ˆ u c 0 ), the function value ˆ F 0 (x 0 ; ˆ u c 0 ) is an average of M estimates. These estimates varies because of their corresponding sample sets, which provides an estimated variance for the compromise lower bound: ( ˆ s c ) 2 := 1 M ( ˆ f m 0 (x 0 ; ˆ u c 0 ) ˆ v c 0 ) 2 M 1 (4.24) This variance is similar to the parametric variance in the Markov decision process [86]. 4.6 A Meta-Algorithm for MSLP Algorithm 5 A Meta-Algorithm for MSLP Initialization: True datasetW, replication number M, validation sample size N 0 . Distributed Optimization: 1: for m= 1;:::;M replication (in parallel) do 2: Formulate the SAA problem with a sampled dataset ˆ W m . 3: while In-sample stopping rule is not satisfied do 4: Iteration count k k+ 1 5: Forward pass: Find the state/decision path along a sample path: (x m;k t ;u m;k t );t2 TnfTg. 6: Backward pass: Update value function approximation: ˆ f m;k t (x t ;u t );t2TnfTg. 7: end while 8: end for Aggregated Validation: 9: Formulate the average pre-decision value function ˆ F t (x t ;u t ) defined in Eq. (4.13). Construct the compromise policy ˆ m c t (x t ) with Eq. (4.14). 10: Calculate the confidence interval of the compromise lower bound according to Eq. (4.17) and Eq. (4.24). 11: Simulate the compromise policy with N 0 paths from the true datasetW. Calculate the out-of- sample upper bound value with Alg. 4. 12: if Out-of-sample stopping rule is not satisfied then 13: Choose larger sample setsf ˆ W m g M m=1 or enlarge the number of replication M. Go to Line 1. 14: else 15: Stop. Output the compromise policy ˆ m c t (x t ) for t2TnfTg, and the upper and lower bound confidence intervals. 16: end if 100 Combined with the compromise policy, we propose a meta-algorithm for the MSLP problems. Our meta-algorithm takes advantage of distributed/parallel computing, providing a more accurate function approximation within limited computational time. It can incorporate most of the value function-based algorithms for MSLP, such as the SDDP and SDLP algorithms. By formulating the compromise policy, the meta-algorithm reduces the variance and bias of the function approxima- tion and thus significantly improves the quality of our policies. Our meta-algorithm consists of two parts: distributed optimization and aggregated validation. We provide a diagram in Fig. 4.3 to illustrate the workflow. In the distributed optimization pro- cedure, we exploit distributed computing, where we construct multiple function approximations in parallel. Within each replication, we check the in-sample stopping rule, which tests whether each algorithmic approximation functionf ˆ f m (;)g M m=1 converges to the corresponding SAA function f ˇ f m (;)g M m=1 . In the aggregated validation procedure, we formulate a compromise policy that con- siders the function approximations in all replications. We check the optimality of the compromise policy by computing the out-of-sample upper bound and lower bound, which is referred to as the out-of-sample stopping rule. This step tests whether ˆ F(;) provides an optimal policy of the true problem. Our meta-algorithm sequentially enlarges the sample size or the number of replications to construct a more accurate function approximation, reducing the optimality gap to zero. The details are summarized in Alg. 5, where the forward/backward pass can be those in any value function-based algorithms, such as SDDP. In our meta-algorithm, we first choose a sample size and solve multiple SAA problems in parallel. We then formulate the compromise policy and check its optimality. If the out-of-sample optimality gap is less than the tolerance, we stop; otherwise, we choose larger sample sets and solve them again. 4.6.1 Online Dual Dynamic Programming In our meta-algorithm with SDDP, we need first to choose a sample set and solve the fixed SAA problem in each replication. The selection of the sample size is a balance between efficiency and accuracy. If the sample size is large, the SAA problem is more accurate while the computational 101 Figure 4.3: Diagram of the meta-algorithm SAA Function Algorithmic Approximation Function Compromise Function Processor 1 Processor 2 Processor M Processor m Distributed Optimization SDDP-type algo SDDP-type algo SDDP-type algo A veraging Output Compromise LB Evaluate Compromise Policy for UB 'T rue' Function Aggregated Validation Sampling cost is high. If the optimality gap is larger than the tolerance during the out-of-sample test, the SAA problems are not accurate enough. We have to choose larger sample sets and solve the new SAA problem again from scratch. For SDDP, it appears that the selection of sample size is a matter which depends on prior experience with an application. In order to automatically find the sample size and avoid starting over, we propose a sequen- tial sampling algorithm for MSLP, referred to as the online dual dynamic programming (ODDP) algorithm. Our algorithm does not require an offline sample set before the optimization step. In- stead, we only need an oracle to generate the new samples. In other words, we collect the samples on the fly. This algorithm is the most suitable for the online or streaming-data setting. Unlike the SDLP algorithm, which solves quadratic programming problems in the forward pass, ODDP solves only linear problems. In addition, our ODDP algorithm can deal with the random vari- ables with evolving distributions. In the ODDP algorithm, the SAA function iteratively converges to the ‘true’ function, while the algorithmic approximation function iteratively converges to the SAA function. We exploit distributed computing and run each replication in parallel. A compro- mise policy is formulated for a more stable performance. We only need the out-of-sample stopping rule, which checks the optimality gap between the out-of-sample upper bound and the compromise lower bound. In ODDP, we pause the algorithm for every L iterations to check the out-of-sample 102 Algorithm 6 Online Dual Dynamic Programming Algorithm (One iteration) Initialization: Count k. Sample Oracle. Initial state x 0 . Set of cuts:J t . Forward pass: Simulate the the state/decision path along a sample path: 1: Set x k 0 = x 0 . 2: Generate a sample pathw k [T] from the oracle. 3: Update the observed sample set ˆ W t ˆ W t S fw k t g;8t2T , and empirical probability p(w t ) k1 k p(w t )+ 1 k I(w t =w k t );8w t 2 ˆ W t ;8t2T . 4: for t= 0;:::;T do 5: Solve the optimization problem u k t = argmin u t ˆ f k1 t (x k t ;u t ) to obtain the decision. 6: Find the next state based on system dynamics x k t+ =D t x k t ;u k t ;w k t . 7: end for Backward pass: Update pre-decision value function approximation: 8: for t= T 1;:::;0 do 9: forw t 2 ˆ W t do 10: Find the state x t+ =D t x k t ;u k t ;w t 11: Solve the subproblem with x t+ as input: min u t+ ˆ f k t+ (x t+ ;u t+ ); and obtain optimal dual solutionp t+ . 12: Compute lower bounding affine function l t+ (x t+ ) := a t+ +hb t+ ;x t+ i, where a t+ = hb t+ ;p t+ i;b t+ = c t+ hC t+ ;p t+ i. 13: Update the set of cuts as:J t+ J t+ S f(a t+ ;b t+ ;k)g 14: end for 15: Obtain the updated stage cost-to-go value function approximation using ˆ h k t+ (x t+ )= max (a t+ ;b t+ ; j)2J t+ f j k (a t+ +hb t+ ;x t+ i)g to obtain ˆ f k t (x t ;u t )=hc t ;x t i+hd t ;u t i+ å w t 2 ˆ W t p(w t ) ˆ h k t+ (x t+ ) (4.25) 16: end for stopping rule. If it is not satisfied, we resume the algorithm and generate more samples to up- date the function approximation instead of solving a new SAA problem from scratch. The details are shown in Alg. 6, where we highlight the difference with the SDDP algorithm. The extensive computational results show the effectiveness and efficiency of our algorithm. Because of the stagewise independent and finite support assumption (A5), we will ultimately observe all possible samples at any stage. In Line 3 of Alg. 6, the set ˆ W t will be fixed since we 103 observes all possible realizations. As a result, the loop over the observed sample set (Line 9-14) will be finite. If the random variables follow a static distribution, p(w t ) uniformly converges to the true distribution. Since more samples are observed, the previously generated cuts decrease, as shown in Line 15 of Alg. 6. The decreasing of the cuts makes them a valid outer approximation of the corresponding SAA function, which has the same pattern in stochastic decomposition [83]. Using the analysis that follows the arguments supporting [77], we can show that the algorithmic approximation function in ODDP converges to the actual function asymptotically. 4.6.2 Parallel Scheme By exploiting distributed/parallel computing, we are able to accelerate the optimization process of our algorithm. In an SDDP-type algorithm, we can compute the following steps in parallel: 1. Multiprocessor parallelization: Each processor processes an independent replication. The whole optimization process is conducted independently concerning replication from sce- nario data generation to value function update. As a result, we can obtain the function approximation and policy based on the samples in one replication parallel. 2. Scenario path parallelization: in the forward pass, We can run multiple scenario path simu- lations in parallel, which will provide various state paths and decision paths. 3. Cutting plane parallelization: During the backward pass, we calculate the new cutting planes for each sample concerning the new state and decision. The cutting plane for each sample can be found in parallel. A disadvantage of this parallelization is that we will lose the warm starts when solving linear programs with the dual simplex algorithm. Our computation only focuses on the first parallelization method: the multiprocessor paralleliza- tion. Since each processor applies the optimization algorithm independently end-to-end, there is no communication cost among processors. When each processor pauses, it takes barely any time to 104 construct the compromise policy by averaging the algorithmic function approximations in all repli- cations. Our meta-algorithm can be extended to incorporate other parallelization, further reducing the computational time. Modern programming languages have made the process of parallel computing easy to imple- ment. Take Julia, one of the more popular languages in the Operation Research community, as an example. To implement multiprocessor parallelization, we only need to modify a few lines of code. For multi-threaded computing, we need to simply add the macroThreads.@threads before the iterations such as 1 Threads.@threads for m = 1:M 2 solve_Replication_m 3 end For distributed computing, the Distributed.jl package can be easily applied. These minor changes significantly reduce the computational time. The computational experiments were carried out on a workstation with two Intel(R) Xeon(R) Platinum 8270 CPU @2.70GHz processors and 64.0 GB RAM. Each processor has 26 cores and 52 logical processors. The computations exploit the state-of-the-art implementation of the SDDP solver, the sddp.jl package [78]. We use Julia 1.7.1 with Gurobi 9.5 as the LP solver. In Julia, we set the maximum number of threads as 104. 4.7 Further Computational Results This section conducts further experiments with our meta-algorithm and compares the difference with the one replication SDDP algorithm. We consider the hydro-valley thermal scheduling prob- lem with four state variables in each stage. The objective is to operate one thermal generator and four hydro-dams in a valley chain over t stages, considering rainfall uncertainty. The aim is to minimize the generation cost (including hydro generation and thermal generation) while the power demand is met. The details of the formulation are in Section 4.3.2. 105 4.7.1 Results forMeta+ODDP This subsection focuses on the performance of the meta-algorithm with ODDP. We show that our meta-algorithm with ODDP iteratively increases the lower bound estimate and decreases the upper bound estimate. As a result, it provides a tighter optimality gap as the number of iterations increases. The sequential sampling property enables the algorithm to pause and resume, where we pause the algorithm for an optimality test and resume it (as necessary) for continuously updating function approximations. The algorithm automatically finds the number of samples required based on the specified tolerance. In addition, we demonstrate the power of the compromise policy by noting that the compromise policy outperforms any individual policy in all iterations. Consider a ten-stage hydro-valley thermal scheduling problem with four continuous state vari- ables and 13 continuous decision variables in each stage. We solve this problem with ODDP under the meta-algorithm framework, setting L= 100. We run M= 30 replications in parallel. Every L inner iterations, we pause the algorithm and check the out-of-sample stopping rule. If the stopping rule is not satisfied, we resume the algorithm, run L more inner iterations, and continue. The pro- cess to run L inner iteration and check the out-of-sample stopping rule for the compromise policy composes one outer iteration. Figure 4.4: Optimality Gap v.s. Outer Iteration; Out-of-Sample Upper Bound and Lower Bound v.s. Outer Iteration 1 2 3 4 5 6 7 Outer Iteration 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Out-of -Sample Optimality Gap (%) Gap 1 2 3 4 5 6 7 Outer Iteration 46100 46200 46300 46400 46500 46600 46700 46800 Estimated Objective Out-of -Sample Upper Bound Compromise Lower Bound Fig. 4.4 shows that the upper and lower bound estimates get close as the iteration number in- creases, where their optimality gap reduces to zero asymptotically. The error bars of the optimality 106 gap and upper/lower bounds represent their 95% confidence intervals. In the right figure, the red line is the out-of-sample upper bound estimate based on 20,000 scenario paths. We calculate the mean and variance based on Alg. 4. The blue line is the compromise lower bound, whose value and variance are calculated according to Eq. (4.17) and Eq. (4.24), respectively. In the left figure, the optimality gap is calculated based on Eq. (4.16). Since the upper and lower bound estimates are normal distribution and independent, the variance of the optimality gap is the sum of the upper and lower bounds’ variance. As the outer iteration increases, the compromise function approaches the ‘true’ value function, which enhances the performance of the compromise policy. As a result, the upper bound decreases, and the lower bound increases. As shown, the optimality gap decreases quickly and is close to zero. When the algorithm stops, the optimality gap is less than 0.3%. Figure 4.5: Out-of-Samples upper bound and lower bound of the individual policies and the com- promise policy in different outer iteration policies 46000 47000 48000 49000 50000 51000 Estimated Objective Outer Iteration No.1 policies Estimated Objective Outer Iteration No.3 LB Out-of -Sample Individual UB CIs Out-of -Sample Compromise UB CIs Compromise LB CIs 1 6 11 16 21 26 Compromise P olicy policies 46000 47000 48000 49000 50000 51000 Estimated Objective Outer Iteration No.5 1 6 11 16 21 26 Compromise P olicy policies Estimated Objective Outer Iteration No.7 LB Out-of -Sample Individual UB CIs Out-of -Sample Compromise UB CIs Compromise LB CIs Fig. 4.5 compares the upper and lower bounds of all the individual policies, as well as the compromise policy. It shows the performance in the 1/3/5/7 outer iterations. As we can see, the 107 optimality gap of each individual policy decreases as the algorithm runs, while the compromise policy always performs well. As the iteration number increases, more samples are observed in all replications. As a result, the upper bound estimates keep decreasing while the lower bound estimates keep increasing. This is illustrated by the descent of the red lines and the rising of the black squares in Fig. 4.5. This figure also shows the stability of the compromise policy. If we run only one replication, the output policy can be any of the first 30 individual policies, whose optimality gap can be enormous. In contrast, the compromise policy always has an insignificant optimality gap. Figure 4.6: Cumulative distribution function of individual policy evaluation in different outer iter- ation Estimated Objective 0.0 0.2 0.4 0.6 0.8 1.0 Cumulative Frequency Outer Iteration No. 1 Estimated Objective 0.0 0.2 0.4 0.6 0.8 1.0 Cumulative Frequency Outer Iteration No. 3 CDF Mean Compromise Policy Obj. 47000 48000 49000 50000 51000 Estimated Objective 0.0 0.2 0.4 0.6 0.8 1.0 Cumulative Frequency Outer Iteration No. 5 47000 48000 49000 50000 51000 Estimated Objective 0.0 0.2 0.4 0.6 0.8 1.0 Cumulative Frequency Outer Iteration No. 7 CDF Mean Compromise Policy Obj. We also compare all the policies concerning their performance for the ‘true’ problem. We find that the compromise policy outperforms all the individual policies in every outer iteration. This property demonstrates the consistency of the compromise policy’s advantage. In Fig. 4.6, 108 we show the estimated objective of all the policies based on the simulation of 20,000 out-of- sample Monte Carlo scenario paths in different iterations. The blue line chart with stars illustrates the cumulative frequency of individual policies’ performance. The blue vertical line is the mean value corresponding to the objective values of individual policies, which can be thought of as the expected policy performance if we only run one replication. The red vertical line is the objective value of the compromise policy. As we can see, the expected individual policy performance is improved as more samples are observed in the ODDP algorithm. Meanwhile, the compromise policy always has the lowest estimated objective compared with all individual policies. In general, the compromise policy outperforms the individual policies, which illustrates the bias reduction property of the compromise policy. 4.7.2 Comparison among Algorithms With more computations, we compare the performance of the SDDP algorithm, our meta-algorithm with SDDP (labeled as meta+SDDP), and our meta-algorithm with ODDP (labeled as meta+ODDP). We consider five hydro-valley thermal scheduling problems whose total stages (t) are 24, 48, 72, 96, and 120, respectively. They all consider four hydro generators and one thermal generator. In the SDDP algorithm, we choose ten samples in each stage. Thus, the total number of possi- ble scenarios is 10 t . We calculate the in-sample optimality gap according to Eq. (4.16), with the lower bound estimate and the in-sample upper bound confidence intervals (based on 1,000 scenario paths). The algorithm stops when zero is within 95% confidence intervals (CIs) of the in-sample optimality gap. This stopping rule is commonly used in the SDDP-type algorithms [78, 87]. For the meta+SDDP algorithm, we run M= 30 replications in parallel, where each replication has 10 samples per stage. The in-sample stopping rule is the same as the SDDP algorithm. We formulate the compromise policy based on the function approximations in all replications. The out-of-sample optimality gap is calculated with the compromise lower bound CIs and the out- of-sample upper bound CIs (based on 2,000 out-of-sample scenario paths). The out-of-sample stopping rule is satisfied if zero is within 95% CIs of the out-of-sample optimality gap or the time 109 of distributed optimization is greater than 1,500 seconds. If the out-of-sample stopping rule is not satisfied, we add ten more samples per stage in each replication and solve the new SAA problems. For the meta+ODDP algorithm, we compute M= 30 replications in parallel. Every L= 500 inner iterations, we pause the optimization procedure, formulate the compromise policy and check the out-of-sample optimality test, which is the same one as in meta+SDDP. If the optimality test is not satisfied, we resume the distributed optimization and refine the function approximations. For all the algorithms, we summarize the upper bound estimate (based on 2,000 out-of-sample scenario paths), lower bound estimate, and out-of-sample optimality gap with 95% CIs in Table 4.1, which are visualized in Fig. 4.7. The distributed optimization time (or training time) in sec- onds is also included. As we can see, the performance of the one-replication SDDP algorithm is unreliable. Although 0 is within 95% of its in-sample optimality gap, its out-of-sample optimality gap is significantly greater than 0. In the problem witht = 96 stages, its out-of-sample optimality gap is greater than 40%! Thus, we should implement a multiple-replication algorithm to obtain a better policy for the ‘true’ problem. As shown, in all instances, the meta+SDDP algorithm has the best performance regarding the upper bound estimate, as well as the optimality gap. This demonstrates the advantage of the compromise policy, which not only provides a tighter lower bound estimate (with confidence intervals) but also constructs a better policy. Our meta-algorithm considerably improves the performance of SDDP, while the extra computational effort is not sig- nificant. The training time for meta+SDDP approximately equals the longest training time out of all the replications. As for the meta+ODDP algorithm, the optimality gaps in all instances are less than 5%. The sequential sampling property makes the meta+SDDP algorithm suitable for online settings and evolving environments. Notice that when we try to get an accurate evaluation for the policy with Algorithm 4, the time for evaluation is significant. For instance, for the problem with t = 24, the evaluation time for the policy from meta+SDDP with N 0 = 2;000 scenario paths is 96.5 seconds. With our meta- algorithm, we only need to evaluate one policy: the compromise policy. However, in the commonly 110 used multi-replication SAA method [1], when researchers solve M replications, they have to evalu- ate all the individual policies from the replications and then choose the best one as the final policy. As a result, their evaluation time can be M times as much as that in our method. Moreover, we have already shown that the compromise policy performs better than the best individual policy. 111 Table 4.1: Comparison among different algorithms: the upper/lower bounds and optimality gap (with 95% confidence intervals). Algorithms Properties t=24 t=48 t=72 t=96 t=120 SDDP UB ¯ Q 16954.46 ( 220.29) 38429.00 ( 805.07) 50599.06 ( 589.62) 107103.85 ( 3114.61) 90657.98 ( 1734.68) LB ˆ v 0 15961.66 31393.26 47175.33 62234.40 81681.85 Gap (%) 5.86 ( 1.30) 18.31 ( 2.09) 6.77 ( 1.17) 41.89 ( 2.91) 9.90 ( 1.91) time (s) 33 20.1 39.1 30.4 99.5 meta +SDDP UB ¯ Q 15965.32 ( 7.19) 31971.32 ( 25.43) 48010.64 ( 12.90) 64039.92 ( 15.18) 80114.97 ( 17.38) LB ˆ v c 0 15911.25 ( 86.95) 31864.24 ( 204.38) 47964.18 ( 274.61) 64042.43 ( 282.84) 79926.52 ( 521.84) Gap (%) 0.34 ( 0.55) 0.33 ( 0.64) 0.10 ( 0.57) 0.00 ( 0.44) 0.24 ( 0.65) time (s) 41.4 35.8 52.9 99 230 meta +ODDP UB ¯ Q 15949.12 ( 24.45) 32147.86 ( 135.12) 48257.73 ( 169.21) 64514.94 ( 384.31) 80927.77 ( 493.61) LB ˆ v c 0 15741.94 ( 11.57) 31367.00 ( 32.14) 46843.21 ( 68.22) 62237.11 ( 71.97) 77555.76 ( 114.79) Gap (%) 1.30 ( 0.17) 2.43 ( 0.43) 2.93 ( 0.38) 3.53 ( 0.61) 4.17 ( 0.63) time (s) 1632 3028 4528 7219 8855 112 Notice that, in our hydro-valley thermal scheduling problem, the meta+ODDP algorithm is much more computational intensive compared with others. It stops because the training time is longer than the limit in all instances. Since meta+ODDP considers more samples per stage, the computational time is much longer. At the 500 th inner iteration, each replication may apply up to 500 samples to construct one cut during the backward pass. On the other hand, the meta+SDDP algorithm exploits ten samples to generate one cut. In hindsight, since the optimality gap in meta+SDDP is small enough, we may conclude that ten samples per stage with 30 replications are sufficiently accurate to capture the random structure of this problem. However, for the example with more complications in the stochastic process, such as the SSN problems in [83], we need to explore thousands of samples per stage to get an accurate approximation. In these cases, a larger sample set in meta+SDDP is beneficial. 4.8 Conclusions We propose an ensemble model called the compromise policy for MSLP problems, the only (al- most) free lunch to build a more stable policy. The compromise policy exploits distributed/parallel computing and builds a more accurate function approximation with little extra computational time. We show that the compromise policy not only reduces the variance of the function approxima- tion but also reduces the bias. It provides a tighter lower bound estimate (with confidence inter- val) for the optimal objective value. In our compromise policy, we can further incorporate the variance-reduced sampling strategies, such as Latin Hypercube Sampling (LHS) and Randomized Quasi-Monte Carlo (RQMC), to generate a scenario tree to reduce the variance of the function approximation further. Considering the compromise policy, we provide a general meta-algorithm based on in-sample and out-of-sample stopping rules for MSLP problems. Our meta-algorithm can incorporate SDDP- type algorithms, improving their performance and stability. We introduce an online dual dynamic programming algorithm that iteratively improves algorithmic approximation and SAA function 113 with sequential sampling. The ODDP algorithm automatically decides the sample size of the SAA function and is capable of pausing/resuming during the optimality test. With extensive compu- tations, we demonstrate the advantage of our algorithms, even for instances with over 100 time periods. 114 Figure 4.7: Comparison among different algorithms: the upper/lower bounds and optimality gap (with 95% confidence intervals). Number of Stage τ = 24 Number of Stage τ = 48 Number of Stage τ = 72 Number of Stage τ = 96 Number of Stage τ = 120 SDDP meta +SDDP meta +ODDP 0 2 4 6 Optimality Gap (%) Gap SDDP meta +SDDP meta +ODDP 15750 16000 16250 16500 16750 17000 Estimated Objective Lower Bound Upper Bound SDDP meta +SDDP meta +ODDP 0 5 10 15 20 Optimality Gap (%) Gap SDDP meta +SDDP meta +ODDP 32000 34000 36000 38000 Estimated Objective Lower Bound Upper Bound SDDP meta +SDDP meta +ODDP 0 2 4 6 8 Optimality Gap (%) Gap SDDP meta +SDDP meta +ODDP 47000 48000 49000 50000 51000 Estimated Objective Lower Bound Upper Bound SDDP meta +SDDP meta +ODDP 0 10 20 30 40 Optimality Gap (%) Gap SDDP meta +SDDP meta +ODDP 60000 70000 80000 90000 100000 110000 Estimated Objective Lower Bound Upper Bound SDDP meta +SDDP meta +ODDP 0 2 4 6 8 10 12 Optimality Gap (%) Gap SDDP meta +SDDP meta +ODDP 77500 80000 82500 85000 87500 90000 92500 Estimated Objective Lower Bound Upper Bound 115 Chapter 5 A Cyber-Infrastructure for Computational Stochastic Programming 5.1 Introduction This chapter introduces a new Cyber-Infrastructure (CI) for the Operations Research (OR) re- searchers, especially those working with Stochastic Programming. The goals of this project are: • Promote reproducibility of computational results, and enhance validation and comparison of OR models and algorithms • Provide a pedagogical tool for training students. • Promote reusability by sharing of data, code, methods, and evaluations across the OR com- munity A common thread that binds these goals together is that they are enabled only if all of the data asso- ciated with an OR solution (code, input data, intermediate products) are discoverable and broadly accessible in a form that could be understood and reused, with enough descriptive information to understand what the data represents, and how the Analytics process can be reproduced. Here the data includes the mathematical model (which in case of optimization is provided via special- purpose codes such a AMPL, ARENA etc.), the parameters specified for instantiating mathemat- ical objects, and the workflow associated with the process. The outputs associated with such an 116 experiment leads to insights, and ultimately decisions. We refer to the entire workflow as the An- alytics process, and usually involves multiple paradigms, including data science, decision science, simulation, visualization etc. cORe is intended to provide a platform for sharing OR solutions as specified above. It has become broadly accepted in the scientific data sharing community that by ensuring cer- tain principles, reproducibility can be enhanced. These are the so-called FAIR principles [88] — that it is Findable, Accessible, Interoperable, and Reusable. Specifically in the context of OR solutions, we require that the “input” is findable when it is identified by a unique identifier and characterized by rich metadata that describe the details of the data and Analytics workflow; acces- sible via standard protocol with access control and its metadata accessible even when the data is not; interoperable by using standardized terms to describe it; and reusable by providing accurate and relevant attributes. The cORe CI is a first step in this direction. The widespread availability of data and software have now made it possible to build a CI which will allow the OR community to share OR solutions, and to enable the field to become more engaged with the scientific approach of experimentation and validation. In addition to research ad- vances, such a CI is likely to support new applications, especially those which face a similar terrain of data and decisions. For example, many cities have begun to implement advanced OR tools for bike sharing, transportation planning, crime abatement etc. These applications may be viewed un- der the broad umbrella of Analytics for Smart Local Communities (ASLC). OR support of ASLC calls for a multi-faceted infrastructure which promotes an exchange of ideas through modeling and algorithmic services which integrate data resources, with appropriate models and algorithms. For example, notions such as modeling demand-response in power grids, traffic-responsive auto- mobile routing, safety-conscious bicycle routing, and many others require an integrated system of data management, predictive and prescriptive analytics, as well as validation and visualization tools. The cORe cyber-infrastructure is designed to support such end-to-end ASLC. From a so- cietal viewpoint, communities have a lot to gain by sharing their Analytics resources, especially since they often face similar circumstances, and often have access to similar data. 117 To be sure, there have been several previous forays into facilitating computational OR. Cur- rent software efforts within the OR community include COIN-OR (COmputational INfrastructure for Operations Research) for developing mathematical optimization software [89], MIPLIB-2010 (http://miplib.zib.de), a library of deterministic test-instances for mixed-integer programming,SIPLIB [90] and more recently, a simulation-optimization portal known as http://simopt.org [91, 92]. These resources are important, but they do not serve the role of a CI. To the best of our knowledge, the most widely used CI within the OR community has been the NEOS project, which was launched almost twenty five years ago (as a joint effort between Argonne National Labs, and Northwestern University). That project, which continues now with support from several universities (e.g., Wis- consin, ASU, and others), has been the mainstay of CI for optimization. While the services NEOS provides continues to be valuable for the optimization community, it is time to create a new CI which will incorporate many lessons of the past two decades. The high-level goals mentioned at the outset included reproducibility and reusability. In ad- dition, we impose the following requirements which are specific to the set of services we wish to provide. Unlike previous efforts, the focus on ASLC transcends any one genre of modeling tools, algorithms and software. Since an Analytics project typically requires coalescing data and deci- sion sciences, the resulting workflow typically involves statistical analysis, as well as optimization algorithms. With these requirements in mind, we have designed the cORe platform to achieve the following functionality: • Support complex multi-model learning, optimization, simulation, and other tools which have become routine in Analytics projects. Such workflow involves heterogeneous data, models, algorithms and OR resources. To be effective, cORe must serve as a repository for OR resources. • Enable rapid implementation so that the work required for computational results can be re- duced by leveraging previously reported experiments. The cORe repository of computational experiments will create an atmosphere in which alternative experimental setups will not be too onerous. 118 • Expedite the development of prototypes, each with its particular choice of steps in the work- flow. Since an Analytics project may involve multiple prototypes, the overall task may be quite daunting. However, as users begin to share tools, ASLC projects will become less daunting. We refer to this approach to Analytics as a “crowd-sourced” CI solution, which is an extension of the spirit of the package “R” for statistics. • By creating a search-able ecosystem of OR resources, we hope to reduce the time/effort required to carry out computational experiments. This ecosystem will also support citations via digital object identifiers (doi). This approach allows users to be very specific about the instance(s) being used in their test and moreover, it avoids proliferation of instances associated with the same name. For instance, the SSN (Sonet Switched Network) instance in the stochastic programming (SP) literature has at least two versions with the same name; one of them hasO(10 70 ) scenarios, whereas a sampled version with 5000 scenarios is also named SSN. While the solution of the second may be a good approximation of the first, the two instances are different, and theirdoi should be different. We view any Analytics application as a project network, also known as Critical Path Method (CPM) [93]. Such networks are said to be activity-on-arc networks where each step of the (Analyt- ics) application represents a computational service provided by a specific software. In most cases, each step/activity will represent the execution of one type of software (e.g., say a regression using “R”). Because each step requires input data, these must be specified as well. Finally as in CPM, each task can only start after all predecessor tasks have been completed. Unlike CPM however, we will not require activity durations in the current network representation, although in the future this aspect may be included when we support time-sensitive applications. The class of applica- tions supported by the conceptual design of cORe is one that can be represented by a directed acyclic graph. This structure is relatively easily implemented as a sequence of steps in which all precedence must be obeyed. Continuing the analogy with CPM, the project input description will require three specific de- tails for any particular step (activity): Name of the Specific Step (representing the Activity), Input 119 File Name(s)/Location(s) (representing the predecessor step), and the Output Name(s)/Location(s) (representing the successor step). Before proceeding to the details of our platform design, we should also clarify a few additional aspirational features: a) allow users to annotate and specify workflow (or pipelines); b) allow users to upload papers associated with experiments, so long as no copyrights are in place; c) provide the capability to search the system for similar efforts which might have been carried out in the past. Thus a list of keywords are associated with each contribution. 5.2 System Design and Related Work The CI for cORe will leverage a system known as DERIV A (Discovery Environment for Relational Information and Versioned Assets) [94] which is currently used to support both small and large scientific special interest groups in the Biosciences community. The DERIV A system is a model- driven web-based interface for data discovery. Because of its roots in the experimental sciences, DERIV A is designed to support collaboration through the full life-cycle of scientific data, which is intended to streamline the storage and management of digital assets such as pictures, including experimental data from instruments (e.g., microscopes, sequencers and flow cytometers). Because of the its focus on mathematical modeling, and algorithm development, the workflow for OR solutions may seem to be significantly different from the needs of life sciences research. However, the steps followed during a Biosciences experiment may involve a variety of instruments, all of which must be available for anyone else to replicate those experiments. In the case of OR, each step in the experiment requires its own “instruments” which happen to be software/algorithms which operate on certain operating systems. So long as the workflow of the OR experiment speci- fies these “instruments” in a specific manner, others conducting the same experiment are responsi- ble for assembling the entire protocol for the OR experiment as well. In this sense, the experimental process is similar. 120 The difference between the needs of the OR community, and those of the Biosciences commu- nity is that experiments of the latter produce data which are platform-independent. Nevertheless, the protocols they follow must be specific, and anyone trying to replicate those experiments must have access to specific instruments specified by the protocol. Similarly, OR researchers will be able to download the open-source code and data, and reproduce the experiment provided they have all applicable licences installed on their platform. In order to see how the two sets of steps require the same kind of support from DERIV A, we present a somewhat realistic example of running, the Electricity Dispatch Model for the state of Illinois [95]. This experiment requires multiple steps in the workflow to represent a simulation model of wind energy production, as well as a sequential implementation of the electricity dis- patch in ten minute intervals. Thus the simulated dispatch reflects plan for the next six ten-minute intervals to estimate the simulated hourly cost. Once the model, and algorithm have been devel- oped, one begins the integration process of introducing these procedures into the larger experiment which needs to be instantiated with data for the experiment. This is accomplished through the ser- vices of DERIV A which include: a) a loosely coupled web services architecture with well defined public interfaces for every component, b) use of Entity Relationship (ER) Models that leverage standardized vocabularies, with adaptive components which can automatically respond to evolv- ing ER models, c) model-driven user interfaces to enable navigation and discovery as the data model evolves, and d) data-oriented protocols where distributed components coordinate complex activities via data state changes. DERIV A uses the ER data model to catalog and organize assets which will be digital objects (i.e., files). Assets are characterized by contextualizing metadata which places an asset into the model by relating it to a specific entity. Additional descriptive meta- data are used to describe additional attributes and relationships between assets. The components of the cORe ecosystem are shown in Figure 5.1. The cORe platform includes the acquisition, management, analysis and sharing of data, models and decisions, which provides the following capabilities. 121 Figure 5.1: CORE: computational operations research exchange platform design. • Characterization and acquisition of datasets, including input data for statistical learning and analysis, outputs from learning/optimization models, and results from validation analysis. • Organization of data-driven decisions. The cORe system provides users with interactive ways of connecting and importing data-analysis packages/functions (such as R packages), optimization solvers and validation/visualization tools. • The data assets will be stored in distributed in cloud based data storage systems. • Sharing and exchange of model and data collections. The platform involves management of IP associate with data assets, which is intend to protect proprietary data and software. 5.3 Computations One of the major conveniences today is the widespread availability of local data, ranging from weather and traffic, to crime, hospitals, health-care and more. Many of these data resources are available freely on the web, while others need some level of authentication, and still others call for subscription to a service. 122 In the cORe platform, each experiment is classified by its scientific thrust (see Figure 5.2): Data Science, Decision Science and Data-2-Decision. This classification corresponds to the class of activities undertaken in an experiment. For the first category, the experiment focuses on the data science aspect, while in the second, the focus is on decision/optimization, and the third refers to the fusion of data and decision sciences. Figure 5.2: Homepage of the cORe platform. The cORe repository classifies datasets into three categories by the level of realism: pedagog- ical, advanced and challenge. As the names indicate, the first category is used for educational purposes, and the other two reflect the degree of realism associated with the experiment, with advanced being less demanding than challenge instances. To illustrate “cORe at work”, we return to the three main goals summarized at the outset, namely, Reproducibility, Pedagogy, and Reusability. • Reproducibility of computational results, and comparisons among algorithms: we provide an example from http://simopt.org, and compare results from three alternative stochastic optimization algorithms for a multi-dimensional newsvendor.. • Pedagogy. Many universities, including USC, offer a course which integrates data and de- cision sciences (or prescriptive and predictive Analytics). cORe supports this mission by providing a repository of examples. The one included here is simple combination of data 123 science (matrix completion) and decision science (mixed integer programming) for meal planing, referred to as the MnM problem. • Reusability: As mentioned earlier, publications often require editors to be able to reuse OR solutions reported in a paper. We present such an example which allows other users to run the same workflow, as described in a paper (and provided via cORe). This example, known as LEO-Wyndor, is a more advanced combination of data science (linear regression), decision science (stochastic linear programming), together with data science for model validation. 5.3.1 Reproducibility of Computational Comparisons of Algorithms: Multi- Dimensional Newsvendor In this section we borrow a simulation optimization instance from http://simopt.org, the Multi- Dimensional Newsvendor (MDNV) Problem [96]. This instance appeared in [97], although [98] discussed a parametric version earlier. In this instance, a firm manufactures q products with p different resources. The resource vector x2R q + needs to be determined before the demand vector w2R p + is observed. After the demand is realized, a production vector y2R p + is selected to maximize the profit of the firm. For given resource i and product j, let c i be the unit investment cost for resource i, v j be the unit profit margin for product j, a i j be the amount of resource i required to produce one unit of product j. While the original instance was stated as a maximization problem, we state it below as a minimization problem: min f(x)= c T x+E[g(x;w)] s:t: 0 x u (5.1) where g(x;w)= minv T y s:t: Ay x (capacity constraints) 0 yw (demand constraints) (5.2) 124 The demand vectorw has p i.i.d. components (assumed to be non-negative), with each component having a Burr Type XII distribution, which has the cumulative distribution function: F(w j )= 1 1+w a j b ; j2f1;:::; pg with parameter a = 2, b = 20. By applying duality theory to equation (5.2), we have: g(x;w) g(x k ;w)+l(x k ;w) T (x x k ), where x k is the k-th iterate. Therefore, for a given vector w, dual variablel(;w) is a subgradient of g(;w). Several stochastic subgradient-based algorithms have been proposed for this instance by [97]. In the cORe platform, we compare the performance of Stochastic Approximation (SA), Robust Stochastic Approximation (RobustSA) [99], and Stochastic Decomposition (SD) [100]. The data for equation (5.1), (5.2) are as follows: u= 5; c T =(5 5); v T =(9 8 7); A= 0 B @ 1 0 0 0 1 2 1 C A : All algorithms were coded in the C language, and the experiments were conducted on a Mac- Book Pro with 2.7GHz Intel Core i7 processor. We run 10 replications for SA and RobustSA, with initialization point(0:1 0:1). For SA, a diminishing step size is selected, a=k, where a is 0.1 and k is iteration number. The stopping rule is either the maximum number of iteration (10000) or the following equation is satisfied: a k 1 N 2 j N 2 å i=1 l(x k ;w i )j< 0:0001 where N 2 is selected as 10. For RobustSA, the settings are follows: the total number of iteration is 10,000, N 2 = 10, the step size is a constant equal to 0.0001. For the SD algorithm, we run 10 replications and at the end of those runs, SD produces an compromise decision based on all the replications [100]. Thus, there is only one solution and confidence interval for SD. The time reported for SD reflects all 10 replications as well as the time to obtain the compromise decision. 125 The results are reported in Table 5.1. For the notation in the table: Rep. is the number of replication, Sol. is the optimal solution, CI is the 95% confidence interval for the objective function and Time is the CPU time in seconds. Table 5.1: SA vs. RobustSA vs. SD on the MDNV problem. Rep SA RobustSA SD (10 Reps) Sol. CI Time Sol. CI Time Sol CI Time 1 [0.1735,0.2293] [-0.8479,-0.8395] 8.83 [0.1746,0.2270] [-0.8479,-0.8395] 19.25 [0.1698,0.2279] [-0.8479,-0.8394] 12.86 2 [0.1717,0.2286] [-0.8481,-0.8396] 8.87 [0.1735,0.2290] [-0.8479,-0.8394] 19.17 3 [0.1742,0.2293] [-0.8479,-0.8395] 8.41 [0.1727,0.2273] [-0.8480,-0.8396] 19.34 4 [0.1733,0.2289] [-0.8480,-0.8395] 7.43 [0.1734,0.2296] [-0.8479,-0.8394] 19.29 5 [0.1723,0.2276] [-0.8481,-0.8396] 9.38 [0.1714,0.2290] [-0.8482,-0.8397] 18.99 6 [0.1730,0.2283] [-0.8479,-0.8395] 8.64 [0.1730,0.2289] [-0.8480,-0.8395] 19.37 7 [0.1725,0.2284] [-0.8481,-0.8396] 6.48 [0.1715,0.2285] [-0.8482,-0.8398] 20.19 8 [0.1742,0.2284] [-0.8479,-0.8394] 8.77 [0.1752,0.2304] [-0.8475,-0.8391] 19.31 9 [0.1726,0.2290] [-0.8479,-0.8394] 8.36 [0.1705,0.2275] [-0.8481,-0.8396] 20.04 10 [0.1730,0.2302] [-0.8479,-0.8394] 7.99 [0.1734,0.2283] [-0.8480,-0.8395] 19.64 5.3.2 Reusability of a Research Example: LEO-Wyndor Finally, we present an example where new research known as Learning Enabled Optimization (LEO) is introduced via a pedagogical problem known as Wyndor, borrowed from [101]. Our example, LEO-Wyndor Problem [102], extends the original Wyndor model to one in which the production plan is to be made while bearing in mind that the allocation of the advertising effort (time slots on TV or radio) affects sales of different types of products, while the original Wyndor model was intended to use product demand as input, and the production plan was intended to maximize profit under production constraints, under the assumption that the company could sell any quantity of products produced. In other words, production capacities were the main constraints. For this example, a statistical model, Multivariate Linear Regression (MLR) model, is used to predict future sales (W), considering the advertising slots (Z) on TV and radio. In this example, the advertising decisions constitute a bet on the first stage (advertising) decisions x, and the second stage decisions are the production planning choices, given firm order(sales). The whole process for this model includes three phases: Statistical Learning, Stochastic Pro- gramming and Validation, which are listed in the page of LEO-Wyndor instance of the cORe platform. The workflow for this instance is included in Figure 5.3. Multiple models have been 126 implemented for comparison, including deterministic forecasts (DF), normally distributed and un/correlated (NDU/NDC) demands and empirical additive errors (EAE) models. A specific nu- merical instance of this model can be found on the cORe platform (https://core.isrd.isi.edu). Figure 5.3: Workflow of LEO-Wyndor in cORe Platform. 5.4 Conclusions This chapter has presented a new cyber-infrastructure for the OR community, and is intended to help researchers share OR solutions which includes data, models, codes, and experiments which can be reproduced without as much programming as is necessary today. With the cORe platform, we can significantly enhance the reproducibility of the research in computational stochastic pro- gramming. 127 Chapter 6 Conclusions and Future Work 6.1 Summary of the Research This thesis focuses on the theory and applications of computational stochastic programming. On the theory part, I propose several variance reduction methods for stochastic combinatorial opti- mization, where a stopping rule is proposed, which provides an upper bound for the probability of early termination. In addition, I investigate the multi-replications algorithm in multi-stage stochas- tic linear programming and improve the policy’s performance and stability. A meta-algorithm with variance-reduced policy is proposed. On the application part, I build the stochastic models for re- source allocation problems during the COVID-19 pandemic, whose performances are much better than the one of the deterministic models. I apply the variance reduction methods for logistics and power system problems, which show a significant advantage in computational time and solution stability. Moreover, we build a cyber-infrastructure to enhance the reproducibility of computa- tional stochastic programming research. This platform accelerates the exchange of research in the operation research domain. Practical Stochastic Linear Programming Applications: I build one deterministic model and two stochastic models (simple recourse and general recourse models) to formulate the prob- lem of health care resource allocation among the country. The extensive computational experi- ments prove the advantage of the stochastic linear programming model. Under specific settings, 128 the general recourse model’s decision can reduce the unmet demand for ventilators up to 95.3% compared with the “no-coordination” model. Stochastic Combinatorial Optimization: I propose two variance reduction methods for solv- ing very large-scale stochastic combinatorial optimization problems. The first one is based on the “bagging” from machine learning and is an extension of the “compromise decision” in stochastic linear programming. The second method is based on the computational budget allocation tech- nique from simulation optimization. I formulate the post-processing procedure as a budget allo- cation problem, where the final decision is evaluated with lower variance. I show that, with little computational effort, we can find a decision with a high theoretical guarantee. Multi-stage Stochastic Linear Programming: I propose the compromise policy, which is a policy with variance reduction and bias reduction. The compromise policy can be integrated with the SDDP-type algorithms. By exploiting parallel/distributed computing, the compromise policy achieves a tremendous advantage in terms of performance and stability with little extra compu- tational time. In order to take advantage of the compromise policy, I propose a meta-algorithm in MSLP with the value-function-based algorithms. An online dual dynamic programming algo- rithm is conducted to show the progressive improvement of the value function approximation. The extensive computations show the efficiency and effectiveness of the meta-algorithm. Cyber-Infrastructure for Stochastic Programming: We build a cyber-infrastructure for com- putational stochastic programming, named computational Operation Research exchange (cORe). This platform enhances the reproducibility of the computational in operation research areas, espe- cially stochastic programming. It not only promotes reusability by sharing data, code, methods, and evaluations but also provides a pedagogical tool for training students. 6.2 Future Research Directions Stochastic programming has shown its effectiveness in many practical applications, from resource allocation to network problems. The stochastic programming theory requires more investigation, 129 which could provide new insight into modeling and algorithm design. Combined with my current research, the following directions are promising for future research. 6.2.1 Multi-stage Dynamic System for Deep Learning Recent years have witnessed tremendous advances in deep artificial neural networks, which attracts the attention of a considerable amount of researchers and practitioners [103–105]. It plays a sig- nificant role in many areas, ranging from image classification [106], natural language processing [107] to data mining, and game-playing [108]. In one perspective, the deep NN architectures can be regarded as a kind of hierarchical rep- resentation. As NNs become deeper and deeper, the data representations become more and more abstract. In the other perspective, the deep NNs are complex, nonlinear, multi-stage dynamic sys- tems, where a series of input data are transformed layer by layer to the target space [109, 110]. The commonly used gradient descent algorithm only calculates the first-order derivative and ignores the high order derivative information. It is blind to the loss function’s curvature. The search directions might oscillate in different iterations when applying gradient descent to optimize a loss function with ill-conditioned curvature. It is challenging to keep track of the progress, making gradient descent unsuitable for these problems. On the contrary, 2-nd order optimization methods take advantage of the curvature information and scale the search direction based on curvature infor- mation. This property makes 2-nd order optimization methods ’scale invariance’ and has a steady decrease in the loss function, which is proper for NN problems. The exploration with the opti- mization method beyond first-order includes the natural gradient method [111], KFAC [112], and L-BFGS [113, 114]. We need to calculate the Hessian matrix’s inverse when implementing the 2- nd order optimization methods, e.g., Newton’s Method. Most of the current research on 2-nd order optimization methods treats all the weight parameters equally and optimizes them simultaneously, ignoring the hierarchical structure of NNs. We notice that the backpropagation with gradient descent algorithm in the weight parameter space can be viewed as an iterative algorithm between chain rule and Dynamic Programming (DP), 130 where the gradients are calculated and updated layer by layer. These layer-wise updates exploit the hierarchy property of NNs and reduce the time complexity. From this perspective, it is natural to implement the Quasi-Newton optimization methods within the DP framework. Besides, we observe the connection between the backpropagation algorithm and the First Order Differential Dynamic Programming algorithm [115]. Thus, applying the algorithms in dynamic systems (e.g., Quasi-Newton Differential Dynamic Programming [116]) for deep learning would be promising. Early explorations have been explored, such as the Differential Dynamic Programming Neural Optimizer [117]. We can show that the optimal policy in MSLP is a piecewise affine policy. Applying the differential dynamic programming algorithm to deep learning shows that the optimal policy is an affine policy, where the feedback policy, which is ignored in the first-order algorithm, should gain more attention. 6.2.2 Machine Learning Model for the Policy in MSLP In the SDLP algorithm [77], the authors consider the basic feasible policy (BFP), which is a piece- wise affine policy and converges to optimality. It provides a proximal point in the quadratic op- timization problem at any stage. The basic feasible policy and the value function approximation perform better as the iteration number increases. The disadvantage of BFP is that it may take a long time in the inference stage. One future direction is to build a machine learning model (online learning) to learn the BFP. Since the BFP converges to optimality asymptotically, we can show that the online learning model also converges to optimality asymptotically. In addition, the online learning model can find the proximal point in a shorter time. We find the following connections between the SDLP algorithms (in MSLP) and the value/policy iteration algorithms (in reinforcement learning). The SDLP can be regarded as a value iteration algorithm, which is commonly used in reinforcement learning or approximate dynamic program- ming. We iteratively update the value function approximation and find the optimal policy by 131 minimizing pre-decision value function for any given state. However, unlike the Empirical Dy- namic Programming (EDP) algorithm [118], which uses parametric basis functions, or the deep reinforcement learning [119], which implement a deep neural network, we approximate the value function with piecewise affine functions. In EDP or DQN method, the choice of the basis func- tions or the design of the network structure is critical for the accuracy of function approximation. Basically, we need the data consist of two elementsf(x n ;z n )g n , where x n is the state, and z n is the corresponding function value. These data can be generated by efficient algorithms, such as the temporal difference (TD) algorithm. We use the first elementfx n g as the features and the last one as the response, and thus, a regression task is implemented. We will use the regression model to predict the value function during the inference step when given a new state. On the contrary, our algorithm uses a different method to approximate the value function with cutting planes (or minorants). This method uses a strategy of “learning from one’s mistakes.” We assign some state variables trial values by simulating the scenario path and acquire the best decision consistent with these states. In the process, we can get some information about the relationship between other states and decisions (through the duality of the linear programming problem). By exploiting this information, we obtain more knowledge about all decisions for any state, and this helps us reduce the number of decisions we must enumerate to find an optimal solution. Another comparative algorithm in reinforcement learning is the policy iteration algorithm, which consists of two steps: policy evaluation and policy improvement. These two steps run iteratively until the policy converges to optimality. In our algorithm, we implement a heuristic pol- icy, specifically, the basic feasible policy, to find the proximal point. Each observed basis provides a policy, and BFP chooses the best policy. In each iteration, we evaluate the decisions associated with all the observed bases and choose the one with the lowest objective value. That is, we evaluate our BFP based on the value function approximation, which is similar to policy evaluation in the policy iteration algorithm. During the backward pass, we solve the stagewise dual problem, which may provide a new basis. We add the new basis to the observed basis set, and this improves our 132 heuristic policy, which completes our policy improvement step. Similar to the policy iteration al- gorithm, our algorithm implements the policy evaluation and policy improvement steps iteratively until the BFP converges to optimality. By exploiting the similarity among MSLP and reinforcement learning, we can adapt new ideas and propose efficient algorithms for the problems in MSLP. 133 References 1. Shapiro, A., Dentcheva, D. & Ruszczy´ nski, A. Lectures on stochastic programming: mod- eling and theory (SIAM, 2009). 2. Rockafellar, R. T., Uryasev, S., et al. Optimization of conditional value-at-risk. Journal of risk 2, 21–42 (2000). 3. Ntaimo, L. & Sen, S. The million-variable “march” for stochastic combinatorial optimiza- tion. Journal of Global Optimization 32, 385–400 (2005). 4. Powell, W. B. & Topaloglu, H. in Applications of stochastic programming 185–215 (SIAM, 2005). 5. Tomasgard, A. & Høeg, E. in Applications of stochastic programming 253–276 (SIAM, 2005). 6. Bakker, H., Dunke, F. & Nickel, S. A structuring review on multi-stage optimization under uncertainty: Aligning concepts from theory and practice. Omega 96, 102080 (2020). 7. Powell, W. B. A unified framework for stochastic optimization. European Journal of Oper- ational Research 275, 795–821 (2019). 8. Sen, S. & Higle, J. L. An introductory tutorial on stochastic linear programming models. Interfaces 29, 33–61 (1999). 9. Sen, S., Yu, L. & Genc, T. A stochastic programming approach to power portfolio optimiza- tion. Operations Research 54, 55–72 (2006). 10. Shi, J. & Oren, S. S. Wind power integration through stochastic unit commitment with topol- ogy control recourse in 2016 Power systems computation conference (PSCC) (2016), 1–7. 11. Xie, S., Li, X. & Ouyang, Y . Decomposition of general facility disruption correlations via augmentation of virtual supporting stations. Transportation Research Part B: Methodologi- cal 80, 64–81 (2015). 12. Werner, A., Uggen, K. T., Fodstad, M., Lium, A.-G. & Egging, R. Stochastic mixed-integer programming for integrated portfolio planning in the LNG supply chain. The Energy Jour- nal 35 (2014). 13. Romeijnders, W., Stougie, L. & van der Vlerk, M. H. Approximation in two-stage stochastic integer programming. Surveys in Operations Research and Management Science 19, 17–33 (2014). 14. Arora, S. & Barak, B. Computational complexity: a modern approach (Cambridge Univer- sity Press, 2009). 15. Carino, D. R., Myers, D. H. & Ziemba, W. T. Concepts, technical issues, and uses of the Russell-Yasuda Kasai financial planning model. Operations research 46, 450–462 (1998). 16. Golari, M., Fan, N. & Jin, T. Multistage stochastic optimization for production-inventory planning with intermittent renewable energy. Production and Operations Management 26, 409–425 (2017). 134 17. Shiina, T. & Birge, J. R. Multistage stochastic programming model for electric power capac- ity expansion problem. Japan journal of industrial and applied mathematics 20, 379–397 (2003). 18. Gangammanavar, H. & Sen, S. Two-scale stochastic optimization for controlling distributed storage devices. IEEE Transactions on Smart Grid 9, 2691–2702 (2016). 19. Bertsekas, D. Dynamic programming and optimal control: Volume I 2012. 20. Powell, W. B. Approximate Dynamic Programming: Solving the curses of dimensionality 2007. 21. Shapiro, A. & Philpott, A. A tutorial on stochastic programming. Manuscript. Available at www2. isye. gatech. edu/ashapiro/publications. html 17 (2007). 22. Kleywegt, A. J., Shapiro, A. & Homem-de-Mello, T. The sample average approximation method for stochastic discrete optimization. SIAM Journal on Optimization 12, 479–502 (2002). 23. Deng, Y . & Sen, S. Predictive stochastic programming. Computational Management Sci- ence, 1–34 (2021). 24. Nelson, B. L., Swann, J., Goldsman, D. & Song, W. Simple procedures for selecting the best simulated system when the number of alternatives is large. Operations Research 49, 950–963 (2001). 25. Santner, T. J. & Goldsman, D. M. Design and analysis of experiments for statistical selec- tion, screening, and multiple comparisons (Wiley, 1995). 26. Bardossy, M. G. & Raghavan, S. An inexact sample average approximation approach for the stochastic connected facility location problem. Networks 70, 19–33 (2017). 27. Higle, J. L. & Sen, S. Stochastic decomposition: An algorithm for two-stage linear programs with recourse. Mathematics of operations research 16, 650–669 (1991). 28. Higle, J. L. & Sen, S. Finite master programs in regularized stochastic decomposition. Math- ematical Programming 67, 143–168 (1994). 29. Higle, J. L., Sen, S. & Sen, S. Stochastic decomposition: a statistical method for large scale stochastic linear programming (Springer Science & Business Media, 1996). 30. Mak, W.-K., Morton, D. P. & Wood, R. K. Monte Carlo bounding techniques for determin- ing solution quality in stochastic programs. Operations research letters 24, 47–56 (1999). 31. Sen, S. & Liu, Y . Mitigating uncertainty via compromise decisions in two-stage stochastic linear programming: Variance reduction. Operations Research 64, 1422–1437 (2016). 32. Liu, J. & Sen, S. Asymptotic results of stochastic decomposition for two-stage stochastic quadratic programming. SIAM Journal on Optimization 30, 823–852 (2020). 33. CSSE, J. COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. https://github.com/CSSEGISandData/COVID-19 (ac- cessed Nov. 2, 2020) (2020). 34. Team, I. C.-1. F. Modeling COVID-19 scenarios for the United States. Nature Medicine (2020). 35. Friedman, J., Liu, P., Gakidou, E., COVID, I. & Team, M. C. Predictive performance of international COVID-19 mortality forecasting models. medRxiv (2020). 36. Ray, E. L., Wattanachit, N., Niemi, J., Kanji, A. H., House, K., Cramer, E. Y ., et al. Ensemble forecasts of coronavirus disease 2019 (covid-19) in the us. MedRXiv (2020). 37. Santosh, K. COVID-19 prediction models and unexploited data. Journal of medical systems 44, 1–4 (2020). 135 38. COVID, I., Murray, C. J., et al. Forecasting COVID-19 impact on hospital bed-days, ICU- days, ventilator-days and deaths by US state in the next 4 months. MedRxiv (2020). 39. Xu, J. & Sen, S. Decision Intelligence for Nationwide Ventilator Allocation During the COVID-19 Pandemic. SN Computer Science 2, 1–15 (2021). 40. Birge, J. R. & Louveaux, F. Introduction to stochastic programming (Springer Science & Business Media, 2011). 41. Brauer, F., Castillo-Chavez, C. & Castillo-Chavez, C. Mathematical models in population biology and epidemiology (Springer, 2012). 42. Hong, Y .-R., Lawrence, J., Williams Jr, D. & Mainous Iii, A. Population-level interest and telehealth capacity of US hospitals in response to COVID-19: cross-sectional analysis of Google search and national hospital survey data. JMIR Public Health and Surveillance 6, e18961 (2020). 43. Tsai, T. C., Jacobson, B. H. & Jha, A. K. American hospital capacity and projected need for COVID-19 patient care. Health Affairs Blog (2020). 44. Anastassopoulou, C., Russo, L., Tsakris, A. & Siettos, C. Data-based analysis, modelling and forecasting of the COVID-19 outbreak. PloS one 15, e0230405 (2020). 45. Sameni, R. Mathematical modeling of epidemic diseases; a case study of the COVID-19 coronavirus. arXiv preprint arXiv:2003.11371 (2020). 46. Yang, W., Karspeck, A. & Shaman, J. Comparison of filtering methods for the modeling and retrospective forecasting of influenza epidemics. PLoS Comput Biol 10, e1003583 (2014). 47. Piscitello, G. M., Kapania, E. M., Miller, W. D., Rojas, J. C., Siegler, M. & Parker, W. F. Variation in ventilator allocation guidelines by US state during the coronavirus disease 2019 pandemic: a systematic review. JAMA network open 3, e2012606–e2012606 (2020). 48. Beitler, J. R., Mittel, A. M., Kallet, R., Kacmarek, R., Hess, D., Branson, R., et al. Ventilator sharing during an acute shortage caused by the COVID-19 pandemic. American journal of respiratory and critical care medicine 202, 600–604 (2020). 49. Emanuel, E. J., Persad, G., Upshur, R., Thome, B., Parker, M., Glickman, A., et al. Fair allocation of scarce medical resources in the time of Covid-19 2020. 50. Huang, H.-C., Araz, O. M., Morton, D. P., Johnson, G. P., Damien, P., Clements, B., et al. Stockpiling ventilators for influenza pandemics. Emerging infectious diseases 23, 914 (2017). 51. Deng, Y . & Sen, S. Predictive Stochastic Programming. Computational Management Sci- ence (2020). 52. Wets, R. J.-B. Solving stochastic programs with simple recourse. Stochastics 10, 219–242 (1983). 53. Breiman, L. Bagging predictors. Machine learning 24, 123–140 (1996). 54. Defourny, B., Ernst, D. & Wehenkel, L. Planning under uncertainty, ensembles of distur- bance trees and kernelized discrete action spaces in 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (2009), 145–152. 55. Chen, C.-H., Lin, J., Y¨ ucesan, E. & Chick, S. E. Simulation budget allocation for further enhancing the efficiency of ordinal optimization. Discrete Event Dynamic Systems 10, 251– 270 (2000). 56. Kahn, H. & Marshall, A. W. Methods of reducing sample size in Monte Carlo computations. Journal of the Operations Research Society of America 1, 263–278 (1953). 136 57. Botev, Z. & Ridder, A. Variance reduction. Wiley StatsRef: Statistics Reference Online, 1–6 (2014). 58. Benders, J. F. Partitioning procedures for solving mixed-variables programming problems. Numerische mathematik 4, 238–252 (1962). 59. Sen, S. & Sherali, H. D. Decomposition with branch-and-cut approaches for two-stage stochastic mixed-integer programming. Mathematical Programming 106, 203–223 (2006). 60. Gade, D., K¨ uc ¸ ¨ ukyavuz, S. & Sen, S. Decomposition algorithms with parametric Gomory cuts for two-stage stochastic integer programs. Mathematical Programming 144, 39–64 (2014). 61. Sen, S. Algorithms for stochastic mixed-integer programming models. Handbooks in oper- ations research and management science 12, 515–558 (2005). 62. K¨ uc ¸ ¨ ukyavuz, S. & Sen, S. in Leading Developments from INFORMS Communities 1–27 (INFORMS, 2017). 63. Ghadimi, S., Lan, G. & Zhang, H. Mini-batch stochastic approximation methods for non- convex stochastic composite optimization. Mathematical Programming A 155, 267–305 (2016). 64. Shawe-Taylor, J., Cristianini, N., et al. Kernel methods for pattern analysis (Cambridge university press, 2004). 65. Homem-de-Mello, T., De Matos, V . L. & Finardi, E. C. Sampling strategies and stopping criteria for stochastic dual dynamic programming: a case study in long-term hydrothermal scheduling. Energy Systems 2, 1–31 (2011). 66. Chen, C.-H., He, D., Fu, M. & Lee, L. H. Efficient simulation budget allocation for selecting an optimal subset. INFORMS Journal on Computing 20, 579–595 (2008). 67. Ho, Y .-C., Sreenivas, R. S. & Vakili, P. Ordinal optimization of DEDS. Discrete event dy- namic systems 2, 61–88 (1992). 68. Dai, L. Convergence properties of ordinal comparison in the simulation of discrete event dynamic systems. Journal of Optimization Theory and Applications 91, 363–388 (1996). 69. Chen, C.-H. A lower bound for the correct subset-selection probability and its application to discrete-event system simulations. IEEE transactions on automatic control 41, 1227–1231 (1996). 70. Stollsteimer, J. F. The effect of technical change and output expansion on the optimum num- ber, size, and location of pear marketing facilities in a California pear producing region (University of California, Berkeley, 1961). 71. Wang, Q., Batta, R. & Rump, C. M. Algorithms for a facility location problem with stochas- tic customer demand and immobile servers. Annals of operations Research 111, 17–34 (2002). 72. Louveaux, F. V . & Peeters, D. A dual-based procedure for stochastic facility location. Op- erations research 40, 564–573 (1992). 73. Dyer, M. & Stougie, L. Computational complexity of stochastic programming problems. mathematical programming 106, 423–432 (2006). 74. Hanasusanto, G. A., Kuhn, D. & Wiesemann, W. A comment on “computational complexity of stochastic programming problems”. Mathematical Programming 159, 557–569 (2016). 75. Shapiro, A. On complexity of multistage stochastic programs. Operations Research Letters 34, 1–8 (2006). 137 76. Pereira, M. V . & Pinto, L. M. Multi-stage stochastic optimization applied to energy plan- ning. Mathematical programming 52, 359–375 (1991). 77. Gangammanavar, H. & Sen, S. Stochastic Dynamic Linear Programming: A Sequential Sampling-based Multistage Stochastic Programming Algorithm. 78. Dowson, O. & Kapelevich, L. SDDP.jl: a Julia package for stochastic dual dynamic pro- gramming. INFORMS Journal on Computing 33, 27–33. doi:https://doi.org/10. 1287/ijoc.2020.0987 (1 2021). 79. Donohue, C. J. & Birge, J. R. The abridged nested decomposition method for multistage stochastic linear programs with relatively complete recourse. Algorithmic Operations Re- search 1 (2006). 80. Philpott, A. B. & Guan, Z. On the convergence of stochastic dual dynamic programming and related methods. Operations Research Letters 36, 450–455 (2008). 81. Dowson, O. The policy graph decomposition of multistage stochastic programming prob- lems. Networks 76, 3–23 (2020). 82. De Matos, V . L., Morton, D. P. & Finardi, E. C. Assessing policy quality in a multistage stochastic program for long-term hydrothermal scheduling. Annals of Operations Research 253, 713–731 (2017). 83. Sen, S. & Liu, Y . Mitigating uncertainty via compromise decisions in two-stage stochastic linear programming: Variance reduction. Operations Research 64, 1422–1437 (2016). 84. Smith, J. E. & Winkler, R. L. The optimizer’s curse: Skepticism and postdecision surprise in decision analysis. Management Science 52, 311–322 (2006). 85. Peer, O., Tessler, C., Merlis, N. & Meir, R. Ensemble Bootstrapping for Q-Learning in Proceedings of the 38th International Conference on Machine Learning, ICML 2021 139 (2021), 8454–8463. 86. Mannor, S., Simester, D., Sun, P. & Tsitsiklis, J. N. Bias and variance approximation in value function estimates. Management Science 53, 308–322 (2007). 87. Roug´ e, C. & Tilmant, A. Using stochastic dual dynamic programming in problems with multiple near-optimal solutions. Water Resources Research 52, 4151–4163 (2016). 88. Madduri, R., Chard, K., D’Arcy, M., Jung, S. C., Rodriguez, A., Sulakhe, D., et al. Repro- ducible Big Data Science: A Case Study in Continuous FAIRness. BioRxiv, 268755 (2018). 89. Ralphs, T., Vigerske, S. & Waechter, A. COIN-OR Build Tools 0.8 in. https://github. com/coin-or-tools/BuildTools/, accessed 28 th June 2019 (COR@L Laboratory, Lehigh University, 2018). doi:10.5281/zenodo.1147193. 90. Ahmed, S., Garcia, R., Kong, N., Ntaimo, L., Parija, G., Qiu, F., et al. SIPLIB: A Stochas- tic Integer Programming Test Problem Library in. http://www2.isye.gatech.edu/ ~ sahmed/siplib, accessed 28 th June 2019 (2004). 91. Pasupathy, R. & Henderson, S. G. A Testbed of Simulation-Optimization Problems in Pro- ceedings of the 2006 Winter Simulation Conference (eds Perrone, L. F., Lawson, B. G., Liu, J. & Wieland, F. P.) (2006), 255–263. 92. Pasupathy, R. & Henderson, S. G. SimOpt: A Library of Simulation Optimization Prob- lems in Proceedings of the 2011 Winter Simulation Conference (eds Jain, S., Creasey, R., Himmelspach, J., White, K. P. & Fu, M. C.) (2011), 4080–4090. 93. Winston, W. L. Introduction to Mathematical Programming: Applications and Algorithms (Duxbury Resource Center, N. Scituate, Massachusetts, 2003). 138 94. Schuler, R. E., Kesselman, C. & Czajkowski, K. Accelerating Data-Driven Discovery with Scientific Asset Management in 2016 IEEE 12 th International Conference on e-Science (2016), 31–40. 95. Gangammanavar, H., Sen, S. & Zavala, V . M. Stochastic Optimization of Sub-Hourly Eco- nomic Dispatch With Wind Energy. IEEE Transactions on Power Systems 31, 949–959 (2016). 96. Xu, J., Deng, Y . & Sen, S. Multi-Dimensional Newsvendor (MDNV) Problem in. https: //doi.org/10.25551/WCF8, accessed 28 th June (2018). 97. Kim, S., Pasupathy, R. & Henderson, S. G. in Handbook of Simulation Optimization (ed Fu, M. C.) 207–243 (Springer, New York, NY, 2015). doi:10.1007/978-1-4939-1384-8_8. 98. Harrison, J. M. & Van Mieghem, J. A. Multi-Resource Investment Strategies: Operational hedging under Demand Uncertainty. European Journal of Operational Research 113, 17– 29 (1999). 99. Nemirovski, A., Juditsky, A., Lan, G. & Shapiro, A. Robust stochastic Approximation Ap- proach to stochastic Orogramming. SIAM Journal on optimization 19, 1574–1609 (2009). 100. Sen, S. & Liu, Y . Mitigating Uncertainty via Compromise Decisions in Two-Stage Stochas- tic Linear Programming: Variance Reduction. Operations Research 64, 1422–1437 (2016). 101. Hillier, F. S. & Lieberman, G. J. Introduction to operations research (McGraw-Hill Science, Engineering and Mathematics, 1995). 102. Deng, Y ., Xu, J. & Sen, S. LEO-Wyndor Problem in.https://doi.org/10.25551/W1QR, accessed 28 th June (2018). 103. LeCun, Y ., Bengio, Y . & Hinton, G. Deep learning. nature 521, 436–444 (2015). 104. Goodfellow, I., Bengio, Y . & Courville, A. Deep learning (MIT press, 2016). 105. Schmidhuber, J. Deep learning in neural networks: An overview. Neural networks 61, 85– 117 (2015). 106. He, K., Zhang, X., Ren, S. & Sun, J. Identity mappings in deep residual networks in Euro- pean conference on computer vision (2016), 630–645. 107. Deng, L. & Liu, Y . Deep learning in natural language processing (Springer, 2018). 108. Hester, T., Vecerik, M., Pietquin, O., Lanctot, M., Schaul, T., Piot, B., et al. Deep q-learning from demonstrations in Proceedings of the AAAI Conference on Artificial Intelligence 32 (2018). 109. Liu, G.-H. & Theodorou, E. A. Deep learning theory review: An optimal control and dy- namical systems perspective. arXiv preprint arXiv:1908.10920 (2019). 110. Weinan, E. A proposal on machine learning via dynamical systems. Communications in Mathematics and Statistics 5, 1–11 (2017). 111. Martens, J. New insights and perspectives on the natural gradient method. arXiv preprint arXiv:1412.1193 (2014). 112. Martens, J. & Grosse, R. Optimizing neural networks with kronecker-factored approximate curvature in International conference on machine learning (2015), 2408–2417. 113. Berahas, A. S., Nocedal, J. & Tak´ ac, M. A multi-batch L-BFGS method for machine learning in Advances in Neural Information Processing Systems (2016), 1055–1063. 114. Bollapragada, R., Nocedal, J., Mudigere, D., Shi, H.-J. & Tang, P. T. P. A progressive batch- ing L-BFGS method for machine learning in International Conference on Machine Learning (2018), 620–629. 139 115. MAYNE, D. A Second-order Gradient Method for Determining Optimal Trajectories of Non-linear Discrete-time Systems. International Journal of Control 3, 85–95. doi:10.1080/ 00207176608921369 (1966). 116. Sen, S. & Yakowitz, S. J. A quasi-Newton differential dynamic programming algorithm for discrete-time optimal control. Automatica 23, 749–752 (1987). 117. Liu, G.-H., Chen, T. & Theodorou, E. A. Differential dynamic programming neural opti- mizer. arXiv e-prints, arXiv–2002 (2020). 118. Haskell, W. B., Jain, R., Sharma, H. & Yu, P. A universal empirical dynamic programming algorithm for continuous state MDPs. IEEE Transactions on Automatic Control 65, 115– 129 (2019). 119. Mnih, V ., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., et al. Play- ing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013). 140
Abstract (if available)
Abstract
This thesis focuses on computational validation of stochastic programming models and applications, which includes topics in two-stage stochastic linear programming (2-SLP), two-stage stochastic combinatorial optimization (SCO), and multi-stage stochastic linear programming (MSLP). I touch upon compelling applications in the healthcare area for these fundamental building blocks, studying statistical issues for variance reduction in SCO and designing algorithms for distributed computations in MSLP. Specifically, the computational experiments focus on network-related problems, such as resource allocation, facility location problems, and power systems. The thesis begins with an overview of stochastic programming (SP), followed by a contemporary application in resource allocation. Our models and the associated computational results illustrate the power of 2-SLP models for reducing resource shortages nationwide (US) using data for ventilator demand/availability. I demonstrate both their computational practicality and improved decision-making via vastly reduced shortages during the peak of the COVID-19 pandemic. From 2-SLP, I look at SCO, where the challenges include non-convexity (due to discrete decision variables), as well as variability (due to sampling methods). Our practices intend to overcome the combinatorial explosion in the size of realistic instances. Among these approaches to reducing variance, I introduce new concepts, such as kernel methods from statistical learning, compromise decisions from stochastic linear programming, and optimal computing budget allocation (OCBA) from simulation optimization into large-scale SCO models. These methods expand the computational horizons of SCO and improve the solution quality of SCO significantly. A stopping rule is proposed based on bagging and compromise decisions, which provides a probability guarantee of the quality of the final decision. Following the above advances, we move to the next class of challenging SP models, namely, MSLP, where we study a meta-algorithm. The meta-algorithm can incorporate most value function-based algorithms, such as SDDP. A compromise policy is proposed, which exploits distributed computing. I show that the compromise policy can reduce variance and bias. I propose a sequential sampling algorithm called the online dual dynamic programming (ODDP) algorithm. This approach introduces multiple replications for MSLP models to improve the accuracy of the estimated optimum values and decisions. I propose an optimality test using an out-of-sample optimality gap used in the ODDP algorithm. Finally, as part of a team, I have developed a cyber-infrastructure called computational Operation Research exchange (cORe), which promotes reproducibility in computational experiments for stochastic programming.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Computational stochastic programming with stochastic decomposition
PDF
On the interplay between stochastic programming, non-parametric statistics, and nonconvex optimization
PDF
The next generation of power-system operations: modeling and optimization innovations to mitigate renewable uncertainty
PDF
Learning enabled optimization for data and decision sciences
PDF
The fusion of predictive and prescriptive analytics via stochastic programming
PDF
Learning and control in decentralized stochastic systems
PDF
Smarter markets for a smarter grid: pricing randomness, flexibility and risk
PDF
A stochastic conjugate subgradient framework for large-scale stochastic optimization problems
PDF
Models and algorithms for the freight movement problem in drayage operations
PDF
Online learning algorithms for network optimization with unknown variables
PDF
Stochastic games with expected-value constraints
PDF
Stochastic models: simulation and heavy traffic analysis
PDF
Modern nonconvex optimization: parametric extensions and stochastic programming
PDF
Empirical methods in control and optimization
PDF
Topics in algorithms for new classes of non-cooperative games
PDF
Interaction and topology in distributed multi-agent coordination
PDF
Mixed-integer nonlinear programming with binary variables
PDF
I. Asynchronous optimization over weakly coupled renewal systems
PDF
Learning and decision making in networked systems
PDF
Congestion reduction via private cooperation of new mobility services
Asset Metadata
Creator
Xu, Jiajun
(author)
Core Title
Computational validation of stochastic programming models and applications
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Degree Conferral Date
2022-08
Publication Date
06/30/2022
Defense Date
05/16/2022
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
integer programming,OAI-PMH Harvest,optimization,power system,stochastic programming,variance reduction
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Sen, Suvrajeet (
committee chair
), Ioannou, Petros (
committee member
), Jain, Rahul (
committee member
), Kesselman, Carl (
committee member
)
Creator Email
jiajunx@usc.edu,jjxu217@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC111352084
Unique identifier
UC111352084
Legacy Identifier
etd-XuJiajun-10799
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Xu, Jiajun
Type
texts
Source
20220706-usctheses-batch-950
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
integer programming
optimization
power system
stochastic programming
variance reduction