Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Computational stochastic programming with stochastic decomposition
(USC Thesis Other)
Computational stochastic programming with stochastic decomposition
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
UNIVERSITY OF SOUTHERN CALIFORNIA DOCTORAL THESIS ComputationalStochasticProgrammingwith StochasticDecomposition Author: Yifan LIU Supervisor: Dr. Suvrajeet SEN Athesissubmittedinfulfilmentoftherequirements forthedegreeofDoctorofPhilosophy inthe DataDrivenDecisionsLaboratory DanielJ.EpsteinDepartmentofIndustrialandSystemsEngineering May17,2016 DeclarationofAuthorship I, Yifan LIU, declare that this thesis titled, “Computational Stochastic Programming with StochasticDecomposition”andtheworkpresentedinitaremyown. Iconfirmthat: • This work was done wholly or mainly while in candidacy for a research degree at thisUniversity. • Whereanypartofthisthesishaspreviouslybeensubmittedforadegreeoranyother qualificationatthisUniversityoranyotherinstitution,thishasbeenclearlystated. • WhereIhaveconsultedthepublishedworkofothers,thisisalwaysclearlyattributed. • Where I have quoted from the work of others, the source is always given. With the exceptionofsuchquotations,thisthesisisentirelymyownwork. • Ihaveacknowledgedallmainsourcesofhelp. • Where the thesis is based on work done by myself jointly with others, I have made clearexactlywhatwasdonebyothersandwhatIhavecontributedmyself. Signed: Date: iii “Forwisdomwillenteryourheart,andknowledgewillbepleasanttoyoursoul.” Proverbs2:10 v UNIVERSITYOFSOUTHERNCALIFORNIA Abstract YifanLiu DanielJ.EpsteinDepartmentofIndustrialandSystemsEngineering DoctorofPhilosophy ComputationalStochasticProgrammingwithStochasticDecomposition byYifan LIU Stochastic Programming (SP) has long been considered as a well-justified yet compu- tationally challenging paradigm for practical applications. Computational studies in the literatureofteninvolveapproximatingalargenumberofscenariosbyusingasmallnum- berofscenariostobeprocessedviadeterministicsolvers,orrunningSampleAverageAp- proximation on some genre of high performance machines so that statistically acceptable bounds can be obtained. In this dissertation we show that for a class of stochastic lin- ear programming problems, an alternative approach known as Stochastic Decomposition (SD) can provide solutions of similar quality, in far less computational time using ordi- narydesktoporlaptopmachinesoftoday. Inadditiontothesecompellingcomputational results, we also provide a stronger convergence result for SD, and introduce a new solu- tionconceptwhichwerefertoasthecompromisedecision. Thisnewconceptisattractive foralgorithmswhichcallformultiplereplicationsinsampling-basedconvexoptimization algorithms. Forsuchreplicatedoptimization,weshowthatthedifferencebetweenanav- eragesolutionandacompromisedecisionprovidesanaturalstoppingrule. vii SDisasequentialsamplingschemecombinedwithBenders’likedecompositionmethod forsolvingtwostagestochasticlinearprogramswithrecourse. Itexploitsthespecialstruc- turesoftherecourseproblemtogeneratefunctionapproximationsinanefficientmanner. However,randomnessisonlyallowedinthesecondstagerighthandsideandtechnology matrix associated with the first stage decision variables. In the second part of this dis- sertation, we relax this assumption to accommodate randomness of the second stage cost coefficients. Finallyourcomputationalresultscoveravarietyofinstancesfromthelitera- ture, and we further scale some operations management applications up to a level that is previouslyconsideredoutofboundsforstochasticprogramming. viii Acknowledgements I would like to express my deepest gratitude to my advisor Dr. Suvrajeet Sen for his advice, inspiration, support and patience throughout my PhD studies at both Ohio State and USC. This dissertation would not have been possible without his guidance and deep insights. I have been very fortunate to have him as my mentor not only in academics but alsoindailylife. I am grateful to Dr. Julie Higle, our department chair for providing a great research environment in the ISE department. I would also like to thank Dr. Rahul Jain for serving on my committees and providing valuable feedback. I have had chances to take classes frombothprofessorsandIenjoytheirteachingsverymuch. Duringmystudies,Iwasfortunatetobefundedthroughresearchassistantshipsfrom NSF and AFSOR grants. I would also like to thank Epstein Department for providing travel grants for attending academic conferences. Many thanks to the ISE staff members foralltheirsupportandhelp. I had the chance to get to know many wonderful graduate students in the ISE de- partment including Dinakar Gade, Harsha Gangammanavar, Praneeth Aketi, Yunwei Qi, Semih Atakan, Chengjiang Qian, Yunxiao Deng, Alp Ozkan, Junyi Liu. Special thanks to Dinakar, Harsha and Semih for many valuable discussions both inside and outside the DataDrivenDecisionsLab. Finally,IwouldliketothankmyparentsQunjingLiuandFengjuChen,mywifeChen XieandherparentsJunXieandYuXiao,fortheirloveandsupport. ix Contents DeclarationofAuthorship iii Abstract vii Acknowledgements ix 1 Introduction 1 1.1 IndividualDecision-making............................ 1 1.1.1 UtilityTheory................................ 1 1.1.2 JudgmentandDecision-making ..................... 3 1.2 GroupDecision-making .............................. 4 1.2.1 Games .................................... 4 1.2.2 CooperativeandNon-cooperativeGames................ 4 1.3 System-wideDecision-making .......................... 5 1.3.1 StochasticProgramming.......................... 5 1.3.2 RobustOptimization............................ 5 1.3.3 SimulationOptimization ......................... 6 1.3.4 MarkovDecisionProcesses ........................ 7 1.4 Learning ....................................... 7 1.4.1 SupportVectorMachines ......................... 8 1.4.2 NeuralNetwork .............................. 8 1.4.3 Multi-armsBanditProblem ........................ 8 1.5 Contributionsofthisdissertation ......................... 9 1.6 Conclusion...................................... 10 xi 2 IntroductiontoStochasticProgramming 11 2.1 AnOverivewofStochasticProgrammingModels ............... 11 2.2 AnExample ..................................... 15 2.3 SolutionMethods .................................. 19 DeterministicDecomposition ....................... 19 StochasticApproximation(SA) ...................... 21 SampleAverageApproximation(SAA) ................. 23 StochasticDecompositionMethod .................... 24 2.4 Conclusion...................................... 28 3 CompromiseDecisionsinTwo-StageStochasticLinearProgramming 29 3.1 Introduction ..................................... 29 3.2 Motivation: ComputationswithaPracticalSLP................. 33 3.2.1 AnapplicationofSampleAverageApproximationtoSSN ...... 34 3.2.2 StochasticDecomposition(SD)forSSN ................. 36 3.3 AlgorithmicConceptsinStochasticDecomposition .............. 43 3.3.1 AlgorithmicDetailsofSDandConvergence .............. 45 3.3.2 StoppingRules ............................... 52 In-SampleStoppingRule. ......................... 52 Out-of-SampleStoppingRule. ...................... 56 3.4 FurtherComputationalResults .......................... 62 3.5 Conclusions ..................................... 67 4 StochasticDecompositionwithRandomCost 71 4.1 Introduction ..................................... 71 4.2 DescriptionofSDAlgorithm ........................... 72 AlgorithmicDescriptionofSD ...................... 73 4.3 SparsityPreservingDecompositionofSecond-StageDualVectors ...... 74 4.4 DescriptionoftheDualVertexSet ........................ 75 4.5 CheckingDualFeasibility ............................. 76 xii 4.5.1 FurtherNotesonDualFeasibilityTest.................. 77 MultipleRandomCostCoefficients ................... 78 SingleRandomCostCoefficient ..................... 78 4.6 AlgorithmforRecursivelyBuildingV k t ..................... 80 4.7 SupplyChain-FreightTransportationApplication............... 80 4.8 Conclusion...................................... 82 5 OperationsManagementApplications 85 5.1 Introduction ..................................... 85 5.2 Operationandproductionmanagementmodels ................ 88 5.2.1 Multiproductsinventoryproblemwithsubstitution ......... 90 5.2.2 Multi-locationtransshipmentproblem.................. 92 5.2.3 SupplyChainOperationswithRandomDemandsandFreightRates 94 5.3 ComputationalExperiments............................ 99 5.4 Conclusion...................................... 109 6 FutureWork 111 6.1 NetworkStructuredRandomCostProblem ................... 111 6.2 SDforBuildingSupportVectorMachines .................... 112 6.3 BinaryFirstStageModels-theChainingProblem ............... 115 A TheSSNInstance-BacktotheFuture 119 B FiguresofDualVertexStability 123 C FiguresComparingSD,SAAandSAAwithLHS 125 D PseudocodeforRCSD 129 D.1 FormingCuts .................................... 130 D.2 UpdatingCuts.................................... 132 D.3 Resampling ..................................... 134 xiii E AnIllustrativeExample 135 Bibliography 141 xiv ListofFigures 2.1 Loaddurationcurve. ................................ 16 2.2 Linearizationofaloaddurationcurve....................... 16 2.3 Tworealizationsoftheloaddurationcurve.................... 18 3.1 CumulativefrequencyofSSNobjectivefunctionestimatesforLoose,Nom- inalandTighttolerances .............................. 42 3.2 Relative/absolute difference between compromise and average decisions calculatedforLoose,NominalandTighttolerances .............. 44 5.1 CurrentFrameworkforStatisticalAnalytics. .................. 87 5.2 FrameworkforIntegratedAnalytics........................ 87 5.3 ObjectivelowerandupperboundsofSDandSAAforBAA99......... 102 5.4 ObjectivelowerandupperboundsofSDandSAAforBAA99-20. ...... 102 5.5 ObjectivelowerandupperboundsofSDandSAAforBAA99-256. ..... 103 5.6 ObjectivelowerandupperboundsofSDandSAAforRETAIL. ....... 103 5.7 ObjectivelowerandupperboundsofSDandSAAforRETAIL14. ...... 104 5.8 ObjectivelowerandupperboundsofSDandSAAforRETAIL140. ..... 104 5.9 ObjectivelowerandupperboundsofSDandSAAforSCFT-6......... 105 5.10 ObjectivelowerandupperboundsofSDandSAAforSCFT-46. ....... 105 5.11 ObjectivelowerandupperboundsofSD,SAAandSAAwithLHSforBAA99.109 C.1 ObjectivelowerandupperboundsofSD,SAAandSAAwithLHSforBAA99- 20............................................ 125 C.2 ObjectivelowerandupperboundsofSD,SAAandSAAwithLHSforBAA99- 256. .......................................... 126 xv C.3 ObjectivelowerandupperboundsofSD,SAAandSAAwithLHSforRETAIL.126 C.4 Objective lower and upper bounds of SD, SAA and SAA with LHS for RE- TAIL14......................................... 127 C.5 Objective lower and upper bounds of SD, SAA and SAA with LHS for RE- TAIL140. ....................................... 127 C.6 ObjectivelowerandupperboundsofSD,SAAandSAAwithLHSforSCFT-6.128 C.7 ObjectivelowerandupperboundsofSD,SAAandSAAwithLHSforSCFT- 46............................................ 128 E.1 Diamondshapedexamplewith! 5 =2.5..................... 138 E.2 Diamondshapedexamplewith! 5 =1.5..................... 139 E.3 Diamondshapedexamplewith! 5 =0.5..................... 140 xvi ListofTables 3.1 SAAwithLatinHypercubeSampling ...................... 36 3.2 Stochastic Decomposition with Common Random Numbers (on a desktop PCwithCPLEX12.4) ................................ 39 3.3 StochasticDecompositionwithCommonRandomNumbers(onaMacBook AirwithCPLEX12.4) ................................ 41 3.4 TestinstanceswithSDunderNominalTolerance................ 63 3.5 Objective upper and lower bounds and corresponding 95% confidence in- tervalsofallinstancessolvedbySD ....................... 65 3.6 Samplesize(stddev),solutiontimeforSD(stddev),solutiontimeforCom- promiseProblemandevaluationtime(ofcompromisedecisionandaverage decision)forallinstancessolvedbySD ..................... 66 4.1 Thestructureofdualvertices............................ 79 4.2 SupplyChain-FreightTransportationinstancestestedwithSD ....... 82 4.3 ComputationalResults(ObjectiveValueEstimates,andOptimalityGapEs- timates)........................................ 82 4.4 ComputationalResults(SampleSizesandCPUtimes)............. 82 5.1 OMinstancestestedwithSDunderLooseTolerance.............. 99 5.2 Objective upper and lower bounds and corresponding 95% confidence in- tervalsofallinstancessolvedbySDwithloosetolerance ........... 100 5.3 Objective upper and lower bounds and corresponding 95% confidence in- tervalsofallinstancessolvedbySDwithnominaltolerance ......... 100 xvii 5.4 Objective upper and lower bounds and corresponding 95% confidence in- tervalsofallinstancessolvedbySDwithtighttolerance ........... 101 5.5 Summaryoftotalsolutiontime(CPUsecond)ofSDandSAAforallinstances106 5.6 Samplesize(stddev),solutiontimeforSD(stddev),solutiontimeforCom- promiseProblemandvalidationtime(ofcompromisedecisionandaverage decision)forallinstancessolvedbySDwithloosetolerance ......... 107 5.7 Samplesize(stddev),solutiontimeforSD(stddev),solutiontimeforCom- promiseProblemandvalidationtime(ofcompromisedecisionandaverage decision)forallinstancessolvedbySDwithnominaltolerance ....... 108 5.8 Samplesize(stddev),solutiontimeforSD(stddev),solutiontimeforCom- promiseProblemandvalidationtime(ofcompromisedecisionandaverage decision)forallinstancessolvedbySDwithtighttolerance.......... 108 E.1 Informationforillustrativepurposes. ...................... 137 E.2 Storedbasesinformationfor! 5 =2.5 ...................... 138 E.3 Storedbasesinformationfor! 5 =1.5 ...................... 139 E.4 Storedbasesinformationfor! 5 =0.5 ...................... 140 xviii ListofAbbreviations C.I. ConfidenceInterval CEP CapacityExpansionPlanning DP DynamicProgramming i.i.d. independentandidenticallydistributed LP LinearProgramming MDP MarkovDecisionProcess OM OperationsMangement PGP PowerGenerationPlanning RC RandomCost RCSD RandomCostStochasticDecomposition SA StochasticApproximation SD StochasticDecomposition SP StochasticProgram SAA SampleAverageApproximation SLP StochasticLinearProgram SSN SonetSwitchedNetwork xix Tomyparents. xxi Chapter1 Introduction Decision-makingunderuncertaintyisverybroadtopicanditrangesfromthefieldofop- erationsresearch,economics,controltheory,computersciencetoprobabilityandstatistics. It would be a daunting task to even touch the surface of each field and try to make con- nections with the theme of decision making under uncertainty. In the following sections, weprovideabriefoverviewofasmallbutrelevantsubsetoffieldsofresearchthatmight put this dissertation under the larger context of making decisions under uncertainty. For eachsubarea,wewillpointtosomeseminarworkaccompaniedbymorerecentreferences which will give the interested reader a more detailed exposition than it’s possible in this dissertation. 1.1 IndividualDecision-making 1.1.1 UtilityTheory Thismaybe,historically,themostwellknownframeworkformodelingthedecisionmak- ing process. (von Neumann and Morgenstern (2007)) For a more recent exposition see Howard and Abbas (2015). Earlier development of the utility theory focuses on the peo- ple’s preferences over alternative choices. For example, for decision x 2 X and y 2 X whereX is the set of alternative decisions, an asymmetric and transitive relation is de- fined over the real valued utility function u as (here assuming alternative with greater 1 2 Chapter1.Introduction utilityvaluearealwayspreferabletothosewithlessutilityvalue): x y) u(x)<u(y) (1.1) Asymmetricmeansthatifx y istrue, theny xmustbefalse. Transitivityensures that alternatives are ordered in a consistent way i.e., if x y and y z, then x z. A decision maker might be indifferent between two alternatives x,y and this situation is modeled with the indifference relation⇠ . Note that the transitive assumption about the indifferencerelationiscriticalfororderingoftheutilityvalues. If⇠ isnon-transitive,then x ⇠ y and x 6= y could result in u(x)<u(y) or u(y)<u(x) depending on the utility function. If⇠ istransitiveandonlycountablemanyindifferentclassesexist,thenwehave x y, u(x)<u(y) (1.2) The early development of utility theory focused on expected utility. Generally speak- ing,arationaldecisionmakerissupposedtomakedecisionsthatmaximizeacertainform oftheutilityfunctionwhich,forinstance,couldbeanexpectedvalueutilityfunction. The inclusion of probability into utility theory stimulates the development of theory to de- scribe preference structure under uncertainty. For a probability measure P defined over thesimplelotterysetX thathasfinitesupport,theexpectedutilityisdefinedas u(p)= X x2 X p(x)u(x) (1.3) Under such framework, certain inconsistencies arise and one of the most notable ones is called St. Petersberg’s paradox (Aumann (1977)). This paradox could still be fixed using concave-shapedlogutilityfunctionswiththeassumptionofupperboundedlotteryprize. Thus terminologies like “risk neutral”, “risk averse” and “risk seeking” follows the de- velopmentoftheanalysisoftheriskinvolvedwiththedecisionmakingprocess. Butstill, certainaspectsrelatedtohumanpsychologyarenotaccommodatedunderthisframework. Chapter1.Introduction 3 AnotherfamousparadoxcalledAllaisparadox(seeMachina(1987))pointsouttheincon- sistency of the expected utility theory with respect to the actual human decision making process. 1.1.2 JudgmentandDecision-making With the pioneering work from Kahneman and Tversky (1979) on Prospect Theory, hu- man psychology was introduced as part of the decision making process. It is also one of the attempts to better understand the paradoxes induced by the expected utility theory. In Kahneman and Tversky (1979), empirical evidences are provided contradicting basic assumptionsandaxiomsoftheutilitytheory. Forexample,decisionsmadebysurveysub- jects on certain alternate options tend not optimize expected utility. Another interesting exampleisthatsubjectsprefercertaintywhileearningprofitsbutseekriskwhileminimiz- inglosses. Besides,decisionmakersmightfocusmoreonthechangeofutilitycomparedto areferencepointratherthanontheabsoluteutilityaccumulatedattheendofthedecision making process. For example, performance of a stock trader is usually evaluated against standardindicessuchasS&P500. The core development of Prospect Theory involves the asymmetric S-shaped utility functionwhichpenalizesmoreonreferentiallossandrewardlessonreferentialgain. u(p)= X x2 X w(p(x))u(x) (1.4) where w(p) is a weighting function that conveys the idea that a small probability event is more likely to emotionally upset or please the decision maker than a large probability event. 4 Chapter1.Introduction 1.2 GroupDecision-making 1.2.1 Games In the previous section, we provided a discussion about a single rational agent making expected utility maximization decision. Now consider a situation in which two or more rationalagentstrytomaximizetheirownutilitiesandeachagent’sactioncanaffectothers’ utilities. To address this more complex situation, a framework under the name of Game TheoryisdevelopedbyJohnvonNeumannin1928. Amorecomprehensiveandupdated coverage of the topic can be found in von Neumann and Morgenstern (2007). The formal definitionofa(normal-form)gameisstatedasfollows, Definition1. Normal-formgame. An-personnormal-formgameisatuple (N,A,u),where: • N isafinitesetofnplayersindexedbyi; • A = A 1 ⇥ A 2 ⇥ ...A n whereA i is a finite set of actions available to playeri. And a vector a=(a 1 ,a 2 ,...,a n )isanactionprofile; • u=(u 1 ,u 2 ,...,u n )whereu i :A! Ristheutilityfunctionforplayeri. If a player decides to fix his action to a single element of the action set, then a pure strategyisimplemented. Theplayercanalsorandomizetheactionchosentofollowsome distributionofallactionsoftheactionset,whichiscalledamixedstrategy. Theutilityofa mixedstrategyiscalculatedusingtheexpectedutilities. 1.2.2 CooperativeandNon-cooperativeGames Intheprevioussubsection,ourdefinitionofthegameis,bydefault,thedefinitionofnon- cooperative game, in which all players compete with each other with the goal of maxi- mizing their own utilities. However in some real world applications, coalition might be formed by players. The word “cooperative” under this framework does not imply that players of a coalition agrees on the decision made for the group. The major difference between cooperative and non-cooperative game is that the former focus on the analysis Chapter1.Introduction 5 of the group as a whole while the later concentrate on the individual decisions-making processundercompetition. Withacoalition,informationmightbesharedandgroup-wide decisionsaremadeduringthegame. 1.3 System-wideDecision-making 1.3.1 StochasticProgramming Aswecaninferfromthename,thisfieldofresearchisanoutgrowthfromthefieldofmath- ematicalprogramming. Problemsanalyzedunderthisframeworkareingeneralmodeled as math programs based on which further analysis is conducted by integrating stochastic elementsintotheparametersoftheoptimization. Thisisthemaintopicofthedissertation and we prepare an extensive introduction solely for SP models and solution methods in Chapter2. Intheremainderofthissection,wesummarizeotheroptimizationapproaches whicharecommonlyapplicableforsystem-widedecision-making. 1.3.2 RobustOptimization RobustoptimizationframeworkisfirstproposedinSoyster(1973). Themathematicalfor- mulationoftheproblemcanbestatedasfollows, min c > x s.t. n X j=1 A j x j b 8 A j 2K j x 0 When the technology matrixA of the above LP is under uncertainty i.e.,A j 2K j (where K j isconvex), theauthorproposedarobustmethodtoidentifysolutionsthatarefeasible underalldatarealizations. Some users may consider the solution of the above problem is still too conservative. RecentdevelopmentfromBertsimasandSim(2004)definestheterm“priceofrobustness” 6 Chapter1.Introduction and aim at making trade-off between optimality and robustness. The authors provide a mechanism for the user to control the degree of conservatism that should be imposed uponthesolution. 1.3.3 SimulationOptimization A nice introduction to this topic could be found in Fu (2002). The overall procedure of simulationoptimizationistofinda“design”or“configuration”thatoptimizesanobjective function. Theformulationoftheproblemcanbestatedasfollows Min ✓ 2 ⇥ J(✓ )=E[L(✓,! )] (1.5) where ✓ is the “design” vector and ! represents a sample path of the simulation process. The parameters denoted ✓ are similar to the decision variables x in the optimization lit- erature. However, because of its close ties to the statistics literature, the simulation opti- mization community treats designs as being denoted by ✓ , and the notationx is typically usedasoutcomesofrandomvariablesX. Inordertomaintainconnectionswiththatbody ofliteratureinthissection,weadopttheirnotationalconventions. Listhesampleperfor- mancefunctionandJ(✓ )istheobjectivefunction. Andtheconstraintsset⇥ oftheproblem couldbeeitherexplicitlyspecifiedorimplicitlyprovided. Thedesignvector✓ canbequal- itative or quantitative. For the latter case, it could be further categorized into continuous and discrete variables. One of the major advantages of the simulation framework is its capability to model a wide variety of systems using the discrete-event dynamic systems (DEDS) paradigm. These systems are generalization of discrete-time systems which can require very fine discretizations to capture range of events that might occur in real-world applications. Moreover,thelossfunctionsallowedinthesemodelsareoftenverygeneral, and do not impose restrictions such as linearity and/or convexity. On the other hand, such generality comes at a price because most simulation optimization models are either unconstrainedorlimitedtoeasierconstraints(e.g. boundsondecisionvariables). Chapter1.Introduction 7 Models under this framework usually have a so-called “black-box” objective function thatmightnothaveageneralanalyticalformforapplyingtypicaloptimizationtechniques suchasNewton’smethodorgradientdescentmethod. Rankingandselection,ordinalop- timization,stochasticapproximation,responsesurfacemethodsandetc,(seeBanks(1998)) aredevisedtosolvethisgenreofproblem. 1.3.4 MarkovDecisionProcesses The framework of stochastic optimal control is widely applied in real world applications suchascontrolsystems, communicationandfinance. Theoriginalworkondynamicpro- gramming developed by Bellman (1954) laid out the foundation of the field. Under this framework, decision maker is interested in modeling the system in terms of states and transition functions with the goal of finding an optimal policy that govern the transitions of the system into an “efficient” steady state. The framework relies on the contraction mappingpropertyofthethedynamicprogrammingoperatorinsearchforthefixedpoint solution. TheMDPmethodworkswithadynamicsystemwhichtypicallyassumesaMarkovian evolution structure which evolves over an infinite time-horizon. The major concern of this model is also coined by Bellman as the curse of dimensionality, which states that the number of state variables of the system becomes so large that it would be impractical to modellargescaleproblemsunderthisframework. Inordertoovercomethishurdle,there has been a move to use approximation techniques which fall under the broad umbrella of Approximate Dynamic Programming (ADP). (Please include citations to the book by WarrenPowellaswellasabookbyDimitryBertsekas.) 1.4 Learning Statistical learning theory is less of a decision making technique but more of a prediction method. A“learner”isexposedtolargeamountofdatainordertocomeupwithamodel 8 Chapter1.Introduction for prediction tasks such as regression and classification. It could be categorized as su- pervised,unsupervisedandreinforcementlearning. Supervisedlearningmodeliscreated from labeled data set while unsupervised learning method finds hidden structures of the problemwithunlabeleddataset. ReinforcementlearningisalsoreferredasApproximate Dynamic Programming by operations research and optimal control scholars. The frame- workincludeMDPandrelatedapproximationtechniquesfordealingwithlargescaleap- plicationthatcontainslargenumberofstatevariablesandcomplextransitionfunctions. 1.4.1 SupportVectorMachines SVMisapopularwayofbuildingpowerfullinear/nonlinearclassifiers. Unlikethepercep- tronclassifier,SVMalsohastheobjectiveofmaximizingthemarginwidthoftheclassifier. Besides, the kernel method can map data set to a higher dimension in which they might be linearly separable (or at least the error rate can be improved). Since the margin width iscommonlymodeledasaquadratictermintheobjectivefunction, thefinalproblemisa largescaleQPproblemwithnicestructures. Inthefinalchapter,wedemonstratehowthe SVMmodelmaybeviewedasaspecializedversionofSP. 1.4.2 NeuralNetwork Neuralnetworkiswidelyappliedinimageandvoicerecognitionapplications. Itsimulates themechanismsofbraintotraintheweightsandfiringthresholdofeachneuron. Labeled datasetisinputintothenetworkandabackpropagationprocesswithstochasticgradient descent type algorithm is applied to calculate the optimal weight and bias parameters. With the advancement of computing power (multiple CPU and GPU cores), more and moredeepneuralnetworksarebuiltandexcitingresultsarepublished. 1.4.3 Multi-armsBanditProblem In addition to the above research areas within field of statistical learning, there is another interesting problem called Multi-armed bandit problem. A classical introduction can be Chapter1.Introduction 9 foundinRobbins(1952). Inthisproblem,aplayerispresentedwithmultipleslotmachines andthetaskfortheplayeristofigureout1)whichleverstopull,2)howmanytimeseach lever should be pulled and 3) the order of pulling these levers. Each slot machine has its own distribution of reward and this information is unknown to the decision maker. The player is expected to maximize her/his reward by “learning” while playing with the slot machines. 1.5 Contributionsofthisdissertation This dissertation focuses on the system-wide decision-making via stochastic program- ming. The focus of this work is primarily computational but our computations will also be supported by new concepts and results. We summarize our contributions by chapter below. In Chapter 3, we introduce the notion of compromise problems to reduce both com- putationaltimeaswellasthevarianceassociatedwithanoptimaldecisiontoastochastic programming. In the current literature (Bayraksan and Morton (2011)) recommends that whenusingsamplebasedalgorithmsforstochasticprogramming,oneshouldusereplica- tions in order toobtain anestimate ofthe varianceassociated withthe objectivefunction. Suchreplicationsoftenleadtoalternativerecommendations,andasaresult,onecanonly choose the best alternative after estimating the objective value of each replicated solution (Linderothetal.(2006)). Forlargestochasticlinearprogrammingproblems,obtainingsuch estimatecanbeextremelytimeconsuming. Inthisdissertation,weproposeanewconcept calledcompromisesolution,whichisshowntohavebetterpropertiesthananyoftheindi- vidualreplicatedsolution. WepresentthisideainChapter3,andshowthatundercertain circumstances i.e., the difference between the compromise decision and the average de- cision not only provides a sound stopping criterion but also helps reduce the variance. This result is proved in Theorem 2. We also provide asymptotic convergence result for individualreplicationsaswellasthemeanandcompromisedecisionsthemselves. 10 Chapter1.Introduction Much of the work on Stochastic Decomposition (see section 2.3) has been devoted to the case of deterministic cost in the second stage linear programs. For certain classes of applications such as financial problems involving uncertainty in interest rates, or supply chainmodelswithfuelcostuncertainty,thecostcoefficientsofthesecondstageLPcanbe highlyuncertain. Insuchcases, theexistingSDframeworkisnotappropriate(becauseof the deterministic cost assumption). In Chapter 4, we provide an algorithmic extension of SDsoastoaccommodaterandomcostcoefficientsinthesecondstage. Weshowthateven inthepresenceofcontinuousrandomvariables,weareabletomaptherandomnessonto afinitenumberofvectorswhichincludetheextremepointofanaveragedualpolyhedron. Mathematical and algorithmic developments in the earlier chapters are put into prac- tice in Chapter 5. All applications described in this chapter are from operations man- agement domain. These applications are very common in industry and we will provide computationalevidencetodemonstratethatouralgorithmcanbescaleduptosolvemuch largerSPinstancesthanhavebeenpossibleinthepast. 1.6 Conclusion Inthischapter,wepresentabroadoverviewoftheresearchareasthatarewithinthecon- textofdecisionmakingunderuncertainty. Thisdissertationaddressesthesolutionmethod for the recourse problem of the stochastic programming framework. And the rest of the dissertation is organized as follows. We give an introduction to stochastic (linear) pro- grams in Chapter 2. We introduce the concept of compromise decisions and compromise problemInChapter3. AnextensionofStochasticDecomposition(SD)algorithmforsolv- ing stochastic linear programs with random cost is described in Chapter 4. In Chapter 5, we present a variety of applications of SD in problems arising in production and op- erations management. Finally, we conclude this dissertation with a discussion of future researchopportunitiesinChapter6. Chapter2 IntroductiontoStochastic Programming Thefieldof stochastic programmingwasintroducedinthemiddle oflastcenturyfollow- ingthesuccessofalgorithmicdevelopmentsandapplicationsoflinearprogramming. This chapterwillfocusonthisclassofproblemsandisdividedintofoursections. Thefirstsec- tion introduces the reader to the field of stochastic programming with literature reviews covering techniques and framework for modeling the uncertainty. A power generation planning (PGP) problem is explained in detail in the second section to illustrate the re- course framework. The third section focuses on solution methods for solving stochastic programs. 2.1 AnOverivewofStochasticProgrammingModels This dissertation is devoted to computational aspects for two-stage stochastic program- ming models. Such models trace their origins to the seminal paper by Dantzig (1955). These original models were intended to extend the linear programming "world view" which relies heavily on the assumption that all data for a linear program (LP) is known withcertainty. InordertoextendtheLPframeworktoallowdatauncertainty,theseearly 11 12 Chapter2.IntroductiontoStochasticProgramming papersstatedthelinearprogramunderuncertainty(LP-U)inthefollowingway: min c > x+ k X i=1 p i q > i y i s.t. Ax b (LP-U) Cx+Dy i = ⇠ (! i ) i=1,2,...,k InDantzig(1955),modelswereproposedas"extensionstolinearprogramming"forplan- ning under uncertainties . In this type of model, decisions are made sequentially before uncertainparametersofthemodelarerevealed. Noticethatthisformulationdistinguishes betweentwotypesofdecisionsxandy i ,i=1,...,k. Thefirstclassofdecisions,namelyx aretheplanningdecisionsthataretobeimplemented"now",whereas,thesecondclassof decisions, namely y i are contingency decisions and depend on the precise scenario ⇠ (! i ) whichmayberevealedinthefuture(intheaboveLP-Uformulation,p i istheprobability for scenarioi, and is assumed to be given as part of the problem data). Uncertainties are usually modeled with random variables whose outcomes will be denoted by ! 2⌦ , and denotethesupportof⌦ by⌅ . Wewilluseh(x,!)todenoteoutcomesofalossassociated with decisionx under outcome !. The performance of the system can be measured with worst-case,i.e.,robustoptimization min x2 X max !2 ⌅ h(x,!) (2.1) average-case(expectation): min x2 X E[h(x, ˜ !)] (2.2) Note that one might incorporate Expected Utility functions within the above formulation byusingtheexpectationofu(h(x, ˜ !)). RiskAverseTheformulationof(2.2)isreferredtoasriskneutralmodelduetotheex- pectation taken with respect to the second stage recourse function. A risk averse model Chapter2.IntroductiontoStochasticProgramming 13 (seeAhmed(2005),MillerandRuszczy´ nski(2011))replacestheobjectivewiththefollow- ingform, min x2 X E[h(x, ˜ !)]+ D[h(x, ˜ !)] (2.3) whereD is a dispersion statistic, and is the parameter controlling the trade-off between expected value and the risk. The mean-variance optimization method is an example of using variance as the dispersion statistic which, unfortunately, is not tractable under the two-stageSPframework. Inordertopreservetractability,oneusuallyrequiresthedisper- sion measure to be convex. Beyond the static case, it is difficult to ensure that variance remainsaconvexfunctionofthefirststagevariables. Chance-ConstrainedPrograms Another model formulation addressing uncertainties thatappearspecificallyamongconstraintswasintroducedaschance-constrainedprogram- minginCharnesandCooper(1959). Inthisclassofmodels,constraintsarenotrequiredto besatisfiedallthetime,theyonlyneedtobesatisfiedwithcertainprobability. Forinstance, in financial application, this model maximizes expected return subject to a constraint on value-at-risk (VaR), i.e., the largest loss ¯ t in portfolio value that can occur with a given probability↵ ismodeledas: P(q > y ¯ t) ↵ (2.4) RecourseModel In this dissertation, we will focus on two-stage stochastic linear pro- grams with recourse (similar to the formulation introduced in Dantzig (1955)) as the fol- lowing: min f(x)=c > x + E[h(x, ˜ !)] (SLP-M) s.t.x2X = {Ax b}✓< n 1 14 Chapter2.IntroductiontoStochasticProgramming where, ˜ ! denotesanappropriatelydefined(vector)randomvariable,andthefunctionhis definedasfollows: h(x,!)=min d(!) > y s.t. Dy = ⇠ (!) C(!)x (SLP-S) y 0,y2< n 2 . where!denotesanyoutcomeoftherandomvariable ˜ !. Thefixed-and-relatively-complete recourseassumptionoftheabovemodelimpliesthatthematrixDisdeterministic,andthe functionhhasfinitevalueforallsolutionsxsatisfyingAx b. Asiscommonindecision- makingunderuncertainty,itisnecessarytomakedecisions(x)inthefirststagebeforean outcome ! is observed, and subsequently, the second stage decisions (y) are made. The quantity ⇠ (!) C(!)x often denotes the “position of resources” after the decision x has beenmade,andtheoutcome! hasbeenobserved. The original paper by Dantzig presented the recourse model using discrete random varibles. In this case, it can be set up using linear programming. However, extension to the continuous random variables lead to convex optimization models. Later, we will present stochastic decomposition algorithm which takes continuous random variables or scenarios as input and generate confidence intervals for both objective lower and upper bounds. MultistageStochasticLinearProgramsWewillnotgointodetailsregardingmultistage SLPbutwewanttogivereadersaglimpseoftheproblemformulationandsolutionmeth- ods for multistage problems. The following is the formulation of multi-stage stochastic linearprograms. Min: f(x 1 )=c 1 x 1 +E ! 2[min :c 2 x 2 (! 2 )+···E ! H[min :c H x H (! H )]···] (2.7a) D 1 x 1 = ⇠ 1 (2.7b) C 1 (!)x 1 +D 2 x 2 (! 2 )= ⇠ 2 (!) (2.7c) Chapter2.IntroductiontoStochasticProgramming 15 . . . (2.7d) C H 1 (!)x H 1 (! H 1 )+D H x H (! H )= ⇠ H (!) (2.7e) x 1 0,x t (! t ) 0,t=2...H. (2.7f) where ! > =(⇠ t (!) > ,C t 1 1 (!) > ,...,,C t 1 mt (!) > )fort=2...H Theobjectiveconsistsoftwoparts: oneistheimmediatecostatthefirststage(t=1);and the other one is the future "cost-to-go" for stages (t=2...H). Notice that the "horizontal dots"insidethesecondpartoftheobjectivefunctionshouldbeinterpretedasmultipleem- beddedexpectationoperations. SothereareatotalofH 1expectationoperatorsamong thefuturecost-to-go. 2.2 AnExample Applications of stochastic programming can be found in many areas , for example, trans- portation and logistics, production planning and scheduling, supply chain optimization, powerproductionplanning. Inordertofacilitatetheunderstandingofthismodelparadigm, anexampleofpowergenerationplanning(appearsinMurphyetal.(1982))ispresented. Acommonlyusedconceptinelectricityplanningisloaddurationcurve(LDC)shown inFigure2.1. ForagivenloadLontheverticalaxis,thecorrespondinghorizontalvalueh is interpreted as the duration of that load. One-day (24 hours) and one-year (8760 hours) LDCaremostlyusedforelectricityplanning. 16 Chapter2.IntroductiontoStochasticProgramming Time MW h 8760 hours L FIGURE 2.1: Loaddurationcurve. Time MW Peak Base P 1 P 2 β 2 β 1 FIGURE 2.2: Linearizationofaloaddurationcurve. Chapter2.IntroductiontoStochasticProgramming 17 InFigure2.2,theoriginalLDCisapproximatedbytwosegmentsoflinearizationrepre- sentingtwomodesofoperation,i.e.,basemodeandpeakmode. Forexample,coalsteam and nuclear power can be used to provide most of the base mode operation. Solar, wind and hydroelectric power could be used to satisfy both base and peak mode operations. Electricityacquiredfromenergymarketcanbeusedtosatisfythepeakload. Assume that capacities of two types of technology are under evaluation. Let x = (x A ,x B )denotethemegawattsofcapacityobtainedfromAandB.Aftertheinitialacquisi- tionofcapacity,adispatchingruleshouldbespecified. Lety=(y A 1 ,y A 2 ,y B 1 ,y B 2 )denote the capacity acquired from A and B used to satisfy base and peak load. For example,y A 1 denotesthecapacityacquiredfromtypeAtechnologytosatisfybaseload. Withtheabovenotations,alinearprogrammingformulationoftheproblemisthefol- lowing: min :c A x A +c B x B +f A ( 1 y A 1 + 2 y A 2 )+f B ( 1 y B 1 + 2 y B 2 ) s.t. x A + y A 1 + y A 2 0 x B + y B 1 + y B 2 0 1 y A 1 + 1 y B 1 =d 1 2 y A 2 + 2 y B 2 =d 2 (2.8) where(c A ,c B )and(f A ,f B )arefixedandoperatingcostscorrespondingly. And( 1 , 2 ) denotes the breakpoint in the load duration curve. And d 1 , d 2 corresponds to the area under the name ’Base’ and ’Peak’ of the LDC in figure (2.2). In reality, the load duration curve for the coming time period is not known for certainty. Let’s consider a LDC with twopossiblerealizationsthatoccurwithequalprobability: 18 Chapter2.IntroductiontoStochasticProgramming Time MW Base Peak θ1=0.5 β2 β1 Time MW Base Peak θ2=0.5 β2 β1 FIGURE 2.3: Tworealizationsoftheloaddurationcurve. Ourgoalistominimizetheexpectedtotalcost. Thefollowingisageneralizedformu- lation of the power generation planning problem. Note that there is a budget constraint (2.9a) and a minimum capacity constraint. The setI = {1,...,m} denote the set of steps discretizedLDC.ThesetJ = {1,...,n}denotethesetofpossiblegenerationtechnologies. min f(x)= X j2 J c j x j + E[h(x, ˜ !)] (2.9a) s.t. X j2 J c j x j b (2.9b) X j2 J x j M (2.9c) x j 0 8 j2J (2.9d) h(x,!)=min X i2 I X j2 J f j i y ij (2.10a) s.t. x j + X i2 I y ij 0 8 j2J (2.10b) Chapter2.IntroductiontoStochasticProgramming 19 X j2 J y ij = ! i 8 i2I (2.10c) y ij 0 8 i2I,j2J (2.10d) 2.3 SolutionMethods Inthissection,wediscusssomestate-of-the-artsolutionmethods(orprocedures)forsolv- ingtwo-stagestochasticlinearprograms. DeterministicmethodsincludingL-shaped(Kelly’s cutting plane) method and its variants, i.e., single cut and multi cuts L-shaped method. Samplingmethodssuchasstochasticapproximation(SA),sampleaverageapproximation (SAA)andstochasticdecomposition(SD)arecoveredlaterinthissection. DeterministicDecomposition Itshouldbenotedthatthedeterministicequivalentofproblem(SLP-M)canbeverylarge andmaynotbesolvedusinggeneralpurposesolversystem. Decompositionmethodsare usuallyemployedtosolvelargescalelinearprogrammingproblems. Wewillfirstexplain the Kelly’s cutting plane method for the deterministic equivalent problem. When the de- terministicequivalentbecomestoolarge,samplingtechniquescanbeusedtoapproximate the objective of problem (SLP-M). Stochastic Decomposition is introduced as a method of carryingoutstatisticalapproximationwithinthedecompositionframework. Kelley’scuttingplanealgorithm In Kelley’s cutting plane method (see Kelley (1960)), thefunctionbeingminimizedis: Min: x2 X cx+H(x) (2.11) whereX isacompactandconvexsetandH isaconvexfunction. 1. (Initialize). Let✏> 0 andx 1 2X be give. Letk 0 and definev 0 (x)=1 , u 0 =1 , l 0 =1 . 20 Chapter2.IntroductiontoStochasticProgramming 2. Define(update)thepiecewiselinearapproximation. k k+1. EvaluateH(x k )and k 2@H (x k ). Let↵ k =H(x k ) ( k ) > x k . (1)v k (x) = max{v k 1 (x),↵ k +( k ) > x k },andf k (x)=c > x+v k (x). (2)u k =min{u k 1 ,f(x k )}. 3. Solvethemasterproblem. Letx k+1 = argmin{f k (x)|x2X}. 4. Stoppingrule. l k =f k (x k+1 ). Ifu k l k ✏,stop. Otherwise,repeatfromstep1. The above algorithm can be used to solve two-stage stochastic linear program with recoursebecausetheobjectivefunctionoftheproblemeqn:SLP-Misconvex. Thismodified versionofthecuttingplanetechniqueisnamedL-shapedmethod. Thisideaisalsocalled Benders’decomposition,whichwasfirstappliedforsolvingtwo-stageproblemwithfirst stage integers. The cutting plane built by this method is essentially an outer linearization of the objective function. There is a dual version of the L-shaped method, Danzig-wolf decomposition,whichbuildsaninnerlinearizationoftheobjectivefunction. Singlecut In L-shaped method, all scenarios are aggregated into a single cut, so the singlecutmethodisthesameasL-shapedmethod. MultiplecutsInsteadofaggregatingallscenariosintoasinglecut,asanalternative,one can formulate each scenario as one cut and add it to the first stage problem. As a result, multiple cuts are added to the first stage problem. Based on numerical experiments, this variant of L-shaped method, in general, takes less iterations to converge. However, the first stage problem can be potentially large if multiple cuts are added. As a result, the master problem can be more difficult to solve. The tradeoff between single cut and multi cutsneedstobemadeandsomehybridmethodsprovideawaytobalancecomputation. RegularizedL-shapedmethodTheregularizedversionofmulticutsL-shapedmethod is introduced in Ruszczy´ nski (1986). The regularized decomposition method can greatly stabilizethesolutionprocess. Itavoidstheinefficientsolvingoftheinitialiterationsofthe cuttingplanemethodandalsoovercomesthedegeneracyissuewhichmostlyoccuratthe endoftheiterationsofthecuttingplanemethod. Chapter2.IntroductiontoStochasticProgramming 21 ProgressiveHedgingmethod This method is first introduced in Rockafellar and Wets (1991). It decomposes the problem into different scenario problems and solves each of themindividually. Thenanaveragefirst-stagesolutionisobtainedandusedtoformulate thescenarioproblemwithtwotermspenalizingdeviationfromtheaveragesolution. This processisrepeateduntileachscenario’sfirst-stagesolutionis“close”totheiraverage. NestedBendersDecomposition After the introduction of L-Shaped method and multi cuts method, it is not hard for us to understand the solution method for multistage SLP. Similarly, we will solve LP problem in the "forward pass", dual multipliers discovered duringthesolvewillbeusedtoformulateeitherasinglecutormulticutsbackfromchil- dren’sstagetotheparents’stageduringthe"backwardpass". Thecorrespondingsolution methodiscalledNestedDecompositionforStochasticProgrammingAlgorithm(NDSPA) whichcanbefoundinBirgeandF.Louveaux(2011). StochasticApproximation(SA) InRobbinsandMonro(1951), aprocedureforfindingthedesiredsolutionofamonotone functionM(x)= ↵ isprovidedbydoingsuccessiveexperimentsatx 1 ,x 2 ,...,suchthatx n will approach the solutionx ⇤ in probability (lim k!1 P(|x k x ⇤ |>✏) = 0). The difficulty in the recourse problem is that the function M(x) is the expected value at x where it is distributedaccordingtoY =Y(x)withPr[Y(x) y]=H(y|x). M(x)= Z 1 1 ydH(y|x). (2.12) Thecontributionofthispaperisthatitprovidesaconsistentprocedureforestimating a solution x ⇤ under certain restrictions on the nature of H(y|x). The new value x n is ob- tained as certain functions of x 1 , x 2 ,..., x n 1 , M(x 1 ), M(x 2 ),..., M(x n 1 ) and possibly derivatives M 0 (x 1 ), M 0 (x 2 ),..., M 0 (x n 1 ). The practical usage of the algorithm depends on the speed of convergence of lim n!1 x n = x ⇤ and the ease with which the x n can be 22 Chapter2.IntroductiontoStochasticProgramming computed. Theprocedureforfindingthesolutionisthefollowing: x n+1 x n =a n (↵ y n ) (2.13) wherey n isarandomvariablewithconditionalcdf: Pr[y n y]=H(y|x n ), (2.14) and{a n }isafixedsequenceofpositiveconstantssatisfying: 0< 1 X n=1 a 2 n =A<1 and 1 X n=1 a n =1 . (2.15) Usuallyasequenceoftype c n (wherecisaconstant)willsatisfytheabovetwoconditions. InPolyakandJuditsky(1992),theauthorsinvestigatedanSAtyperecursivealgorithm withtheaveragingoftrajectory. ¯ x t = 1 t t 1 X i=0 x i . (2.16) ConclusionoffasterconvergencerateisachievedwiththeassumptionsoffirstorderLip- chitz on the objective function and that the random outcomes is a martingale-difference process. Thefollowingrequirementonthestepsizeofthealgorithmisalsoneeded: (a n a n+1 )/a n =o(a n ),a n > 0foralln (2.17) Another method designed for dealing with non-differentiability is introduced by Yuri Ermoliev called Stochastic Quasigradient Methods (SQG). This method makes it possible to solve problems without knowing the exact values of the objective and constraints (let alonehigherorderderivatives). Inordertodealwithconstrainedproblems,thisclassofal- gorithmalsoinvolvesthestandardprojectionprocessthatmapsthesequenceofsolutions totheconstraintsset. x n+1 =⇧ X [x n +a n (↵ y n )] (2.18) Chapter2.IntroductiontoStochasticProgramming 23 In most of the SA type scheme, almost sure convergence is guaranteed by choosing an appropriate step size which is usually required to be diminishing but not too rapidly. But in practice, there are very few guidances on how to choose the suitable step length. In Nemirovski et al. (2009), a robust SA scheme is introduced to determine an optimal constant steplength for minimizing the theoretical error over a pre-specified number of steps. Mirror-descent generalizations of Robust SA is investigated in Lan et al. (2011). The Mirror Descent SA algorithm is as follows: Starting from an initial pointx 1 , the new solutionsareobtainedfollowingtheupdatingequation: x t+1 :=P xt ( t G(x t ,⇠ t )) (2.19) where P x (y) is the prox-mapping P x : < n ! X and X := {x 2 X : @w (x) 6=0}. It is assumed that there exists an oracle, which for a given (x,⇠ ), returns a stochastic sub- gradient G(x,⇠ ). The stepsize t = is considered as a constant in the illustration. The distancegenerationfunctionw(x)isconvexandcontinuousonX andthesetX isconvex on which w(x) is continuously differentiable and strongly convex. For example, when w(x)= 1 2 kxk 2 2 thenwehaveP x (y)=⇧ X (x y)andhence: x t+1 =⇧ X (x t t G(x t ,⇠ t )) (2.20) whichisreferredtoasEuclideanSA. SampleAverageApproximation(SAA) Insteadofsolvingtheproblem(SLP-M)directly,asampledversionoftheproblemcanbe solvedtoapproximatetheoriginalproblem. Min x2 X F N (x)⌘ c > x+ 1 N N X i=1 h(x,! i ) (2.21) TherearerichliteraturesonthepropertiesoftheSAAtypedproblems. Herearesome 24 Chapter2.IntroductiontoStochasticProgramming keyinsightsfromtherecentpublications. InMaketal.(1999),itisshownthatthesampled problem provides a lower bound on the original problem’s optimal value and the lower boundincreasesasthesamplesizeincreases. InShapiroandHomem-de-Mello(1998),MonteCarlosimulationbasedapproachesfor solvingtwo-stageSLParediscussed. Variancereductiontechniquesareappliedregarding the calculation of the estimators of gradient and Hessian of the objective function. Sta- tistical testing is provided regarding the selection of sample size. Direction of descent is controlledbymakingsurethattheprojectionofgradientofthesampledproblemforman acuteanglewiththe"true"gradient. InKleywegtetal.(2002),atwo-stagestochasticprogrammingproblemwithfirststage binary solved by SAA is discussed. Issues regarding the selection of sample size and the sizeofreplicationsarediscussed. The SAA method should be considered as a sampling procedure instead of an algo- rithm. In practice, user needs to choose the sample size and the number of replications for specific applications. If the sample size turns out to be insufficient, there’s a potential cold start of the procedure, i.e., calculation will restart from scratch with the newly ob- tained samples. In our view, the ideal algorithm should learn from the instance and stop intelligentlyunderuserspecifiedtolerancelevelandcouldalsoresume/warmstartusing previousobtainedinformation. StochasticDecompositionMethod StochasticDecompositionisasequentiallysampledBenders’decompositionmethod. Itis different form Benders’ decomposition not only due to the sequential sampling strategy, but also due to the sub-problem’s dual vertex approximation scheme. Regularization of the first stage problem also plays a critical role. Without regularization, the unstable cut- tingplanescancausesomecomputationalissues. Oneisthatalargenumberoffunctional approximationsaregeneratedandstoredwhichhinderstheefficientsolvingofthemaster Chapter2.IntroductiontoStochasticProgramming 25 problem. Theotherissueisthattheprogressiveupdateofoldercutswillimpacttheover- all functional approximations such that the solutions might stray away from the optimal solutionperiodically. InSD,samplepricefunctionatiterationk isbuiltasthefollowing: h k,k (x)= 1 k k X i=1 ⇣ ⇡ k i ⌘ > [⇠ (! i ) C(! i )x] (2.22) where: ⇡ k i 2argmax ⇡ 2 V k {⇡ (⇠ (! i ) C(! i )x k )},i=1,...,k (2.23) where V k is a restricted dual vertex set of the original sub-problem. Also note that if no random cost is found in the sub-problem ,i.e., d(!)= ¯ d, all sub-problem solutions share thesamedualpolytope. Andallpreviousgeneratedsamplepricefunctionsare: h j,k (x)= j k j X i=1 ⇣ ⇡ k i ⌘ > [⇠ (! i ) C(! i )x] j=1,...,k 1 (2.24) = j k j X i=1 ⇣ ↵ k i + k i x ⌘ j=1,...,k 1 (2.25) where↵ k i = ⇡ k i > ⇠ (! i )and k i = ⇡ k i > C(! i ). Allofthepricefunctionsconstructed during the second stage will be aggregated to the first stage and the objective has the followingform: f k 1 (x)=c > x+max{h j,k 1 (x),j2J k 1 } (2.26) whereJ k 1 istheindexsetofallsamplepricefunctionsgeneratedduringthepreviousk 1 iterations. Intheun-regularizedversionofSD,thissetisbasicallyJ k 1 ⌘{ 1,2,...,k 1} andthecardinalityofthesetJ k is|J k 1 | =k 1. Thustheoptimizationproblemoftheun-regularizedSDisthefollowing: x k 2argmin{f k 1 (x)|x2X} (2.27) 26 Chapter2.IntroductiontoStochasticProgramming Intheregularizedversion,theoptimizationproblemisfollowing: x k 2argmin{f k 1 (x)+ ⇢ k 1 2 kx ˆ x k 1 k 2 |x2X} (2.28) It has an extra quadratic term for regularizing the distance between the decision ˆ x k 1 (we call it incumbent decision) and the future decisionx (we call it candidate decision). Trust region method is employed to adjust the value of the parameter ⇢ k 1 and it will be ex- plainedinmoredetailinthefollowingparagraphs. Andalsonotethatinthisregularized versionSD,thecardinalityofJ k 1 isn 1 +3wheren 1 denotethenumberofdecisionvari- ablesinthefirststage. Thiswillbeexplainedafterequation(2.29). Themotivationofimplementingregularizationisthatitgreatlystabilizedthesolution process and a stronger proof of convergence can also be shown. In the original SD, the proofofconvergence(seeHigle,Sen1991)onlysaysthatthereexistsasubsequenceamong thewholesolutionsequenceanditsaccumulationpointconvergestotheoptimalsolution. While in the regularized version of SD, the stronger proof (see Higle, Sen 1994) can show thateveryaccumulationpointofthesolutionssequenceconvergestotheoptimalsolution. Choosingtheregularizer⇢ k The regularization technique is applied in many areas of optimizationwiththepurposeofstabilizingthesolutionsprocess. Forexampleinlogistic regression, a convex loss function could be regularized such that the Hessian matrix is positive definite and the objective function also becomes strongly convex. Thus a unique minimizercanbefoundefficiently. Thestrategyappliedhereforchoosingtheparameter⇢ ismotivatedbythetrust-regionmethod. Thesearchingareaisexpanded(byreducingthe value of ⇢ ) if the amount of actual improvement is achieved within a certain range of the predictedimprovement. Otherwise,thesearchingareaisreduced(byincreasingthevalue of⇢ ). Chapter2.IntroductiontoStochasticProgramming 27 Limitingthesizeofthesetofsamplepricefunctions In regularized version of SD, the masterproblemcanberepresentedinthefollowingway: Min: cx+⌘ + ⇢ k 2 kx ¯ x k k 2 s.t. Ax b ⌘ ↵ k i + k i x 8 i2J k (2.29) If we write down the KKT conditions of the above problem, we can prove that at most (n 1 + 1) price functions are needed for the solution of the regularized problem. Thus togetherwiththeoneupdatedoverincumbentsolutionandtheonenewlygenerated,we onlyneedtostore (n 1 +3)pricefunctionsatanytimeduringtherun. AhighlevelalgorithmdescriptionofSDcanbesummarizedasfollows: 1. Generateanewoutcome! k ,independentofallpreviousoutcomes. 2. Let ⇡ k denotetheoptimalconditionaldualvertexforthesecondstageLP,given inputs (x k ,! k ). AssumingV k 1 (possiblyempty)isavailablefrompreviousiter- ations,V k V k 1 [{ ⇡ k }. 3. Derive two sample average price functions denoted h ⌫,k ,h 0,k , with the former representingthecandidate,andthelatterrepresentingtheincumbent. Wesimply present calculations for h ⌫,k , and recognize that h 0,k is calculated similarly (by replacingx k by ˆ x k 1 in(2.31)). h ⌫,k (x)= 1 k k X i=1 ⇡ > i,k [⇠ (! i ) C(! i )x] (2.30) where, i=1,...,k and ⇡ i,k is a conditional dual vertex for outcome i, and is calculatedasfollows. ⇡ i,k 2 argmax{⇡ > [⇠ (! i ) C(! i )x k ]|⇡ 2V k } (2.31) 28 Chapter2.IntroductiontoStochasticProgramming 4. Delete one of those linear functionsh j,k 1 for which the dual multiplier is zero. Create an updated index setJ k . (One can ensure that at mostn 1 +3 indices are presentinJ k .) 5. Let t(j) denote the iteration at which inequality j 2J k was created. Under the assumption that h 0 for all (x,!), the functionsj/2{0,⌫ } are updated as follows. h j,k (x) ✓ t(j) k ◆ h j,t(j) (x),j2J k \{0,⌫ }. (2.32) WewillgointofurtherdetailsoftheSDalgorithminthenextchapter. 2.4 Conclusion Wepresentabriefintroductiontothefieldofstochasticprogrammingwithafocusontech- niques and framework for modeling the uncertainty. The so-called “recourse problem” is explained in detail through a power generation planning (PGP) problem. Solution meth- ods for solving stochastic programs are also covered in the end. In the next chapter, we provide more details on a particular solution method of SP called Stochastic Decomposi- tion(SD). Chapter3 CompromiseDecisionsinTwo-Stage StochasticLinearProgramming 3.1 Introduction For certain stochastic linear programming (SLP) models, the associated probability space can be so large that identifying a deterministically verifiable optimum is impossible (in reasonable time) using any foreseeable computer. Nevertheless, such models arise in a variety of applications, and new notions of approximate (or near)- optimality, supported bystatisticallyverifiablebounds,areimportantfordecisionsupport. Whilestatisticalop- timalityboundshavebeenstudiedintheliteratureforawhile(e.g.,HigleandSen(1991a), Higle and Sen (1996a), Mak et al. (1999), Kleywegt et al. (2002), Bayraksan and Morton (2011), Glynn and Infanger (2013)), their use in identifying near-optimal decisions for re- alistic instances has been limited. It is important to note the emphasis on “decisions” as opposed to identifying bounds within which the optimal value might belong. Prior at- tempts to use statistical optimality bounds have either required some genre of high per- formancecomputingortheyhavebeenlimitedtorelativelysmallinstances(fewdecision variables/constraints and a small number of random variables or scenarios). The goal of thischapteristodemonstratethatforcertainclassesoftwo-stageSLPmodels,weareable to provide statistically verifiable, near-optimal decisions even if uncertainty is modeled using continuous random variables. Building on previous work connected to Stochastic 29 30 Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming Decomposition (SD), we introduce the notion of a “compromise” decision, which allows us to not only confirm statistical bounds, but also recommend decisions with significant confidence. WereportcomputationalresultsforasuiteofSLPtestproblemsfromtheliter- ature,andshowthatstatisticallyverifiabledecisionscanbeobtainedwithinafewminutes ofcomputingondesktops,laptops,andsimilar“run-of-the-mill”computingdevices. The class of SLP models discussed below fall under a category known as fixed-and- relatively-completerecoursemodels,andmaybestatedasfollows. Min f(x)=c > x + E[h(x, ˜ !)] (3.1a) s.t.x2X = {Ax b}✓< n 1 (3.1b) where, ˜ ! denotesanappropriatelydefined(vector)randomvariable,andthefunctionhis definedasfollows. h(x,!)= Min d > y (3.2a) s.t. Dy = ⇠ (!) C(!)x (3.2b) y 0,y2< n 2 . (3.2c) Thenotation!denotesanyoutcomeoftherandomvariable ˜ !. Thefixed-and-relatively- completerecourseassumptionoftheabovemodelimpliesthatthematrixDisdeterminis- tic,andthefunctionhhasfinitevalueforallsolutionsxsatisfyingAx b. Previouswork onSDhasrestrictedthesecondstagecostvector(d)tobefixed,andwecontinuewiththis setup. However our research has established that random elements in the cost vector (d) canalsobeaccommodatedwithinSDanditisreportedinthenextchapter. Asiscommonindecision-makingunderuncertainty,itisnecessarytomakedecisions (x) in the first stage before an outcome ! is observed, and subsequently, the second stage decisions(y)aremade. Thequantity⇠ (!) C(!)xoftendenotesthe“positionofresources” after the decision x has been made, and the outcome ! has been observed. In the termi- nology of Approximate Dynamic Programming (ADP, Powell (2007)), this vector may be Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming 31 lookeduponasthe“post-decisionstate”ofatwo-stagemodel. Inorder tomotivate thischapter, wewill beginby presentingsomecomputational re- sults for a well known instance called SSN (Sonet Switched Network, Sen et al. (1994)). Thismodelhasbeenusedforavarietyofpurposesinthepast;itwasoneoftheearlySLP modelsvalidatedusingdiscrete-eventsimulation;othershaveusedittoillustratewhether certain algorithms scale well (Linderoth et al. (2006) and Nemirovski et al. (2009)), and stillothershaveusedittoillustratewhatmakescertainSLPmodelsdifficultforsampling- based algorithms (Shapiro et al. (2002)). More recently the Defense Science Board Report (Grasso and LaPlante (2011)) recommends SSN and models of this genre for DoD trade- off studies in which a very large number of contingency scenarios are necessary as part of the analysis to accompany recommendations for investment in new technologies. For these reasons, and because of its roots as an industrial-sized instance (SSN grew out of a 1990’s Metropolitan network in Atlanta, GA), we use the performance on SSN to illus- trate that decision-makers need not shy away from some classes of SLP models; certain current algorithms are up to the task of providing statistical optimality bounds and de- cisions within reasonable computational time using ordinary computing machinery. This evidenceisprovidedinsection3.2. FollowingadiscussionofSSN,section3presentsthemethodology, whichisbasedon Stochastic Decomposition (SD, Higle and Sen (1991b), Higle and Sen (1994), Higle and Sen (1996b)). While SD has close ties to some Simulation Optimization approaches (e.g., ShapiroandHomem-de-Mello(1998),andapproachesdescribedinKimetal.(2014)),these methodsaremoregeneralinscopebecausetheyallow“blackbox”functions,whereas,SD requires two-stage SLP. This focus of SD facilitates computations with large scale prob- lems arising in practical applications. In addition to the ability to scale up using linear programming, focusing on SLP models also allows us to show that the algorithm pro- duces a sequence which converges to a unique limit (with probability one). Moreover, as outputsofthealgorithm,ourprocedureprovidesdecision-makerstwoalternativechoices which either reinforce each other, or provide indicators of “indecision”. One of the alter- native decisions will be the “compromise” decision, and the other will be an “average” 32 Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming solution. We will show that when these decisions are similar (i.e., they reinforce), they are both very close to optimum. On the other hand, if these decisions are not similar, then we suggest greater precision in calculating solutions for each replication, and these should be undertaken by using the “warm starting” capability that is naturally available via SD. When there is indecision between the average and the compromise, we will refer to both as average and compromise solutions. When they agree with each other (up to somegiventolerance),thenwerefertothemasaverageandcompromisedecisions. Inthis senseitisdifferentfromthePolyakandJuditsky(1992)averagingprocesscommonlyused instochasticapproximation. Following our presentation of the methodology, we present further computational re- sults in section 3.4. The test instances which are available in the literature cover a vari- ety of applications ranging from inventory control and supply chain planning to power generation planning, freight transportation, and others. All of the test instances reported hererequireanalgorithmictreatmentofmulti-dimensionalrandomvectors,andhencein- stances with the simple recourse property (e.g., newsvendor models) are not included in thisstudy. ThischapteraddressesseveralquestionsofrelevancetotheSPcommunity. 1. Given that SP problems can be demanding, greater accuracy may call for re-runs that shouldre-usepreviouslydiscoveredstructuresofaninstance. Howcansuchstructures bere-usedforthepurposesofwarm-starting? 2. Given that SLP models have a very special (convex) structure, how should sampling- basedmethodsbedesignedtotakeadvantageofsuchstructure? 3. Sampling-basedSPmethodsborrowvariancereductiontechniquesfromthesimulation literature. ArethereothervariancereductiontechniquesthatareappropriateforSP,but arenotconsideredinthesimulationliterature? 4. ParallelarchitecturesinSPhavetraditionallybeenusedtoprocessbunchesofscenarios. Arethereotherwaystouseparallelarchitectureswhichpermitthesolutionofindustrial- strengthmodels? Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming 33 5. Should SP algorithms report lower bound estimates for the “true” problem so that the qualityofarecommendeddecisioncanbeascertained? We will present the conclusions of this chapter by placing our contributions in the context of these questions. For now, we begin by applying sampling-based algorithms toSSN. 3.2 Motivation: ComputationswithaPracticalSLP Formally speaking, SSN is a two-stage stochastic linear programming model. The basic “operations”-issue in the SSN model is to recommend link sizes of a given network so that the network will experience the least number of “lost calls” (in expectation), while operating under a given budget constraint. We refer the reader to Appendix B for its mathematical formulation. In the SP literature, such models are often classified as “here- and-now” models because the link capacities must be decided before actual demands are known. Models of this type, which are based on introducing randomness to linear pro- gramming models, must contend with multi-dimensional random vectors, which, in the SSNmodelrepresentpoint-to-pointdemanduncertainty. Inthisparticularexample,there are 86 point-to-point pairs, which, by standards of LP models, is modest. As is common today, these demands are available through forecasting systems, and errors in forecasts may be treated as independent random variables. For the model presented in Sen et al. (1994), each marginal error random variable was deemed to be sufficiently approximated byadiscretizationusingabout5-9outcomesperdemandpair. Itisnotdifficulttoseethat thetotalnumberofscenariosinvolvesanastronomicalnumberofparametricLPs(approx- imately of magnitude 10 71 ). Even if one had access to an exascale (10 18 flops) computing platform,itwouldbepointlesstryingtoseekasolutionwhoseoptimalitycouldbeverified inadeterministicsense. Itisthereforepragmatictoseekapproximatesolutionswhichare near-optimuminastatisticalsense. OtherapproximationstoSLPhavebeensuggestedvia 34 Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming lineardecisionrules(Kuhnetal.(2009),Chenetal.(2008)). However,theseapproachesare motivatedbyscalability,ratherthanstatisticalboundsofoptimality. The remainder of this section uses the SSN model to illustrate the level of computing resources that may be necessary to provide decision support using sampling approaches fortwo-stageSLPmodels. ThetwoapproachespresentedbelowareSampleAverageAp- proximation(SAA)andtheStochasticDecomposition(SD)algorithm. 3.2.1 AnapplicationofSampleAverageApproximationtoSSN In recent years, SAA has become the de facto standard for sampling-based SP method- ology. This methodology draws inspiration from statistical inference (see Shapiro et al. (2009)), and is supported by stability results (Römisch (2003)) which justify approxima- tions of SLP, and calculations of general confidence bounds (Vogel (2008)). The Statistical Validation procedure for SAA is usually based on Mak et al. (1999). As an overview, the SAAprocessmaybesummarizedasfollows. 1. Choose a sample size N, and choose a number M denoting the number of replications (batches). 2. (Optimization Step). For m=1,...,M create an approximation F m N (x), and solve an SAAinstance. Let ˆ F m N denotetheoptimalvalueofreplicationm. 3. (Statistical Validation Step). Using { ˆ F m N } M m=1 estimate a Lower Bound confidence inter- val. Using M solutions (i.e., potential decisions) from the Optimization Step, estimate thebestUpperBoundconfidenceintervaltoaspecifiedlevelofaccuracy. 4. If the lower end of the estimated Lower Bound confidence interval is acceptably close to the upper end of the estimated Upper Bound confidence interval, then stop. Else, increasethesamplesizeN andrepeatfromstep2. Becausethesamplingstepisindependentoftheoptimizationstep, SAAissometimes referredtoasan“external”samplingapproach. Thisseparationbetweentheoptimization Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming 35 algorithm and sampling leads to a fairly general setup. Some presentations in the litera- turerefertothe“OptimizationStep”asthe“TrainingStep”,andthe“StatisticalValidation Step” as simply the “Validation Step” (see Hastie et al. (2013)). While the notion of repli- cations is often not emphasized in some segments of the literature, decision-makers who have experience with sample-based algorithms (e.g., simulation) seek variance estimates of any metric reported by an algorithm, which in this case consists of lower bound and upperboundestimates. Thecalculationinstep4reflectstheworstcaseoptimalitygap,whichwerefertoasthe Pessimistic-Gap. We should also note that variance reduction techniques, such as Latin Hypercube Sampling (LHS), have been found to reduce variance of SAA estimates (Lin- deroth et al. (2006)). When M is in the neighborhood of 30, one typically invokes the central limit theorem for estimating a Lower Bound confidence interval; however, when M is much smaller (say 10), then, confidence intervals should be derived using the t- distribution. In order to complete the numerical illustration of SAA for SSN, we now draw upon resultsfromLinderothetal.(2006). TheOptimizationStepfortheirstudywasperformed usingtheAsynchronousTrustRegionalgorithmofLinderothandWright(2003). FromMaketal.(1999)itiswellknownthattheoptimalitygapreduces(inexpectation) with increases in sample size (as shown in Table 3.1). However, as the row forN = 1000 illustrates, any particular sequence of runs may not result in improved pessimistic gaps. Nevertheless,weseethatwithasignificantincreaseinsamplesize(from1000to5000),we see a significant improvement in the "pessimistic gap" in Table 3.1. Because the bounds in Table 3.1 provide 95% confidence intervals, the chance that the actual gap exceeds the pessimisticgapislessthan10%. Thelastrowcorrespondingtoasamplesizeof5000yields apessimisticgapofabout2%oftheupperbound,withveryhighprobability. 36 Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming TABLE 3.1: SAAwithLatinHypercubeSampling SampleSize(N) LowerBound UpperBound Pessimistic-Gap 50 10.10(±0.81) 11.380(±0.023) 2.113 100 8.90(±0.36) 10.542(±0.021) 2.023 500 9.87(±0.22) 10.069(±0.026) 0.445 1000 9.83(±0.29) 9.996(±0.025) 0.481 5000 9.84(±0.10) 9.913(±0.022) 0.195 The results of Linderoth et al. (2006) were obtained using a computational grid with hundreds of desktop PCs, although no more than one hundred machines were in oper- ation at any one time. Even so, each SAA instance of the final row (with N = 5000) required about 30-45 minutes of wall clock time for solving one SAA instance of SSN. As it turned out, the solutions provided by the replications (about 6) were quite disparate eventhoughtheseexperimentsweredoneusingLatin-HypercubeSampling. Theuseofa computational grid to establish the above results was a remarkable feat, solving millions ofLPsongeographicallydispersedandarchitecturallydiversemachines. However,italso underscoresthechallengeofusingSAAforreal-worldapplications;ifoneresortstosam- pling/simulation without exploiting the structure of the optimization problem, then the computing resources required to solve these instances can easily out-strip the available resources,thusrestrictingthepotentialoftheSLPmodelingparadigm. 3.2.2 StochasticDecomposition(SD)forSSN Inkeepingwiththe“high-level”descriptionofSAAabove,weprovidea“high-level”de- scriptionofSD.AswithSAA,onemaychoosethenumberofreplications(M),butinstead of choosing a sample size, we allow the SD algorithm to determine what is a sufficiently large sample size during the Optimization Step. Unlike SAA, SD does not optimize one sample average functionF m N ; instead it creates successive approximationsf m k at iteration k. Theseapproximationssatisfy Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming 37 f m k (x) F m k (x), (3.3) whereF m k denotes asample average functionwith a sample sizeofk (as in(2.21)). These minorantsarereferredtoasValueFunctions(VF)andeachVFisthepointwisemaximum of finitely many affine functions. Of these affine functions, two are lower bounds on sub- gradients of F m k denoted h 0,k and h ⌫,k while the rest are updated versions of previously generatedlowerboundingapproximationsofF m t ,t<k,andaredenotedh t,k . Askgrows, the approximations h 0,k and h ⌫,k become more accurate approximations of subgradients ofF m k ,andatsuchiterationstheycanberegardedassampleaveragesubgradientapprox- imations (SASA). In summary, eachf m k is the maximum ofn 1 +3 affine pieces, with two SASA functions from the current iterationk and at mostn 1 +1 from previous iterations. WedenotetheindexsetforalloftheseaffinefunctionsasJ k with|J k | n 1 +3. Because of sampling, there are errors, and if these errors cause the sampled approxi- mation to “cut off” portions of the epigraph of f, then they can thwart asymptotic con- vergence of the algorithm. Instead of letting the errors persist in future iterations (i), we activelymodifythesefunctionssothattheysatisfyh m t,i (x) F m i ,wherei k. Forthisrea- son,weprefernottorefertoeachpieceh t,k asacut(whichisthetraditionalterminology), andrefertothemasminorants. For iterationk+1, the next sample (of sizek+1) will use allk previously generated outcomesandaddonemore(generatedindependentlyofpreviousoutcomes)toarriveat a sample size of k+1. The earliest version of SD (Higle and Sen (1991b)) optimized the VFapproximationofiterationk,thatis,Min{f m k 1 (x)|x2X}toobtainthenextcandidate solution x m,k . More recent versions, including this chapter, are based on Higle and Sen (1994) where optimization of a regularized version produces the next candidate solution. This form of the objective function is dual to the objective function commonly used in bundlemethods. A subset of the sequence of candidate solutions, denoted ˆ x m,k , will be referred to as “incumbents” (or incumbent solutions). In earlier papers, it has been shown that if 38 Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming k!1 then, we have asymptotic consistency of the values i.e., if ˆ x m,k ! x m , then lim k!1 f m k (ˆ x m,k )=lim k!1 F m k (ˆ x m,k )= c > x m +E[h(x m , ˜ !)] (wp1). A few more details regarding the algorithm are provided in the following section (see also Birge and F. Lou- veaux(2011)). While results like consistency are based on long-run behavior of an algorithm, one stopseachreplicationafterafinitenumberofsteps,whichwillbebasedondetectingthat theapproximationsobtainedforthecurrentreplicationhavestabilizedsufficiently, based onagiventolerancelevel. ThistestisknownasanIn-Samplestoppingrule(HigleandSen (1999)), andsignalswhetheraparticularreplicationhasenoughinformationtoproposea solutionwhichwedenotebyx m . Ifm<M (thedesirednumberofreplications),then,we proceedtothenextreplication;otherwiseSDrecommendsa“compromisedecision”x c which presents a compromise between all replicated decisions x m . Using x c as the proposed decision,wecalculatea95%confidenceintervalfortheupperboundf(x c ). Inaddition,a 95%confidenceintervalforalowerboundestimateontheoptimalvalueisalsoreported. AswithSAAcomputationsintheprevioussubsection,wewillreportthepessimisticgap. Ahigh-leveldescriptionofSDisprovidednext. 1. (Initialize). Letmdenoteacounterofcompletedreplications,andsetm=0. 2. (Out-of-Sampleloop). Incrementm,andinitialize(orcontinue)thenextreplication. 3. (In-Sample loop). Update the available sample by adding one sampled outcome (inde- pendentofpreviouslygeneratedoutcomes),andupdateempiricalfrequencies. 4. (Update Value Function (VF) Approximation). Using previously generated approxima- tionsandthenewoutcome,updatetheVFapproximationf m k (x). 5. (OptimizationStep). OptimizearegularizationoftheVFapproximation,andupdatean incumbentsolutionforthefirststage. 6. (In-Sample Stopping Rule). If an In-Sample stopping rule is satisfied, then output a lower bound estimate ˆ ` m := f m (x m ), an incumbent x m , and continue to step 7. Else repeatfromstep3. Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming 39 7. (Out-of-Sample Stopping Rule). If the number of replications is less than M, then go to step 2. Else, using the replicated solutions {x m } calculate a compromise decision (denotedx c ),andwiththissolutionestimatea95%UpperBoundconfidenceintervalof specified accuracy. Using { ˆ ` m } M m=1 , the lower bound estimates, calculate a 95% Lower Bound confidence interval. If the Pessimistic-Gap is acceptably small then stop. Else, decrease the tolerance of the In-Sample Stopping Rule, reset m=0, and resume all M replicationsfromstep2. It is important to observe that this sequential sampling procedure automatically in- cludes variance reduction from Common Random Numbers (CRN). To see this, we men- tion that in step 5 above, the candidate replaces the incumbent as the new incumbent if the estimated value of the candidate solution is lower than that of the current incumbent using the same collection of outcomes {! t } k t=1 . This comparison is essentially based on usingCRN(seeFu(2002)). Inaddition,theSASAfunctionsgeneratedinanyiteration(one at the incumbent and another at the candidate) use the same stream of outcomes. Hence thesetwoaffinefunctionsviewallx2X throughthesamelens(ofoutcomes). ThisprocedurewasexecutedfortheSSNinstanceusingthreeincreasinglytighterrel- ative tolerances: loose (0.01), nominal (0.001), and tight (0.0001), and the results for each runappearinTable3.2. TABLE 3.2: StochasticDecompositionwithCommonRandomNumbers(on adesktopPCwithCPLEX12.4) Stopping Avg. SampleSize LowerBound UpperBound Pessimistic- CPUTime(s) Tolerance (StdDev) Gap (StdDev) Loose 1030.83(182.31) 9.345(±0.240) 9.951(±0.05) 0.896 30.11(6.63) Nominal 2286.90(341.71) 9.736(±0.118) 9.927(±0.05) 0.359 90.50(20.56) Tight 3305.47(617.17) 9.852(±0.107) 9.923(±0.05) 0.228 162.01(55.43) 40 Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming Upon examining Table 3.2, we first observe that the average sample size (per replica- tion) increases with increased precision, as expected. The solution quality (as seen in the upper bound) does not improve dramatically from the first row to the third. However, the improvements in lower bounds are significant, ultimately, mitigating the uncertainty about the quality of the solution. In this sense, it reinforces a common observation in dif- ficult optimization models: proving optimality is what requires extensive computations for difficult instances. Nevertheless, the average CPU secs. for even the row with “tight” toleranceisunder3minutesonadesktopPCwiththefollowingspecifications: IntelCore i7-3770SCPU@3.10GHz(Quad-Core),and8GBMemory@1600MHz. Theprocessorused for the calculation reported in Table 3.2 is faster than the average processor of 2004/2005 vintage: PentiumIVprocessorsofthaterawereapproximately2GHz. Since these processors are faster than the ones used in the computational grid study (Linderoth et al. (2006)), we also examine the performance on a slower machine. We exe- cutethesameSDcodeonaMacBookAirrunningIntelCorei5CPU@1.8GHz(Dual-Core) with 4 GB Memory @1600 MHz. Such (laptop) processors of today could be considered on par with (or slightly slower than) the standard Pentium IV processors of 2004/2005 vintage. The average solution times and solution quality for such a processor is reported in Table 3.3. Notice that the solution quality is very similar to that reported in Tables 3.1 and3.2. Indeed,theaveragelowerboundaswellasthepessimisticlowerbound(inTable 3.3) are the best reported to date for SSN. Moreover, the increase in CPU time (secs.) is marginal with “tight” tolerance, requiring just over 3 mins. of CPU time on average. In contrast, the grid study (Linderoth et al. (2006)) reported wall-clock time of the order of 30-45 minutes per replication using about 100 processors at any given time (for instances withN = 5000). We recognize that it is impossible to provide a precise characterization of the level of speed-upforseveralreasons: a)inadditiontoprocessorspeeds,LPsoftwarehasalsomade significant progress since 2004/2005, and b) there is communications overhead involved with using the grid. Nevertheless, it is worth noting, as in a recent PCAST report (Hol- drenetal.(2010)),thatsoftwareadvancesarejustasmeaningfulforchallengingnumerical Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming 41 problemsasimprovementsinprocessingpower. Asaroughestimateofsuchadvances,let ⌘ representthefractionoftimethatprocessorsareeitheridleorcommunicatingduringthe shortest run (30 mins. of wall-clock time) on the grid. Then the speed-up factor for soft- ware(LP-solversandSD)usingalaptopcouldbeapproximatedas 30 3 ⇥ 100(1 ⌘ ). Tosee howthismightcomparewiththerateofMoore’sLaw, firstnotethatthespeedofproces- sorsusedforTable3.3isaboutthesameorslowerthanstandardprocessorsof2004/2005 vintage. Hencethespeed-upcanbeattributedentirelytosoftwareprogress,whichimplies that as long as ⌘ 0.872 (i.e., idling/communications are less than 87.2%) speed-ups in LP/SLPsoftwareoutpaceMoore’sLawwhichcallsforafactorofatleast128in10.5years. TABLE 3.3: StochasticDecompositionwithCommonRandomNumbers(on aMacBookAirwithCPLEX12.4) Stopping Avg. SampleSize LowerBound UpperBound Pessimistic- CPUTime(s) Tolerance (StdDev) Gap (StdDev) Loose 1023.33(167.62) 9.366(±0.244) 9.953(±0.050) 0.881 32.73(6.97) Nominal 2353.43(343.33) 9.764(±0.120) 9.928(±0.050) 0.334 109.96(26.31) Tight 3137.50(605.17) 9.876(±0.107) 9.925(±0.050) 0.206 189.79(74.57) LetusnowreturntoacloserexaminationofTable3.2. ThesteppedcurveinFigure3.1 showsthespreadofobjectivefunctionestimatesobtainedforeachtolerancelevelreported inTable3.2. DespitethefactthatSDisanasymptoticallyconvergentalgorithm,theobjec- tivefunctionestimates(foreachterminalincumbent)showvariabilityduetothefactthat each run is terminated upon achieving some level of accuracy in finitely many iterations. Foreachtolerancesetting(Loose,NominalandTight),theobjectivefunctionestimatesare in the range [10.100, 10.460], [9.994, 10.330] and [9.950, 10.310] respectively, and note that both upper and lower limits of these ranges move in the appropriate direction (lower). Moreover,fromTable3.2wenoticethatlowerboundsincreasesteadily,startingwith9.345 (forloosetolerance)risingto9.736(fornominaltolerance), andfinally9.852(fortighttol- erance). Thus increasing precision in SD leads to less biased estimates of lower bounds. 42 Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming Moreover, comparing these lower bounds to the last two rows (N = 1000,5000) in Table 3.1, we observe that lower end of confidence intervals for lower bounds in Table 3.1 are similartothosereportedinTables3.2and3.3. 9.9 10.0 10.1 10.2 10.3 10.4 10.5 0.0 0.2 0.4 0.6 0.8 1.0 9.9 10.0 10.1 10.2 10.3 10.4 10.5 0.0 0.2 0.4 0.6 0.8 1.0 9.9 10.0 10.1 10.2 10.3 10.4 10.5 0.0 0.2 0.4 0.6 0.8 1.0 Frequency of SSN objective function estimates for Loose, Nominal and Tight tolerances Estimated Objective Cumulative Frequency Tight Nominal Loose Tight Nominal Loose Loose Tolerance Nominal Tolerance Tight Tolerance FIGURE 3.1: Cumulative frequency of SSN objective function estimates for Loose,NominalandTighttolerances Figure1alsoshowsthreelinesonthefarleftofthefigure. Thesecorrespondtotheup- perbound(averagevalue)reportedinTable3.2. Theycorrespondtoobjectiveestimatesfor compromise decisions at each tolerance setting. As shown in Figure 3.1 the compromise decisions for each setting yields a lower objective function estimate than the incumbent solutionsforthecorrespondingrun. Thecompromisedecisionsarenotonlysuperior,but they also possess another important property: when the compromise decision and the average decision are reasonably close, we can also conclude that both decisions are rea- sonablygood. Wewillestablishthisresultinthefollowingsection(Theorem2). Fornow, examining the specific instance at hand (i.e., SSN), we present the maximum relative dif- ference between the compromise and average decisions, that is {2|x c j ¯ x j |/|x c j +¯ x j |},8 j in Figure 3.2. For variables that are almost zero (tolerance) we report the absolute dif- ference {|x c j ¯ x j |},8 j instead of the relative difference. In these figures, the horizontal Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming 43 axisdisplaystheindexoffirststagedecisionvariables,andtheverticalaxisrepresentsthe differencebetweenthecompromiseandaveragedecisionsforeachfirststagevariable. Ob- servethatforeachtolerancesetting,themaximumrelativedifferencesareontheorderof 10 4 to 10 5 . Sincetherelativedifferenceshownforalltolerancelevelsisprettyminimal, we can infer high quality decisions from compromise solutions for each tolerance level, even thoughthe LowerBound confidenceintervals areweaker forthe loose andnominal tolerances. Thus even for SSN, an instance considered to be ill-conditioned, the nominal setting provides reasonable accuracy. Such decision support is intended to mitigate the effectsofuncertainty,withoutrequiringextensivecomputationalresources. 3.3 AlgorithmicConceptsinStochasticDecomposition Thissectionprovidesthealgorithmicbackgroundforthecomputationalresultspresented for SSN, as well as the more extensive computations presented in the next section. The algorithmic content will be presented in two subsections: one dealing with algorithm de- sign and convergence, and another on stopping rules. The latter will be divided into two furthersubsectionsdealingwithIn-SampleandOut-of-Sampleaspectsofstoppingwithin theSDframework. Beforegettingintothedetails,wementionsomeofthecriticalassump- tionsforSD. Assumptions. ThecurrentSDalgorithmoperatesunderthefollowingassumptions: 1. Thesetoffirststagesolutions,X,andthesetofoutcomes⌦ arecompact. 2. Therelativelycompleterecourseassumptionholds. 3. ThematrixD isdeterministic(i.e.,fixedrecourse)andthevectordisalsodetermin- istic. 4. The recourse functionh is non-negative. This assumption can be relaxed in several ways(HigleandSen(1996b)). 44 Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming 0 20 40 60 80 0.0000 0.0010 0.0020 0.0030 Index of solution vector for loose tolerance Relative/absolute difference Maximum relative difference: 2.936506e-04 Maximum absolute difference: 2.99852e-03 0 20 40 60 80 0.0000 0.0010 0.0020 0.0030 Index of solution vector for nominal tolerance Relative/absolute difference Maximum relative difference: 5.857978e-05 Maximum absolute difference: 1.99228e-04 0 20 40 60 80 0.0000 0.0010 0.0020 0.0030 Index of solution vector for tight tolerance Relative/absolute difference Maximum relative difference: 4.85385e-05 Maximum absolute difference: 1.580362e-04 FIGURE3.2: Relative/absolutedifferencebetweencompromiseandaverage decisionscalculatedforLoose,NominalandTighttolerances Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming 45 3.3.1 AlgorithmicDetailsofSDandConvergence The earliest paper on SD (Higle and Sen (1991b)) proposed using linear master programs as is common in Benders’ decomposition (Birge and F. Louveaux (2011)). Subsequently themethodwasextendedtoabundle-typemethodviaaregularizedalgorithm(Higleand Sen (1994)). In order to keep this presentation self-contained, we provide a brief review togetherwithsomeresults. For this, and the following section, we will suppress the index m because our focus will be on calculations during any replication. At iteration k, a VF approximation will be given by the pointwise maximum of some linear (formally affine) functions, that is, f k 1 (x)= c > x + max{h j,k 1 (x),j 2J k 1 }. With each index j, we will record t(j) as the iterationatwhichthelinearfunctionwascreated. Thisquantityt(j)willalsoremindusof the sample size used to create thej th function. It was shown in Higle and Sen (1994) that oneonlyneedsn 1 +3indices(atmost)inJ k 1 , wheren 1 denotesthenumberofdecision variablesinthefirststage,andfunctionsthataredeletedwillbe“forgotten”forallfuture iterations. AnyiterationoftheSDalgorithmworkslikeanelection. Atiterationk,westartwitha previously chosen incumbent decision and a VF approximationf k 1 (x). The algorithmic strategy is to present a challenge to the incumbent by finding a solution to the following optimizationproblem. x k 2argmin{f k 1 (x)+ ⇢ 2 kx ˆ x k 1 k 2 |x2X} (3.4) The decisionx k is referred to as the candidate. The quadratic term in the above prob- lem is variously referred to as a proximal term or Tikhonov regularization. This form of decomposition (SD, Higle and Sen (1994)) predates recent bundle methods (e.g., Oliveira etal.(2011))andisdifferentinseveralways: a)convergenceofSDisprovenforbothcon- tinuousanddiscreterandomvariables, b)weworkfromstatisticalapproximationsofthe objective function, which in turn requires a completely different convergence concept, c) SDrequiresonlyapproximatesecondstagesolutionsbasedonpreviouslydiscovereddual 46 Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming extremepoints,d)themasterprogram(3.4)isdualtotheoneusedinthebundlemethod, ande)SDisabletoincludestatisticalstoppingrulesasdiscussedsubsequentlyinsection 3.3.2. Thequantity⇢ 1ischosenadaptivelydependingupontheprogressobserveddur- ing the algorithm. Formally, it should also be indexed byk, but since we will not discuss theproceduretoupdate⇢ ,weprefertomaintainanun-indexedparameter. We will pit the two competing decisions ˆ x k 1 (incumbent) andx k (candidate) against each other using an updated value function f k (x). If the candidate happens to be signifi- cantly better (lower value) than the incumbent, then we accept it as the new incumbent. Otherwise, there is no incumbent update. The first question at this point is: how does one calculate and update the VF approximationsf k (x)? We accomplish the update in the followingsteps: 1. Generateanewoutcome! k ,independentofallpreviousoutcomes. 2. Let ⇡ k denote the optimal conditional dual vertex for the second stage LP, given inputs (x k ,! k ). Assuming V k 1 (possibly empty) is available from previous iterations, V k V k 1 [{ ⇡ k }. 3. LetJ k = {0,1,...,⌫ }betheindexsetofSASAfunctionsatiterationk. DerivetwoSASA functions denoted h ⌫,k ,h 0,k , with the former representing the candidate, and the latter representingtheincumbent. Wesimplypresentcalculationsforh ⌫,k ,andrecognizethat h 0,k (i.e.,↵ 0,k + > 0,k ˆ x k 1 )iscalculatedsimilarlybyreplacingx k with ˆ x k 1 in(3.6). h ⌫,k (x)= 1 k k X i=1 ⇡ > i,k [⇠ (! i ) C(! i )x]⌘ ↵ ⌫,k + > ⌫,k x (3.5) where,i=1,...,k and ⇡ i,k is a conditionaldual vertex for outcomei, and iscalculated asfollows. ⇡ i,k 2 argmax{⇡ > [⇠ (! i ) C(! i )x k ]|⇡ 2V k } (3.6) 4. Delete one of those linear functions h j,k 1 for which the dual multiplier obtained by solving(3.4)iszero. CreateanupdatedindexsetJ k . (Onecanensurethatatmostn 1 +3 indicesarepresentinJ k .) Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming 47 5. Lett(j)denotetheiterationatwhichinequalityj2J k wascreated. Undertheassump- tionthath 0forall (x,!),thefunctionsj/ 2{0,⌫ }areupdatedasfollows. h j,k (x) ✓ t(j) k ◆ h j,t(j) (x),j2J k \{0,⌫ }. (3.7) Remark 1. The approximations used in SD depend critically on the SASA functions (see (3.5)) derived for both the candidate and the incumbent solutions. These functions are similar in spirit to Benders’ cuts; however, they are different in several important ways: a) they use the empirical distribution induced by the sample, b) as seen in (3.6) they can be derived by using approximately optimum shadow prices (in V k ), and c) as iterations proceed, the SASA functions that were gen- erated in past iterations are given less weight because they were created using a smaller sample size (than k). Thus unlike cuts in Benders’ decomposition, each SASA function ultimately fades away, thus avoiding the persistence of poor (sampled) approximations. Because they do not cut away any part of the epigraph permanently, we do not refer to these approximations as “cuts” (as intraditionalBenders’decomposition). Let k =f k 1 (x k ) f k 1 (ˆ x k 1 )denotethepredictedchangeundertheVFapproxima- tionf k 1 . Weallow ⇢ 1tobechoseninsuchawaythat k 0. Thenthewinnerofthe electionusingtheapproximationf k willbethenextincumbent ˆ x k asobtainedbelow,with afixedparameterr2(0,1)(setat0.2inourexperiments). ˆ x k = 8 >>< >>: x k if f k (x k )<f k (ˆ x k 1 )+r k ˆ x k 1 otherwise (3.8) In the event that the incumbent changes, the positioning of the incumbent and candidate inequalitiesinJ k mustalsobeswapped. Beforepresentingthenewmethodologicalresultsofthischapter,wesummarizesome keyresultsfromtwopreviouspapersinthefollowingproposition. Parts(a)and(b)below 48 Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming pertain to both linear master problems Higle and Sen (1991b) as well as the regularized masterproblemsHigleandSen(1994). Proposition 1. Let {f k }, {x k },{ˆ x k } be the sequences of objective function approximations, can- didate solutions and incumbent solutions, respectively, generated by the algorithm. LetK denote thesequenceofiterationswheretheincumbentchanges. (a)If{ˆ x k } k2 K ! ˆ x,then{f k (ˆ x k )} k2 K ! f(ˆ x)(withprobabilityone). Let (↵ 0,k , 0,k )denote the coefficient corresponding to the incumbent SASA function (see step 3 of the statement of the regularized SD algorithm). Then every accumulation point of {(↵ 0,k ,c + 0,k )} k2 K defines a supportoff at ˆ x(withprobabilityone). (b) Let {ˆ x k } 1 k=0 and {x k } 1 k=1 be the sequence of incumbent and candidate solutions, respec- tively, generated by the regularized SD algorithm. With probability one there exists a subsequence ofiterations,indexedbyasetK ⇤ ,suchthatlim k2 K⇤ f k 1 (x k )+ 1 2 kx k ˆ x k 1 k f k 1 (ˆ x k 1 )=0, andeveryaccumulationpointof{ˆ x k } k2 K⇤ isanoptimalsolutionof(3.1)(wp1). PROOF. (a)SeeTheorem2inHigleandSen(1991b). (b)SeeTheorem5inHigleandSen(1994). Part (a) of the above proposition shows that the SD algorithm obtains asymptotically accurate objective function values (wp1) whereas part (b) shows the existence of a subse- quence whose accumulation points are optimal (wp1). Note that this does not rule out the possibility of other subsequences. In Theorem 1 we show that the entire sequence of incumbents does converge to an optimal solution (wp1). The existence of the limit is particularlyimportantinastochasticalgorithmbecausetheapproximationsbecomemore dependableinthiscase. However,observethattheuniquenessofthelimitpointdoesnot requiretheinstancetohaveauniquesolution. Theorem 1. AssumeX is a compact set, and the fixed-and-relatively-complete recourse assump- tionholds. Inaddition,assumethat⇢ 1,andtherecoursefunctionh(x,!)isnon-negativeforall x2X almostsurely. Thenthesequence{ˆ x k }generatedbytheregularizedSDalgorithmconverges toanoptimalsolutionwithprobabilityone. PROOF. Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming 49 Inthecourseoftheproofofpart(b)ofProposition1,HigleandSen(1994)showsthat if there are only finitely many distinct incumbent solutions generated by the algorithm, thenthefinalincumbentsolutionisanoptimumwithprobabilityone. Inthiscase,thereis clearly a unique solution generated by the algorithm. We prove this result for the case of infinitely many incumbents in Theorem 1. LetK denote the sequence of iterations where theincumbentchanges,andfor⌧> 1,letk 0 ,k 1 ,k 2 ,...,k ⌧ 2K. Considerthequantity ⌧ = 1 ⌧ ⌧ X `=1 ⇣ f k ` 1 (ˆ x k ` ) f k ` 1 (ˆ x k ` 1 ) ⌘ (3.9a) = 1 ⌧ [f k⌧ 1 (ˆ x k⌧ ) f k 1 1 (ˆ x k 0 )]+ 1 ⌧ ⌧ 1 X `=1 ⇣ f k ` 1 (ˆ x k ` ) f k `+1 1 (ˆ x k ` ) ⌘ (3.9b) BecauseofProposition1part(a),thesummationtermin(3.9b)mustapproachzero(wp1) as⌧ !1 (seePage129ofBartle(1976)). Hence lim ⌧ !1 ⌧ =lim ⌧ !1 1 ⌧ [f k⌧ 1 (ˆ x k⌧ ) f k 1 1 (ˆ x k 0 )](wp1). (3.10) Becauseofthefixedandcompleterecourseassumption,andthecompactnessofX,the termf k⌧ 1 (ˆ x k⌧ ) f k 1 1 (ˆ x k 0 )isbounded. Henceas⌧ !1 ,(3.10)implies lim ⌧ !1 ⌧ =0(wp1). (3.11) Now with ⇢ 1, the optimality conditions for (3.4) at a candidate solution x k imply that(seeequation(2.6)onpage115ofHigleandSen(1996b)). f k 1 (x k ) f k 1 (ˆ x k 1 )k x k ˆ x k 1 k 2 0. (3.12) Focusing on those iterations in which incumbents change (as in the index set K above), we note that ˆ x k ` 1 =ˆ x k ` 1 . Combining (3.9a), (3.11) and (3.12), we conclude that (with probability1) 50 Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming 0= lim ⌧ !1 ⌧ =lim ⌧ !1 1 ⌧ ⌧ X `=1 ⇣ f k ` 1 (ˆ x k ` ) f k ` 1 (ˆ x k ` 1 ) ⌘ (3.13a) lim ⌧ !1 1 ⌧ ⌧ X `=1 kˆ x k ` ˆ x k ` 1 k 2 0,k ` 2K. (3.13b) Therefore we conclude that the sample average change between all incumbent solu- tionsvanishes(wp1). DuetoProposition1(b),thereexistsasubsequenceindexedbyK ⇤ suchthat{ˆ x k ` } k ` 2 K⇤ ! ˆ x,anoptimalsolutionwithprobabilityone. Hence,thereexistsapositiveintegerL 1 <1 suchthatkˆ x k ` ˆ xk< ✏ 2µ fork ` 2K ⇤ and`>L 1 ,andµisthediameteroftheballcontain- ingthecompactsetX. Leteverysequencethatconvergesto ˆ xbeincludedintheindexset ˆ K ⇤ . For example, choose> ✏ 2µ > 0 sufficiently large, and define ˆ K ⇤ = {k|kx k ˆ xk , k 2K}. Define ¯ K ⇤ =K \K ⇤ . Iftheclaimofthetheoremisfalse,thenthereisapositive probability that ¯ K ⇤ is an infinite set of iterations for which 0 < p 2✏<kˆ x k ` ˆ xk<µ for k ` 2 ¯ K ⇤ where`>L 2 , a positive integer. Denote L = max{L 1 ,L 2 }. Now note that the summationin(3.13b)canbewrittenas 1 ⌧ ⌧ X `=1 kˆ x k ` ˆ x k ` 1 k 2 = 1 ⌧ 2 6 6 4 ⌧ X `=1 (k ` ,k ` 1 )2 K⇤ kˆ x k ` ˆ x k ` 1 k 2 + ⌧ X `=1 (k ` ,k ` 1 )2 ¯ K⇤ kˆ x k ` ˆ x k ` 1 k 2 + ⌧ X `=1 k ` 2 K⇤ ,k ` 1 2 ¯ K⇤ kˆ x k ` ˆ x k ` 1 k 2 + ⌧ X `=1 k ` 2 ¯ K⇤ ,k ` 1 2 K⇤ kˆ x k ` ˆ x k ` 1 k 2 3 7 7 5 (3.14a) 1 ⌧ 2 6 6 4 ⌧ X `>L k ` 2 K⇤ ,k ` 1 2 ¯ K⇤ k(ˆ x k ` ˆ x) (ˆ x k ` 1 ˆ x)k 2 + ⌧ X `>L k ` 2 ¯ K⇤ ,k ` 1 2 K⇤ k(ˆ x k ` ˆ x) (ˆ x k ` 1 ˆ x)k 2 3 7 7 5 (3.14b) Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming 51 1 ⌧ 2 6 6 4 ⌧ X `>L k ` 2 K⇤ ,k ` 1 2 ¯ K⇤ ⇣ kˆ x k ` ˆ xk 2 +kˆ x k ` 1 ˆ xk 2 2kˆ x k ` ˆ xkkˆ x k ` 1 ˆ xk ⌘ + ⌧ X `>L k ` 2 ¯ K⇤ ,k ` 1 2 K⇤ ⇣ kˆ x k ` ˆ xk 2 +kˆ x k ` 1 ˆ xk 2 2kˆ x k ` ˆ xkkˆ x k ` 1 ˆ xk ⌘ 3 7 7 5 (3.14c) 1 ⌧ 2 6 6 4 ⌧ X `>L k ` 2 K⇤ ,k ` 1 2 ¯ K⇤ ✏+ ⌧ X `>L k ` 2 ¯ K⇤ ,k ` 1 2 K⇤ ✏ 3 7 7 5 . (3.14d) To make the above arguments completely transparent, observe that in going from (3.14a) to (3.14b), we dropped some non-negative terms, and added/subtracted ˆ x; from (3.14b)to(3.14c),weappliedCauchy-Schwartzinequality;from(3.14c)to(3.14d),weused theboundsontheappropriatedistancebetweenpointsofK ⇤ and ¯ K ⇤ i.e.,kˆ x k ` ˆ xk 2 > 2✏ fork ` 2 ¯ K ⇤ andkˆ x k ` ˆ xkkˆ x k ` 1 ˆ xk< ✏ 2 forcrossoverindices. From(3.14),wehave 1 ⌧ ⌧ X `=1 kˆ x k ` ˆ x k ` 1 k 2 C ⌧ ⌧ ✏whereC ⌧ 2(0,⌧ L], (3.15) where C ⌧ reflects the number of crossovers between K ⇤ and ¯ K ⇤ . Moreover, the quantity C⌧ ⌧ reflects an unbiased (empirical) estimator of the probability of crossovers betweenK ⇤ and ¯ K ⇤ . Thus, if the probability of crossovers is non-zero, the estimator lim ⌧ !1 C⌧ ⌧ 6=0 wp1. Hence, the left hand side of (3.15) is strictly greater than zero wp1, contradicting (3.13). Therefore, the supposition that the set ¯ K ⇤ has positive probability is untenable, whichconcludestheproof.⌅ Afewcommentsonanalyticalpredictions(asopposedtocomputationalexperiments) are also in order. In this regard, recall that any SAA implementation separates the com- putationalworkalongtwodimensions: numericaloptimizationandstatisticalvalidation. RoysetandSzechtman(2013)exploresavarietyofcombinationsbasedonasymptoticrates forusingnumericaloptimizationwithinSAA.Arelatedmethod,knownasRetrospective 52 Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming Approximations (RA), has been studied in Pasupathy (2010) where the sample average function is allowed to use larger sample sizes as iterations proceed. It is easy to see that RA has similarities with both SAA as well as SD. Like SAA, it seeks a near-optimal solu- tion to an approximate problem of the same form as SAA (2.21); although like SD, it uses a growing set of outcomes. Indeed, RA is perhaps closest in spirit to a precursor of SD knownasaStochasticCuttingPlanemethod(SCP,HigleandSen(1996b)),andforsmooth problems,onemightexpectSCPtohavesimilarconvergenceratesasRA,asymptotically. 3.3.2 StoppingRules As with any optimization algorithm, SD must be terminated in finitely many iterations. Becausetheexpectationoperatorrequiresmulti-dimensionalintegration,providingdeter- ministic certificates of optimality for practical instances (with several random variables) is intractable in general. As a result, a tandem of stopping rules, one based on In-Sample estimates, and the other on Out-of-Sample tests have been studied previously in a series of papers (Higle and Sen (1991a), Higle and Sen (1996a), and Higle and Sen (1999)). We will comment on the performance of these tests in the appropriate subsections below. At this point, it suffices to say that previously known hurdles (e.g., the inability to recon- cilemultiplesolutionsfromreplications,relativelylargegapestimates,andinsomecases time-consumingLP-basedbootstrapprocesses)havebeenovercomethroughafreshview oftheOut-of-Sampletests. Thesestoppingrulesarepresentednext. In-SampleStoppingRule. AsproposedinHigleandSen(1999),theIn-Sampleruleisintendedtoaddresstwoissues: 1. (DualVertexStability.) Toassesswhethertheapproximationdueto(3.6)exhibitsany sensitivitytoadditionalinformationintheformofnewdualvertices. 2. (Primal-Dual Gap Stability.) To recognize whether the estimated primal and dual solutionsassociatedwith(3.4)aresensitivetovariabilityduetosampling. Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming 53 DualVertexStability.Atiterationk,weassesstheimpactofnewinformation(newout- comes,newfirst-stagecandidatesolutionsandmostimportantlynewdualvertices)onVF approximations. NotehoweverthatthefirsttermofanyVFapproximationislinear(c > x) and c is known. Hence the predictive capacity of a VF approximation depends on how well the dual vertices inV k predict the recourse function realizationsh(x,!) for any pair (x,!) encountered by the algorithm. For any set of runs, suppose that we fix a tolerance leveldenoted✏. Forsucharun,letq2[2,k 1]defineawindowofiterationswhichwillbe used to ascertain whether further iterations are meaningful for the purpose of improving theapproximationoftherecoursefunction. Towardsthisendweobservethedifferencein approximation-qualitywhentherecoursefunctionisestimatedusingV q versusthelarger setV k in(3.6). Thevalueq representsawindowofiterations. Welookatwhatmighthave happenedifwestopcollectingdualverticesatiterationq. Andtheimpactofextra (k q) iterationsontheVFapproximationsisestimatedusingthefollowingratio: R k q (x)=S k q (x)/S k k (x),where (3.16a) S k q (x)= k X i=1 max{⇡ > [⇠ (! i ) C(! i )x]|⇡ 2V q } (3.16b) S k k (x)= k X i=1 max{⇡ > [⇠ (! i ) C(! i )x]|⇡ 2V k }. (3.16c) TheseratiosarecalculatedwheneverwecalculateaSASAfunctionateitheracandidate (x k )oranincumbent(ˆ x k 1 ). Byassumption,S k k (x)> 0,andsincewehaveS k q (x) S k k (x), we have 0 R k q (x) 1. Then, we assess the stability of R k q (x) by examining its sam- ple mean and variance over the most recent w(✏) iterations. When these measures are sufficiently close to target thresholds (0.95 for the sample mean, and 0.00001 for the sam- ple variance), then we declare the set of dual vertices to be stable. Appendix C provides figures showing the evolution of R k q (x) associated with candidate and incumbent solu- tions for the industrial-strength instances in our study. In our implementation, we use w(✏)2{64,256,512} for ✏2{Loose (0.01),Nominal(0.001),Tight(0.0001)} respectively. 54 Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming Finally,toensureanunbiasedestimateoftheobjectivevalue,weverifythat h 0,k (ˆ x k 1 ,! t )=h(ˆ x k 1 ,! t ),t=1,...,k. (3.17) Primal-DualGapStability. Recall that (3.4) is defined using several SASA functions and hence the primal as well as its dual are random objects. We will consider a primal- dual pair of solutions (ˆ x k 1 , ˆ ✓ k , ˆ k ) which are primal and dual optimum for (3.4) in itera- tionk. Here ˆ ✓ k denotesthevectorofdualmultipliersassociatedwiththeSASAfunctions, whereas, ˆ k denotes the vector of dual multipliers associated with first stage constraints. We wish to ascertain whether the variability of the gap estimate is small enough that this pairofprimalanddualsolutionsmaybeconsideredtobecloseenoughtooptimalityi.e., within a specified tolerance. To do so, we use the general concept of bootstrapping set forth in Efron (1979) (see also Singh (1981)). In the context of SD, bootstrapping involves re-sampling each SASA function in J k to create a re-sampled instance of (3.4). This ap- plicationofbootstrappingwasfirstusedinHigleandSen(1991a)whereprimalanddual pairs were both linear programs. Because re-sampled versions of linear programming approximationsmayrender thedualmultipliers( ˆ ✓ k , ˆ k ) infeasibleforthe re-sampledap- proximation, the above implementation of the bootstrapping procedure required solving each re-sampled dual problem, thus making it somewhat computationally burdensome. In a subsequent paper, Higle and Sen (1999) proposed the primal-dual pair of QPs below whichovercomethecomputationaldemandsposedbytheearlierLPcounterpart. Let j denoteanaffinepieceandB k denotethecorrespondingmatrixofallaffinefunc- tionsiniterationsk. Accordingly,letH k denotethevectorofscalarsh j,k (ˆ x k 1 ). Asshown in Higle and Sen (1994), the set J k has cardinality of at most n 1 +3, thus maintaining a finiteboundonthesizeoftheprimalmasterproblem. LetAandbdefinethepolyhedron X = {Ax b} and letb k denote the quantityb Aˆ x k 1 . Then using primal decisions as =x ˆ x k 1 theprimalanddualproblemsmaybestatedas u = Min h+c > + ⇢ 2 k k 2 (3.18a) Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming 55 s.t. h ( j ) > h j,k (ˆ x k 1 ) 8 j2J k (3.18b) ˆ x k 1 + 2X (3.18c) ` = Max H > k ✓ +b > k 1 2⇢ kc+B > k ✓ +A > k 2 (3.19a) s.t. 1 > ✓ =1,✓ 0, 0 (3.19b) With the above formulations, the point ˆ x k 1 gets mapped to =0 in the primal, and the multipliers ( ˆ ✓ k , ˆ k ) are to be used for the dual. The gap estimate for this pair of so- lutions will be estimated by constructing a primal-dual pair of problems in which each affine piece is represented by a re-sampled estimate. Very briefly, the idea is as follows. Let{! 1 ,...,! k }bearandomi.i.d. sampleofsizek withdistributionF andletF k denote theempiricaldistributionof{! 1 ,...,! k }. DefinearandomobjectT(! 1 ,...,! k ;F),which dependsonthedistributionF. Thebootstrapmethodisusedtoapproximatethedistribu- tionofT(! 1 ,...,! k ;F)bythatofT( 1 ,..., k ;F k )underF k ,where{ 1 ,..., k }denotes arandomsampleofsizek underdistributionF k . Next,wesummarizeanimportanttheo- rembySingh(1981). Proposition 2. Letµ = E F [!], ¯ ! k =1/k P k i=1 ! i , ¯ k =1/k P k i=1 i and assumeE F [! 2 ] < 1 . LetP andP k denotetheprobabilitiesunderF andF k respectively. Then lim k!1 |P(k 1/2 (¯ ! k µ) s) P k (k 1/2 ( ¯ k ¯ ! k ) s)|=0 a.s. (3.20) PROOF. SeeSingh(1981).⌅ Basically, Proposition 2 studies the convergence (to zero) of the discrepancy between distributionofk 1/2 (¯ ! k µ)andthebootstrapapproximation. Essentially,pivotalstatistics likethesampleaverageareappropriateforbootstrappingbecausetheyarebasedonlinear operators. In order to apply this idea to our setting, we re-sample every SASA function 56 Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming defining the primal (3.18) to obtain a re-sampled primal. The dual corresponds to this re- sampledprimal,andthereforehastheform(3.19)usingthere-sampleddata. Thusinour re-samplingprocesswesetupprimalanddualproblemsfromwhichwecomparethegap associated with the given primal-dual pair. Note that this procedure re-samples pivotal statistics (i.e., sample averages) and not the duality gap directly because the latter is not necessarilypivotal. Moreover,sincetheprimalsolution ˆ x k 1 andthedualsolution( ˆ ✓ k , ˆ k ) arefeasibletotherespectivere-sampledproblems,weareabletocalculatethere-sampled gap estimates by simply computing the primal and dual objective functions for each re- sampled instance. This eliminates the need to solve LPs as in the original bootstrapping methodofHigleandSen(1991a). Letu i ` i denotethegapobtainedforthei th re-sampled instance, and from these we compute the empirical frequency distribution F k (✏) to esti- mateP(u ` ✏). Thus when F k (✏) 1 , then the replication is terminated, and the nextSDreplicationcanbegin. Infact,theactualimplementationusesarelativetolerance, so that a replication terminates when duality gap is small relative to the current incum- bent value. For our computational experiments, we use =0.05, so that the In-Sample testrequires95%ofthebootstrappedgapestimatestopassthetest. Out-of-SampleStoppingRule. Theideaofreplicationinstochasticprogramminghasbeenadoptedinmanypapersdeal- ing with sample-based algorithms (Mak et al. (1999)). Because sampling introduces ran- domnessintothealgorithm,itisimportanttocharacterizeerrorsinamannerthatprovides statisticalperformanceguarantees. AsthereadermayhaveobservedfromFigure3.1,the variabilityofsolutionsaswellasobjectivevaluescanbesignificant. Indeed, Linderothet al.(2006)alsoreportwidedisparityofsolutionsofsampledinstancesofSSNwithasample size of 5000. A common suggestion is to obtain a preliminary objective estimate for solu- tionsobtainedfromeachreplication,andthentosuccessivelyprune(solutions)andrefine objectiveestimates,untilthesubsetofsolutionsissmallenoughtorecommendadecision (Linderothetal.(2006)). Inordertoovercomeissuesrelatedtothecomplexityofmultiple Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming 57 replications,BayraksanandMorton(2011)proposedasequentialsamplingscheme,where theincreaseinsamplesizewascontrolled. Tothebestofourknowledge,thisideahasbeen testedonsomeofthesmallerinstances(e.g.,GBD,PGP2,4TERM)andthecomputational resultssuggestthatforinstanceswithhighervariability(e.g.,PGP2),multiplereplications provide more reliable estimates. In contrast, Nesterov and Vial (2008) suggest multiple replicationswithsmallincrementstothesamplesize,andthentouseanaveragesolution, instead of trying to find the best among the replications. This idea has some similarity withtheconceptofcompromisedecisionswhichwereportbelow. OurOut-of-Sampletest will leverage not only the primal solutions {x m } M m=1 , but also the value function approxi- mations{f m (x)} M m=1 . Tomotivatetheout-of-samplestoppingrule,weusecertainconceptsfromprox-mapping theory, which we summarize below. In this context, the master program (3.4) may be lookeduponascreatingasequence{x k }suchthat x k =prox(⇢ m ;f m k 1 ;ˆ x k 1 ), where we write ⇢ m to denote the proximal parameter for them th replication. In contrast, deterministic algorithms use a deterministic functionf m , instead of a stochastic approxi- mationf m k 1 . As a result, optimality for deterministic algorithms can be recognized when onearrivesatafixedpointx ⇤ oftheprox-mapping,thatis,x ⇤ = prox(⇢ m ;f m ;x ⇤ ),orsim- ply x ⇤ = prox(⇢ m ;f m ). Despite in-sample optimality tests, stochastic algorithms like SD usesampledapproximationsf m k 1 , andthevariabilitydoesnotvanishforfinitek. Never- theless, we will use ideas from deterministic optimization to seek reduced variance solu- tions to the “true” problem. When the functionsf m are deterministic, andX is common acrossreplications,oneenjoysthefollowingidentity 1 M M X m=1 prox(⇢ m ;f m )=prox( 1 M M X m=1 ⇢ m ; 1 M M X m=1 f m ). (3.21) In a stochastic setting, there is variance associated with the expressions on both sides of 58 Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming (3.21). Nevertheless,thevarianceontherighthandsidecanbeexpectedtobesmaller,and we formalize this idea in Theorem 2 of this section. With this motivation, we propose to solvethefollowing. Min x2 X 1 M M X m=1 [f m (x)+ ¯ ⇢ 2 kx x m k 2 ], (3.22) where ¯ ⇢ is the sample average of {⇢ m }. We refer to (3.22) as the Compromise Problem and its solution as the Compromise Decision. The Compromise Problem represents two objectives: the average VF approximation and the sum of squared errors. Let x c denote the compromise decision, and ¯ x = 1 M P m x m . Intuitively, ¯ x is an optimal solution to the squarederrorspartof theobjectiveandifx c and ¯ xagree, thenclearly, bothareoptimal to (3.22). Onarelatednote,weobservethatwhileitmightseemthat(3.22)isevenmoreonerous than the individual replications, that is not the case. Because the In-Sample rule already providesx m ,thesampleaveragesolution ¯ xisalreadyavailable,andsoaretheVFapprox- imations {f m }. Hence all that is required is to warm-start the quadratic program (3.22) using ¯ x,andifitdeclaresoptimalityatthestart,onecandirectlyuse ¯ xasthecompromise decision. However, since the marginal amount of computations to solve the compromise problem(independentlyofthesampleaveragedecision)isnotveryresourceintensive,we calculate both x c and ¯ x independently so that the reader can verify similarities between thesesolutions. LetK m (✏) denote the sample size used to construct the SASA function atx m at termi- nationforagiventrialmand✏> 0. DefineN =min{K m (✏),m=1,...,M}, thesmallest samplesizeoftheSASAfunctionsusedbyanyoftheM approximationsattheirterminal solutionsx m . Ofcourse,N dependson ✏, butwesuppressthisdependencefornotational convenience. Remark2. The terminal solution x m was used to generate a SASA function usingK m (✏) obser- vations. Recall that these observations are i.i.d, and the in-sample stopping rule ensures(3.17). In ordertoputthesubsequentdevelopmentinthecontextofstatisticalasymptotictheory,wesumma- rizefewkeyconcepts(seeGreene(2003)). Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming 59 • ThenotationO p isstochasticboundedness. SoX n =O p (a n )meansthatforany✏> 0,there existsafiniteA> 0suchthatP(|X n /a n |>A)<✏, 8 n. • Samplevarianceisaconsistentestimatorofthevariancei.e., ˆ m p ! m . • Lindeberg-Feller Central Limit Theorem. Let X 1 ,...,X M be independent random vari- ables with E[X m ]= µ m and var(X m )= 2 m < 1 . Define ¯ X = 1 M P M m=1 X m , ¯ µ M = 1 M P M m=1 µ m and ¯ 2 M = 1 M P M m=1 2 m . If lim M!1 ¯ M =¯ < 1 ,then p M( ¯ X ¯ µ M ) d ! N(0,¯ 2 ). NotethattheCLTimpliesthat ¯ X =O p (M 1/2 ). TheCentralLimitTheorem(CLT)impliesthat [K m (✏)] 1 2 [f(x m ) f m (x m )]isasymptot- ically normalN(0, 2 m ), where 2 m < 1 denotes the variance of f m (x m ). Since 0<N K m (✏),itfollowsthattheerror|f(x m ) f m (x m )|isnogreaterthanO p (N 1 2 ). Thisproperty isusefulinthefollowingtheorem. Theorem2. Supposethatassumptions1-4statedatthebeginningofthissectionhold. Ifx c = ¯ x, a)thenx c solves Min x2 X ¯ F M (x):= 1 M M X m=1 f m (x), (3.23) b)and |f(x c ) ¯ F M (x c )| =O p ((NM) 1 2 ). (3.24) Proof: a)Toshowthatx c solves(3.23),notethattheproblemin(3.22)isaconvexprogramand itsoptimalityconditionsimplythat 02 1 M M X m=1 @f m (x c )+N X (x c )+¯ ⇢ (x c 1 M M X m=1 x m ) (3.25) whereN X (x c )denotesthenormalconeatx c . Ifx c = ¯ x,thentheaboveequationreducesto theoptimalityconditionsof(3.23). 60 Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming b)Inordertoshow(3.24),notethat f(x c )=f(¯ x) 1 M M X m=1 f(x m ) (3.26a) = 1 M M X m=1 f m (x m )+ 1 M M X m=1 (f(x m ) f m (x m )) (3.26b) 1 M M X m=1 f m (x c )+ 1 M M X m=1 (f(x m ) f m (x m )) (3.26c) Here(3.26a)isduetotheconvexityoff,and(3.26c)isduetof m (x m ) f m (x c ). Asindi- catedinRemark2,theSDalgorithmandCLTensurethattheerrorterms|f(x m ) f m (x m )| areboundedbyanasymptoticallydistributednormalrandomvariable:N(0, 2 m /N). Due to assumptions 1 and 2, there exists ˆ < 1 such that m ˆ . Moreover, the replica- tionsarealsoindependent. HencetheaverageofM errorsterms 1 M P m |f(x m ) f m (x m )| is bounded by O p ((NM) 1 2 ). Hence, in combination with the definition of ¯ F M in (3.23), equation(3.26c)andtheaboveargumentsimply(3.24).⌅ The first claim in the theorem shows that under replication, a sufficient condition for optimalityisx c = ¯ x,andaswithanyoptimalitytest,itiscustomarytoverifythiscondition bycheckingwhetherkx c ¯ xk ✏forapre-specified✏. WhatismoreimportantinTheorem 2 is that the compromise solution provides a decision whose total (objective value) error, viabiasandvariance, isoftheorderof (NM) 1 2 . Incontrast, eachsolutionx m hasatotal (objective value) error of the order of (N) 1 2 , which is higher for M 2. This variance reductionisoneofthekeypropertiesofthecompromisedecision. Asshownbelow,italso provides asymptotic consistency. In this regard, note that when ✏! 0, we haveK m (✏)! 1 , and hence N!1 . As the reader might recall, N depends on ✏, and in fact, so does ¯ F M . We make this dependence explicit by using ¯ F ✏,M below. The following theorem is a combinationofTheorems1and2. Theorem3. Let the assumptions of Theorem 1 hold and letf ⇤ denote the optimal objective value. Letx m (✏), ¯ x(✏)andx c (✏)denotethequantitiesanalogoustothoseinTheorem2foragiven✏. Then a) lim ✏! 0 k¯ x(✏) x c (✏)k! 0(wp1). Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming 61 b) lim ✏! 0 P(| ¯ F ✏,M (¯ x(✏)) f ⇤ | t)! 0forallt> 0. ProofofTheorem3 a)Since✏! 0,wehaveN!1 andthereforeTheorem1impliesthatf m ✏ (x m (✏))! f ⇤ (wp1). Moreover, Theorem 1 also yields {x m (✏)! x m , 8 m} (wp1), where x m denotes an optimalsolutionforreplicationm. Sincex m isunique(byTheorem1),sois ¯ x = 1 M P m x m . Since ¯ x2X andf isconvex, f ⇤ f(¯ x) 1 M X m f(x m )=f ⇤ (wp1). (3.27) We also note that f m ✏ denotes the minorant for the m th replication for a round with N ✏ samples. Hence, 0 lim ✏! 0 ¯ F ✏,M (·)=lim ✏! 0 1 M X m f m ✏ (·) 1 M X m f(·)(wp1). (3.28) Together, expressions (3.27) and (3.28) imply epigraphical nesting (Higle and Sen (1995)). As a result of this property, we conclude that ¯ x 2 argmin{lim ✏! 0 1 M P m f m ✏ (·)} (wp1). (see Higle and Sen (1995) or Proposition 7.30 of Rockafellar and Wets (2009)). Also ¯ x 2 argmin{ 1 M P m ¯ ⇢ 2 ||x x m || 2 }. Hence ¯ x solves the compromise problem (3.22), or x c = ¯ x (wp1). Thiscompletestheproofofparta). b)Becausei)X iscompact,ii)(3.28)impliesthatthereexistsafunctionwhichmajorizes | ¯ F ✏,M (·)| over x 2X, and iii) stochastic equicontinuity holds because the relatively com- plete recourse assumption implies the existence of a common Lipschitz constant for the VFapproximations,thegenericuniformlawoflargenumbersapplies(seeNewey(1991)). Hence, lim ✏! 0 P(| ¯ F ✏,M (¯ x(✏)) f ⇤ | t)! 0, 8 t> 0. (3.29) ⌅ 62 Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming The increased reliability offered by compromise decisions reported for SSN (see Fig- ure 3.2) might have come as a surprise for a problem that has been characterized as “ill- conditioned”. The reader might recall that for the loose tolerance run, Figure 3.2 already showedgoodagreementbetweentheaverageandcompromisedecisions,andadecision- maker might have been satisfied with an objective function upper bound of 9.951 (see Table 3.2). However, if the decision-maker chooses to obtain greater accuracy, then, it is easytoresumetheSDprocessforthenominaltoleranceusingalltheinformationobtained inthecourseofthepreviousrun(withloosetolerance). Thus,whilegreaterreliabilitymay require more sampling (for instances like SSN), the marginal effort is simply the addi- tionaliterationsandnotanentirerunfromscratch. Suchwarm-startingiscriticalforthose decision-makerswhoseekgreateraccuracyindecisionsupport. To give the reader a sense of the computational time involved for each of tolerance levelofSSN,wenotethattheCPUsecondsforthecompromiseproblemswere5.78(Loose Tol.), 5.86 (Nominal Tol.) and 6.21 (Tight Tol.) respectively. When compared with the effort required to estimate the objective function at one or more of the points {x m }, the solution time of the Compromise Problem is minimal. To close this section, we note that thereareseveralinstancesintheliterature(e.g.,CEP1andSTORM)forwhichlongerruns (and even replications) are an overkill. By monitoring the coefficient of variation of each decisionvariableacrosssuccessivereplications,severalinstancescanbeaccuratelysolved withoutexcessivelylongruns. Weshallreportthesecomputationalresultsinthefollowing section. 3.4 FurtherComputationalResults Table3.4summarizesalltestinstancessolvedbySDundernominaltolerance. BAA99isa single period multi-product inventory model (see Bassok et al. (1999)) and BAA99-20 is a largerversion(20products)ofBAA99. CEPisacapacityexpansionmodelwhichappears inHigleandSen(1996b). LandS,LandS2andLandS3arethreeversionsofapowergener- ation planning problem (F. V. Louveaux and Smeers (1988)). The original LandS instance Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming 63 hasonerandomvariablewith3outcomes. LandS2has3randomvariableswithatotalof 64scenarios. LandS3has3randomvariableseachdiscretizedtoafinernormaldistribution with a total of 10 6 scenarios. PGP2 is the same genre of instances, but has greater objec- tive function variability in value as well as solution variability (Higle and Sen (1996b)). Themulti-locationtransshipmentmodelcalledRETAILcanbefoundinHereretal.(2006). SSNisthenetworkexpansionmodelpresentedatthebeginningofthischapter. STORMis an air freight scheduling model (Mulvey and Ruszczy´ nski (1995)). 4NODE and 20TERM are both freight fleet scheduling problems, with the former data set appearing on a web site due to A. Felt (http://www4.uwsp.edu/math/afelt/slptestset/download.html), and the latter came to us from Infanger (1999). Along with SSN, STORM and 20TERM are based on industrial applications, and are indicated by an asterisk in Table 3.4. One of the largerdatasetsinthissetisSTORM.Whileitislargebothinsizeandnumberofscenarios, thevariabilityofitsobjectivefunctionisrelativelysmall,asconfirmedbytheratiosR k (x) reportedforSTORMinAppendixC. TABLE 3.4: TestinstanceswithSDunderNominalTolerance Instance 1 st stage 2 nd stage #ofrandom Magnitudeof Name variables/constraints variables/constraints variables scenarios BAA99 2/0 7/4 2 615 BAA99-20 20/0 250/40 20 10 34 CEP 8/5 15/7 3 216 LandS 4/2 12/7 1 3 LandS2 4/2 12/7 3 64 LandS3 4/2 12/7 3 10 6 PGP2 4/2 16/7 3 576 RETAIL 7/0 70/22 7 10 11 SSN* 89/1 706/175 86 10 70 STORM* 121/185 1259/528 117 10 81 4NODE 52/14 186/74 12 32768 20TERM* 63/3 764/124 40 10 12 Because the early computational tests in the Stochastic Programming literature were performedusingdeterministicalgorithms,itbecamecustomarytoreportperformanceby creating alternative instances using different sample sizes. For example, Zverovich et al. 64 Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming (2012) reported 16 different sample sizes for 4NODE, corresponding to scenarios ranging from 2 0 ,2 1 ,...,2 15 . In our study, we do not consider these as 16 different instances, but oneinstancewith 2 15 scenarios,andourgoalistoprovidestatisticalguaranteesforobjec- tive value upper and lower bounds, together with recommendations of the compromise and average solutions with the former being generated without warm-starting (using the average solution). We made this choice to avoid prompting the compromise problem in anyway. InTable3.5,wereportthequalityofaveragesolutionsaswellascompromisesolutions for the instances shown in Table 3.4. Since the bounds on the problems are statistically estimated,wereport95%confidenceintervalsforallupperandlowerboundsinthetable. The upper bound confidence intervals are calculated for both average decisions as well ascompromisedecisions. TheseconfidenceintervalsarederivedbyrunningMonteCarlo experiments to estimate f(¯ x) and f(x c ). We chose the sample size for these estimates to ensurethatthehalfwidthiswithin1%ofthevalue. ThelowerboundsinTable3.5arecalculatedbyusingSDvaluesf m (x m ),m=1,...,M. Todoso,letL M = 1 M P M m=1 f m (x m ),andcalculates 2 M = 1 M 1 P M m=1 (f m (x m ) L M ) 2 . Since M = 30,wechose 1 z ↵/ 2 suchthatP{N(0,1) z ↵ }=1 ↵ andreport L M z ↵/ 2 s M p M ,L M + z ↵/ 2 s M p M , where↵ =0.05. (3.30) We see that for all instances, the confidence levels obtained using the nominal toler- ance is acceptably close to the lower bound confidence interval. Moreover, in comparing theestimatedvalueofthecompromisedecisionwiththatoftheaveragedecision,wefind thattheformerhaseitherequalorbetterobjectivevalueestimates,exceptforBAA99and LandS2, where the discrepancy is only in the third decimal place. Such minor deviations areinevitableinsample-basedalgorithms. Thesesolutionsareobtainedwithnominaltol- erance, although the user has the option to be more demanding, and seek solutions with tightertolerancewithouthavingtore-starttheprocess“fromscratch”. ThisisbecauseSD 1 Asisstandardinthestatisticalliterature,↵ denotestheconfidencelevel,whereaselsewhereinthechapter, ↵ denotesconstantsoftheminorantsinSD. Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming 65 recordsalldualvertices,andSASAfunctionsforeachrun,andtheycanbeusedimmedi- atelytocontinuefurtheriterationsiftheuserdesires. ForlargescalemodelslikeSSN,this isapowerfulsetup. AnotheraspectofourstudythatisuniquetoSDisitsabilitytoreport the sample sizes that were necessary to obtain the quality of solution. This is particularly important in large scale applications where optimization and simulation are tightly cou- pled (see the wind energy integration model for the state of Illinois in Gangammanavar etal.(2015)). TABLE 3.5: Objectiveupperandlowerboundsandcorresponding95%con- fidenceintervalsofallinstancessolvedbySD Instance LowerBound AverageDecision CompromiseDecision UpperBound UpperBound Name (95%CI’s) (95%CI’s) (95%CI’s) BAA99 -240.864 -236.204 -236.203 (±5.988) (±5.451) (±5.451) BAA99-20 -322870.496 -318393.648 -318451.800 (±1337.854) (±3994.053) (±3994.502) CEP 353080.809 354175.320 354175.320 (±10530.916) (±1687.537) (±1687.537) LandS 381.120 381.761 381.761 (±1.174) (±1.309) (±1.309) LandS2 227.789 227.393 227.395 (±1.628) (±0.668) (±0.668) LandS3 225.712 225.541 225.541 (±1.319) (±0.640) (±0.640) PGP2 447.339 447.955 447.928 (±2.157) (±1.406) (±1.405) RETAIL 153.995 154.411 154.406 (±2.573) (±0.772) (±0.772) SSN 9.736 9.927 9.927 (±0.118) (±0.050) (±0.050) STORM 15493958.503 15481852.286 15481760.131 (±8826.814) (±48193.646) (±48208.190) 4NODE 446.979 447.058 446.951 (±0.068) (±0.392) (±0.394) 20TERM 253649.385 254515.479 254514.672 (±168.573) (±1005.008) (±1004.934) In Table 3.6 we report the sample sizes used for each instance (together with its vari- abilityover30replications). FromthistableonecanobservethatalthoughSTORMisvery 66 Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming large in terms of the variables, constraints and scenarios, it requires a sample size of only 300 to get reasonable solutions. This is about the same number of samples required for much smaller instances such as PGP2 and BAA99. From our study, we also observe that 20TERMandRETAILtakemoreiterations,andinfact,thelattertakeslongerforobjective function estimation even though the instance (RETAIL) is smaller than 20TERM in terms ofdecisionvariablesandconstraints. Asexpecteditisthevariabilitythatmakesinstances takelonger. Finally,itisclearthatSSNisbyfarthemostdemandingoftheseinstances,but eachreplicationcanbecompletedinabout90secsfornominaltolerance. Itwasshownin Figure3.2thatthissolutionhasverylowerrorfromtheaveragesolution. Fromthispoint of view, the user may decide to accept the decision based on the fact that the difference between the compromise and average solutions is small enough. However, as shown in section 3.2tighter statisticalbounds can alsobe obtainedby demanding greaterprecision onthebounds. Suchsolutionreportsarelikelytoprovidedecisionmakersagreatdealof confidenceinadoptingdecisionsrecommendedbytheSLPsolver. TABLE 3.6: Sample size (std dev), solution time for SD(std dev), solution timeforCompromiseProblemandevaluationtime(ofcompromisedecision andaveragedecision)forallinstancessolvedbySD Average Average CPUsecs CPUsecs CPUsecs Instance SampleSize CPUsecs toSolve toEstimateObj toEstimateObj Name (StdDev) (StdDev) Compromise ofCompromise ofAverage forSD forSD Problem Decision Decision BAA99 298.03(58.82) 0.89(0.23) 0.01 0.25 0.21 BAA99-20 330.50(13.68) 1.49(0.13) 0.04 0.13 0.14 CEP 263.03(6.73) 0.79(0.08) 0.01 5.61 5.43 LandS 260.27(2.77) 0.76(0.07) 0.01 0.21 0.22 LandS2 264.27(4.86) 0.83(0.08) 0.01 1.29 1.26 LandS3 263.57(5.99) 0.78(0.07) 0.00 0.78 0.74 PGP2 284.63(27.60) 0.74(0.10) 0.01 0.31 0.31 RETAIL 460.47(150.55) 1.53(0.67) 0.00 4.01 3.98 SSN 2286.90(341.71) 90.50(20.56) 5.86 112.31 111.52 STORM 300.50(5.33) 19.56(1.46) 8.83 0.06 0.05 4NODE 406.07(49.44) 5.66(0.85) 3.04 0.01 0.01 20TERM 453.07(5.98) 7.28(0.39) 1.13 0.13 0.12 Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming 67 3.5 Conclusions OurcontributionstotheSPliteraturemaybeviewedthroughseverallenses. Thesepoints of view are captured by the questions that were posed in section 3.1, and in this context, weofferthefollowingcomments. 1. Given that SP problems can be demanding, greater accuracy may call for re-runs that should re-use previously discovered structures of an instance. How can such structuresbere-usedforthepurposesofwarm-starting? • Because SD uses “sampling-on-the-fly” (i.e., adaptive sampling), it creates a sequence of SASA functions based on re-using dual vertices. This approach recordsinformationgatheredduringthecourseofthealgorithm(e.g.,dualver- tices, and SASA functions), so that future iterations can be resumed without having to re-discover them from scratch. This capability is also important in the context of Stochastic MIPs, just as fast LP solvers, and warm-starting are importantfordeterministicMIPs. 2. GiventhatSLPmodelshaveaveryspecial(convex)structure,howshouldsampling- basedmethodsbedesignedtotakeadvantageofsuchstructure? • Convex non-smooth optimization forms the backbone of SD, which derives its stability from using a proximal term. The latter also suggests a quadratic pro- grammingdualproblemwhichisusedfortheIn-Sampletests(usingbootstrap- pingofprimalanddualestimates)aswellastheOut-of-Sampletests(usinglow variancevaluefunctionsintheCompromiseProblem). Suchintegrationofnu- merical optimization and statistical computing promotes algorithmic synergy formoreefficientsolution. 3. Sampling-based SP methods borrow variance reduction techniques from the simu- lation literature. Are there other variance reduction techniques that are appropriate forSP,butarenotconsideredinthesimulationliterature? 68 Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming • Variance reduction in simulation is primarily aimed at performance (objective function)estimation. ThisisavailableinSDthroughcommonrandomnumbers when comparing incumbent and candidate solutions. While this is important forSPmodels,itneedstogobeyondperformanceestimationbecauseoftheem- phasis on decision-making. In order to appreciate this point, note that past at- temptsatsolvingSSNhavenotonlyrequiredhighperformancecomputing,but disagreement between alternative runs can leave users in a quandary. Instead, concurrence between compromise and sample average decisions, provides a principled and computationally effective approach. In addition, Theorem 2 provides variance reduction of replications from O p ((N) 1 2 ) to O p ((NM) 1 2 ). Moreover, the bias is reduced as well. Figure 3.1 highlights the empirical suc- cessofthisapproach. 4. Parallel architectures in SP have traditionally been used to process bunches of sce- narios. Are there other ways to use parallel architectures which permit the solution ofindustrial-strengthmodels? • Thespeedofreplicationscanbeimprovedsignificantlybyusingparallelization. This approach to SP (and of course SD) needs greater attention. In the event that computational deadlines preclude a meaningful number of replications, one should verify whether the objective function estimate reported by SD falls in the 95% confidence interval for the upper bound resulting from one run. If it does not belong to the confidence interval, the analyst ought to be careful to considerfastermachines,parallelprocessingandperhapstightertolerance. 5. Should SP algorithms report lower bound estimates for the “true” problem so that thequalityofarecommendeddecisioncanbeascertained? • Itisdifficulttoestablishthequalityofasolutionwithoutbothupperboundand lowerboundestimates. Theformerisusuallyastatisticalestimationproblem,if adecisionisavailable. Thelatter(lowerbounds)isanoptimizationissuesince Chapter3.CompromiseDecisionsinTwo-StageStochasticLinearProgramming 69 it is the optimization step that provides the lower bounding estimate. Clearly thisisanessentialingredientofanyoptimizationmethod. In summary we have demonstrated that two-stage SLP models are now solvable to veryreasonablelevelsofaccuracy,evenforindustrial-strengthmodels,inreasonabletime, using“run-of-the-mill”computingdevices. ThisbodeswellforusersoftheSLPmodeling paradigm. Chapter4 StochasticDecompositionwith RandomCost 4.1 Introduction Theclassofproblemsstudiedinthischaptermaybestatedasfollows. Min f(x)=c > x + E[h(x, ˜ !)] s.t.x2X = {Ax b}✓< n 1 (M) where, ˜ ! denotesanappropriatelydefined(vector)randomvariable,andthefunctionhis definedasfollows: h(x,!)= Min d(!) > y s.t. Dy = ⇠ (!) C(!)x y 0,y2< n 2 (S) NotethatunlikepreviouslystudiedapplicationsofSD,model(S)allowsrandomcostcoef- ficients. Asaresult,onecannolongermapthetherandomoutcomesontoafinitenumber ofverticesofthesecond-stagedualpolyhedron. Tomakethisobvious,itisbesttoexamine 71 72 Chapter4.StochasticDecompositionwithRandomCost thedualformulationoftherecoursefunction,whichmaybewrittenasfollows. h(x,!)= Max ⇡ > [⇠ (!) C(!)x] (4.1a) s.t. D > ⇡ d(!) (4.1b) Note that the right hand side in (4.1b) depends on outcomes ! of the random variable ˜ !. Hence, in case of continuous random variables, or when there is a very large number of scenarios,thenumberofdistinctdualpolyhedracanbecomeexplosivelylarge. Withsuch instances, the growth in dual vertices can easily overwhelm the current SD methodology. In this chapter, we show that by mapping the randomness to a finite set of commonly shareddirections,onecanmanagethisgrowth,atleastinthoseinstanceswherethenum- berofrandomvariablesind(!)isnottoolarge. Insection4.2,wefirstdescribetheoriginalSDalgorithmthatworksonproblemswith deterministicsecondstagecost. Severalkeyobservationsoftheproblemwithrandomcost are discussed in the following section. We formalize all key algorithmic ideas in section 4.3, 4.4. Feasibility issues are taken care of in section 4.5. The new version of SD dealing withrandomcost(RCSD)ispresentedinsection4.6followedbycomputationalexperience withafreighttransportationapplicationinsection4.7. Weconcludethischapterinsection 4.8. 4.2 DescriptionofSDAlgorithm Intheinterestofconvenienceforthereader,wepresentahighleveldescriptionoftheSD algorithm before proceeding to the advances that are necessary to accommodate random costs in the second stage. In order to simplify the presentation, we set aside certain algo- rithmic details such as incumbent updating rules and stopping rules. The focus is on the constructionofminorantsoftherecoursefunctionandinparticular,ontheapproximation ofthesecondstagedualsolutionswiththeuncoveredsetofbases. Chapter4.StochasticDecompositionwithRandomCost 73 Tosimplifythenotation,inseveralcontexts,wedenoteoutcomesofrandomvariables usingasuperscriptovertheparameter’sname. Forinstance,⇠ t ,C t andd t areequivalentto ⇠ (! t ),C(! t )andd(! t )correspondingly. Thelettert,iarebydefaulttheindexofoutcome andbasis. Thecurrentiterationnumberisdenotedk. AlgorithmicDescriptionofSD Assuminga)compactnessofthefirststagedecisions(X), b)relativelycompleterecourse, andc)non-negativerecoursevalues,theSDalgorithmisprovidedinAlgorithm1. Algorithm1SDAlgorithm 1: Initialization. Assumethatx 0 2X isgiven. LetV 0 ; andk=0. 2: k k+1. Generate ! k , an outcome of ˜ !. Solve (S) instantiated by ! k and obtain the associateddualvertex⇡ (x k ,! k ),V k V k 1 [{ ⇡ (x k ,! k )} 3: DefineandUpdatethepiecewiselinearapproximation. Calculateapproximateddualmultipliers: ⇡ k t 2argmax{⇡ (⇠ t C t x k )|⇡ 2V k },t=1,...,k. (4.2) Evaluatecutcoefficient (↵ k k , k k ): ↵ k k = 1 k ⌃ k t=1 (⇠ t ) > ⇡ k t , k k = 1 k ⌃ k t=1 ( C t ) > ⇡ k t (4.3) Updatepreviouscuts: ↵ k t k 1 k ↵ k 1 t , k t k 1 k k 1 t ,t=1,...,k 1 (4.4) Thenewcutisaddedtothemasterproblemandallpreviouscutsareupdated. 4: Solvethemasterproblemandletx k+1 x k . Repeatfromstep2. TheoriginalSD(seeHigleandSen(1991b)andHigleandSen(1996b))addressesprob- lems with randomness in the second stage right hand side ⇠ and technology matrixC. In that case, all subproblems share the same dual polyhedron⇧= {⇡ |D > ⇡ d}. However inthepresenceofrandomcostcoefficients,eachoutcomeoftherandomvariablesetsupa subproblemwithitsowndualpolyhedron⇧ t = {⇡ |D > ⇡ d(! t )}. Themaineffortinthis chapter is devoted designing extensions which will allow us to map the outcomesd t to a finitesetbasedonindexsetsassociatedwithbasesmatricesofD. Giventhefixedrecourse natureoftheproblemswearestudying,thisprovidesafruitfulavenuetoexplore. 74 Chapter4.StochasticDecompositionwithRandomCost 4.3 SparsityPreservingDecompositionofSecond-StageDualVec- tors Animportantnotationusedfrequentlyhereistheindexsetofbasicvariables. Everytime a subproblem is solved to optimality, a corresponding optimal basis is discovered. We record the index set of the basis with the notation B i , and accordingly at iteration k, we will have an index set of basic variables denoted as {B 1 ,B 2 ,...,B k }. Correspondingly, theindexsetofnon-basicvariablesis{N 1 ,N 2 ,...,N k }. WeuseD B asthesubmatrixofDassociatedwiththebasiccolumns,andsimilarly,D N thesubmatrixofnon-basiccolumns. Thusatthek th iteration, thedualpriceobservedfor thek th outcomesatisfies: D > B k ⇡ =d B k(! k ). (4.5) Letting ¯ d =E ˜ ! [d(˜ !)],wehave d(! k )= ¯ d+ (! k ). (4.6) Using(4.6),equation(4.5)canberewrittenas D > B k ⇡ = ¯ d B k + B k(! k ). (4.7) ForagivenbasisB k ,wecanalwaysdecompose⇡ k = ⌫ k +✓ k k suchthat D > B k (⌫ k +✓ k k )= ¯ d B k + B k(! k ) (4.8) and 8 >>< >>: D > B k ⌫ k = ¯ d B k D > B k ✓ k k = B k(! k ) (4.9) Chapter4.StochasticDecompositionwithRandomCost 75 Henceletting 8 >>< >>: ⌫ k =(D > B k ) 1 ¯ d B k ✓ k k =(D > B k ) 1 B k(! k ) (4.10) weobservethat⌫ k iscomputedonlyonceforeachnewlydiscoveredbasisB k ,andweare decomposing⇡ k = ⌫ k +✓ k k withtheplantocompute✓ k k usingasparsevector B k(! k ). This takes advantage of the sparsity of vector t B k for previous (t=1,2,...,k 1) and future (t>k)outcomes. However,itisimportanttonotethatwhileifwereplacetherealization ! k by some other realization ! t ,t6= k in (4.10), then the resulting vector ⌫ k +✓ k t may not be dual feasible, although, there may be other realizations ! j ,j 6= k,t for which ✓ k j may befeasible. Torecognizethis,weneedaconvenientcharacterizationofdualfeasibility. In the following section, we introduce the notations and key observations for dealing with randomcost. 4.4 DescriptionoftheDualVertexSet Letusnowapplytheabovedecompositionideatodescribetheentiredualvertexset. LetR denotethesubsetofvariablesinthesecondstagewhichincludesarandomcostcoefficient, and define e j as the unit vector with 0’s in all positions, and 1 in position j. Then, for a givenbasisindexsetB,andj2B\ R,wedefine j =(D > B ) 1 e j (4.11) asthej th columnoftheinverseofD > B . NowconsiderabasisindexsetB i ,andlet i = { i j =(D > B i ) 1 e j |j2B i \ R} (4.12) be the set of columns of such basis-inverse. We will refer to these vectors as shifting di- rections for the corresponding columns with random costs. In general, one may use any index setB i and its shifting vectors i to representd(! t ). That is, for any basis index set 76 Chapter4.StochasticDecompositionwithRandomCost B i ,andagivenoutcome! t wehave⌫ i +✓ i t where ✓ i t = X j2 B i \ R t j i j . (4.13) Since some of these combinations may not be dual feasible, we will need to incorporate a method which will identify (for a given ! t ) only those index sets B i which admit dual feasibility. This subset of feasible dual vertices (associated with ! t ) will be denoted V k t , whereas, the entire collection (feasible or not) will be given by ˜ V k t . The entire set of basic solutionsuncoveredcanbedescribedasfollows: ˜ V k t = {⌫ i +✓ i t |i=1,2,...,k}, (4.14) andasubsetofthesewillbeidentifiedasbeingdualfeasibleforeacht k,willbedenoted V k t .Thedualfeasibilitytestisdescribednext. 4.5 CheckingDualFeasibility Thefollowingtheoremformalizesthedualfeasibilitytest: Theorem4. Let⇧(0) = {⇡ |D > ⇡ ¯ d} be a nonempty polyhedral set. LetV(0) be the set of BFS of set⇧(0) . Similarly define⇧( !)= {⇡ |D > ⇡ ¯ d + (!)} as the perturbed polyhedral set and V(!)asthesetofBFSof⇧( !). GivenafeasiblebasisindexsetB,andaperturbationvector (!), thefollowingstatementsareequivalent: 1). ABFS⌫ 2V(0) canbemappedtoanew BFS⌫ +✓ (!)2V(!) where✓ =(D > B ) 1 B (!) isaparametricvectorof!. 2). Gd+G (!) 0,whereG=[ D > N (D > B ) 1 ,I]. PROOF. 1) =) 2): As for⇧( !), we have the following system for the given basis B andperturbation (!): D > B ⌫ +D > B ✓ = ¯ d B + B (!) (4.15a) Chapter4.StochasticDecompositionwithRandomCost 77 D > N ⌫ +D > N ✓ ¯ d N + N (!). (4.15b) Fromtheequality(4.15a)and⌫ =(D > B ) 1 ¯ d B ,wehave ✓ =(D > B ) 1 B (!). (4.16) Fromtheinequality(4.15b),wehavethefollowing: 0 ¯ d N + N (!) D > N ⌫ D > N ✓ = ¯ d N + N (!) D > N (D > B ) 1 ¯ d B D > N (D > B ) 1 B (!) =[ D > N (D > B ) 1 ,I][ ¯ d > B , ¯ d > N ] > +[ D > N (D > B ) 1 ,I][ B (!) > , N (!) > ] > =G ¯ d+G (!). (4.17) Note that vector ¯ d=[ ¯ d > B , ¯ d > N ] > and (!)=[ B (!) > , N (!) > ] > need to be appropriately arrangedaccordingtotheorderofthebasisrows. 2) =) 1): Wecanstartfrom(4.17)andarrivebackto(4.15b).⌅ The implication of Theorem 4 is that for any given basis, G ¯ d is a dense matrix-vector multiplication only needs to be calculated once whereas,G (!) as a sparse matrix-vector multiplicationwhichcanbecarriedoutefficientlyforlargenumberofscenarios (! 1 ),..., (! k ). WeprovideanillustrativeexampleinAppendixEtoexplaintheconceptindetail. 4.5.1 FurtherNotesonDualFeasibilityTest Thedualfeasibilitytestisconductedusingthefollowinginequality [ D > N (D > B ) 1 ,I][ > B (!), > N (!)] > ¯ g (4.18) where ¯ g=[D > N (D > B ) 1 , I] ¯ dand (!)=d(!) ¯ d. In the following discussion, we will first consider the general case where randomness occur in multiple columns. And then we restrict our treatment to the special case where therandomnessappearsonlyinonecolumn. 78 Chapter4.StochasticDecompositionwithRandomCost MultipleRandomCostCoefficients In order to implement Theorem 4 in a computationally effective manner, some streamlin- ing is necessary. Depending on the location of the random cost coefficient, we have the followingtwocasesforagivenbasisB andoutcome!: 1). Suppose that the random variables appear only in non-basic components d N (!). Thenwehave [ D > N (D > B ) 1 ,I][ > B (!), > N (!)] > =[ D > N (D > B ) 1 ,I][0 > , > N (!)] > = N (!). (4.19) From(4.19)itfollowsthatwecanverifydualfeasibilitybychecking N (!) ¯ g. Thischeck canbecarriedoutveryefficientlysinceanyfeasibleoutcomeliesinsideanorthant. 2). Iftherandomvariablesappearinplacesotherthand N (!),thenweneedtoexplicitly checkthefeasibilityoftheremappedbasicsolutionsusingtheinequality(4.18). SingleRandomCostCoefficient In some applications, one random cost coefficient may affect multiple decision variables simultaneously. For example, the Federal Reserve’s quarterly adjustment of interest rate might impact multiple investment decisions at the same time. Similarly oil price could impact multiple links of a transportation network and orange juice price might affect op- erationsofmultiplelocationsofarestaurantchain. For the case when there is a random cost coefficient (d(!)) which affects a variety of different second stage variables (y j ,j 2 J), that is, the collective cost is modeled as P j2 J d(!)y j , then we recommend that the accumulated cost of these variables be repre- sented as one new variable, say z = P j2 J y j , and then use d(!) as the cost coefficient of z, a single variable. In other words, scalability of our approach may depend on how the model is constructed, and the modeler is well advised to minimize the number of second stage variables which have random costs associated with them. As a result, the outcome eitherappearsind N (!)ord B (!)foranybasis. Inthenonbasicrandomcostcase,thehigh Chapter4.StochasticDecompositionwithRandomCost 79 dimensionorthantreducestoasingledimensionalboundcheck. Inthelattercase(i.e.,the variablewithrandomcostisbasic), thefeasibilitycheckrequiresustoverifywhetherthe outcomebelongstoacertainintervalasshownbelow. Recallthat j =(D B ) 1 e j ,j2B\ R and|B\ R|=1. Hence, [ D > N (D > B ) 1 ,I][ > B (!), > N (!)] > = D > N j j (!) ¯ g, (4.20) whichimpliesthat lb j (!) ub lb = max i2 I 1 { ¯ g i ( D > N j ) i } where I 1 = {i|( D > N j ) i > 0} ub = min i2 I 2 { ¯ g i ( D > N j ) i } where I 2 = {i|( D > N j ) i < 0}. (4.21) Notethatsubscriptiintheaboveequationsindicatestherownumberofthecolumnvector. Beforeclosing thissection, wepresentthe modificationnecessaryfor the"argmax"proce- dure of SD. By conducting a feasibility check over each element of the set ˜ V t k (see (4.14)), weobtainthedualbasicfeasiblesolutionsetdenotedasV t k . Therefore,theargmaxproce- dureiscarriedoutcolumnwisewithrespecttotheTable4.1. Notethatsomecombinations inthetablemightnotbefeasible(infeasiblesolutionsarecrossedout). B i /! t ! 1 ! 2 ! 3 ... B 1 ⌫ 1 +✓ 1 1 ⌫ 1 +✓ 1 2 ⌫ 1 +✓ 1 3 ... B 2 ⌫ 2 +✓ 2 1 ⇠ ⇠ ⇠ ⇠X X X X ⌫ 2 +✓ 2 2 ⌫ 2 +✓ 2 3 ... B 3 ⌫ 3 +✓ 3 1 ⌫ 3 +✓ 3 2 ⌫ 3 +✓ 3 3 ... . . . . . . . . . . . . . . . TABLE 4.1: Thestructureofdualvertices. 80 Chapter4.StochasticDecompositionwithRandomCost At iterationk, for a given ! t , the argmax procedure (see (4.2)) will go through thet th columnandidentify ⇡ k t 2argmax{⇡ > (⇠ t C t x k )|⇡ 2V t k := Subsetof ˜ V t k containingdualfeasiblesolutions}. (4.22) 4.6 AlgorithmforRecursivelyBuildingV k t Inthissection,wewillfocusonthealgorithmicprocedureforrecursivelybuildingthedual vertex setV k t . The method is presented in Algorithm 2. The idea is to construct the new setV k k first and then update all previous setV k 1 t toV k t for allt<k. A pseudo code for this algorithm appears in Appendix D, and an illustrative example appears in Appendix E. Algorithm2AlgorithmforbuildingdualvertexsetV k t 1: Initialization: V k k ⌘{ ⇡ (x k ,! k )} 2: Forthebasisuncoveredatk th iteration,calculateandstore⌫ k , k ,G k and ¯ g k : ⌫ k =(D > B k ) 1 ¯ d B k k = { k j |j2B k \ R} G k =[ D > N k (D > B k ) 1 ,I] ¯ g k = G k ¯ d, where k j isasdefinedin(4.12). 3: DefineV k k : V k k k 1 S i=1 {v i +✓ k i |G i (! k ) ¯ g i }[ V k k ,where✓ k i = P j2 B i \ R j (! k ) i j . 4: UpdateV k 1 t toV k t fort=1,...,k 1: V k t V k 1 t [{ v k +✓ k t |G k (! t ) ¯ g k },where✓ k t = P j2 B k \ R j (! t ) k j . 4.7 SupplyChain-FreightTransportationApplication Consider a supply chain network of manufacturing plants, distribution centers and cus- tomers. Theproblemistodeterminethemonthlyproductionandinventorylevelsofeach Chapter4.StochasticDecompositionwithRandomCost 81 facility, and the monthly shipping quantities between network nodes such that the total expectedcostofoperatingtheglobalsupplychainisminimizedwhilesatisfyingcustomer demands over the specified planning horizon. This problem appears in You et al. (2009). However the data is not provided in the publication so we created a version with similar mathematical formulation. Here we summarize the sequence of events of this system as follows: 1. Inventory,transshipmentandproductiondecisionsaremadeatthebeginningofthe timeperiod. 2. The transshipment decisions are fulfilled immediately at the starting point of the shipping route while its fulfillment time at the destination depends on the order date, node pair, product index and transportation mode. Production decision is im- mediatelyfulfilledatthefirststage. 3. Randomdemandsandfreightratesarerealizedatthesecondstage. 4. Futureinventory,transshipmentandproductiondecisionsaremodeledinthesecond stage. Weformulatethisproblemwith2plants,3distributioncenters,14transportationlinks and 6 customers. Then this problem is scaled up to a larger (more realistic) size similar tothatoftheproblemfromYouetal.(2009)inwhichtwoproductsareconsidered. There are total of 5 plants, 13 distribution centers, 131 transportation links and 46 customers. There are two major differences between the original data set and ours: 1) we formulate the model using three months as the planning horizon, with the first month as the first stage and remaining months as the second stage; 2) we model around 10% of the freight ratesasrandomvariables. ThesizeoftheproblemissummarizedinTable4.2. In this experiment, 30 replications of SD runs are conducted. The objective lower and upper bounds and corresponding 95% confidence intervals of the problem is provided in Table4.3. Thistablealsoreportspessimisticgaps( (u+u width) (l l width ))aswellasthe averagegapbetweenupperandlowerbound(u l). Insomecaseswhenpessimisticgaps 82 Chapter4.StochasticDecompositionwithRandomCost TABLE 4.2: SupplyChain-FreightTransportationinstancestestedwithSD Instance 1 st stage 2 nd stage #ofrandom Magnitudeof Name variables/constraints variables/constraints variables scenarios SCFT-6 30/18 86/36 14 10 9 SCFT-46 602/174 348/1480 186 10 129 are too demanding to reduce, we recommend that decision makers use the average gap as the measure of optimality. The sample size and solution time information are given in Table4.4. Furtherdetailsregardingtheconceptofcompromiseandaveragedecisionscan befoundinChapter3ofthisdissertation. TABLE 4.3: Computational Results (Objective Value Estimates, and Opti- malityGapEstimates) Instance LowerBound AverageDecision CompromiseDecision Pessimistic Average UpperBound UpperBound Name (95%CI’s) (95%CI’s) (95%CI’s) Gap Gap SCFT-6 495670.630 495191.826 495191.839 3899.010 -478.791 (±3414.656) (±963.145) (±963.145) 0.79% 0.10% SCFT-46 8551249.099 8585048.702 8585044.672 55992.434 33795.573 (±2432.583) (±19765.153) (±19764.278) 0.65% 0.39% TABLE 4.4: ComputationalResults(SampleSizesandCPUtimes) Average Average CPUsecs CPUsecs CPUsecs Instance SampleSize CPUsecs toSolve toEstimateObj toEstimateObj Name (StdDev) (StdDev) Compromise ofCompromise ofAverage forSD forSD Problem Decision Decision SCFT-6 277.43(13.91) 1.78(0.12) 0.042 2.63 2.65 SCFT-46 443.90(40.39) 864.00(219.29) 236.65 1.90 1.87 4.8 Conclusion In this chapter, we present an extension of SD for solving two-stage stochastic linear pro- grams with random cost. Our extension uses the structure of linear programming which suggests that we can map the impact of uncertainty, including continuous random vari- ables, to scaling a finite number of vectors associated with dual bases of the problem. It Chapter4.StochasticDecompositionwithRandomCost 83 can be shown that problems with a single random cost coefficient (e.g., interest rate un- certainty) can be easily managed, and we also show that the feasibility check of such a problem can be carried out efficiently for either continuous or discrete random variable with a large number of outcomes. From this observation, users are advised to formulate theirmodelstohaveasfewrandomcostaspossible. Ifmultiplerandomcostsexist,group them as much as possible to reduce the total number of random cost coefficients’ present intheobjective. Weshouldalsomentionthatwhentherecoursemodelhasasecondstage network structure, the tree structure associated with a basis can greatly enhance the effi- ciency of calculations proposed above. Computational experiments are carried out for a globalsupplychainoperationproblem. Forthislargescaleproblem,weareabletoobtain high quality solutions based on the percentage pessimistic, as well as average gaps. All experimentsarefinishedwithinreasonableamountofcomputingtimeonapersonaldesk- topmachine. Themultiplereplicationsschemeimprovesthesolutionqualitybysolvinga compromise problem. Furthermore, when one needs faster turn-around, the decomposi- tionschemecanbeimplementedonaparallelmachinetorunthereplicationsconcurrently ondistributed/parallelmachines. Chapter5 OperationsManagement Applications 5.1 Introduction Uncertaintyisubiquitousinpractice. Whetherwearemakingdecisionsregardingenergy production,investmentplanninginfinancialmarkets,productionplanningunderdemand uncertainty, the ability to model uncertainty helps managers gain insights into situations which may otherwise go unnoticed, and worse, lead to circumstances which might have beenavoidedbymodelswhichprovidedecisionsthatareresilienttodatauncertainty. For thisreason,theareaofoperationsmanagementhashadalonghistoryofstudyingmodels whichaccommodateuncertainty. Unfortunately,untilrecently,thesemanagershaveoften hadtorelyonmodelswhicharesomewhatadhocintheirhandlingofuncertainty. There are many reasons for this approach, the most prominent among them being the fact that stochastic programming (SP) models (including two-stage SP models) have been consid- eredtobejusttoodemandingtoaddressinstanceswhichaccommodatedatauncertainty, correlationsamongrandomvariables,andespeciallyforsituationsinwhichtherearefacil- ity capacity constraints, financial penalties for service level failures, and other conditions arisingfromthehighlyvolatilenatureofmanybusinessenvironments. As suggested elsewhere in this dissertation, SP has long been recognized as a well justified paradigm for decision making under uncertainty. Also well documented is the 85 86 Chapter5.OperationsManagementApplications assessment that this class of models can be computationally so demanding that approx- imations and heuristics have become common-place in many OM applications. This is not to say that heuristics are necessarily unworthy from a managerial perspective; recent research(e.g. Artingeretal2015)arguethatunderwidespreaduncertainty,simpleorgani- zationalheuristicsmightbeacceptable, although, theyareofteninferiorwhenoneisable toestimateprobabilityandriskthroughquantitativemeans. Withtheexplosivegrowthof computingpower,anddatasetswhicharefairlyreliable,thereisaneedtomovetowards welljustifiedparadigmsuchasSP.Unfortunately,thegrowthofbigdatahasnotyettrans- lated into the growth of decisions with big data. In this chapter, we will investigate the potential of using SP models in several OM models that have appeared in the literature. Our purpose is to examine the possible advantages to using this formal paradigm to de- sign principled decision-making tools for models which have appeared in the literature. We will compare results from SP with those from the approximations that have appeared in the literature, and examine whether there are benefits to adopting SP methodology for suchdecisionmodels. OneofthemajorreasonsforthischapteristhatmanyOMapplicationsareconsidered resourceintensive, anddailysavingscanadduptoasignificantedgeincompetitiveness. Forinstance,itiswellacceptedthatinventorypolicyforhighendgoods(e.g. automobiles) can be major revenue drivers, and using well justified models like SP may lead to major advantages. With new developments such as those reported in this dissertation, we wish to demonstrate the time is ripe for a re-evaluation of the widespread notion that SP is unabletoaddressapplicationsofmodestsize,letalonethosewhicharerealisticinsize. Nowadays, statistical analysis tends to adopt the framework described in Figure 5.1. Forexample,ifalinearregressionmodelcannotaccuratelydescribetherelationshipamong variables,thenmoredataoranonlinearregressionmodelmightbeusedtoobtainamore accuratedescription. Afterthemodeliscreated,ausermightuseadifferentsetofdatato testthe“goodnessoffit”whichcanbecategorizedasvalidationprocess. As proposed in Sen et al. (2015), the integrated analytics work-flow (shown in Figure 5.2) closely represents the framework we applied in this chapter. Descriptive analytics Chapter5.OperationsManagementApplications 87 Data Descriptive/Predictive Models (e.g., Historical, Statistical, Probabilistic) Validation Models (e.g., Simulation) Is the Predicted Quality of Decision Acceptable ? N Y Is Data/Model Descriptive Enough ? Y N Done Which layer(s) should provide Greater Fidelity ? FIGURE 5.1: CurrentFrameworkforStatisticalAnalytics. Data Descriptive/Predictive Models (e.g., Historical, Statistical, Probabilistic) Prescriptive Models (e.g., Decision and Optimization) Validation Models (e.g., Simulation) Is the Predicted Quality of Decision Acceptable ? N Y Is Data/Model Descriptive Enough ? Y N Done Which layer(s) should provide Greater Fidelity ? FIGURE 5.2: FrameworkforIntegratedAnalytics. 88 Chapter5.OperationsManagementApplications provides knowledge about the structure/pattern of the system using the historical data. Predictive analytics utilizes the past data to forecast the behaviors of the system in the future. Bothdescriptiveandpredictiveanalyticscanprovideinputdatatotheprescriptive model which focuses on providing decisions for the system. In this chapter, SP model represents the prescriptive part of the integrated analytics. The validation process plays a critical role in our analysis. For the final decision provided for the user, we validate its quality using a different set of data. By exploiting the special structure of two-stage problemsandanalyzingtheshadowpricestability,resampledapproximation’svariability, weshowthatthequalityofthesolutionsareguaranteedempiricallybySD’sstoppingrule. Lowerandupperbounds’confidenceintervalsareprovidedtosupportdecisionmaking. 5.2 Operationandproductionmanagementmodels Models for this class of applications arise in a variety of settings. For example, inven- torymanagementisoneofthemostintenselystudiedissuesinoperationsmanagements, and many important OR paradigms (e.g. EOQ models, (s,S) policies, perishable good in- ventory and others). In most of these applications, uncertainty is a major driving force because revenues, market share, and customer goodwill can be severely undermined by the inability to manage the various sources of uncertainty. First off, it is common to not haveafirm(deterministic)graspondemand,andinthesecircumstance,managementre- sorts to strategies which will help mitigate uncertainty. For instance, in Rao et al (2004), the authors describe alternative ways in which semiconductor manufacturers, memory chip producers, and others use downward substitution to manage uncertainty. In such a strategy, higher speed processors (or memory chips) are used in cases where the demand for lower speed/memory machines exceed their demand expectations. When there is a small number of product lines, mathematical analysis of these inventory models provide insightswhichcanthenbeusedtodeviseheuristicsforlargerandmorerealisticproblems in which some of the assumptions could be violated. For instance, it is not uncommon forsubstitutiontoholdonlypartlybecauseofdesignincompatibilities. Insuchsituations, Chapter5.OperationsManagementApplications 89 it is common practice to still use the heuristic, although its quality may not be similar to the analytical approximations used when justifying the heuristic. The model which we includeinourstudyfirstappearedinBassocketal(1999),andbackthen,itwassuggested bytheauthorsthatSPapproacheswereunabletosolveinstancessuchastheonetheypre- sented (with two normally distributed demands). This chapter will provide evidence to thecontrary. Anotherclassofinventorymodelswherepoolingriskhasbecomecommonisinsitua- tionswhereinventoryisdistributedovermultiplelocations,andthevariabilityacrossthe different outlets can be pooled by using a centralized warehouse from which the retailers orderperiodically,andwhenbetterestimatesofdemandareknown,then,theretailersre- distribute their holdings so that the system can meet demand better as a whole. In recent years, such models have been solved using simulation optimization (Herer et al (2006)) wheretheorderingpolicyisobtainedbyoptimizingasimulationinwhichgradientsesti- matesareobtainedusinginfinitesmalperturbationanalysis. Usingtheinstancepresented in Herer et al (2006), we will demonstrate that a SP approach not only provides excellent solutions,wearealsoabletoprovideestimatesofsolutionquality. Thelatterisoneofthe majorstrengthsofSPmethodology. Finally, we address a model which can be used for logistics under uncertainty. The problemistodeterminethemonthlyproductionandinventorylevelsofeachfacility,and themonthlyshippingquantitiesbetweennetworknodessuchthatthetotalexpectedcost ofoperatingtheglobalsupplychainisminimizedwhilesatisfyingcustomerdemandsover the specified planning horizon. In this problem, customer demands and freight rates are randomvariables. For each of the above models, we present studies addressing scalability, and solu- tion quality using two different approaches of SP. It turns out that the more popular ap- proach(knownasSampleAverageApproximation)continuestohavedifficultieswiththe OM models which we investigate in this paper. However, the stochastic decomposition 90 Chapter5.OperationsManagementApplications method, which has been the focus of this dissertation, is more than capable of provid- inghighqualitysolutionswithreasonablecomputationaleffort,onordinarydesktopma- chines. Based on these studies, it is time to suggest that models of practical size can be addressedusingtheStochasticDecompositionmethodforSLPmodels. The main contributions of this chapter can be summarized as 1) we present results of verylargescaleproblemswhichhaveneverbeensolvedbefore,atleastintheSPcommu- nity. Andmostoftheselarge-scaleinstancesaredeemedchallengingtobesolvedwithan SPsolver;2)Asaconsequenceofthisstudy,wealsodemonstratethepowerofSP,aswell asSD. 5.2.1 Multiproductsinventoryproblemwithsubstitution BAA is a single period multi-product inventory model (see Bassok et al. (1999)). The sys- temdescribedbytheauthorsisconsistedofaretailermakingfirststageorderingdecisions andsecondstageallocationdecisionssothatprofitismaximized. Thesequenceofeventsofthesystemcanbestatedasthefollows, 1. Atthebeginning,placeorders(x i )ofmultipleproducts. 2. The orders are fulfilled immediately (before the start of the second stage), so the orderedquantity(x i )canbeusedtosatisfydemandsofthesecondstage. 3. Randomdemands⇠ i (!)arerealizedinthesecondstage. 4. Based on the observations of the demand levels, allocation decisions (w ij ) are made to satisfy demands. Satisfied demands are rewarded and unsatisfied demands are penalizedinthesecondstageobjectivebyunitrateofa ij and⇡ i respectively. 5. Excessiveproducts(v i )aresalvagedattheendoftheperiodwithunitrates i . Thefollowingnotationwillbeusedinthemodelformulation. Setsandindices: N ⌘ totalnumberofproductsindexedbyi. Chapter5.OperationsManagementApplications 91 Firststagedecisionvariables: x i ⌘ theamountofproductsitobeordered. Firststagedata/parameters: b i ⌘ upperlimitontheorderingdecisionx i . c i ⌘ costincurredbyorderingx i ofproductsi. Secondstagedecisionvariables: w ij ⌘ theamountofproductiusedtosatisfydemandsforproductj u i ⌘ theamountofunsatisfiedproductsi. v i ⌘ theendinginventorylevelofproducti. Secondstagedata/parameters: a ij ⌘ revenueearnedbysubstitutiondecisionw ij . s i ⌘ unitsalvagingrateforproducti. ⇡ i ⌘ backordercostforproducti. Themathematicalformulationofthisproblemisstatedasfollows: Max f(x)= N X i=1 c i x i + E[h(x, ˜ !)] s.t. 0 x i b i ,i=1,...,N, (BAA-M) where, the first term of the objective represents the cost of production. And ˜ ! denotes an appropriatelydefined(vector)randomvariableaffectingthedemands⇠ i ,andthefunction 92 Chapter5.OperationsManagementApplications hisdefinedasfollows: h(x,!)= Max N X i=1 i X j=1 a ji w ji + N X i=1 s i v i N X i=1 ⇡ i u i s.t. u i + i X j=1 w ji = ⇠ i (!),i=1,...,N. v i + N X j i w ij =x i ,i=1,...,N. u i ,v i 0,i=1,...,N w ij 0, 1 i j N, (BAA-S) The first term of the objective is the revenue earned by satisfying demands. The second and the third term represent salvage value of unsold products and penalty incurred for unsatisfied demands. Two sets of constraints are supply-demand and inventory balance equationscorrespondingly. InBassoketal.(1999),authorspresentagreedyheuristicforsolvingthisproblem. We use SD to solve the 2-products inventory model which we named BAA99. BAA99-20 is a larger version (20 products) of BAA. We further scale it up to accommodate 256 products withBAA99-256. 5.2.2 Multi-locationtransshipmentproblem The multi-location transshipment model simply called “RETAIL” can be found in Herer etal.(2006)andZhaoandSen(2006). Inthisproblem,multipleretailersaremakingcoop- eratedinventorydecisionssuchthattheoperationalcostisminimized. Thesequenceofeventsofthissystemisdescribedasfollows, 1. Retailersmakeorderingdecisionss i inthefirststage. 2. Ordersarefulfilledbeforethestartofthesecondstage. 3. Randomdemands⇠ i (!)arerealizedinthesecondstage. Chapter5.OperationsManagementApplications 93 4. Saledecisionsf i andtransshipmentdecisionst ij aremade. Itcostsc ij toshipfromre- taileritoretailerj. Holdingandbackordercostsareestimatedforendinginventory levele i andbackloggeddecisionr i . Hereisalistofnotationsusedinthisproblem, Setsandindices: N ⌘ totalnumberofproductsindexedbyi. Firststagedecisionvariables: s i ⌘ theamountofproductitobeorderedupto(order-up-tolevelforinventorysystem). Secondstagedecisionvariables: e i ⌘ theendinginventoryofretaileri. f i ⌘ theamountusedtosatisfydemandatretaileri. q i ⌘ theamountretaileriorderstoreachthereplenishmentlevels i . r i ⌘ theamountofunsatisfieddemandsatretaileri. t ij ⌘ theamounttransshippedfromretaileritoretailerj Secondstagedata/parameters: c ij ⌘ thetransshipmentcostofshippingfromretaileritoretailerj. h i ⌘ theinventorycostatretaileri. p i ⌘ backordercostatretaileri. Themathematicalformulationisstatedasfollows, Min f(s)=E[h(s, ˜ !)] s.t.s i 0 ,i=1,...,N, (RETAIL-M) where s i represents the order-up-to level. Note that in this problem, all retailers (i = 1,2,...,N) are cooperating with each other to make the replenishment decisions. And ˜ ! denotes an appropriately defined (vector) random variable affecting the demands ⇠ i , andthefunctionhisdefinedasfollows: 94 Chapter5.OperationsManagementApplications h(s,!)= Max N X i=1 h i e i + N X i=1 N X j=1 j6=i c ij t ij + N X i=1 p i r i s.t. f i + N X j=1 j6=i t ij +e i =s i ,i=1,...,N, f i + N X j=1 j6=i t ji +r i = ⇠ i (!),i=1,...,N, N X i=1 r i + N X i=1 q i = N X i=1 ⇠ i (!) e i +q i =s i ,i=1,...,N, e i ,f i ,q i ,r i 0,i=1,...,N, t ij 0,i=1,...,N,j=1,...,N, (RETAIL-S) The first and second set of constraints are node balance constraints for retailers and cus- tomers correspondingly. The third constraint represents the node balance for the whole system (consider the whole transshipment network as a single node and write down the node balance constraint). And the last set of constraints represents the system transitions foreachretailer. 5.2.3 SupplyChainOperationswithRandomDemandsandFreightRates Consider a supply chain network of manufacturing plants, distribution centers and cus- tomers. Theproblemistodeterminethemonthlyproductionandinventorylevelsofeach facility, and the monthly shipping quantities between network nodes such that the total expectedcostofoperatingtheglobalsupplychainisminimizedwhilesatisfyingcustomer demands over the specified planning horizon. This problem appears in You et al. (2009). However the data is not provided in the publication so we created a version with similar mathematicalformulation. Thesequenceofeventsofthissystemcanbedescribedasfollows, Chapter5.OperationsManagementApplications 95 1. Inventory, transshipment and production decisions (I,F,S,W) are made at the be- ginning of the time period. F is for inter-facilities and S is for facility to customer transshipmentdecision. 2. The transshipment decisions are fulfilled immediately at the starting point of the shippingroutewhileitsfulfillmenttimeatthedestinationdependsontheparameter which is parameterized by its subscript indicating the node pair (k,k 0 ), product index j and transportation mode index m. Production decision W is immediately fulfilledatthefirststage. 3. Randomdemandsandfreightratesarerealizedatthesecondstage. 4. Secondstageinventory,transshipmentandproductiondecisions(I,F,S,W)aremade fort> 1. WewillfollowthesamenotationsusedinYouetal.(2009)asfollows. Setsandindices: K⌘ setofplantsanddistributioncentersindexedbyk. K p ⌘ setofplantsindexedbyk. K DC ⌘ setofdistributioncentersindexedbyk. R⌘ setofcustomersindexedbyr. J⌘ setofproductsindexedbyj. M ⌘ setoftransportationmodeindexedbym. T ⌘ setoftimeperiodindexedbyt. Firststagedecisionvariables(t=1): F k,k 0 ,j,m,t ⌘ theamountofproductj shippedbetweenfacilitates (k,k 0 )withmodem. I k,j,t ⌘ theinventorylevelofproductj atfacilityk. S k,r,j,m,t ⌘ theamountofproductj shippedbetweenfacilityandcustomer (k,r)withmodem. W k,j,t ⌘ theproductionlevelofproductj atfacilityk wherek2K p . Firststagedata/parameters(t=1): 96 Chapter5.OperationsManagementApplications I 0 k,j ⌘ theinitialinventorylevelofproductj atfacilityk. k,k 0 ,j,m,t ⌘ theshippingcostofproductj fromk tok 0 withmodem. k,r,j,m,t ⌘ theshippingcostofproductj fromk tor withmodem. h k,j,t ⌘ theinventorycostofproductj atfacilityk. k,j,t ⌘ thethroughputcostofproductj atfacilityk. ⌘ k,j,t ⌘ thebackordercostofproductj atfacilityk. Q k,j ⌘ thecapacityofproductj atplantk wherek2K p . I m k,j,t ⌘ theminimuminventoryrequirementofproductj atfacilityk. k,k 0 ,j,m ⌘ theshippingtimeofproductj fromfacilityk tofacilityk 0 withmodem. k,r,j,m ⌘ theshippingtimeofproductj fromfacilityk tocustomerr withmodem. Secondstagedecisionvariables(t> 1): F k,k 0 ,j,m,t ⌘ theamountofproductj shippedbetweenfacilitates (k,k 0 )withmodem. I k,j,t ⌘ theinventorylevelofproductj atfacilityk. S k,r,j,m,t ⌘ theamountofproductj shippedbetweenfacilityandcustomer (k,r)withmodem. W k,j,t ⌘ theproductionlevelofproductj atfacilityk wherek2K p . SF r,j,t ⌘ theamountofunsatisfieddemandsofproductj fromcustomerr wherer2R Secondstagedata/parameters(t> 1): k,k 0 ,j,m,t ⌘ theshippingcostofproductj fromk tok 0 withmodem. k,r,j,m,t ⌘ theshippingcostofproductj fromk tor withmodem. Note that the model assumes the variability of demand levels and shipping costs are larger for time periods further away from the current time period. We assume all param- eters of the current month are deterministic while shipping costs and customer demands are random in all future months. Here we consider the current month as the first stage andnexttwomonthsasthesecondstage. Thefirststageisrepresentedbelowas(SCP-M) whilethesecondstageisgivenbytheformulation(SCP-S), Chapter5.OperationsManagementApplications 97 f(I t=1 ,F t=1 ,S t=1 ) =Min X k2 K X k 0 2 K X j2 J X m2 M X t=1 k,k 0 ,j,m,t F k,k 0 ,j,m,t + X k2 K X r2 R X j2 J X m2 M X t=1 k,r,j,m,t S k,r,j,m,t + X k2 K X k 0 2 K X j2 J X m2 M X t=1 k,j,t F k,k 0 ,j,m,t + X k2 K X k 0 2 K X j2 J X m2 M X t=1 k,j,t S k,k 0 ,j,m,t + X k2 K X j2 J X t=1 h k,j,t I k,j,t +E[h(I t 2 ,F t 2 ,S t 2 , ˜ !)] s.t. X k 0 2 K X m2 M F k,k 0 ,j,m,t + X r2 R X m2 M S k,r,j,m,t =I 0 k,j I k,j,t +W k,j,t + X k 0 2 K X m2 M F k 0 ,k,j,m,t k 0 ,k,j,m , 8 j, k2K p ,t=1 X k 0 2 K X m2 M F k,k 0 ,j,m,t + X r2 R X m2 M S k,r,j,m,t =I 0 k,j I k,j,t + X k 0 2 K X m2 M F k 0 ,k,j,m,t k 0 ,k,j,m , 8 j, k2K DC ,t=1 X k2 K X m2 M S k,r,j,m,t k,r,j,m +SF r,j,t d r,j,t , 8 r,j, t=1 W k,j,t Q k,j , 8 j, t=1,k2K p I k,j,t I m k,j,t , 8 k,j, t=1 I k,j,t ,F k,k 0 ,j,m,t ,S k,k 0 ,j,m,t 0, 8 ,k,k 0 ,j,m, t=1. (SCP-M) The model includes five types of constraints. They are mass balance constraints for the production plants, distribution centers and customers, together with the constraints for production capacity and minimum inventory requirements. And ˜ ! denotes an appropri- atelydefined(vector)randomvariableaffectingthecustomerdemandsdandfreightrates ,andthefunctionhisdefinedasfollows: 98 Chapter5.OperationsManagementApplications h(I t 2 ,F t 2 ,S t 2 ,!) = Min X k2 K X k 0 2 K X j2 J X m2 M X t 2 k,k 0 ,j,m,t (!)F k,k 0 ,j,m,t + X k2 K X r2 R X j2 J X m2 M X t 2 k,r,j,m,t (!)S k,r,j,m,t + X k2 K X k 0 2 K X j2 J X m2 M X t 2 k,j,t F k,k 0 ,j,m,t + X k2 K X k 0 2 K X j2 J X m2 M X t 2 k,j,t S k,k 0 ,j,m,t + X k2 K X j2 J X t 2 h k,j,t I k,j,t s.t. X k 0 2 K X m2 M F k,k 0 ,j,m,t + X r2 R X m2 M S k,r,j,m,t =I 0 k,j I k,j,t +W k,j,t + X k 0 2 K X m2 M F k 0 ,k,j,m,t k 0 ,k,j,m , 8 j, k2K p ,t 2 X k 0 2 K X m2 M F k,k 0 ,j,m,t + X r2 R X m2 M S k,r,j,m,t =I 0 k,j I k,j,t + X k 0 2 K X m2 M F k 0 ,k,j,m,t k 0 ,k,j,m , 8 j, k2K DC ,t 2 X k2 K X m2 M S k,r,j,m,t k,r,j,m +SF r,j,t d r,j,t (!), 8 r,j, t 2 W k,j,t Q k,j , 8 j, t 2,k2K p I k,j,t I m k,j,t , 8 k,j, t 2 I k,j,t ,F k,k 0 ,j,m,t ,S k,k 0 ,j,m,t 0, 8 ,k,k 0 ,j,m, t 2. (SCP-S) The smaller data set of the SCP instance has 2 plants, 3 distribution centers, 14 trans- portationlinksand6customers. Thenthisproblemisscaleduptoalarger(morerealistic) sizesimilartothatoftheproblemfromYouetal.(2009)inwhichtwoproductsareconsid- ered. Thislargerinstancehasatotalof5plants,13distributioncenters,131transportation linksand46customers. Chapter5.OperationsManagementApplications 99 TABLE 5.1: OMinstancestestedwithSDunderLooseTolerance Instance 1 st stage 2 nd stage #ofrandom Magnitudeof Name variables/constraints variables/constraints variables scenarios BAA99 2/0 7/4 2 615 BAA99-20 20/0 250/40 20 10 34 BAA99-256 256/0 33409/512 256 10 434 RETAIL 7/0 70/22 7 10 11 RETAIL-14 14/0 70/22 14 10 23 RETAIL-140 140/0 20020/421 140 10 237 SCFT-6 30/18 86/36 14 10 9 SCFT-46 602/174 348/1480 186 10 129 5.3 ComputationalExperiments Intable5.1,multipleversionsoftheabovethreeinstancesandtheircorrespondingdimen- sions are listed. The magnitude of scenarios for most of these problems is astronomical due to the large number of independent random variables. All instances are solved with StochasticDecompositionalgorithmandSampleAverageApproximation(SAA).Thealgo- rithmdetailsregardingSDcanbefoundinChapter3. ForSAA,weformulatetheinstance withsamplesizeN = 50, 100, 200andthensolvetheproblemwiththeCPLEXLPsolver. Computations are carried out on a MacBook Pro with the following specifications: Intel Core i7 CPU@2.50GHz (Quad-Core), and 16 GB Memory @1600MHz. Note that multi- products inventory problems (BAA) contain negative valued lower bound since the orig- inal formulation is a maximization problem (converted to minimization problem for SD and SAA). A lower bound (nonzero) is built at the beginning of the SD algorithm for cut updates purpose. Also note that the supply management problems (SCFT) have random costcoefficientandthereforeRCSDisappliedtosolvethoseinstances. Intable5.2, 5.3and5.4, wereporttheobjectivelowerandupperboundestimatesand corresponding95%confidenceintervalsfromSDsolves. AsinChapter3,pessimisticgaps reflect the difference (u +u width ) (` ` width ). On average, the gaps reported in our ex- perimentsaresmall,butbecauseofthevarianceoftheestimators,theconfidenceintervals arelarge. 100 Chapter5.OperationsManagementApplications TABLE 5.2: Objectiveupperandlowerboundsandcorresponding95%con- fidenceintervalsofallinstancessolvedbySDwithloosetolerance Instance LowerBound AverageDecision CompromiseDecision Pessimistic Average UpperBound UpperBound Name (95%CI’s) (95%CI’s) (95%CI’s) Gap Gap BAA99 -237.600 -235.845 -235.787 14.568 1.813 (±7.338) (±5.422) (±5.417) 6.17% 0.77% BAA99-20 -324925.806 -318590.977 -318423.195 12090.402 6502.611 (±1615.426) (±3981.923) (±3972.365) 3.80% 2.04% BAA99-256 -6177441.242 -6134964.465 -6128132.013 107549.058 49309.229 (±9314.228) (±50319.932) (±48925.601) 1.76% 0.80% RETAIL 153.098 154.438 154.501 5.557 1.403 (±3.381) (±0.772) (±0.773) 3.60% 0.91% RETAIL-14 224.804 225.486 225.505 7.784 0.701 (±5.955) (±1.127) (±1.128) 3.45% 0.31% RETAIL-140 842.458 860.001 859.703 37.762 17.245 (±16.219) (±4.300) (±4.298) 4.39% 2.01% SCFT-6 495715.564 495202.792 495193.733 5387.104 -521.831 (±4945.791) (±963.185) (±963.144) 1.09% 0.11% SCFT-46 8528267.073 8584931.543 8584979.589 80395.179 56712.516 (±3881.805) (±19795.430) (±19800.858) 0.94% 0.66% TABLE 5.3: Objectiveupperandlowerboundsandcorresponding95%con- fidenceintervalsofallinstancessolvedbySDwithnominaltolerance Instance LowerBound AverageDecision CompromiseDecision Pessimistic Average UpperBound UpperBound Name (95%CI’s) (95%CI’s) (95%CI’s) Gap Gap BAA99 -241.620 -236.158 -236.155 16.530 5.465 (±5.616) (±5.449) (±5.449) 7.00% 2.31% BAA99-20 -323042.474 -318444.364 -318443.504 9894.396 4598.970 (±1300.878) (±3993.911) (±3994.548) 3.11% 1.44% BAA99-256 -6146527.143 -6135514.129 -6135521.989 68180.792 11005.154 (±5027.272) (±52160.511) (±52148.366) 1.11% 0.18% RETAIL 154.551 154.417 154.419 3.115 -0.132 (±2.475) (±0.772) (±0.772) 2.02% 0.09% RETAIL-14 224.535 225.592 225.617 5.442 1.082 (±3.232) (±1.128) (±1.128) 2.41% 0.48% RETAIL-140 853.004 859.370 859.425 18.331 6.421 (±7.613) (±4.297) (±4.297) 2.13% 0.75% SCFT-6 495670.630 495191.826 495191.839 3899.010 -478.791 (±3414.656) (±963.145) (±963.145) 0.79% 0.10% SCFT-46 8551249.099 8585048.702 8585044.672 55992.434 33795.573 (±2432.583) (±19765.153) (±19764.278) 0.65% 0.39% Chapter5.OperationsManagementApplications 101 TABLE 5.4: Objectiveupperandlowerboundsandcorresponding95%con- fidenceintervalsofallinstancessolvedbySDwithtighttolerance Instance LowerBound AverageDecision CompromiseDecision Pessimistic Average UpperBound UpperBound Name (95%CI’s) (95%CI’s) (95%CI’s) Gap Gap BAA99 -241.070 -236.276 -236.276 15.258 4.794 (±4.993) (±5.471) (±5.471) 6.46% 2.03% BAA99-20 -321399.984 -318513.787 -318513.824 7491.338 2886.160 (±609.823) (±3995.247) (±3995.355) 2.35% 0.91% BAA99-256 -6141748.618 -6134395.672 -6134393.117 63961.423 7355.501 (±3618.900) (±52984.590) (±52987.022) 1.04% 0.12% RETAIL 154.771 154.439 154.439 2.157 -0.332 (±1.717) (±0.772) (±0.772) 1.40% 0.21% RETAIL-14 224.163 225.540 225.540 4.265 1.377 (±1.760) (±1.128) (±1.128) 1.89% 0.61% RETAIL-140 853.955 859.311 859.253 13.245 5.298 (±3.651) (±4.297) (±4.296) 1.54% 0.62% SCFT-6 496191.439 495201.726 495201.732 2202.808 -989.707 (±2229.318) (±963.197) (±963.197) 0.44% 0.20% SCFT-46 8558572.302 8584763.591 8584764.422 48453.926 26192.12 (±2465.488) (±19796.397) (±19796.318) 0.56% 0.31% In Figure 5.3, 5.4 and 5.5, we plot the objective lower and upper bound estimates of SD and SAA for different versions of the muti-products inventory problem. Note that each single blue dot in the figure is an upper bound estimate of the SAA solution which requires a large number of LP solves. The blue line connects the best solution among all 30 validated. On the other hand, the upper bound estimate from SD only requires one validation of the compromise solution from which the solution quality using the loose toleranceisalreadysuperiorthanmostofothersolutionsreportedfromtheSAAsolves. In Figure 5.5, note that SAA is not able to solve the BAA instance with sample size of 100 and 200 whereas SD could start from around 200 samples and proceed all the way to around1500samples. Similar observations could also be made from figure 5.6, 5.7 and 5.8 for different ver- sions of the retail instance. Note that in figure 5.8, SAA is able to solve the larger scale instance. However, the total validation time is too long, therefore only lower bound esti- matesarereportedinthefigure. From Figure 5.9 and 5.10, we can see that SD can also handle instances with second 102 Chapter5.OperationsManagementApplications FIGURE 5.3: ObjectivelowerandupperboundsofSDandSAAforBAA99. FIGURE 5.4: ObjectivelowerandupperboundsofSDandSAAforBAA99- 20. Chapter5.OperationsManagementApplications 103 FIGURE 5.5: ObjectivelowerandupperboundsofSDandSAAforBAA99- 256. FIGURE 5.6: ObjectivelowerandupperboundsofSDandSAAforRETAIL. 104 Chapter5.OperationsManagementApplications FIGURE 5.7: Objective lower and upper bounds of SD and SAA for RE- TAIL14. FIGURE 5.8: Objective lower and upper bounds of SD and SAA for RE- TAIL140. Chapter5.OperationsManagementApplications 105 FIGURE 5.9: ObjectivelowerandupperboundsofSDandSAAforSCFT-6. FIGURE 5.10: Objective lower and upper bounds of SD and SAA for SCFT- 46. 106 Chapter5.OperationsManagementApplications stage random cost coefficients. The compromise solutions provided by SD beat all SAA solutions consistently although the lower bounds reported in this instance are somewhat weaker. Nevertheless, the pessimistic gaps as well as the average gaps are less than a percent,andthereforeperfectlyacceptable. In table 5.5, we summarize the total solution time (solving plus validation) for each method. From the table, we can see that for small and moderate size instances, SAA’s performanceissatisfactory. However,asweincreasetheproblemsizetoalargerscale,the proceduretakestoomuchtimeeithersolvingtheSAAproblemorevaluatingthesolution. Also note that for the time reported in table 5.5, SD’s sample sizes are not included since eachinstanceissolvedadaptively. Thedetailedsamplesizesandsolvingtimeinformation arepresentedrightafterthistable. Table 5.6, 5.7 and 5.8 summarize the SD running time for each instance under loose, nominalandtighttolerance. Mostofthemoderatesizeinstancescanbesolvedwithinone ortwosecondsperreplication. Forlargescaleinstances,allreplicationsarefinishedwithin 3minutesunderloosetolerance. AlsonotethatCPUsecondsarereportedinthetable. But in the actual experiment conducted on Quad-Cores machine, on average the wall clock time is approximately 1/4 of what is reported in the table. SD is still implemented as a sequentialalgorithm,thereportedwallclocktimespeedupisduetotheparallelizationof theCPLEX’sLPandQPsolvers. OneimportantfeatureofSDisits“warmstart”capability. Itimpliesthatsamplesizes TABLE5.5: Summaryoftotalsolutiontime(CPUsecond)ofSDandSAAfor allinstances Instance SD SD SD SAA SAA SAA Name (Loose) (Nominal) (Tight) (N=50) (N=100) (N=200) BAA99 17.00 31.7 68.35 7.17 6.97 6.99 BAA99-20 27.74 46.36 772.39 28.51 23.99 50.81 BAA99-256 3074.60 6593.52 39440.95 29343.85 >108000 >108000 RETAIL 25.91 62.92 237.27 109.97 112.36 122.59 RETAIL14 28.90 137.47 4035.56 237.39 241.74 263.18 RETAIL140 17582.66 20484.27 109771.69 >108000 >108000 >108000 SCFT-6 22.97 56.07 97.29 78.72 79.58 81.37 SCFT-46 3526.68 20778.51 55395.71 53.13 152.14 1924.33 Chapter5.OperationsManagementApplications 107 TABLE 5.6: Sample size (std dev), solution time for SD (std dev), solution timeforCompromiseProblemandvalidationtime(ofcompromisedecision andaveragedecision)forallinstancessolvedbySDwithloosetolerance Average Average CPUsecs CPUsecs CPUsecs Instance SampleSize CPUsecs toSolve toEstimateObj toEstimateObj Name (StdDev) (StdDev) Compromise ofCompromise ofAverage forSD forSD Problem Decision Decision BAA99 158.47(40.53) 0.56(0.15) 0.01 0.19 0.19 BAA99-20 185.10(3.32) 0.92(0.04) 0.04 0.11 0.11 BAA99-256 183.83(5.75) 100.29(7.98) 36.62 29.28 28.72 RETAIL 189.70(47.28) 0.74(0.20) 0.12 3.59 3.63 RETAIL-14 179.80(50.52) 0.72(0.23) 0.03 7.27 7.29 RETAIL-140 195.57(42.41) 55.09(12.53) 7.23 15255.30 15922.73 SCFT-6 125.17(10.48) 0.68(0.05) 0.04 2.53 2.61 SCFT-46 198.93(4.65) 111.70(34.30) 175.52 0.08 0.08 and solution time listed in table 5.6 can be deducted from table 5.7. If a decision maker is not satisfied with the pessimistic gap (or average gap) provided under loose tolerance, onlymarginaleffortisrequiredtoimprovethesolutionquality. Applicationsofthis“warm start” capability is quite broad. Imagine sharing sensitive data among multiple agents for solving stochastic programs. Instead of sending the exact scenario realization data (or distributional data) of the system, only the dual vertex sets and existing functional approximationsarepassedontotheotherparty. Anotherobservationfromtable5.6isthatthevalidationtimeoftheRETAIL-140isex- tremelylong. Thisisnotunusualandgenerallyspeaking,optimizationisan“easier”task than validation. To accurately validate a solution is ,in most cases, more expensive than comparingtwoalternativesfrequently. InFu(2002),theauthormadethispointclearwith regardtotheordinaloptimizationmethodofsimulationoptimization. Variancereduction techniquesmighthelpinthissituationtospeedupthevalidationprocess. Othercompro- mises such as increasing the computing power to sample more data at the solution point oracceptingawiderconfidenceintervalcanalsohelpreducethevalidationtime. 108 Chapter5.OperationsManagementApplications TABLE 5.7: Sample size (std dev), solution time for SD (std dev), solution timeforCompromiseProblemandvalidationtime(ofcompromisedecision andaveragedecision)forallinstancessolvedbySDwithnominaltolerance Average Average CPUsecs CPUsecs CPUsecs Instance SampleSize CPUsecs toSolve toEstimateObj toEstimateObj Name (StdDev) (StdDev) Compromise ofCompromise ofAverage forSD forSD Problem Decision Decision BAA99 299.53(58.52) 1.05(0.22) 0.01 0.19 0.20 BAA99-20 329.67(14.91) 1.54(0.10) 0.04 0.12 0.13 BAA99-256 359.53(9.10) 217.85(16.34) 28.80 29.22 29.68 RETAIL 471.90(147.80) 1.98(0.75) 0.01 3.51 3.36 RETAIL-14 582.50(259.38) 4.33(4.18) 0.03 7.54 7.25 RETAIL-140 650.67(241.87) 194.00(75.43) 7.36 14656.91 14710.65 SCFT-6 277.43(13.91) 1.78(0.12) 0.042 2.63 2.65 SCFT-46 443.90(40.39) 864.00(219.29) 236.65 1.90 1.87 TABLE 5.8: Sample size (std dev), solution time for SD (std dev), solution timeforCompromiseProblemandvalidationtime(ofcompromisedecision andaveragedecision)forallinstancessolvedbySDwithtighttolerance Average Average CPUsecs CPUsecs CPUsecs Instance SampleSize CPUsecs toSolve toEstimateObj toEstimateObj Name (StdDev) (StdDev) Compromise ofCompromise ofAverage forSD forSD Problem Decision Decision BAA99 582.60(119.74) 2.27(0.81) 0.01 0.24 0.22 BAA99-20 1385.63(451.67) 25.74(21.66) 0.05 0.14 0.13 BAA99-256 1342.30(358.74) 1312.70(508.96) 26.70 33.25 34.48 RETAIL 956.67(224.49) 7.78(4.37) 0.02 3.85 3.86 RETAIL-14 2030.43(564.14) 134.24(105.40) 0.03 8.33 8.49 RETAIL-140 4724.30(824.74) 3085.01(802.43) 12.05 17209.34 16013.92 SCFT-6 529.67(12.16) 3.16(0.31) 0.03 2.46 2.48 SCFT-46 1281.77(235.46) 4590.45(1070.97) 261.34 1.87 1.87 Chapter5.OperationsManagementApplications 109 FIGURE 5.11: ObjectivelowerandupperboundsofSD,SAAandSAAwith LHSforBAA99. If SAA is the choice of an OR practitioner, it is recommended to use variance reduc- tiontechniquessuchasLatinHypercubeSampling(LHS)methodtoimprovethesolution quality. In Figure 5.11, we report the empirical behavior of the usage of such a method. Notethatinthisexperiment,LHSisonlyappliedtothesampledproblemsetupstage. For validation, the Monte Carlo sampling method is used. From Figure 5.11, improvements canbeobservedforboththebounds’qualityandvariance. Similarresultscanbefoundin otherinstancesaswell. PleaserefertoAppendixCforvariancereductionfiguresofother instances. 5.4 Conclusion Inthischapter,wepresentthreeoperationandproductionmanagementapplications. We alsoincreasethesizeoftheeachinstancetoaindustrial-strengthscalewhichisconsidered challengingforSPmethods. Objectivelowerandupperboundsareprovidedfordecision makerusingbothSDandSAAmethods. Solutiontimeisalsogiventoshowcasetherobust 110 Chapter5.OperationsManagementApplications performanceofSDalgorithm. Inaddition,wecomparethesamplesizeandsolutiontime ofSDunderdifferenttolerances. Followedbythe“warmstart”capabilityofSDwhichcan benefitthedecisionmakingprocessbyutilizingsavedprevioussolves. Chapter6 FutureWork Inthisdissertation,weintroducetheconceptofcompromisedecisions,anextensiontoSD forsolvingtwostageSLPwithrandomcostandseveraloperationsmanagementproblems to showcase the solution quality provided by SD on desktop machines. In this chapter we present three potential avenues to pursue for future research. a) network structured recourse functions, b) support vector machines, and c) binary first stage. Note that these specifictypesofmodelsthemselveshavealargenumberofapplications. Thus,theimpact ofStochasticDecompositioncanbeenhancedconsiderablywithfurtherresearchonthese additionalmodeltypes. 6.1 NetworkStructuredRandomCostProblem As shown at Step 2 of Algorithm 2 in Chapter 4, the process for building the dual vertex setV t k ,8 t,kinvolvesthecomputationandstorageofaseriesofshiftingdirectionalvectors and basis inverse matrices. For certain network structured problems e.g. Transportation, and Min-cost Flow problems, both the computation and storage could be simplified sig- nificantly. ThereasonisthattherecoursematrixD hasitsbasismatrixwithaveryspecial network structure: namely, each basis is associated with a tree for which there are highly specialized data structures for efficient implementation. For example, the supply chain operationsapplication(showninChapter5)hasarecoursematrixinthefollowingform, D ij =e i e m+j , (6.1) 111 112 Chapter6.FutureWork wheree i isaunitvectorwithoneatthei th coordinateandmisthetotalnumberofrows. ThismatrixisTotallyUnimodular(TU)whichfacilitatecalculationsinvolvingbasismatri- cessinceonly{0, 1,+1}appearinthebasis. Besides,thebasismatrixrepresentsatreeof the network which provides another key insight that computations involving links (with randomcost)connectedtohigherdegreenodeswillpotentiallybemorecomplicatedthan those connected to lower degree nodes. It can be verified that linkl j connected with two lowerdegreenodeswouldhaveasmallerset(mostlythesame)ofshiftingdirections i j for differentbasisB i . Letussupposethelinkl j isconnectedtotwohighdegreenodesN 1 ,N 2 . As long as all bases discovered at k th iteration (B i ,i=1,...,k) include a small subset of those links connected to node N 1 ,N 2 , the cardinality of the set of random directions { i j ,i=1,...,k}shouldalsoberelativelysmall. 6.2 SDforBuildingSupportVectorMachines In machine learning literature, Support Vector Machine (SVM) is a very powerful tech- niqueproposedinVapnik’sPhDthesisaround1964andthemethodgainedpopularityin early90safteritssuccessfulapplicationinhandwritingdigitrecognitionatBellLabs. ThemathematicalformulationofthebinarySVMclassifiercanbestatedasfollows: Min f(w,b)= 1 2 ||w|| 2 + C N N X i=1 ⇠ i s.t.y i (w > x i +b) 1 ⇠ i i=1,...,N ⇠ i 0 i=1,...,N. (SVMPrimal) Note that (x i ,y i ) are data elements which can be considered the technology matrix in the recourse formulation of the two stage stochastic programs. Here, the first stage decision variables are (w,b) wherew is the weights of the separating hyperplane andb is the bias (intercept) term. The parameter C in the objective is a weighting parameter that weighs the trade off between the margin width of the separation and the empirical errors of the Chapter6.FutureWork 113 linearclassifier. Thetwostageproblemcanbestatedasfollows, Min f(w,b)= 1 2 ||w|| 2 + CE[h((w,b), ˜ !)] (SVMMaster-Problem) where, ˜ ! denotesanappropriatelydefined(vector)randomvariable,andthefunctionhis definedasfollows. h((w,b),!)= Min ⇠ s.t.⇠ 1 y(!)x > (!)w +y(!)b ⇠ 0. (SVMSub-Problem) In this formulation, SD can be directly applied to build the function approximation of the error term. Thus the two stage SP is setup to find the widest margin separation hyperplanewithminimumexpectederrors. To completely embrace the power of SVM, we also should check out the dual form of themodelwhichcanbederivedusingtheLagrangianasfollows, L(w,b,↵ )= 1 2 w > w + C N N X i=1 ⇠ i N X i=1 ↵ i h y i (w > x i +b) 1+⇠ i i N X i=1 µ i ⇠ i (6.2) where↵ i 0andµ i 0arethedualmultipliers. With @L w =0 ! w = N X i=1 ↵ i y i x i @L b =0 ! N X i=1 ↵ i y i =0 @L ⇠ i =0, 8 i ! C N ↵ i µ i =0, 8 i, (6.3) Wehave L(w,b,↵ ) (6.4a) 114 Chapter6.FutureWork = 1 2 w > w + C N N X i=1 ⇠ i N X i=1 ↵ i y i w > x i b N X i=1 ↵ i y i + N X i=1 ↵ i N X i=1 ↵ i ⇠ i N X i=1 µ i ⇠ i (6.4b) = 1 2 w > w + N X i=1 ( C N ↵ i µ i )⇠ i w > w + N X i=1 ↵ i (6.4c) = N X i=1 ↵ i 1 2 w > w (6.4d) = N X i=1 ↵ i 1 2 N X i,j=1 y i y j ↵ i ↵ j <x i ,x j > (6.4e) Combiningtheaboveobjectiveandconstraints,wehavethefollowingdualproblem: Max N X i=1 ↵ i 1 2 N X i,j=1 y i y j ↵ i ↵ j <x i ,x j > s.t. N X i=1 ↵ i y i =0 0 ↵ i C N i=1,...,N ↵ i 0 i=1,...,N. (SVMDual) where “<,>” denote the inner product of two vectors. The advantage of the dual formu- lation is that Kernel method can be applied to build lower dimension nonlinear classifier usinglinearclassifierfromahigherdimensionsuchthatthedatacouldbeseparatedwith lower error or even linearly separable in the higher dimension. Simply replace the inner product<x i ,x j > by< (x i ), (x j ) > where (.) is a feature mapping that lift the origi- nal data point to a higher dimension. And the process works in a way that computations actually are not taken place in the high dimensions. Besides, only a small subset of the datasetwouldhavenonzerodualvaluessothattheclassificationprocessafterwardsalso occursinthedualspaceinanefficientway. Thisideaofstoringlimitedbutessentialdual informationresemblesthedualapproximationprocessofSD. Chapter6.FutureWork 115 6.3 BinaryFirstStageModels-theChainingProblem While our work has focused on two-stage stochastic linear programming problems, there are many applications of models where the first stage decision is binary (or even a mix of continuous and binary). One such application is the well known chaining problem in operationsmanagement. Assume we have n products and m manufacturing plants. Under uncertain market environment, demands for each of then products are unknown to the planner. Thus the flexibility of the manufacturing system is critical to the success of the manufacturer. By flexibility, we mean instead of producing a single line of products, each plant should be configuredtoproducemultipletypesofproducts. Discussionaboutthisproblemappears inGravesandJordan(1991)andtheauthorsdemonstratethebenefitsofpartialflexibility comparingtonoflexibilityorfullflexibility. Themathematicalformulationoftheaboveproblemcanbestatedasfollows, Min f(A)= m X i=1 s i s.t. X (i,j)2 A y ij +s i d i ,i=1,...,m X (i,j)2 A y ij c j ,j=1,...,n y ij 0, 8 i,j (ChainingProblem) whereArepresentsaconfigurationofthemanufacturingsystem. Andtheobjectiveofthe problemistofindthebestconfigurationA ⇤ thatminimizesthetotalnumberof“shortfalls” i.e., P m i=1 s i . Note that the above formulation contains no cost of setting up the configu- ration A, which in most cases is unrealistic and simply implies that full flexibility is the optimal solution (configure each plant to produce allm products). In Graves and Jordan (1991), the authors show that a connected chain structure has the similar expected total numberofshortfallsascomparedtothatofthesystemwithfullflexibilitywhichmightbe expensivetosetupandmaintain. 116 Chapter6.FutureWork This problem canbe easilysetup asa twostage recourseproblem inwhich additional firststagebinarydecisionvariablesshouldbeintroducedtorepresentthesystemconfigu- rationA. Min f(x)=E[h(x, ˜ !)] X (i,j) f ij x ij F x ij 2{0,1} 8 (i,j) (ChainingMaster-Problem) The inclusion of a link pair (i,j) is represented by setting the binary variablex ij equal to one and zero otherwise. There is a fixed setup cost f ij for link (i,j) and the total setup cost of the network can be upper limited by the parameterF. The expected total number of shortfalls is the value function that SD aims to approximate by sequentially sampling from random variables corresponding to customer demands for different products. And therecoursefunctionisdefinedasfollows: Min h(x,!)= m X i=1 s i s.t.y ij Mx ij 8 (i,j) X (i,j)2 A y ij +s i d i (!),i=1,...,m X (i,j)2 A y ij c j ,j=1,...,n y ij 0, 8 (i,j) (ChainingSub-problem) Customer demands are random and the objective is the total shortfalls under design x and scenario !. The third constraint limits the capacity for plant j. Users shall work withcontinuousconfigurationdecision(relaxx ij tobecontinuous)atthebeginningofSD algorithm. And once the SD solve stops, the standard Benders decomposition algorithm starts with first stage binary variables using the SD generated dual vertex set. In this way,nofurthersamplingisconducted. Eventhoughafixedsamplesizeisused,thedual Chapter6.FutureWork 117 vertexsetapproximatedduringtheSDrunmightprovidevaluableinformationforfurther calculations. AppendixA TheSSNInstance-BacktotheFuture ThesubtitleofthisAppendix(“BacktotheFuture”)isintendedtoreflectthefactthatthere arecertaininstancesfromthepastwhichneedtobefullyexploredbeforewecanclaimto haveathoroughunderstandingofthetwo-stageSLPparadigm. SSNisonesuchinstance. The subtitle conveys the fact that we went back two decades to revive our interest in the SSNmodel,andtheresultsofthispaperhaveshownushowtosolvesuchproblemsfairly accurately,andwithoutrequiringcomputingplatformsthatarebeyondtheordinary. SSNisanetworkplanningmodelwhichseekstosizelinks(i.e. chooselinkcapacities) before demand is known with certainty. This model is limited to adding capacity to links that have been identified for expansion, and in this sense, it does not recommend what additionallinksmaybeuseful. Themodeldoesnotconsiderthepossibilityofaddingnew links, which would require stochastic integer programming. Thus given a collection of links, we wish to choose additional capacities under a budget constraint, while ensuring thattheexpectedunserveddemandisminimized. Themodelusesthefollowingnotation. Firststagedecisionvariables: x j ⌘ theamountofcapacitytobeaddedtothej th link. Firststagedata/parameters: n⌘ thenumberoflinksthataretobeconsideredforcapacityexpansion. b⌘ thetotalcapacitybudgetfortheentirenetwork. ˜ !⌘ themdimensionalrandomvariablethatrepresentsdemandsassociatedwiththem pointtopointpairsservedbythenetwork. 119 120 AppendixA.TheSSNInstance-BacktotheFuture Thefirststageproblemcanbesummarizedasfollows: Min E[h(x, ˜ !)] (A.1a) s.t. X j x j b (A.1b) x 0 (A.1c) The functionh(x, ˜ !) is a random variable representing unserved demand. It is a function of the capacity expansion plan x and a random vector ˜ !. For each outcome ! follow- ing model is used to estimate unserved demand, and the generic name referring to this piece of the model is the second stage problem. The second stage decision variables are f ir ⌘ thenumberofcallsassociatedwithpointtopointpairithatareservedviaroute r2R(i). s i ⌘ thenumberofunservedrequestsassociatedwithpointtopointpairi. Secondstagedata/parameters: m⌘ thenumberofpointtopointpairsservedbythenetwork. R(i)⌘ thesetofroutesthatcanbeusedtoconnectpointtopointpairi. A ir ⌘ isanincidencevectorinR n whosej th elementis1iflinkj belongstorouter2R(i), andis0otherwise. e⌘ isavectorinR n ofcurrentlinkcapacities. Andthesecondstageproblemisthefollowing: h(x,!)= Min m X i=1 s i (A.2a) s.t. X i X r2 R(i) A ir f ir x+e (A.2b) X r2 R(i) f ir +s i = ! i i=1,...,m (A.2c) f,s 0 (A.2d) Constraints(A.1b)limitstheexpansionplanunderatotalcapacitylevelbwhileconstraints AppendixA.TheSSNInstance-BacktotheFuture 121 (A.2b) ensure that routing is achieved without violating any link capacity. Constraints (A.2c)areflowbalanceequationsofthenetwork. AppendixB FiguresofDualVertexStability The following figures show that for the three “industrial-strength” instances included in thispaper,theVFapproximationsarepoorrepresentationsoftheexpectedrecoursefunc- tion in early iterations. As the SD algorithm proceeds, the ratios R k approach 1 (see Shadow Price Stability in subsection 3.3.2), indicating vastly improved approximations in the vicinity of candidate and incumbent solutions, and they are all steady by the time thattheIn-Sampleruleacceptsanincumbent. Thenumberofdatapointsinthesegraphs exceed the number of iterations because SD updates incumbent objective estimates peri- odically,andtheratiosarecalculatedforthoseupdatestoo. 0 500 1000 1500 2000 2500 3000 0.0 0.2 0.4 0.6 0.8 1.0 Ratio R k (x) vs Iteration Number for SSN Iteration k R k (x) 123 124 AppendixB.FiguresofDualVertexStability 0 100 200 300 400 500 600 0.0 0.2 0.4 0.6 0.8 1.0 Ratio R k (x) vs Iteration Number for 20TERM Iteration k R k (x) 0 100 200 300 400 0.0 0.2 0.4 0.6 0.8 1.0 Ratio R k (x) vs Iteration Number for STORM Iteration k R k (x) AppendixC FiguresComparingSD,SAAand SAAwithLHS FIGURE C.1: Objectivelower andupper boundsof SD,SAA andSAA with LHSforBAA99-20. 125 126 AppendixC.FiguresComparingSD,SAAandSAAwithLHS FIGURE C.2: Objectivelower andupper boundsof SD,SAA andSAA with LHSforBAA99-256. FIGURE C.3: Objectivelower andupper boundsof SD,SAA andSAA with LHSforRETAIL. AppendixC.FiguresComparingSD,SAAandSAAwithLHS 127 FIGURE C.4: Objectivelower andupper boundsof SD,SAA andSAA with LHSforRETAIL14. FIGURE C.5: Objectivelower andupper boundsof SD,SAA andSAA with LHSforRETAIL140. 128 AppendixC.FiguresComparingSD,SAAandSAAwithLHS FIGURE C.6: Objectivelower andupper boundsof SD,SAA andSAA with LHSforSCFT-6. FIGURE C.7: Objectivelower andupper boundsof SD,SAA andSAA with LHSforSCFT-46. AppendixD PseudocodeforRCSD Programming construct will be presented in typewriter font. For example, ⇡ as dual vertexwillappearaspiinthepseudocode. d(! t )= D_bar+D_obs[t] ⇠ (! t )= X_bar+X_obs[t] C(! t )= C_bar+C_obs[t] (D.1) lambda[i][m] = 8 >>< >>: 0, ifrowmhasnorandomelements (⌫ i ) m , otherwise (D.2) sigma_X[i] = ⌫ i ⇥ X_bar sigma_C[i] = ⌫ i ⇥ C_bar delta_X[i][t] = lambda[i]⇥ X_obs[t] delta_C[i][t] = lambda[i]⇥ C_obs[t] (D.3) lambda_theta[i][t][m] = 8 >>< >>: 0, ifrowmhasnorandomelements (✓ i t ) m , otherwise (D.4) 129 130 AppendixD.PseudocodeforRCSD lambda_phi[i][m] = 8 >>< >>: 0, ifrowmhasnorandomelements ( i ) m , otherwise (D.5) ✓ i t = phi[i]⇥ D_obs[t] (D.6) sigma_theta_X[i][t] = ✓ i t ⇥ X_bar = phi[i]⇥ X_bar⇥ D_obs[t] sigma_theta_C[i][t] = ✓ i t ⇥ C_bar = phi[i]⇥ C_bar⇥ D_obs[t] delta_theta_X[i][t] = lambda_theta_[i][t]⇥ X_obs[t] = phi[i]⇥ X_obs[t]⇥ D_obs[t] delta_theta_C[i][t] = lambda_theta_[i][t]⇥ C_obs[t] = phi[i]⇥ C_obs[t]⇥ D_obs[t] (D.7) sigma_phi_X[i] = phi[i]⇥ Xi_bar sigma_phi_C[i] = phi[i]⇥ C_bar delta_phi_X[i][t] = lambda_phi_[i]⇥ Xi_obs[t] delta_phi_C[i][t] = lambda_phi_[i]⇥ C_obs[t] (D.8) D.1 FormingCuts (Weonlystoresigma_X,sigma_C,delta_X,delta_C,sigma_phi_X,sigma_phi_C,delta_phi_X, delta_phi_C) ⇡ i t (⇠ t C t x k )=(⌫ i +✓ i t )(⇠ t C t x k ) (D.9a) = ⌫ i (⇠ t C t x k )+✓ i t (⇠ t C t x k ) (D.9b) procedureargmax(x,t,maxval_alpha,maxval_beta,max_i) AppendixD.PseudocodeforRCSD 131 maxval=1 fori=1,... ,k { /*First,rescalephiwithD_obs[t]togetthecorrecttheta*/ sigma_theta_X[i][t]=sigma_phi_X[i]⇥ D_obs[t] sigma_theta_C[i][t]=sigma_phi_C[i]⇥ D_obs[t] delta_theta_X[i][t]=delta_phi_X[i][t]⇥ D_obs[t] delta_theta_C[i][t]=delta_phi_C[i][t]⇥ D_obs[t] /*Then,addthenupart*/ val=sigma_X[i]+delta_X[i][t]-(sigma_C[i]+delta_C[i][t])⇥ x /*And,addthethetapart*/ val+=sigma_theta_X[i][t]+delta_theta_X[i][t]-(sigma_theta_C[i][t]+delta_theta_C[i][t])⇥ x /*Finally,findandkeepthemaximum*/ ifmaxval<val { maxval_alpha=sigma_X[i]+delta_X[i][t]; maxval_alpha+=sigma_theta_X[i]+delta_theta_X[i][t]; maxval_beta=delta_C[i]+delta_C[i][t]; maxval_beta+=delta_theta_C[i]+delta_theta_C[i][t]; max_val=val; max_i=i; } } 132 AppendixD.PseudocodeforRCSD procedureform_cut(x,k,cut) /*alpha,beta,sample_sizeandsitarrefertofieldsofcut*/ ahat=0;bhat=[0,... ,0]; fort=1,... ,k { argmax(x,t,maxval_alpha,maxval_beta,istar[t]); alpha=alpha*(t-1)/t+maxval_alpha/t; beta=beta*(t-1)/t+maxval_beta/t; } D.2 UpdatingCuts procedureupdate_incumbent_cut(xbar,k,incumbent_cut) /*alpha,beta,sample_sizeandistararefieldsofincumbent_cut*/ ahat=0;bhat=[0,... ,0]; fort=1,... ,k { ifV t k ⌧ 6=V t k { argmax(xbar,t,maxval_alpha,maxval_beta,istar[t]); alpha=alpha+maxval_alpha/k; beta=beta+maxval_beta/k; } else { AppendixD.PseudocodeforRCSD 133 /*First,rescalephiwithD_obs[t]togetthecorrecttheta*/ sigma_theta_X[istar[t]][t]=sigma_phi_X[istar[t]]⇥ D_obs[t] sigma_theta_C[istar[t]][t]=sigma_phi_C[istar[t]]⇥ D_obs[t] delta_theta_X[istar[t]][t]=delta_phi_X[istar[t]][t]⇥ D_obs[t] delta_theta_C[istar[t]][t]=delta_phi_C[istar[t]][t]⇥ D_obs[t] /*Then,calculatetheresampledcutcoefficients*/ maxval_alpha=(sigma_X[istar[t]]+delta_X[istar[t]][t]) maxval_alpha+=(sigma_theta_X[istar[t]][t]+delta_theta_X[istar[t]][t]) maxval_beta=(sigma_C[istar[t]]+delta_C[istar[t]][t]) maxval_beta+=(sigma_theta_C[istar[t]][t]+delta_theta_C[istar[t]][t]) /*Finally,updatetheincumbent_cut*/ alpha=alpha+maxval_alpha/k; beta=beta+maxval_beta/k; } } 134 AppendixD.PseudocodeforRCSD D.3 Resampling procedureresample_cut(k,cut,ahat,bhat) /*alpha,beta,sample_sizeandistarrefertofieldsofcut*/ ahat=0;bhat=[0,... ,0]; fort=1,... ,k { /*Generatern,anobservationofar.v. distributeduniformlyontheset{1,2,... ,k}*/ ifrn>sample_size { ahat=ahat*(t-1)/t } else { /*First,rescalephiwithD_obs[t]togetthecorrecttheta*/ sigma_theta_X[istar[rn]][rn]=sigma_phi_X[istar[rn]]⇥ D_obs[rn] sigma_theta_C[istar[rn]][rn]=sigma_phi_C[istar[rn]]⇥ D_obs[rn] delta_theta_X[istar[rn]][rn]=delta_phi_X[istar[rn]][rn]⇥ D_obs[rn] delta_theta_C[istar[rn]][rn]=delta_phi_C[istar[rn]][rn]⇥ D_obs[rn] /*Then,calculatetheresampledcutcoefficients*/ ahat=ahat*(t-1)/t+(sigma_X[istar[rn]]+delta_X[istar[rn]][rn])/t ahat+=(sigma_theta_X[istar[rn]][rn]+delta_theta_X[istar[rn]][rn])/t bhat=bhat*(t-1)/t+(sigma_C[istar[rn]]+delta_C[istar[rn]][rn])/t bhat+=(sigma_theta_C[istar[rn]][rn]+delta_theta_C[istar[rn]][rn])/t } } AppendixE AnIllustrativeExample We illustrate the random cost developments via an example which we refer to as the "Diamond-shaped"example. Considerthefollowingproblem: min 0 x 5 f(x)= 0.75x + E[h(x, ˜ !)], (E.1) whereh(x,!)isthevalueofthefollowingLP: h(x,!)=min y 1 +3y 2 +y 3 +y 4 +! 5 y 5 s.t. y 1 +y 2 y 3 +y 4 +y 5 = ! 6 + 1 2 x y 1 +y 2 +y 3 y 4 = ! 7 + 1 4 x y 1 ,y 2 ,y 3 ,y 4 ,y 5 0. (E.2) The random variable ˜ ! 5 is uniform over (0,3). The random variable ˜ ! 6 is uniformly dis- tributed over the interval ( 1,0) and ˜ ! 7 is uniformly distributed over (0,1). For the pur- posesofthisexample,allrandomvariablesareindependent. 135 136 AppendixE.AnIllustrativeExample The set of dual feasible solutions⇧ depends on the realization of random cost coeffi- cient ˜ ! 5 h(x,!) = max (! 6 + 1 2 x)⇡ 1 +(! 7 + 1 4 x)⇡ 2 s.t. ⇡ 1 ⇡ 2 1 ⇡ 1 +⇡ 2 3 ⇡ 1 +⇡ 2 1 ⇡ 1 ⇡ 2 1 ⇡ 1 ! 5 . (E.3) Now let us focus on the feasible region of problem (E.3). In Table E.1, we present all thebasesinformationalongwiththreeoutcomeoftherandomcostvariablei.e.,! 5 follows uniformdistributionoverU(0,3)and! 1 5 =2.5,! 2 5 =1.5,! 3 5 =0.5. Foreachbasis,wecollectthefollowingdata: ⌫ i =(D > B i ) 1 ¯ d B i i = { i j |j2B i \ R} G i =[ D > N i (D > B i ) 1 ,I] ¯ g i = G i ¯ d. Note that in the SD algorithm, we will discover this information during the course of the algorithm, and one does not calculate this information a priori. In the matrix G i , the columnsassociatedwithrandomcostaresufficientforthefeasibilitycheck. Thesecolumns arehighlightedintableE.1. From this table, we could infer the range of 5 (!) for each basis such that the corre- spondingbasicsolutionresultsinabasicfeasiblesolution. AppendixE.AnIllustrativeExample 137 ⌫ i i G i ¯ g i 5 (!) B 1 /N 1 = {1,3}/{2,4,5} ✓ 0 1 ◆ ; 0 @ 10 10 0 01 010 0.50.500 1 1 A 0 @ 2 2 1.5 1 A 1.5 5 (!) B 2 /N 2 = {2,3}/{1,4,5} ✓ 1 2 ◆ ; 0 @ 10 10 0 01 010 0.50.500 1 1 A 0 @ 2 2 0.5 1 A 0.5 5 (!) B 3 /N 3 = {1,4}/{2,3,5} ✓ 1 0 ◆ ; 0 @ 10 10 0 01 010 0.5 0.500 1 1 A 0 @ 2 2 0.5 1 A 0.5 5 (!) B 4 /N 4 = {2,4}/{1,3,5} ✓ 2 1 ◆ ; 0 @ 10 10 0 01 010 0.5 0.500 1 1 A 0 @ 2 2 0.5 1 A 0.5 5 (!) B 5 /N 5 = {3,5}/{1,2,4} ✓ 1.5 2.5 ◆ { 5 5 = ✓ 1 1 ◆ } 0 @ 1 2 10 0 1 201 0 1 0 00 1 1 A 0 @ 3 1 2 1 A 1.5 5 (!) 0.5 B 6 /N 6 = {4,5}/{1,2,3} ✓ 1.5 0.5 ◆ { 6 5 = ✓ 1 1 ◆ } 0 @ 1 2 10 0 1 201 0 1 0 00 1 1 A 0 @ 1 1 2 1 A 0.5 5 (!) 0.5 B 7 /N 7 = {2,5}/{1,3,4} ✓ 1.5 1.5 ◆ { 7 5 = ✓ 1 1 ◆ } 0 @ 1 0 100 120 10 1 -2 001 1 A 0 @ 2 1 1 1 A 0.5 5 (!) 0.5 B 8 /N 8 = {1,5}/{2,3,4} ✓ 1.5 0.5 ◆ { 8 5 = ✓ 1 1 ◆ } 0 @ 1 0 100 12 0 1 0 1 -2 001 1 A 0 @ 2 3 1 1 A 1.5 5 (!) 0.5 TABLE E.1: Informationforillustrativepurposes. 138 AppendixE.AnIllustrativeExample FIGURE E.1: Diamondshapedexamplewith! 5 =2.5 5 (! 1 )=1.0 ⌫ i + 5 (! 1 ) i 5 G i ⇤ (! 1 )¯ g i G i ⇤ (! 1 ) ¯ g i B 1 /N 1 = {1,3}/{2,4,5} ✓ 0 1 ◆ 0 @ 0 0 1 1 A 0 @ 2 2 1.5 1 A True B 2 /N 2 = {2,3}/{1,4,5} ✓ 1 2 ◆ 0 @ 0 0 1 1 A 0 @ 2 2 0.5 1 A True B 3 /N 3 = {1,4}/{2,3,5} ✓ 1 0 ◆ 0 @ 0 0 1 1 A 0 @ 2 2 0.5 1 A True B 4 /N 4 = {2,4}/{1,3,5} ✓ 2 1 ◆ 0 @ 0 0 1 1 A 0 @ 2 2 0.5 1 A True B 5 /N 5 = {3,5}/{1,2,4} ✓ 2.5 3.5 ◆ 0 @ 2 2 0 1 A 0 @ 3 1 2 1 A False B 6 /N 6 = {4,5}/{1,2,3} ✓ 2.5 1.5 ◆ 0 @ 2 2 0 1 A 0 @ 1 1 2 1 A False B 7 /N 7 = {2,5}/{1,3,4} ✓ 2.5 0.5 ◆ 0 @ 0 2 2 1 A 0 @ 2 1 1 1 A False B 8 /N 8 = {1,5}/{2,3,4} ✓ 2.5 1.5 ◆ 0 @ 0 2 2 1 A 0 @ 2 3 1 1 A False TABLE E.2: Storedbasesinformationfor! 5 =2.5 AppendixE.AnIllustrativeExample 139 FIGURE E.2: Diamondshapedexamplewith! 5 =1.5 5 (! 2 )=0 ⌫ i + 5 (! 2 ) i 5 G i ⇤ (! 2 )¯ g i G i ⇤ (! 2 ) ¯ g i B 1 /N 1 = {1,3}/{2,4,5} ✓ 0 1 ◆ 0 @ 0 0 0 1 A 0 @ 2 2 1.5 1 A True B 2 /N 2 = {2,3}/{1,4,5} ✓ 1 2 ◆ 0 @ 0 0 0 1 A 0 @ 2 2 0.5 1 A True B 3 /N 3 = {1,4}/{2,3,5} ✓ 1 0 ◆ 0 @ 0 0 0 1 A 0 @ 2 2 0.5 1 A True B 4 /N 4 = {2,4}/{1,3,5} ✓ 2 1 ◆ 0 @ 0 0 0 1 A 0 @ 2 2 0.5 1 A False B 5 /N 5 = {3,5}/{1,2,4} ✓ 1.5 2.5 ◆ 0 @ 0 0 0 1 A 0 @ 3 1 2 1 A False B 6 /N 6 = {4,5}/{1,2,3} ✓ 1.5 0.5 ◆ 0 @ 0 0 0 1 A 0 @ 1 1 2 1 A True B 7 /N 7 = {2,5}/{1,3,4} ✓ 1.5 1.5 ◆ 0 @ 0 0 0 1 A 0 @ 2 1 1 1 A True B 8 /N 8 = {1,5}/{2,3,4} ✓ 1.5 0.5 ◆ 0 @ 0 0 0 1 A 0 @ 2 3 1 1 A False TABLE E.3: Storedbasesinformationfor! 5 =1.5 140 AppendixE.AnIllustrativeExample FIGURE E.3: Diamondshapedexamplewith! 5 =0.5 5 (! 3 )= 1.0 ⌫ i + 5 (! 3 ) i 5 G i ⇤ (! 3 )¯ g i G i ⇤ (! 3 ) ¯ g i B 1 /N 1 = {1,3}/{2,4,5} ✓ 0 1 ◆ 0 @ 0 0 1 1 A 0 @ 2 2 1.5 1 A True B 2 /N 2 = {2,3}/{1,4,5} ✓ 1 2 ◆ 0 @ 0 0 1 1 A 0 @ 2 2 0.5 1 A False B 3 /N 3 = {1,4}/{2,3,5} ✓ 1 0 ◆ 0 @ 0 0 1 1 A 0 @ 2 2 0.5 1 A False B 4 /N 4 = {2,4}/{1,3,5} ✓ 2 1 ◆ 0 @ 0 0 1 1 A 0 @ 2 2 0.5 1 A False B 5 /N 5 = {3,5}/{1,2,4} ✓ 0.5 1.5 ◆ 0 @ 2 2 0 1 A 0 @ 3 1 2 1 A True B 6 /N 6 = {4,5}/{1,2,3} ✓ 0.5 0.5 ◆ 0 @ 2 2 0 1 A 0 @ 1 1 2 1 A False B 7 /N 7 = {2,5}/{1,3,4} ✓ 0.5 2.5 ◆ 0 @ 0 2 2 1 A 0 @ 2 1 1 1 A False B 8 /N 8 = {1,5}/{2,3,4} ✓ 0.5 0.5 ◆ 0 @ 0 2 2 1 A 0 @ 2 3 1 1 A True TABLE E.4: Storedbasesinformationfor! 5 =0.5 Bibliography Ahmed, S. (2005). “Convexity and decomposition of mean-risk stochastic programs”. In: Mathematical Programming106.3,pp.433–446. DOI:10.1007/s10107-005-0638-8. URL:http://link.springer.com/10.1007/s10107-005-0638-8. Aumann, R. J. (1977). “The St. Petersburg paradox: A discussion of some recent com- ments”.In:JournalofEconomicTheory14.2,pp.443–445.DOI:10.1016/0022-0531(77) 90143-0.URL:http://linkinghub.elsevier.com/retrieve/pii/0022053177901430. Banks,J.(1998). Handbookof Simulation.Principles,Methodology,Advances,Applications, and Practice. John Wiley & Sons. URL: http://books.google.com/books? id=dMZ1Zj3TBgAC&printsec=frontcover&dq=intitle:Handbook+of+ Simulation+Principles+Methodology+Advances+Applications+and+ Practice&hl=&cd=1&source=gbs_api. Bartle, R. G. (1976). The Elements of Real Analysis. URL: http://books.google.com/ books?id=1GF9NgAACAAJ&dq=1976+(intitle:The+Elements+of+Real+ Analisys+inauthor:Bartle)&hl=&cd=2&source=gbs_api. Bassok,Y.,R.Anupindi,andR.Akella(1999).“Single-PeriodMultiproductInventoryMod- elswithSubstitution”.In:OperationsResearch47.4,pp.632–642. DOI:10.1287/opre. 47.4.632. URL: http://pubsonline.informs.org/doi/abs/10.1287/ opre.47.4.632. Bayraksan, G. and D. P. Morton (2011). “A sequential sampling procedure for stochastic programming”.In:OperationsResearch59.4,pp.898–913. DOI:10.1287/opre.1110. 0926. URL: http://pubsonline.informs.org/doi/abs/10.1287/opre. 1110.0926. 141 142 BIBLIOGRAPHY Bellman, R. (1954). “The theory of dynamic programming”. In: Bulletin of the American MathematicalSociety60.6,pp.503–516. DOI:10.1090/s0002-9904-1954-09848-8. URL: http://www.ams.org/journal-getitem?pii=S0002-9904-1954- 09848-8. Bertsimas, D. and M. Sim (2004). “The Price of Robustness”. In: Operations Research 52.1. DOI: 10.1287/opre.1030.0065. URL: http://pubsonline.informs.org/ doi/abs/10.1287/opre.1030.0065. Birge, J. R. and F. Louveaux (2011). Introduction to Stochastic Programming. Springer Series in Operations Research and Financial Engineering. New York, NY: Springer Science & Business Media. DOI: 10.1007/978-1-4614-0237-4. URL: http://link. springer.com/10.1007/978-1-4614-0237-4. Charnes, A. and W. W. Cooper (1959). “Chance-constrained programming”. In: Manage- mentscience6.1,pp.73–79.DOI:10.1287/mnsc.6.1.73.URL:http://pubsonline. informs.org/doi/abs/10.1287/mnsc.6.1.73. Chen, X., M. Sim, P. Sun, and J. Zhang (2008). “A Linear Decision-Based Approximation Approach to Stochastic Programming”. In: Operations Research 56.2, pp. 344–357. DOI: 10.1287/opre.1070.0457. URL: http://pubsonline.informs.org/doi/ abs/10.1287/opre.1070.0457. Dantzig, G. B. (1955). “Linear programming under uncertainty”. In: Management science 150.Chapter 1, pp. 1–11. DOI: 10.1007/978-1-4419-1642-6_1. URL: http: //link.springer.com/10.1007/978-1-4419-1642-6_1. Efron, B. (1979). “Bootstrap Methods: Another Look at the Jackknife”. In: The annals of Statistics7.1,pp.1–26.DOI:10.1214/aos/1176344552.URL:http://projecteuclid. org/euclid.aos/1176344552. Fu, M. C. (2002). “Feature Article: Optimization for simulation: Theory vs. Practice.” In: INFORMS Journal on Computing 14.3, pp. 192–215. DOI:10.1287/ijoc.14.3.192. 113. URL:http://dx.doi.org/10.1287/ijoc.14.3.192.113. Gangammanavar, H., S. Sen, and V. M. Zavala (2015). “Stochastic Optimization of Sub- HourlyEconomicDispatchWithWindEnergy”.In:IEEETransactionsonPowerSystems BIBLIOGRAPHY 143 PP.99,pp.1–11.DOI:10.1109/TPWRS.2015.2410301.URL:http://ieeexplore. ieee.org/xpls/abs_all.jsp?arnumber=7065337. Glynn, P. W. and G. Infanger (2013). “Simulation-based confidence bounds for two-stage stochasticprograms”.In:MathematicalProgramming138.1-2,pp.15–42. DOI:10.1007/ s10107-012-0621-0. URL:http://link.springer.com/10.1007/s10107- 012-0621-0. Grasso,A.andW.LaPlante(2011). Enhancing adaptability of US military forces: Part A.Main report: Report of the Defense Science Board, 2010 summer study. URL:http://www.acq. osd.mil/dsb/reports/ADA536755.pdf. Graves,S.C.andW.C.Jordan(1991).AnAnalyticApproachforDemonstratingtheBenefitsof Limited Flexibility. URL:http://books.google.com/books?id=yl1cbwAACAAJ& dq=intitle:An+Analytic+Approach+for+Demonstrating+the+Benefits+ of+Limited+Flexibility&hl=&cd=1&source=gbs_api. Greene,W.H.(2003).Econometricanalysis.PearsonEducationIndia. Hastie, T., R. Tibshirani, and J. Friedman (2013). The Elements of Statistical Learning. Data Mining,Inference,andPrediction.NewYork,NY:SpringerScience&BusinessMedia. DOI:10.1007/978-0-387-84858-7.URL:http://link.springer.com/book/ 10.1007/978-0-387-84858-7. Herer, Y. T., M. Tzur, and E. Yücesan (2006). “The multilocation transshipment problem”. In: IIE Transactions 38.3, pp. 185–200. DOI: 10.1080/07408170500434539. URL: http://www.tandfonline.com/doi/abs/10.1080/07408170500434539. Higle,J.L.andS.Sen(1995).“EpigraphicalNesting:aunifyingtheoryfortheconvergence ofalgorithms”.In:JournalofOptimizationTheoryandApplications84.2,pp.339–360. Higle,J.L.andS.Sen(1991a).“Statisticalverificationofoptimalityconditionsforstochastic programs with recourse”. In: Annals of Operations Research 30.1, pp. 215–239. DOI:10. 1007/BF02204818. URL:http://link.springer.com/10.1007/BF02204818. Higle,J.L.andS.Sen(1991b).“Stochasticdecomposition:Analgorithmfortwo-stagelin- ear programs with recourse”. In: Mathematics of operations research 16.3, pp. 650–669. 144 BIBLIOGRAPHY DOI:10.1287/moor.16.3.650. URL:http://pubsonline.informs.org/doi/ abs/10.1287/moor.16.3.650. Higle,J.L.andS.Sen(1994).“Finitemasterprogramsinregularizedstochasticdecomposi- tion”.In: Mathematical Programming67.1-3,pp.143–168. DOI:10.1007/BF01582219. URL:http://link.springer.com/10.1007/BF01582219. Higle, J. L. and S. Sen (1996a). “Duality and statistical tests of optimality for two stage stochasticprograms”.In: Mathematical Programming75.2,pp.257–275. DOI:10.1007/ BF02592155. URL:http://link.springer.com/10.1007/BF02592155. Higle,J.L.andS.Sen(1996b).StochasticDecomposition.AStatisticalMethodforLargeScale Stochastic Linear Programming. Springer Science & Business Media. DOI: 10.1007/ 978-1-4615-4115-8. URL: http://link.springer.com/book/10.1007/ 978-1-4615-4115-8. Higle, J. L. and S. Sen (1999). “Statistical approximations for stochastic linear program- ming problems”. In: Annals of Operations Research 85, pp. 173–193. DOI:10.1023/A: 1018917710373.URL:http://link.springer.com/10.1023/A:1018917710373. Holdren, J. P., E. Lander, and H. Varmus (2010). Designing a digital future: Federally funded researchanddevelopmentinnetworkingandinformationtechnology.CouncilofAdvisorson ScienceandTechnology. URL:https://www.whitehouse.gov/sites/default/ files/microsites/ostp/pcast-nitrd-report-2010.pdf. Howard, R. A. and A. E. Abbas (2015). Foundations of decision analysis. URL: http:// scholar.google.com/scholar?q=related:ec1egjPpetQJ:scholar.google. com/&hl=en&num=20&as_sdt=0,5. Infanger,G.(1999).PrivateCommunication. Kahneman, D. and A. Tversky (1979). “Prospect theory: An analysis of decisions under risk”. In: Econometrica 47.2, p. 263. DOI: 10.2307/1914185. URL: http://www. jstor.org/stable/1914185?origin=crossref. Kelley,J.E.(1960).“TheCutting-PlaneMethodforSolvingConvexPrograms”.In: Journal of the Society for Industrial and Applied Mathematics 8.4, pp. 703–712. DOI: 10.1137/ 0108053. URL:http://epubs.siam.org/doi/abs/10.1137/0108053. BIBLIOGRAPHY 145 Kim,S.,R.Pasupathy,andS.G.Henderson(2014).“AGuidetoSampleAverageApprox- imation”. In: Handbook of Simulation Optimization. New York, NY: Springer New York, pp. 207–243. DOI: 10.1007/978-1-4939-1384-8_8. URL: http://link. springer.com/10.1007/978-1-4939-1384-8_8. Kleywegt,A.J.,A.Shapiro,andT.Homem-de-Mello(2002).“Thesampleaverageapprox- imationmethodforstochasticdiscreteoptimization”.In: SIAM Journal on Optimization 12.2, pp. 479–502. DOI: 10.1137/S1052623499363220. URL: http://epubs. siam.org/doi/abs/10.1137/S1052623499363220. Kuhn,D.,W.Wiesemann,andA.Georghiou(2009).“Primalandduallineardecisionrules in stochastic and robust optimization”. In: Mathematical Programming 130.1, pp. 177– 209. DOI:10.1007/s10107-009-0331-4. URL:http://link.springer.com/ 10.1007/s10107-009-0331-4. Lan, G., A. Nemirovski, and A. Shapiro (2011). “Validation analysis of mirror descent stochastic approximation method”. In: Mathematical Programming 134.2, pp. 425–458. DOI:10.1007/s10107-011-0442-6. URL:http://link.springer.com/10. 1007/s10107-011-0442-6. Linderoth,J.,A.Shapiro,andS.Wright(2006).“Theempiricalbehaviorofsamplingmeth- ods for stochastic programming”. In: Annals of Operations Research 142.1, pp. 215–241. DOI:10.1007/s10479-006-6169-8. URL:http://link.springer.com/10. 1007/s10479-006-6169-8. Linderoth, J. and S. Wright (2003). “Decomposition Algorithms for Stochastic Program- mingonaComputationalGrid”.In:ComputationalOptimizationandApplications24.2-3, pp. 207–250. DOI: 10.1023/A:1021858008222. URL: http://link.springer. com/10.1023/A:1021858008222. Louveaux, F. V. and Y. Smeers (1988). “Optimal Investments for Electricity Generation: A Stochastic Model and a Test-Problem”. In: Numerical Techniques for Stochastic Opti- mization. Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 445–453. DOI:10.1007/ 978-3-642-61370-8_24. URL: http://www.springer.com/us/book/ 9783642648137. 146 BIBLIOGRAPHY Machina, M. J. (1987). “Choice under Uncertainty: Problems Solved and Unsolved”. In: Journal of Economic Perspectives 1.1, pp. 121–154. DOI:10.1257/jep.1.1.121. URL: http://pubs.aeaweb.org/doi/abs/10.1257/jep.1.1.121. Mak, W.-K., D. P. Morton, and R. K. Wood (1999). “Monte Carlo bounding techniques for determining solution quality in stochastic programs”. In: Operations Research Let- ters 24.1-2, pp. 47–56. DOI: 10.1016/s0167-6377(98)00054-6. URL: http: //linkinghub.elsevier.com/retrieve/pii/S0167637798000546. Miller,N.andA.Ruszczy´ nski(2011).“Risk-AverseTwo-StageStochasticLinearProgram- ming: Modeling and Decomposition”. In: Operations Research 59.1, pp. 125–132. DOI: 10.1287/opre.1100.0847. URL: http://pubsonline.informs.org/doi/ abs/10.1287/opre.1100.0847. Mulvey, J. M. and A. Ruszczy´ nski (1995). “A New Scenario Decomposition Method for Large-Scale Stochastic Optimization”. In: Operations Research 43.3, pp. 477–490. DOI: 10.1287/opre.43.3.477. URL: http://pubsonline.informs.org/doi/ abs/10.1287/opre.43.3.477. Murphy, F. H., S. Sen, and A. L. Soyster (1982). “Electric Utility Capacity Expansion Plan- ning with Uncertain Load Forecasts”. In: A I I E Transactions 14.1, pp. 52–59. DOI:10. 1080/05695558208975038. URL: http://www.tandfonline.com/doi/abs/ 10.1080/05695558208975038. Nemirovski,A.,A.B.Juditsky,G.Lan,andA.Shapiro(2009).“RobustStochasticApprox- imation Approach to Stochastic Programming”. In: SIAM Journal on Optimization 19.4, pp. 1574–1609. DOI:10.1137/070704277. URL:http://epubs.siam.org/doi/ abs/10.1137/070704277. Nesterov,Y.andJ.P.Vial(2008).“Confidencelevelsolutionsforstochasticprogramming”. In: Automatica 44.6, pp. 1559–1568. DOI: 10.1016/j.automatica.2008.01.017. URL:http://linkinghub.elsevier.com/retrieve/pii/S0005109808000873. Newey,W.K.(1991).“Uniformconvergenceinprobabilityandstochasticequicontinuity”. In:Econometrica59,pp.1161–1167. BIBLIOGRAPHY 147 Oliveira,W.,C.Sagastizábal,andS.Scheimberg(2011).“InexactBundleMethodsforTwo- Stage Stochastic Programming”. In: SIAM Journal on Optimization 21.2, pp. 517–544. DOI:10.1137/100808289. URL:http://epubs.siam.org/doi/abs/10.1137/ 100808289. Pasupathy, R. (2010). “On Choosing Parameters in Retrospective-Approximation Algo- rithms for Stochastic Root Finding and Simulation Optimization”. In: Operations Re- search58.4,pp.889–901.DOI:10.1287/opre.1090.0773.URL:http://pubsonline. informs.org/doi/abs/10.1287/opre.1090.0773. Polyak,B.T.andA.B.Juditsky(1992).“Accelerationofstochasticapproximationbyaver- aging”.In: SIAM Journal on Control and Optimization30.4,pp.838–855. DOI:10.1137/ 0330046. URL:http://epubs.siam.org/doi/abs/10.1137/0330046. Powell,W.B.(2007). Approximate Dynamic Programming : Solving the Curses of Dimensional- ity. Wiley Series in Probability and Statistics. Hoboken, NJ, USA: John Wiley & Sons, Inc. DOI: 10.1002/9780470182963. URL: http://doi.wiley.com/10.1002/ 9780470182963. Robbins, H. (1952). “Some aspects of the sequential design of experiments”. In: Bulletin of the American Mathematical Society 58.5, pp. 527–536. DOI:10.1090/S0002-9904- 1952-09620-8. URL:http://www.ams.org/journal-getitem?pii=S0002- 9904-1952-09620-8. Robbins, H. and S. Monro (1951). “A stochastic approximation method”. In: The annals of mathematical statistics 22.3, pp. 400–407. DOI: 10.1214/aoms/1177729586. URL: http://projecteuclid.org/euclid.aoms/1177729586. Rockafellar,R.T.andR.J.B.Wets(2009). Variational Analysis.SpringerScience&Business Media. Rockafellar,R.T.andR.J.B.Wets(1991).“ScenariosandPolicyAggregationinOptimiza- tion Under Uncertainty”. In: Mathematics of operations research 16.1, pp. 119–147. DOI: 10.1287/moor.16.1.119. URL: http://pubsonline.informs.org/doi/ abs/10.1287/moor.16.1.119. 148 BIBLIOGRAPHY Römisch, W. (2003). Stability of Stochastic Programming Problems. Elsevier. DOI:10.1016/ s0927-0507(03)10008-4.URL:http://linkinghub.elsevier.com/retrieve/ pii/S0927050703100084. Royset,J.O.andR.Szechtman(2013).“OptimalBudgetAllocationforSampleAverageAp- proximation”. In: Operations Research 61.3, pp. 762–776. DOI:10.1287/opre.2013. 1163. URL: http://pubsonline.informs.org/doi/abs/10.1287/opre. 2013.1163. Ruszczy´ nski, A. (1986). “A regularized decomposition method for minimizing a sum of polyhedralfunctions”.In:MathematicalProgramming35.3,pp.309–333. DOI:10.1007/ BF01580883. URL:http://link.springer.com/10.1007/BF01580883. Sen, S., C. Barnhart, J. R. Birge, W. B. Powell, and C. A. Shoemaker (2015). “O.R.: Catalyst for Grand Challenges. Opportunities in sustainability”. In: ORMS Today 42, pp. 22– 26. URL: https://www.informs.org/ORMS-Today/Public-Articles/ December-Volume-42-Number-6/O.R.-Catalyst-for-Grand-Challenges. Sen, S., R. D. Doverspike, and S. Cosares (1994). “Network planning with random de- mand”. In: Telecommunication Systems 3.1, pp. 11–30. DOI: 10.1007/BF02110042. URL:http://link.springer.com/10.1007/BF02110042. Shapiro, A., D. Dentcheva, and A. Ruszczy´ nski (2009). Lectures on Stochastic Programming: Modeling and Theory (Society for Industrial and Applied Mathematics and the Mathemati- cal Programming Society, ... Society for Industrial and Applied Mathematics. DOI:10. 1137/1.9780898718751. URL: http://epubs.siam.org/doi/book/10. 1137/1.9780898718751. Shapiro, A. and T. Homem-de-Mello (1998). “A simulation-based approach to two-stage stochastic programming with recourse.” In: Math. Program. () 81.3, pp. 301–325. DOI: 10.1007/BF01580086. URL:http://dx.doi.org/10.1007/BF01580086. Shapiro,A.,T.Homem-de-Mello,andJ.Kim(2002).“Conditioningofconvexpiecewiselin- earstochasticprograms”.In:MathematicalProgramming94.1,pp.1–19. DOI:10.1007/ s10107-002-0313-2. URL:http://link.springer.com/10.1007/s10107- 002-0313-2. BIBLIOGRAPHY 149 Singh, K. (1981). “On the Asymptotic Accuracy of Efron’s Bootstrap”. In: The annals of Statistics9.6,pp.1187–1195.DOI:10.1214/aos/1176345636.URL:http://projecteuclid. org/euclid.aos/1176345636. Soyster,A.L.(1973).“TechnicalNote—ConvexProgrammingwithSet-InclusiveConstraints andApplicationstoInexactLinearProgramming”.In:OperationsResearch21.5,pp.1154– 1157. DOI: 10.1287/opre.21.5.1154. URL: http://pubsonline.informs. org/doi/abs/10.1287/opre.21.5.1154. Vogel, S. (2008). “Universal Confidence Sets for Solutions of Optimization Problems”. In: SIAM Journal on Optimization 19.3, pp. 1467–1488. DOI: 10.1137/070680023. URL: http://epubs.siam.org/doi/abs/10.1137/070680023. vonNeumann,J.andO.Morgenstern(2007).TheoryofGamesandEconomicBehavior.Prince- tonUniversityPress. URL:http://books.google.com/books?id=jCN5aNJ-n- 0C&pg=PA1&dq=intitle:Theory+of+Games+and+Economic+Behavior+ inauthor:John&hl=&cd=1&source=gbs_api. You, F., J. M. Wassick, and I. E. Grossmann (2009). “Risk management for a global sup- plychainplanningunderuncertainty:Modelsandalgorithms”.In:AIChEJournal55.4, pp.931–946.DOI:10.1002/aic.11721.URL:http://doi.wiley.com/10.1002/ aic.11721. Zhao,L.andS.Sen(2006).“AComparisonofSample-Path-BasedSimulation-Optimization and Stochastic Decomposition for Multi-Location Transshipment Problems”. In: Pro- ceedings of the 2006 Winter Simulation Conference. IEEE, pp. 238–245. DOI: 10.1109/ wsc.2006.323079. URL: http://ieeexplore.ieee.org/lpdocs/epic03/ wrapper.htm?arnumber=4117611. Zverovich, V., C. I. Fábián, E. F. D. Ellison, and G. Mitra (2012). “A computational study of a solver system for processing two-stage stochastic LPs with enhanced Benders de- composition”.In:MathematicalProgramming4.3,pp.211–238. DOI:10.1007/s12532- 012-0038-z. URL: http://link.springer.com/10.1007/s12532-012- 0038-z.
Abstract (if available)
Abstract
Stochastic Programming (SP) has long been considered as a well-justified yet computationally challenging paradigm for practical applications. Computational studies in the literature often involve approximating a large number of scenarios by using a small number of scenarios to be processed via deterministic solvers, or running Sample Average Approximation on some genre of high performance machines so that statistically acceptable bounds can be obtained. In this dissertation we show that for a class of stochastic linear programming problems, an alternative approach known as Stochastic Decomposition (SD) can provide solutions of similar quality, in far less computational time using ordinary desktop or laptop machines of today. In addition to these compelling computational results, we also provide a stronger convergence result for SD, and introduce a new solution concept which we refer to as the compromise decision. This new concept is attractive for algorithms which call for multiple replications in sampling-based convex optimization algorithms. For such replicated optimization, we show that the difference between an average solution and a compromise decision provides a natural stopping rule. ❧ SD is a sequential sampling scheme combined with Benders’ like decomposition method for solving two stage stochastic linear programs with recourse. It exploits the special structures of the recourse problem to generate function approximations in an efficient manner. However, randomness is only allowed in the second stage right hand side and technology matrix associated with the first stage decision variables. In the second part of this dissertation, we relax this assumption to accommodate randomness of the second stage cost coefficients. Finally our computational results cover a variety of instances from the literature, and we further scale some operations management applications up to a level that is previously considered out of bounds for stochastic programming.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
On the interplay between stochastic programming, non-parametric statistics, and nonconvex optimization
PDF
Computational validation of stochastic programming models and applications
PDF
The fusion of predictive and prescriptive analytics via stochastic programming
PDF
The next generation of power-system operations: modeling and optimization innovations to mitigate renewable uncertainty
PDF
Learning and control in decentralized stochastic systems
PDF
Stochastic games with expected-value constraints
PDF
Learning enabled optimization for data and decision sciences
PDF
Topics in algorithms for new classes of non-cooperative games
PDF
Smarter markets for a smarter grid: pricing randomness, flexibility and risk
PDF
A stochastic conjugate subgradient framework for large-scale stochastic optimization problems
PDF
Empirical methods in control and optimization
PDF
Stochastic oilfield optimization under uncertainty in future development plans
PDF
Reinforcement learning with generative model for non-parametric MDPs
PDF
Discounted robust stochastic games with applications to homeland security and flow control
PDF
Variants of stochastic knapsack problems
PDF
Stochastic optimization in high dimension
PDF
Modern nonconvex optimization: parametric extensions and stochastic programming
PDF
Train routing and timetabling algorithms for general networks
PDF
Stochastic models: simulation and heavy traffic analysis
PDF
Landscape analysis and algorithms for large scale non-convex optimization
Asset Metadata
Creator
Liu, Yifan
(author)
Core Title
Computational stochastic programming with stochastic decomposition
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Industrial and Systems Engineering
Publication Date
06/16/2016
Defense Date
05/17/2016
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
OAI-PMH Harvest,optimization,random cost,stochastic decomposition,stochastic programs
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Sen, Suvrajeet (
committee chair
), Higle, Julie (
committee member
), Jain, Rahul (
committee member
)
Creator Email
imliuyifan@gmail.com,yifanl@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-252228
Unique identifier
UC11281077
Identifier
etd-LiuYifan-4435.pdf (filename),usctheses-c40-252228 (legacy record id)
Legacy Identifier
etd-LiuYifan-4435.pdf
Dmrecord
252228
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Liu, Yifan
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
optimization
random cost
stochastic decomposition
stochastic programs