Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Probabilistic numerical methods for fully nonlinear PDEs and related topics
(USC Thesis Other)
Probabilistic numerical methods for fully nonlinear PDEs and related topics
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
PROBABILISTIC NUMERICAL METHODS FOR FULLY NONLINEAR PDES AND RELATED TOPICS by Jia Zhuo A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (APPLIED MATHEMATICS) August 2014 Copyright 2014 Jia Zhuo Dedication To my family ii Acknowledgments I owe my deepest gratitude to my advisor, Prof. Jianfeng Zhang, for his indefati- gable guidance, inspiration and patience in my academic research as well as his untiring encouragement and concern for my life. Nothing could I have achieved within the last 5 years in my Ph.D. study without him. Prof. Zhang is such a fantastic mathematician that I can not expect a better soul to work with. His per- sonal charisma and constructive advices not only played a significant role in my research studies but also benefit me as an everlasting beacon, for which I would sincerely appreciate for a whole life. I would like to express my special gratitude to Prof. Jin Ma, who although is not my advisor but never ceases helping and encouraging me in various aspects. I sincerely thank Prof. Yilmaz Kocer for serving on my Dissertation Committee, Prof. Remigijus Mikulevicius and Prof. Sergey Lototsky for serving on my Guid- ance Committee. Moreover, many thanks go to Prof. Igor Kukavica, Prof. Larry Goldstein, Prof. Remy Mikulevicius and Prof. Chunming Wang for their excellent teaching on many important courses in my graduate study. I am also indebted to iii Prof. Christian Bender and Prof. Nilolaus Schweizer at the Saarland University for their guidance and help on my research, especially in the area of primal-dual algorithm for BSDEs. In addition, Mr. Haixiong Chen is gratefully appreciated for initiating me to the realm of Mathematics. I wouldn’t have met so many great souls, including all my USC friends, whose help and support have endowed me with five years of joys, and my wife and soul mateZheZhang,withouttheinitialadmissionofferfromUSC.Iparticularlyappre- ciate a few alumni for their inexhaustible assists and advices for my Ph.D. life, among them are Xinyang Wang, Jianfu Chen, Minzhao Tan, Xin Wang, Jie Du, Huanhuan Wang and Yang Huang. Finally, my parents Shaobin Zhuo and Liqun Luo, as well as my sister’s family well deserve my love and thanks. iv Table of Contents Dedication ii Acknowledgments iii List of Figures vii Abstract viii Chapter 1: Introduction 1 Chapter 2: A Monotone Scheme for High Dimensional Fully Non- linear PDEs 11 2.1 Introduction............................... 11 2.2 Preliminaries .............................. 14 2.3 Thenumericalscheme ......................... 19 2.3.1 Consistency ........................... 21 2.3.2 Themonotonicity........................ 24 2.3.3 Stability . ............................ 32 2.3.4 Boundary condition....................... 34 2.3.5 Convergenceresults....................... 37 2.4 Quasilinear PDE and coupled FBSDEs ................ 42 2.5 Implementationofthescheme ..................... 48 2.5.1 Lowdimensionalcase...................... 49 2.5.2 Leastsquareregression..................... 50 2.5.3 MonteCarlosimulation .................... 52 2.5.4 Somefurthercomments .................... 54 2.6 Numericalexamples .......................... 55 2.6.1 Examples under monotonicity condition ........... 56 2.6.2 Examplesviolatingmonotonicitycondition.......... 70 Chapter 3: Extension to Path-dependent PDEs 80 3.1 Introduction............................... 80 3.2 Preliminaries .............................. 84 v 3.2.1 Thecanonicalsetting...................... 84 3.2.2 Theshiftedspaces ....................... 87 3.3 MonotoneschemeforPPDEs ..................... 91 3.4 Animplementablemonotonescheme ................. 101 3.5 Thecasewithclassicalsolution .................... 114 3.6 Implementationofa1-dimensionalnumericalscheme ........ 117 3.6.1 Aone-dimensional-PPDEexample .............. 120 Chapter 4: A Primal-Dual Algorithm for BSDEs 122 4.1 Introduction............................... 122 4.2 DiscretetimereflectedBSDEs ..................... 126 4.3 Thecaseofaconvexgenerator..................... 132 4.3.1 Upper bounds.......................... 133 4.3.2 Lower bounds .......................... 140 4.3.3 Errorestimates ......................... 147 4.4 Aprimal-dualalgorithmfortheconvexcase ............. 154 4.4.1 Thealgorithm.......................... 154 4.4.2 Controlvariates......................... 161 4.4.3 Numericalexamples ...................... 163 4.5 Thecaseofanon-convexgenerator .................. 169 4.5.1 Upper bounds.......................... 169 4.5.2 Lower bounds .......................... 174 4.5.3 Numericalexamples ...................... 181 Bibliography 186 vi List of Figures 2.1 A 3-dimensional example with various degrees of non-linearity in Example6.2. .............................. 58 2.2 ComparisonwithfinitedifferencemethodinExample6.2. ..... 58 2.3 ConvergenceofadegeneratePDEtruncatedinExample6.2. .... 59 2.4 Numerical results using different basis functions in Example 2.6.2. . 62 2.5 Numerical results of a 12-dimensional example in Example 2.6.3. . . 65 2.6 RelationbetweensizeofsampleanderrorsinExample2.6.3..... 65 2.7 Numerical results for a PDE with unknown viscosity solution in Example2.6.3. . ............................ 67 2.8 A 12-dimensional Isaacs equation with unknown viscosity solution inExample2.6.4. ............................ 69 2.9 A12-dimensionalcoupledFBSDEinExample2.6.5.......... 71 2.10 A 12-dimensional example without monotonicity in Example 2.6.6. . 72 2.11 A10-dimensionalHJBequationinExample2.6.7........... 75 2.12 A 10-dimensional example with tridiagonal generator in Example 2.6.9.................................... 78 3.1 A1dimensionalexampleofPPDEforexample(3.6.1) ....... 121 4.1 Influence of the number of inner simulations and the control variate: upper bound without inner control variate, upper bound with inner control variate, and lower bound (from the top to the bottom). . . . 167 vii Abstract In this dissertation we will discuss three topics, starting with a monotone scheme forhighdimensionalfullynonlinearPDEs, whichincludethequasi-linearPDEthat corresponds to a coupled FBSDEas a special case. As specified in [52], this work is strongly motivated by the remarkable work by Fahim, Touzi and Warin [46], and stays in the paradigm of monotone schemes initiated by Barles and Souganidis [6]. It weakens a critical constraint imposed by [46], especially when the generator of the PDE depends only on the diagonal terms of the hessian matrix. Several numerical examples, up to dimension 12, are reported. The second part is dedicated to monotone schemes for fully nonlinear path- dependent PDEs. As reported in [87], this is an extension of the seminal work Barles and Souganidis [6] to path dependent case. Based on the viscosity theory of path dependent PDEs, developed by Ekren, Keller, Touzi and Zhang [41] and Ekren, Touzi and Zhang [42, 43, 44], we show that a monotone scheme converges to the unique viscosity solution of the (fully nonlinear) parabolic path dependent PDE. An implementable example of such monotone scheme is proposed and tested viii with a 1-dimensional example. Moreover, in the case that the solution is smooth enough, we obtain the rate of convergence of our scheme. Finally we propose an extension of the primal-dual algorithm, which is popu- lar in the pricing of early-exercise options, to a backward dynamic programming equation associated with time discretization schemes of (reflected) BSDEs. This is a collaborative project with Bender and Schweizer [10]. Taking as an input some approximate solution of the backward dynamic program, which was pre- computed, e.g., by least-squares Monte Carlo, our methodology allows us to con- struct a confidence interval for the unknown true solution of the time discretized (reflected) BSDE at time 0. We numerically demonstrate the practical applicabil- ity of our method in two five-dimensional nonlinear pricing problems where tight price bounds were previously unavailable. This dissertation is a collection of [52, 87, 10] written during my Ph.D. study. ix Chapter 1 Introduction Finding promising numerical methods for fully nonlinear parabolic PDEs of the following form with higher dimensions or path dependent feature is always a chal- lenging topic and has been discussed for a long time. −∂ t u−G(t,·,u,Du,D 2 u)=0; u(T,·)= g(·). (0.1) The standard numerical schemes in the PDE literature, e.g. finite difference methods and finite elements methods, work only for low dimensional problems, typically d ≤ 3, due to the well known curse of dimensionality. However, in many applications, especially in finance, the dimension d can be higher. We thus turn to the probabilistic approach which is less sensitive to the dimension. In the semilinear case: G = b(t,x)Du+ 1 2 tr[σ 2 (t,x)D 2 u]+f(t,x,u,σ(t,x)Du), 1 the PDE (0.1) is associated with a Markovian Backward SDE (0.2) ⎧ ⎪ ⎪ ⎨ ⎪⎪ ⎩ X s,x t = x+ t s b(r,X s,x r )dr + t 0 σ(r,X s,x r )dW r Y s,x t = g(X s,x T )+ T t f(r,X s,x r ,Y s,x r ,Z s,x r )dr− T t Z s,x r dW r ,. (0.2) in the sense that Y s,x t := u(t,X s,x t )and Z s,x t := Du(t,X s,x t )σ(t,X s,x t ) solves (0.2), due to the nonlinear Feynman-Kac formula introduced by Pardoux and Peng [72]. Based on the regularity results of BSDEs established by Zhang [86], [86] and Bouchard and Touzi [20] proposed the so called Backward Euler Scheme for such BSDEs and hence for the semilinear PDEs, and obtained the rate of convergence. This scheme approximates the BSDE by a sequence of conditional expectations, in the form of ⎧ ⎪ ⎪ ⎨ ⎪⎪ ⎩ Y i =E i [Y i+1 ]+f i,X i ,Y i ,E i W i+1 −W i t i+1 −t i Y i+1 , Y n = g(X n ). (0.3) Several efficient numerical algorithms have been proposed to compute these condi- tional expectations, notably: Bouchard and Touzi [20], Gobet-Lemor-Waxin [50], Bally-Pag` es-Printems [4]), Bender-Denk [9], Crisan-Manolarakis [33]. There have been numerous publications on the subject and the schemes have been extended to more general BSDEs, e.g. reflected BSDEs which corresponds to obstacle PDEs and is appropriate for pricing and hedging American options. Typically these algorithms work for 10 or even higher dimensional problems. 2 InChapter2weintend tonumerically solvePDE(0.1)infullynonlinearcase, in particular the Hamilton-Jacobi-Bellman equations and the Bellman-Isaacs equa- tions which are widely used in stochastic control and in stochastic differential games. We remark that this is actually one main motivation of the developments of second order BSDEs by Cheridito, Soner, Touzi and Victoir [26], and Soner, Touzi and Zhang [81]. Our scheme is strongly inspired by the work Fahim, Touzi and Warin [46]. Based on the monotone scheme of Barles and Souganidis [6], [46] extended the backward Euler scheme to fully nonlinear PDE (0.1). In the case G is convex in (u,Du,D 2 u), they obtained the rate of convergence by using the techniques in Krylov [59] and Barles and Jakobson [5]. They applied the linear regression method, see e.g. [50], to compute the involved conditional expecta- tions, and presented some numerical examples up to dimension 5. We remark that the rate of convergence has been improved recently by Tan [82], by using purely probabilistic arguments. There is one critical constraint in [46] though. In order to ensure the mono- tonicity of the backward Euler scheme, they assume the lower and upper bounds of G γ , the derivative of G with respect to D 2 u, satisfies certain constraint. However, when it turns into a high-dimensional problem, this constraint implies that G γ is essentially a constant and thus the PDE (0.1) is essentially semilinear, see (2.8) in Chapter 2 for more details. This is of course not desirable in practice. 3 The main contribution of Chapter 2 is to propose a new scheme so as to relax the serious constraint in [46]. In [46] the involved conditional expectations are expressed in terms of Brownian motion, which is unbounded. Our first simple but importantobservationisthatwemayreplaceitwith boundedtrinomialtree, which helps to maintain the monotonicity of the scheme. We next modify the scheme by introducing a new kernel for the Hessian approximation, see the K 2 (ξ) in (3.6) in Chapter 2, but still in the paradigm of monotone scheme. In the special case where G γ is diagonal, namely G involves D 2 u only through its diagonal terms, the above constraint is removed completely. Rate of convergence of our scheme is also obtained. Several numerical examples are presented. In low dimensional case, our scheme is comparable to finite difference method and is superior to the simulation methods. WhenG γ isdiagonal,ourschemeworkswellfor12dimensionalproblems. In Chapter 3 we extend the idea of monotone schemes to the so called fully nonlinear parabolic path dependent PDEs with terminal condition u(T,ω)= g(ω): Lu(t,ω):= −∂ t u(t,ω)−G(t,ω,u,∂ ω u,∂ 2 ωω u)=0, 0 ≤t<T. (0.4) Here ω is a continuous path on [0,T], and G is increasing in ∂ 2 ωω u and thus the PPDE is parabolic. Such PPDE provides a convenient tool for non-Markovian models, especially in stochastic control/game with diffusion control and financial models with volatility uncertainty. Its typical examples include: martingales as 4 path dependent heat equations, Backward SDEs of Pardoux and Peng [73] as semilinear PPDEs, and G-martingales of Peng [74] and Second Order Backward SDEs of Soner, Touzi and Zhang [81] as path dependent HJB equations. The notion of PPDE was proposed by Peng [75]. Based on the functional Itˆ ocalculus, initiated by Dupire [40] and further developed by Cont and Fournie [30], Ekren, Keller, Touzi and Zhang [41] and Ekren, Touzi and Zhang [42, 43, 44] developed a viscosity theory for PPDEs. In the Markovian case, namely (0.1), the seminal work by Barles and Sougani- dis [6] proposed some time discretization scheme and showed that, under certain conditions, the discretized approximation converges to the unique viscosity solu- tion of the PDE. Their key assumption is the monotonicity of the scheme, see Theorem 3.1.2 (ii) in Chapter 3, which can roughly be viewed as the comparison principle for the discretized PDE. This work has been extended by many authors, either by improving the error analysis including the rate of convergence, or by proposing specific algorithms which indeed satisfy the required conditions, see e.g. [5, 15, 46, 52, 59, 82, 83], to mention a few. Our goal of Chapter 3 is to extend the work [6] to PPDE (0.4). Notice that the viscosity solution in [41, 42, 43, 44] is defined through some optimal stopping problem under nonlinear expectation, which is different from the standard viscos- ity theory for PDEs. Consequently, our notion of monotonicity for the scheme also involves the nonlinear expectation, see (3.3) in Chapter 3. This requires some 5 technical estimates for the hitting time involved in the theory. We show that our monotoneschemeconverges totheuniqueviscositysolutionofthePPDEbyfollow- ing similar arguments with [6]. We next propose a specific implementable scheme whichsatisfiesalltheconditionsandthusindeedconverges. A1-dimensionalexam- ple of this kind is presented. In particular, when the PPDE has smooth enough classical solution, we obtain the rate of convergence of our scheme. In the semilinear case, there have been many works on numerical methods for the associated backward SDEs, see e.g. [16, 17, 56, 57, 64, 77, 86]. In particular, [56] used the arguments for viscosity theory of PPDEs. Moreover, [82] studied cer- tain numerical approximation for path dependent HJB equations, in the language of second order BSDEs. However, we should point out that most of these works, including the theory in Chapter 3, are mainly theoretical studies and none of them is efficient, especially in high dimensions. Efficient numerical algorithms for path dependent PDEs remains a challenging problem and we shall explore further in our future research. Almost all probabilistic schemes discussed in the literatures regarding BSDE, PDE and PPDE involve the computation of conditional expectations, which is usually replaced by some conditional expectation operator. However, the main difficulty lies in each step backwards in time when a conditional expectation must beapproximated numerically, buildingontheapproximatesolutiononestepahead. 6 This unavoidably leads to a high order nesting of conditional expectations. There- fore, it is important that the approximate conditional expectations operator can be approximated repeatedly for several times without exploding computational cost. Among the techniques that have been applied and analyzed in the context of BSDEs driven by a (high-dimensional) Brownian motions are least-squares Monte Carlo by Lemor et al. [62] and Bender and Denk [9], quantization by Bally and Pag` es [3], Malliavin Monte Carlo by Bouchard and Touzi [20], cubature on Wiener space by Crisan and Manolarakis [33], and sparse grid methods by Zhang et al. ([85]). A standard approach for solving (0.2) numerically would be the approximate dynamic programming approach (0.3) . In Chapter 4 we will discuss refinement for numerical approximations. In particular, we will focus on a more generalized form, which is ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ Y i =max S i , E i [Y i+1 ]+f i,X i ,Y i ,E i W i+1 −W i t i+1 −t i Y i+1 , Y n = S n := g(X n ). (0.5) This is a topic of trememdous applications in the finance area. When f ≡ 0, (0.3) degenerates to the well-known recursion for the valuation of Bermudan options. Afterthefinancialcrisis, peopleareincreasinglyinterested in”small”nonlinearities in pricing. These are due, e.g., to counterparty risk or funding risk, which had largely been neglected in practice. The applications of BSDEs on pricing problems 7 have been steadily growning. To name a few, Bergman [14], Duffie et al. [39] or the examples in El Karoui et al. [45]. More recent examples include funding risk [61, 32], counterparty risk [32, 55], model uncertainty [53, 1], and hedging under transaction costs [53]. The nonlinearity in some of these examples depends on the delta or the gamma of the option, which can be handled in our discrete time setting by choosing the β properly. The aim of Chapter 4 is to provide a general and efficient framework for calculating upper and lower price bounds for these pricing problems. Theerroranalysisofthetimediscretization(0.5)forBSDEsdrivenbyaBrown- ian motion has been thoroughly studied in the literature under various regularities. We refer to Zhang [86] Bouchard and Touzi [20], Gobet and Labart [49], Gobet and Makhlouf [51] for the non-reflected case (namely S i ≡−∞ fori<n)andto Bally and Pag` es[3], Ma and Zhang [67] as well as Bouchard and Chassagneux [18] for the reflected case. Moreover, the results in Chapter 4 can also be applied to the time discretization schemes for BSDEs driven by a Brownian motion with genera- tors with quadratic growth as in Chassagneux and Richou [24], time discretization schemes for BSDEs with jumps considered in Bouchard and Elie [19], and the time discretization scheme for fully nonlinear parabolic PDEs by Fahim et al. [46]. 8 Even though convergence rates are usually available in the literature for these different approaches for (0.5), the quality of the numerical approximation in prac- tice is difficult to assess, when taking limit is impossible. Generalizing the primal- dual methodology introduced by Andersen and Broadie [2] in the context Bermu- dan option pricing, in Chapter 4 we propose a confidence interval with a numerical approximation for (0.5). Essentially we suggest to take the numerical solution of the approximate dynamic program as an input, to construct a confidence interval forY 0 viaaMonteCarloapproach. Inshort, wetrytofindamaximizationproblem and a minimization problem with value Y 0 , for which optimal controls are available in terms of the true solution (Y i ) i=0,...,n of the dynamic program 0.5. Using the approximate solution instead of the true one, then yields suboptimal controls for these two optimization problems. If the numerical procedure in the approximate dynamic program was successful, these controls are close to optimal and lead to tight lower and upper bounds for Y 0 . Unbiased estimators for the lower and the upper bound can finally be computed by plain Monte Carlo, which results in a confidence interval for Y 0 . Bender and Steiner [12] provides a different a posteriori criterion forBSDEswhich isbetter suited for qualitative convergence analysis than for deriving quantitatively meaningful bounds on Y 0 . The two approaches are thus complimentary. The rest of this dissertation is organized as follows. In Chapter 2 we propose a feasible monotone scheme for high dimensional fully nonlinear PDEs (0.1), which 9 includes the quasilinear PDE that corresponds to the coupled FBSDE as a spe- cial case. Several numerical examples, up to 12-dimensional ones, are presented. Chapter 3 is dedicated to the extension of the theory in [6] to the convergence of numerical schemes for PPDE (0.4) to its path-dependent viscosity solution. An implementable scheme as well as a 1-dimensional example is demonstrated in Chapter 3. In Chapter 4 we provide an extension of the primal-dual methodology to the backward dynamic programming equation (0.5). Confidence interval for the unknown true solution of (0.5) at time 0 is reported. We test the tightness of the price bounds on two 5-dimensional nonlinear pricing problems at the end of Chapter 4. 10 Chapter 2 A Monotone Scheme for High Dimensional Fully Nonlinear PDEs 2.1 Introduction In this chapter we are interested in feasible numerical schemes for the following fully nonlinear parabolic PDE on [0,T]×R d , especially in high dimensional case: −∂ t u−G(t,x,u,Du,D 2 u)=0; u(T,·)= g(·). (1.1) Due to the well-known curse of dimensionality, the traditional numerical PDE methods, namely, finite difference or finite elements doesn’t work well when the dimension of space in the equation is large(d> 3). Hence probabilistic approach that usually involves in conditional expectations come to our mind. Inspired by the remarkable work by Fahim, Touzi and Warin [46] we propose a numerical scheme for (1.1) that works with less restriction on the nonlinearity of G 11 with respect to D 2 u. Based on the monotone scheme of Barles and Souganidis [6] their scheme is an extension of the backward Euler scheme which usually works on BSDEs to fully nonlinear PDE (1.1). One critical constraint in [46] though, is that the assumption of monotonicity of the backward Euler scheme virtually limits the nonlinearity of G in their case. When the dimension is high, this constraint implies that the derivative of G with respect to D 2 u is basically a constant, meaning the PDE (1.1) is in effect semilinear, see (2.8) for explanations in details. This clearly reduces the applicability of their scheme in practice. By replacing the Brownian motion in the conditional expectation of [46] with a bounded trinominal tree, and introducing a new kernel k 2 (ξ) in (3.6) for approx- imating the Hessian matrix, we manage to relax the serious constraint in [46]. In particular, this constraint is removed entirely when G corresponds to D 2 u only through its diagonal elements. Numerical examples shown in this chapter will demonstrate a comparison of our scheme with the the finite difference method and the simulation method in low dimensional case, and indicates that our method performs better than the simulation method and is almost as good as the finite difference method. Our scheme works well even for 12-dimensional equations when G is independent of the un-diagonal terms of D 2 u. We note that PDE (1.1) covers the quasilinear PDEs as a special case, which corresponds to a coupled forward backward SDE due to the four step scheme of Ma, Protter and Yong [65]. There are only a few papers on numerical methods 12 for FBSDEs, e.g. Douglas-Ma-Protter [65], Makarov [68], Cvitanic and Zhang [34], Delarue and Menozzi [35], Milstein-Tretyakov [69], Bender and Zhang [13], and Ma, Shen and Zhao [66]. Most of them deal with low dimensional FBSDEs only, except that [13] reported a 10-dimensional numerical example. However, [13] proved the rate of convergence only for time discretization, and the convergence of the linear regression approximation is not analyzed theoretically. Our scheme works for FBSDEs as well, especially when the diffusion coefficient σ is diagonal. A numerical example for a 12-dimensional coupled FBSDE is reported. Wehavealsopresented afewnumericalexampleswhichviolateourassumptions and thus the scheme may not be monotone. Numerical results show that our scheme still converges. In particular, we note that our current theoretical result does not cover the G-expectation, a nonlinear expectation introduced by Peng [76]. We nevertheless implement our scheme for a 10-dimensional HJB equation, which corresponds to a second order BSDE and includes the G-expectation as a special case, and it indeed converges to the true solution. It will be very interesting to investigate the convergence of our scheme, or its variations if necessary, when the monotonicity condition is violated. We shall leave this for our future research. Finally, we note that we have recently extended the idea of monotone schemes to the so called path dependent PDEs, see Zhang and Zhuo [87] or Chapter 3 The rest of the chapter is organized as follows. In Section 2 we present some preliminaries. In Section 3 we propose our scheme and prove the main convergence 13 results. Section 4 is devoted to the study of quasilinear PDEs and the associated coupled FBSDEs. In Section 5 we discuss how to approximate the involved con- ditional expectations. Finally we present several numerical examples in Section 6, up to dimension 12. 2.2 Preliminaries LetT> 0 be the terminal time, d ≥ 1 the dimension of the state variable x, S d the set of d×d symmetric matrices, and R d×d + the set of nondegenerate d×d matrices. For γ,˜ γ ∈ S d ,wesay γ ≤ ˜ γ if ˜ γ − γ is nonnegative definite. For x,˜ x ∈ R d and γ,˜ γ ∈ R d×d ,denote x· ˜ x := d i=1 x i ˜ x i , |x| := √ x·x, and γ:˜ γ := tr(γ˜ γ T ), |γ| := √ γ : γ, where T denotes transpose. For any γ=[γ ij ] ∈S d ,denote D[γ] := the diagonal matrix whose (i,i)-th component is γ ii . (2.1) It is clear that, for any γ,˜ γ ∈S d , D[γ]:˜ γ = D[γ]: D[˜ γ]= γ : D[˜ γ]. (2.2) Moreover, we use the same notation 0 to denote the zeroes in R d and S d . 14 Our objective isPDE(1.1), where G:(t,x,y,z,γ) ∈ [0,T]×R d ×R×R d ×S d → R and g : x ∈ R d → R. We first recall the definition of viscosity solutions: an upper (resp. lower) semicontinuous function u is called a viscosity subsolution (resp. viscosity supersolution) of PDE (1.1) if u(T,·) ≤ (resp. ≥) g(x)andforany (t,x) ∈ [0,T)×R d and any smooth function ϕ satisfying: [u−ϕ](t,x)=0 ≥ (resp. ≤)[u−ϕ](s,y), for all (s,y) ∈ [0,T]×R d , we have −∂ t ϕ−G(·,ϕ,Dϕ,D 2 ϕ) (t,x) ≤ (resp. ≥)0. For the theory of viscosity solutions, we refer to the classical references [31] and [47]. We remark that [6] considers more general discontinuous viscosity solutions, which is unnecessary in our situation due to the regularities we will prove, see also [46] Remark 2.2. We shall always assume the following standing assumptions: Assumption 2.2.1. (i) G(·,0,0,0) and g are bounded. (ii) G is continuous in t, uniformly Lipschitz continuous in (x,y,z,γ),and g is uniformly Lipschitz continuous in x (iii) PDE (1.1) is parabolic, that is, G is nondecreasing in γ. 15 (iv) Comparison principle for PDE (1.1) holds in the class of bounded viscosity solutions. That is, if u 1 and u 2 are bounded viscosity subsolution and viscosity supersolution of PDE (1.1), respectively, then u 1 ≤ u 2 . For notational simplicity, throughout the chapter we assume further that G is differentiable in (y,z,γ) so that we can use the notations G γ etc. However, we emphasize that all the results in the chapter do not rely on this additional assumption. Our goal of this chapter is to numerically compute the viscosity solution u. In their seminal work Barles and Souganidis [6] proposed a monotone scheme in an abstract way and proved its convergence by using the viscosity solution approach. To be precise, for any t ∈ [0,T)and h ∈ (0,T − t), let T t h be an operator on the set of measurable functions ϕ : R d → R.For n ≥ 1, denote h := T n , t i := ih, i=0,1,···,n, and define: u h (t n ,·):= g(·),u h (t,·):=T t t i −t [u h (t i ,·)],t ∈ [t i−1 ,t i ) (2.3) for i = n,··· ,1. The following convergence result is due to Fahim, Touzi and Warin [46] Theorem 3.6, which is based on [6]. Theorem 2.2.2. Let Assumption 2.2.1 hold. Assume T t h satisfies the following conditions: 16 (i) Consistency: for any (t,x) ∈ [0,T)×R d and any ϕ ∈ C 1,2 ([0,T)×R d ), lim (t ,x ,h,c)→(t,x,0,0) T t h [c+ϕ](t +h,·) (x )−[c+ϕ](t ,x ) h = ∂ t ϕ(t,x)+G(t,x,ϕ,Dϕ,D 2 ϕ). (ii) Monotonicity: T t h [ϕ] ≤ T t h [ψ] whenever ϕ ≤ ψ. (iii) Stability: u h is bounded uniformly in h whenever g is bounded. (iv) Boundary condition: for any x ∈R d , lim (t ,x ,h)→(T,x,0) u h (t ,x )= g(x). Then the PDE (1.1) has a unique bounded viscosity solution u,and u h converges to u locally uniformly as h → 0. We remark that in [46] the Monotonicity condition is weakened slightly. Roughly speaking, [46] proposed a scheme ¯ T t h as follows. Assume there exist σ,σ :[0,T] × R d → R d×d + such that 1 2 a(t,x) ≤ G γ (t,x,y,z,γ) ≤ 1 2 a(t,x), for any (t,x,y,z,γ), where a := σσ T and a := σσ T .Denote ¯ F(t,x,y,z,γ):= G(t,x,y,z,γ)− 1 2 a : γ, (2.4) and define ¯ T t h [ϕ](x):= ¯ D t,0 h ϕ(x)+h ¯ F t,x, ¯ D t,0 h ϕ(x), ¯ D t,1 h ϕ(x), ¯ D t,2 h ϕ(x) , (2.5) 17 where, for a d–dimensional standard Normal random variable N, ¯ D t,i h ϕ(x):= E ϕ(x+ √ hσN) ¯ K i (N) ,i=0,1,2, (2.6) ¯ K 0 (N):=1, ¯ K 1 (N):= σ −T N √ h , ¯ K 2 (N):= σ −T [NN T −I d ]σ −1 h , and σ −T := (σ −1 ) T . This scheme satisfies the consistency, and the stability follows from the monotonicity. However, to ensure the monotonicity, one needs to assume F γ : a −1 ≤ 1, see the proof of [46] Lemma 3.12. This essentially requires [ 1 2 a− 1 2 a]: a −1 ≤ 1, and thus a : a −1 ≤ d+2. (2.7) In the case a = αI d ,a = αI d for some scalar functions 0<α ≤ α,wehave 1 ≤ α/α ≤1+2/d. (2.8) When d is large, this implies α ≈ α and thus G is essentially semilinear, which of course is not desirable in practice. Our goal of this chapter is to modify the algorithm (2.5)-(2.6) so as to relax the above constraint, mainly in the case that G γ is diagonally dominant. In particular, when G γ is diagonal, we remove this constraint completely. 18 2.3 The numerical scheme In this section we present our numerical scheme and study its convergence. Our scheme involves two functions σ 0 :[0,T]×R d → R d×d + and p:[0,T]×R d → (0,1) satisfying: σ 0 , σ −1 0 ,and p −1 are bounded. (3.1) Denote F(t,x,y,z,γ):= G(t,x,y,z,γ)− 1 2 a 0 (t,x): γ, where a 0 := σ 0 σ T 0 ; ˜ G γ (t,x,y,z,γ):= σ −1 0 (t,x)G γ (t,x,y,z,γ)σ −T 0 (t,x). (3.2) For notational simplicity, we will suppressing the variables when there is no confu- sion. Unlike in [46], we emphasize that we do not require 1 2 a 0 ≤ G γ .Let(Ω,F,P) be a probability space. For each (t,x), let ξ := ξ t,x be an R d -valued random variable such that its components ξ i , i=1,···,d are independent and have the identical distribution: P(ξ i = 1 √ p )= p 2 , P(ξ i = − 1 √ p )= p 2 , P(ξ i =0)=1−p. (3.3) 19 This implies that E[ξ i ]=E[ξ 3 i ]=0, E[ξ 2 i ]=1, E[ξ 4 i ]= 1 p . (3.4) We now modify the algorithm (2.5)-(2.6): T t h [ϕ](x):= D t,0 h ϕ(x)+hF t,x,D t,0 h ϕ(x),D t,1 h ϕ(x),D t,2 h ϕ(x) , (3.5) where, recalling (2.1) and suppressing the variables (t,x), D t,i h ϕ(x):=E ϕ(x+ √ hσ 0 ξ)K i (ξ) ,i=0,1,2, K 0 (ξ):=1,K 1 (ξ):= σ −T 0 ξ √ h ,K 2 (ξ):= σ −T 0 (1−p)ξξ T −(1−3p)D[ξξ T ]−2pI d σ −1 0 (1−p)h . (3.6) One may check straightforwardly that E[K 1 (ξ)] =0,E[K 2 (ξ)] =0. (3.7) We recall that the approximating solution u h is defined by (2.3). Remark 2.3.1. If we assume 1 2 a 0 ≤ G γ and set p = 1 3 , then our scheme isobtained byreplacing thenormalrandomvariableN in(2.6)withtrinomialrandomvariable ξ. This in fact has already been mentioned in [46]. Remark 2.3.2. (i) The seemingly complicated kernel K 2 (ξ) is to ensure the con- sistency of the scheme, see Lemma 2.3.3 below. 20 (ii) The σ 0 is used to construct the forward process, on which we will compute the conditional expectations. This is fundamental in Monte Carlo methods which we will use. (iii) The introduction of p allows us to obtain the monotonicity of our scheme, see Section 2.3.2 below. However, we should point out that the crucial property is (3.4). Additional freedom of parameters, e.g. by replacing the trinomial tree with 5-nomial trees, will not help to weaken the monotonicity condition Assumption 2.3.4 below. (iv) An additional advantage of using trinomial tree (instead of Brownian motion) is that it is bounded, which helps to ensure the monotonicity, see the proof of Lemma 2.3.6 and Remark 2.3.9 (ii) below. 2.3.1 Consistency We first justify our scheme by checking its consistency. Lemma 2.3.3. Under Assumption 2.2.1 and (3.1), T t h satisfies the consistency requirement in Theorem 2.2.2 (i). 21 Proof. Fix (t,x) ∈ [0,T) × R d .Let ϕ ∈ C 1,2 ([0,T] × R d )and(t ,x ,h,c) → (t,x,0,0). Apply Taylor expansion to h: with the right side taking values at (t ,x ), ϕ(t +h,x + √ hσ 0 ξ) = ϕ+∂ t ϕh+ √ hDϕ·σ 0 ξ + h 2 D 2 ϕ:[σ 0 ξ][σ 0 ξ] T +o(h). We emphasize that, thanks to (3.1), the o(·) in this proof is uniform in (t ,x ,h,c). By (3.4) and the independence of ξ k one may check straightforwardly that D t,0 h ϕ(t +h,·)(x )= ϕ+∂ t ϕh+ h 2 D 2 ϕ : a 0 +o(h); D t,1 h ϕ(t +h,·)(x )= Dϕ+o( √ h). (3.8) Moreover, for any i = j, E (1−p)ξξ T −(1−3p)D[ξξ T ]−2pI d =0; E ξ i (1−p)ξξ T −(1−3p)D[ξξ T ]−2pI d =0; E ξ 2 i (1−p)ξξ T −(1−3p)D[ξξ T ]−2pI d =2(1−p)δ i,i ; E ξ i ξ j (1−p)ξξ T −(1−3p)D[ξξ T ]−2pI d =(1−p)(δ i,j +δ j,i ) 22 Hereδ i,j isthed×d-matrixwhose(i,j)-thcomponentis1andallothercomponents are 0. Then, denoting A=[a i,j ]:= σ T 0 D 2 ϕσ 0 , E D 2 ϕ:[σ 0 ξ][σ 0 ξ] T σ −T 0 [(1−p)ξξ T −(1−3p)D[ξξ T ]−2pI d ]σ −1 0 = σ −T 0 E A : ξξ T [(1−p)ξξ T −(1−3p)D[ξξ T ]−2pI d ] σ −1 0 = σ −T 0 d i,j=1 a i,j E ξ i ξ j (1−p)ξξ T −(1−3p)D[ξξ T ]−2pI d σ −1 0 =(1−p)σ −T 0 d i,j=1 a i,j (δ i,j +δ j,i )σ −1 0 =2(1−p)σ −T 0 Aσ −1 0 =2(1−p)D 2 ϕ, and thus D t,2 h ϕ(t +h,·)(x )= D 2 ϕ+o(1). (3.9) Plug (3.8) and (3.9) into (3.5) and recall (3.7), we have T t h [c+ϕ](t +h,·) (x )= c+ϕ+∂ t ϕh+ h 2 D 2 ϕ : a 0 +o(h) +hF t ,x ,c+ϕ+∂ t ϕh+ h 2 D 2 ϕ : a 0 +o(h),Dϕ+o( √ h),D 2 ϕ+o(1) . 23 Then, by (3.2), 1 h T t h [c+ϕ](t +h,·) (x )−[c+ϕ](t ,x ) = ∂ t ϕ(t ,x )− 1 2 o(1) : a 0 (t ,x )+o(1) +G t ,x ,c+ϕ+∂ t ϕh+ h 2 D 2 ϕ : a 0 +o(h),Dϕ+o( √ h),D 2 ϕ+o(1) . Send (t ,x ,h,c) → (t,x,0,0), we obtain the consistency immediately. 2.3.2 The monotonicity Toobtainthemonotonicityofourscheme, weneedtoimposeanadditionalassump- tion. Let σ 0 :[0,T] ×R d → R d×d + and ˜ G γ , D[ ˜ G γ ] be defined by (3.2) and (2.1). Introduce the following scalar functions associated to σ 0 : α(t,x):=sup α>0: αI d ≤ D[ ˜ G γ ](t,x,y,z,γ), ∀(y,z,γ) ; α(t,x):=inf α>0: αI d ≥ D[ ˜ G γ ](t,x,y,z,γ), ∀(y,z,γ) ; Λ(t,x):= α(t,x) α(t,x) ,θ(t,x):=inf θ ≥0: D[ ˜ G γ ] ≤ (1+θ) ˜ G γ ; (3.10) α p := p(2+3θ)−θ p(1+θ) ,c p := 2pΛ+α p −2p; λ p := √ p (1−p+pd)c p − 2pdΛ cp √ α |σ −1 0 | ; λ ∗ := inf (t,x)∈[0,T]×R d sup p∈[ θ 2(1+θ) , 1 3 ]∩(0, 1 3 ] λ p (t,x). 24 We remark that if we rescale σ 0 by a constant c,then α and α will be rescaled by c −2 . However, Λ, θ, α p , c p , λ p and λ ∗ are all invariant. The following assumption is crucial. Assumption 2.3.4. There exist σ 0 and p satisfying (3.1) and: (i) θ(t,x) ≤ 2 for all (t,x) and λ ∗ > 0 (ii) p ∈ [ θ 2(1+θ) , 1 3 ]∩(0, 1 3 ],λ p ≥ λ ∗ 2 and α = c −2 p . This assumption is somewhat complicated. We shall provide several remarks concerning it after proving the monotonicity of our scheme. At below we first explain our choices of parameters which will be used in the proof of next lemma. Remark 2.3.5. (i) We need p ≤ 1 3 so that 1 − 3p, the coefficient of D[ξξ T ], is nonnegative. For 0 ≤ θ ≤ 2, we have θ 2(1+θ) ≤ 1 3 .Moreover,for p ∈ [ θ 2(1+θ) , 1 3 ] ∩ (0, 1 3 ], it holds that α p ≥ 2p. (ii) To ensure the monotonicity, we shall first choose σ 0 with |σ 0 | = 1 satisfying Assumption 2.3.4, preferably the one maximizing the corresponding λ ∗ .Wenext choose p ∈ [ θ 2(1+θ) , 1 3 ]∩(0, 1 3 ] such that λ p ≥ λ ∗ 2 . Finally we rescale σ 0 to obtain σ 0 satisfying α = c −2 p . (iii) The above choices of p and σ 0 is somewhat optimal in order to maintain the monotonicity. However, given G, they may not be optimal for the convergence of the scheme. For example, a smaller p may help for the monotonicity, but may increase the variance of the Monte Carlo simulation which will be introduced 25 later. In our numerical examples in Section 2.6 below, we may choose them slightly differently. It is not clear to us how to choose p and σ 0 so as to optimize the overall efficiency of the algorithm. Lemma 2.3.6. Let Assumptions 2.2.1 and 2.3.4 hold, and consider the algorithm by using the p and σ 0 as specified in Remark 2.3.5 (ii). Then there exists a con- stant h 0 > 0, depending on d,T,λ ∗ , and the bounds and Lipschitz constants in Assumption 2.2.1 and (3.1), such that T t h satisfies the monotonicity in Theorem 2.2.2 (ii) for all h ∈ (0,h 0 ]. Proof. Let ϕ 1 ≤ ϕ 2 be bounded and ψ := ϕ 2 −ϕ 1 ≥ 0. Then by (3.5) we have, at (t,x), T t h [ϕ 2 ]−T t h [ϕ 1 ]= D t,0 h ψ +h F y D t,0 h ψ +F z ·D t,1 h ψ +F γ : D t,2 h ψ . (3.11) Here the terms F y ,F z ,F γ are defined in an obvious way and we emphasize that they are deterministic. Plug (3.6) into above equality, then T t h [ϕ 2 ]−T t h [ϕ 1 ] = E ψ(x+ √ hσ 0 ξ) 1+h[F y +F z ·K 1 (ξ)+F γ : K 2 (ξ) = E ψ(x+ √ hσ 0 ξ) 1+hF y + √ hF z ·(σ −T 0 ξ)+ I 1−p . (3.12) 26 where I := F γ : σ −T 0 [(1−p)ξξ T −(1−3p)D[ξξ T ]−2pI d ]σ −1 0 =[ ˜ G γ − 1 2 I d ]:[(1−p)ξξ T −(1−3p)D[ξξ T ]−2pI d ] =(1−p)[ ˜ G γ − 1 2 I d ]:(ξξ T )−(1−3p)[D[ ˜ G γ ]− 1 2 I d ]:(ξξ T ) −2ptr( ˜ G γ )+pd. Denote α i := ( ˜ G γ ) ii . Then it follows from the definition of θ that I ≥ (1−p)[ 1 1+θ D[ ˜ G γ ]− 1 2 I d ]:(ξξ T )−(1−3p)[D[ ˜ G γ ]− 1 2 I d ]:(ξξ T ) −2ptr( ˜ G γ )+pd = pα p D[ ˜ G γ ]:(ξξ T )−p|ξ| 2 −2ptr( ˜ G γ )+pd = pd−p d i=1 ξ 2 i +2α i −α p α i ξ 2 i . Note that α ≤ α i ≤ α, α p ≥ 2p by Remark 2.3.5 (i), α = c −2 p by Remark 2.3.5 (ii), and ξ 2 i takes only values 0 and 1 p .Then p[ξ 2 i +2α i −α p α i ξ 2 i ] ≤ (2pα i )∨[1−(α p −2p)α i ] ≤ 2pα∨[1−(α p −2p)α]=2pα=2pΛc −2 p . 27 Thus 1−p+I ≥ 1−p+pd−2pdΛc −2 p = λ p |σ −1 0 | √ p ≥ λ ∗ |σ −1 0 | 2 √ p . (3.13) By the Lipschitz continuity of G in γ,wehave C|σ −1 0 | 2 ≥ α=Λc −2 p = Λ 2pΛ+α p −2p ≥ 1 α p ≥ 1 3 . (3.14) Note that p ≤ 1 3 and |ξ i | takes values 0 or 1 √ p .Then hF y + √ hF z ·(σ −1 0 ξ) ≤Ch+ C √ h √ p |σ −1 0 |≤ C 1 √ h 0 √ p |σ −1 0 |, (3.15) for some constant C 1 .Set h 0 := ( 3λ ∗ 4C 1 ) 2 and recall again that p ≤ 1 3 and (3.1) . Plug (3.13) and (3.15) into (3.12), we see that 1+hF y + √ hF z ·(σ −1 0 ξ)+ I 1−p ≥ 0, and thus proves the monotonicity. We remark that our algorithm works well when ˜ G γ is diagonally dominated, namely when θ is uniformly small. In this case, we have the following simple sufficient condition for Assumption 2.3.4. 28 Proposition 2.3.7. Let Assumption 2.2.1 hold. Assume there exist σ 0 :[0,T]× R d → R d×d + and a small constant ε 0 > 0 such that σ 0 and σ −1 0 are bounded and, for the C 0 in (3.14), θ ≤ ε 0 4dC 0 and α(t,x) |σ −1 0 | 2 ≥ ε 0 . (3.16) Then Assumption 2.3.4 holds and consequently T t h is monotone. Proof. First, set p := ε 0 4dC 0 ∈ [ θ 2(1+θ) , 1 3 ]∩(0, 1 3 ]. It is clear that (3.1) holds. By the first inequality of (3.14) and second inequality of (3.16), we have Λ ≤ C 0 ε 0 .Then α p ≥ p(1+3θ) p(1+θ) ≥ 1, c p ≥ √ α p ≥ 1, and thus, λ p ≥ √ p 1−p+pd−2pd C 0 ε 0 ε 0 ≥ ε 0 4dC 0 1 2 ε 0 = ε 3 2 0 C . This implies Assumption 2.3.4 immediately. Remark 2.3.8. In this remark we investigate the bound of Λ for our algorithm, and compare it with (2.8). (i) When ˜ G γ is diagonally dominated, in particular when θ = 0, under (3.16) we remove the constraint (2.8) completely and thus improve the result of [46] significantly. We also note that when d = 1 we always have θ=0andthusthe bound constraint does not exist in our case. 29 (ii) Let 0<θ ≤2and d ≥ 2. For simplicity, we shall assume α,α and θ are all constants. Note that λ p = √ pα c p |σ −1 0 | (1−p+pd)(2pΛ+α p −2p)−2pdΛ =2p(1−p) √ p(d−1) √ pα c p |σ −1 0 | (1−p+pd)(α p −2p) 2p(1−p)(d−1) −Λ =2p(1−p)(d−1) √ pα c p |σ −1 0 | [1+ 1 (d−1)p ][1− θ 2(1+θ)p ]−Λ Then our constraint is: Λ < Λ θ := sup p∈[ θ 2(1+θ) , 1 3 ] [1+ 1 (d−1)p ][1− θ 2(1+θ)p ]. (3.17) When 0<θ ≤ 2 d+3 , one may compute straightforwardly that the optimal p := 2θ 2−(d−3)θ ∈ [ θ 2(1+θ) , 1 3 ] and thus Λ θ =1+ [2−(d−3)θ] 2 8θ(1+θ)(d−1) . Once again, we see that Λ θ is large when θ is small. In particular, there exists unique θ d ≤ 2 d+3 such that Λ θ d =1+ 2 d . Therefore, whenθ<θ d , our scheme allows for a larger bound of Λ than (2.8). (iii) When θ ≥ 2 d+3 , or more generally when θ ≥ θ d ,wemayset p := 1 3 and our algorithm reduces back to [46], by replacing the Brownian motion with trinomial tree, see Remarks 2.3.1. In this case Assumption 2.3.4 may be violated, but we can still easily obtain the same bound (2.7) as in [46]. That is, under (2.7) our 30 algorithm(with p = 1 3 )isstill monotone, but theproofshouldfollowthearguments in [46], rather than those in Lemma 2.3.6. (iv) By (3.17), to maintain monotonicity it suffices to choose p such that [1+ 1 (d−1)p ][1− θ 2(1+θ)p ] ≥ Λ. In particular, when θ = 0, one natural choice is p := min 1 (Λ−1)(d−1) , 1 3 . We remark further that, in light of Remark 2.3.5 (iii), we may not want to choose smaller p although it also maintains the monotonicity. Remark 2.3.9. This remark concerns the degeneracy of G. (i)Thesecondinequalityof(3.16)impliesimmediatelytheuniformnondegener- acy ofG γ : D[σ −1 0 G γ σ −T 0 ] ≥ ε 0 |σ −1 0 | 2 I d . This is mainlydue to thetermF z ·(σ −1 0 ξ)in (3.12). In [46], G γ is assumed to be nondegenerate, but not necessarily uniformly, under the additional assumption that F T z F −1 γ F z is bounded (F γ is nondegenerate in [46]). If we assume that |F z |≤ C|α|, then following similar arguments, in par- ticular by using a weaker version of monotonicity in the spirit of [46] Lemma 3.12, we may remove the uniform nondegeneracy in (3.16) as well. (ii) Unlike [46], we do not require G γ ≥ 1 2 a 0 andthus F can bedegenerate. This is possible mainly because we use bounded trinomial tree instead of unbounded Brownian motion. 31 (iii) When G is degenerate, namely α can be equal to 0, one can approximate the generator G by G ε := G + εI d : γ and numerically solve the corresponding solution u ε . By the stability of viscosity solutions we see that u ε converges to u locally uniformly. (iv) Motivated from pricing Asian options, in a recent work Tan [83] investi- gated the numerical approximation for the following type of PDE with solution u(t,x,y): −∂ t u(t,x,y)−G(t,x,y,u,D x u,D 2 xx u)−H(t,x,y,u,D x u,D y u)=0, where G is nondegenerate in D 2 xx u, but the PDE is always degenerate in D 2 yy u. 2.3.3 Stability Given the monotonicity, one may prove stability following standard arguments. Lemma 2.3.10. Let Assumptions 2.2.1 and 2.3.4 hold. Then for any h ∈ (0,h 0 ], T t h satisfies the stability in Theorem 2.2.2 (iii). Proof. First, it follows from Lemma 2.3.6 that T t h satisfies the monotonicity. Denote C n := sup x∈R d |u h (t n ,x)| and C i := sup (t,x)∈[t i ,t i+1 )×R d |u h (t,x)|, i = n−1,··· ,0. Since g is bounded, we see that C n ≤ C. We claim that C i ≤ (1+Ch)C i+1 +Ch. (3.18) 32 Then by the discrete Gronwall Inequality we see that max 0≤i≤n−1 C i ≤ C(1+Ch) n [C n +nh] ≤ C. This proves the lemma. We now prove (3.18). Let (t,x) ∈ [t i ,t i+1 ) ×R d and denote h := t i −t ≤ h, ξ := ξ t,x . Similar to (3.11), one may easily get u h (t,x)= E u h (t i+1 ,x+ √ h σ 0 ξ)I i+1 +h F t i ,x,0,0,0 (3.19) where, for some deterministic F y (t i ),F z (t i ),F γ (t i ) defined in an obvious way, I i+1 := 1+h F y (t i )+F z (t i )·K 1 (ξ)+F γ (t i ): K 2 (ξ) . The monotonicity in Lemma 2.3.6 exactly means I i+1 ≥ 0. Noting that F(t i ,x,0,0,0)= G(t i ,x,0,0,0) is bounded, then |u h (t,x)|≤ E |u h (t i+1 ,x+ √ h σ 0 ξ)|I i+1 +Ch ≤ C i+1 E[I i+1 ]+Ch . By (3.7), we see that E t i [I i+1 ]=1+h F y (t i ) ≤1+Ch . Then |u h (t,x)|≤ (1+Ch )C i+1 +Ch ≤ (1+Ch)C i+1 +Ch. 33 Since (t,x) is arbitrary, we obtain (3.18). 2.3.4 Boundary condition Lemma 2.3.11. Let Assumptions 2.2.1 and 2.3.4 hold, then |u h (t,x)−g(x)|≤ C(T −t) 1 2 Proof. Without loss of generality, we assume t = t k for some k. Fix (t k ,x)and denote X n t k := x, F n t k := {∅,Ω}.For j = k+1,···,n, define recursively: X n t j := X n t j−1 + √ hσ 0 (t j−1 ,X n t j−1 )ξ j , F n t j := F n t j−1 ∨σ(ξ j ), where ξ j := ξ t j−1 ,X n t j−1 is determined by (3.3) and is independent of F n t j−1 .Thenit is clear: u h (t j−1 ,X n t j−1 )= E t j−1 u h (t j ,X n t j ) +hF t j−1 ,X n t j−1 ,E t j−1 u h (t j ,X n t j ) , E t j−1 u h (t j ,X n t j )K 1 (ξ j ) ,E t j−1 u h (t j ,X n t j )K 2 (ξ j ) . Similarly to the proof of Lemma 2.3.10, we have u h (t j−1 ,X n t j−1 )= E t j−1 u h (t j ,X n t j )I j +hF t j−1 ,X n t j−1 ,0,0,0 , 34 where, by abusing the notation I slightly, I j := 1+h F y (t j−1 )+F z (t j−1 )·K 1 (ξ j )+F γ (t j−1 ): K 2 (ξ j ) ≥ 0, and F y (t j−1 ),F z (t j−1 ),F γ (t j−1 ) are defined in an obvious way. Denote J k := 1 and J i := Π i j=k+1 I j ,i = k+1,···,n. Recalling u h (t n ,x)= g(x), by induction we get u h (t k ,x)= u h (t k ,X n t k )= E g(X n tn )J n +h n−1 j=k J j F t j ,X n t j ,0,0,0 . Sinceg isbounded and uniformlyLipschitz continuous, wemaylet g beastandard smooth molifier of g such that g −g ∞ ≤C, Dg ∞ ≤ C and D 2 g ∞ ≤C −1 . (3.20) 35 Then, noting again that F(t,x,0,0,0)= G(t,x,0,0,0) is bounded, |u h (t k ,x)−g(x)| ≤ E g ε (X n tn )J n −g ε (x) +CεE J n +ChE n−1 j=k J j +Cε = n i=k+1 E g ε (X n t i )J i −g ε (X n t i−1 )J i−1 +CεE[J n ]+ChE n j=k J j +Cε = n i=k+1 E [g ε (X n t i )−g ε (X n t i−1 )]J i +g ε (X n t i−1 )J i−1 [I i −1] +CεE J n +ChE n−1 j=k J j +Cε. Since E t j−1 [I j ]=1+hF y (t j−1 ), we have E t i−1 [I i ]−1 ≤Ch and E J i ≤ (1+Ch) i−k ≤ C. (3.21) Thus |u h (t k ,x)−g(x)| ≤ n i=k+1 E [g ε (X n t i )−g ε (X n t i−1 )]J i +C(n−k)h+Cε. (3.22) Moreover, for some appropriate F t i -measurable ˜ X n t i , g ε (X n t i )−g ε (X n t i−1 )= √ hDg (X n t i−1 )·σ 0 ξ i + h 2 D 2 g ( ˜ X n t i ):(σ 0 ξ i )(σ 0 ξ i ) T . 36 By (3.1), we have E t i−1 Dg (X n t i−1 )·σ 0 ξ i I i = E t i−1 h[Dg (X n t i−1 )·σ 0 ξ i ][F z (t i−1 )·K 1 (ξ i )] ≤ C √ h; E t i−1 D 2 g ( ˜ X n t i ):(σ 0 ξ i )(σ 0 ξ i ) T I i ≤Cε −1 E t i−1 [I i ] ≤Cε −1 . Then E t i−1 [g ε (X n t i )−g ε (X n t i−1 )]I i ≤Ch[1+ε −1 ]. Plug this into (3.22) and recall (3.21), we have |u h (t k ,x)−g(x)|≤ C(n−k)h[1+ε −1 ]+Cε. Note that (n−k)h = T −t k .Set ε := √ T −t k , we obtain the result. 2.3.5 Convergence results First, combine Lemmas 2.3.3, 2.3.6, 2.3.10, 2.3.11, it follows immediately from Theorem 2.2.2 that Theorem 2.3.12. Let Assumptions 2.2.1 and 2.3.4 hold. Then the PDE (1.1) has a unique bounded viscosity solution u,and u h converges to u locally uniformly as h → 0. 37 We next study the rate of Convergence. We first consider the case that u is smooth. Let C [2] b ([0,T]×R d ) denote the subset of C 1,2 ([0,T]×R d ) such that u, ∂ t u,Du,D 2 u are bounded; and C [4] b ([0,T] ×R d )thesetof u ∈ C [2] b ([0,T] × R d ) such that each component of ∂ t u,Du,D 2 u is also in C [2] b ([0,T]×R d ). Theorem 2.3.13. Let Assumptions 2.2.1 and 2.3.4 hold and h ∈ (0,h 0 ). Assume further that u ∈ C [4] b ([0,T]×R d ),and G is locally uniformly Lipschitz continuous in x, locally uniformly on (y,z,γ). Then there exists a constant C, independent of h (or n), such that |u h (t,x)−u(t,x)|≤Ch for all (t,x). Proof. Again, since h< h 0 , it follows from Lemma 2.3.6 that T t h satis- fies the monotonicity. Denote C n := sup x∈R d |u h (t n ,x) − u(t n ,x)| and C i := sup (t,x)∈[t i ,t i+1 )×R d |u h (t,x)−u(t,x)|, i = n−1,··· ,0. We claim that C i ≤ (1+Ch)C i+1 +Ch 2 . (3.23) Since C n = 0, then by the discrete Gronwall Inequality we see that max 0≤i≤n−1 C i ≤ C(1+Ch) n [C n +nh 2 ] ≤Ch. This proves the theorem. 38 We now prove (3.23). Similar to the proof of Lemma 2.3.10, we shall only estimate |u h (t i ,x)−u(t i ,x)|, and the estimate for the general |u h (t,x)−u(t,x)| is similar. For this purpose, recall (3.5), (3.6) and define ˜ u h (t i ,x):=[D t i ,0 u(t i+1 ,·)](x) +hF t i ,x,[D t i ,0 u(t i+1 ,·)](x),[D t i ,1 u(t i+1 ,·)](x),[D t i ,2 u(t i+1 ,·)](x) . (3.24) We note that the right side of above uses the true solution u, instead of u h in (2.3). It is clear that |u h (t i ,x)−u(t i ,x)|≤|u h (t i ,x)− ˜ u h (t i ,x)|+|˜ u h (t i ,x)−u(t i ,x)|. (3.25) Compare (2.3) and (3.24), by the first equality of (3.11) we have, at (t i ,x), u h (t i ,x)− ˜ u h (t i ,x) = E [u h −u](t i+1 ,x+ √ hσ 0 ξ) 1+h[F y +F z ·K 1 (ξ)+F γ : K 2 (ξ) Then it follows from similar arguments in the proof of Lemma 2.3.10 that |u h (t i ,x)− ˜ u h (t i ,x)|≤ (1+Ch)C i+1 . (3.26) 39 Next, since u ∈ C [4] b ([0,T]×R d ), applying Taylor expansion and by (3.4) one may check straightforwardly that, at (t i ,x), D t i ,0 u(t i+1 ,·) (x)= u+∂ t uh+ h 2 D 2 u : a 0 +O(h 2 ); D t i ,1 u(t i+1 ,·) (x)= Du+O(h); D t i ,2 u(t i+1 ,·) (x)= D 2 u+O(h), where O(·) is uniform, thanks to (3.1). Then, again at (t i ,x), ˜ u h −u = ∂ t uh+ h 2 D 2 u : a 0 +O(h 2 ) +hF t i ,x,u+O(h),Du+O(h),D 2 u+O(h) = ∂ t uh− h 2 O(h): a 0 +O(h 2 ) +hG t i ,x,u+O(h),Du+O(h),D 2 u+O(h) . Note that u satisfy the PDE (1.1) and recall (3.2), then ˜ u h (t i ,x)−u(t i ,x) = O(h 2 )+h× G ·,u+O(h),Du+O(h),D 2 u+O(h) −G(·,u,Du,D 2 u) (t i ,x). Since u and its derivatives are bounded, and G is locally uniformly Lipschitz con- tinuous in x,thenwehave ˜ u h (t i ,x)−u(t i ,x) ≤ Ch 2 . 40 Plug this and (3.26) into (3.25), we obtain sup x |u h (t i ,x)−u(t i ,x)|≤ (1+Ch)C i+1 +Ch 2 . Similarly we may estimate sup x |u h (t,x)−u(t,x)| for t ∈ (t i ,t i+1 ), and thus prove (3.23). We finally study the case when u is only a viscosity solution. Given the mono- tonicity, our arguments arealmost identical to those of [46] Theorem 3.10, which in turn relies on the works Krylov [59] and Barles and Jakobsen [5]. We thus present only the result and omit the proof. The result relies on the following additional assumption. Assumption 2.3.14. (i) G is of the Hamilton-Jocobi-Bellman type: G(t,x,y,z,γ)=inf α∈A 1 2 σ α (σ α ) T (t,x): γ +b α (t,x)y +c α (t,x)·z +f α (t,x) , where σ α ,b α ,c α and f α are uniformly bounded, and uniformly Lipschitz continu- ous in x and uniformly H¨ older- 1 2 continuous in t, uniformly in α. (ii) For any α∈A andδ> 0, there exists a finite set {α i } M δ i=1 such that: inf 1≤i≤M δ (|σ α −σ α i | ∞ +|b α −b α i | ∞ +|c α −c α i | ∞ +|f α −f α i | ∞ ) ≤ δ. 41 We then have the following result analogous to [46] Theorem 3.10. Theorem 2.3.15. Let Assumptions 2.2.1 and 2.3.4 hold and h ∈ (0,h 0 ). (i) under Assumption 2.3.14 (i), we have u−u h ≤Ch 1/4 , (ii) under the full Assumption 2.3.14, we have −Ch 1/10 ≤ u−u h ≤Ch 1/4 . Remark 2.3.16. The arguments in [46] relies heavily on the viscosity properties of the PDE. Very recently Tan [82] provides a purely probabilistic arguments for HJB equations. His argument works for non-Markovian setting as well and thus provides a discretization for second order BSDEs. Moreover, under his conditions he shows that |u h −u|≤ Ch 1 8 , which improves the left side rate in Theorem 2.3.15 (ii). 2.4 Quasilinear PDE and coupled FBSDEs In this section we focus on the following G which is quasilinear in γ 1 : G = 1 2 [σσ T ](t,x,y): γ +b t,x,y,σ(t,x,y)z ·z +f t,x,y,σ(t,x,y)z . (4.1) 1 The idea of rewriting this PDE in the form of (3.2) for numerical purpose was communicated to the second author by Nizar Touzi back in 2003, which was in fact a main motivation to study the second order BSDE in [26, 81]. 42 Here f is scalar, b is R d -valued, and σ is R d×m -valued for some m. In this case the PDE (1.1) is closely related to the following coupled FBSDE: ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ X t = x+ t 0 b(s,X s ,Y s ,Z s )ds+ t 0 σ(s,X s ,Y s )dW s ; Y t = g(X T )+ T t f(s,X s ,Y s ,Z s )ds− T t Z s ·dW s . (4.2) Here W is a m-dimensional Brownian motion, and (X,Y,Z) is the solution triplet taking values in R d , R,and R m , respectively. Due to the four step scheme of Ma, Protter, andYong[65], when thePDE(1.1) hastheclassical solution, the following nonlinear Feynman-Kac formula holds: Y t = u(t,X t ),Z t = σ(t,X t ,u(t,X t ))Du(t,X t ). (4.3) The feasible numerical method for high dimensional FBSDEs has been a chal- lenging problem in the literature. There are very few papers on the subject, e.g. [13, 34, 35, 65, 66, 68, 69], most of which are not feasible in high dimensional cases. To our best knowledge, theonly work which reported a high dimensional numerical example is Bender and Zhang [13]. Our scheme works for quasilinear PDE as well, especially when σσ T is diago- nally dominated, and thus is appropriatefor numerically solving FBSDE (4.2). We remark that the σ 0 (t,x) we will choose is different from σ(t,x,y) in (4.1), and the 43 F defined by (3.2) is different from f. We shall present a 12-dimensional example, see Example 2.6.5 below. One technical point is that the G in (4.1) is not Lipschitz continuous in y, mainly due to the term 1 2 [σσ T ](t,x,y): γ. This can be overcome when the PDE has a classical solution u ∈ C [4] b ([0,T]×R d ) (as in Theorem 2.3.13). Theorem 2.4.1. Let G take the form (4.1). Assume (i) σ,b,f,g are bounded, continuous in all variables, and uniformly Lipschitz continuous in (x,y,z). (ii) Assumption 2.3.4 holds. (iii) The PDE (1.1) has a classical solution u ∈ C [4] b ([0,T]×R d ). Then |u h −u|≤Ch when h is small enough. Proof. We follow the proof of Theorem 2.3.13. Define C i , i=0,···,n and ˜ u h as in Theorem 2.3.13 and again it suffices to prove (3.23). We first estimate |u h (t i ,x)− ˜ u h (t i ,x)|.Denote ϕ h (x):= u h (t i+1 ,x),ϕ(x):= u(t i+1 ,x),ψ := ϕ h −ϕ. 44 Then, denoting D j := D t i ,j , j=0,1,2, and suppressing the variables (t i ,x), u h − ˜ u h = D 0 ψ +hF ·,D 0 ϕ h ,D 1 ϕ h ,D 2 ϕ h −hF ·,D 0 ϕ,D 1 ϕ,D 2 ϕ = D 0 ψ − h 2 a 0 : D 2 ψ +hG ·,D 0 ϕ h ,D 1 ϕ h ,D 2 ϕ h −hG ·,D 0 ϕ,D 1 ϕ,D 2 ϕ = D 0 ψ +h L 1 ψ +L 2 ψ +L 3 ψ , where, denoting a(t,x,y):= σσ T (t,x,y), L 1 ψ := − 1 2 a 0 :D 2 ψ+ 1 2 a(·,D 0 ϕ h ):D 2 ψ+b ·,D 0 ϕ h ,σ(·,D 0 ϕ h )D 1 ϕ h ·D 1 ψ; L 2 ψ := 1 2 a(·,D 0 ϕ h )−a(·,D 0 ϕ) : D 2 ϕ; L 3 ψ := b ·,D 0 ϕ h ,σ(·,D 0 ϕ h )D 1 ϕ h −b ·,D 0 ϕ,σ(·,D 0 ϕ)D 1 ϕ ·D 1 ϕ + f ·,D 0 ϕ h ,σ(·,D 0 ϕ h )D 1 ϕ h −f ·,D 0 ϕ,σ(·,D 0 ϕ)D 1 ϕ . 45 Let η denote a generic function with appropriate dimension which is uniformly bounded and may vary from line to line. Since u ∈ C [4] b ([0,T] × R d ), one may easily check that D 0 ϕ,D 1 ϕ,D 2 ϕ are bounded. Then L 1 ψ = 1 2 a(·,D 0 ϕ h ): D 2 ψ − 1 2 a 0 : D 2 ψ +η·D 1 ψ; L 2 ψ = ηD 0 ψ L 3 ψ = ηD 0 ψ +η · σ(·,D 0 ϕ h )D 1 ϕ h −σ(·,D 0 ϕ)D 1 ϕ = ηD 0 ψ +η ·σ(·,D 0 ϕ h )D 1 ψ +η· σ(·,D 0 ϕ h )−σ(·,D 0 ϕ) D 1 ϕ = ηD 0 ψ +η·D 1 ψ. Thus u h − ˜ u h = D 0 ψ +h ηD 0 ψ +η·D 1 ψ + h 2 a−a 0 : D 2 ψ = E ψ(x+ √ hσ 0 ξ) 1+hη +hη ·K 1 (ξ)+ h 2 [a−a 0 ]: K 2 (ξ) . Now following the same arguments as in Lemma 2.3.6, for small h we have 1+hη +hη ·K 1 (ξ)+ h 2 [a−a 0 ]: K 2 (ξ) ≥ 0. Then it follows from the arguments in Theorem 2.3.13 that |u h (t i ,x)− ˜ u h (t i ,x)|≤ (1+Ch)C i+1 . 46 Similarly we may prove |˜ u h (t i ,x)−u(t i ,x)|≤Ch 2 . Thus we prove (3.23) and hence the theorem. Remark 2.4.2. (i) The existence of classical solutions for quasilinear PDE can be seen in [60]. The rationale for the convergence in this case is as follows. Let C 0 be a bound for u , Du, D 2 u and assume d = 1 for simplicity. Let ˆ z := (−C 0 )∨z∧C 0 and ˆ γ := (−C 0 )∨γ ∧C 0 be the truncation. Consider ˆ G(·,z,γ):= 1 2 a:ˆ γ +b ·,σˆ z · ˆ z +f ·,σˆ z . (4.4) Then u is a classical solution of the following PDE as well: −∂ t u− ˆ G(t,x,u,Du,D 2 u)=0. (4.5) Under the conditions of Theorem 2.4.1, one can easily check that ˆ G satisfies Assumption 2.2.1. However, we should point out that ˆ G violates the nondegener- acy required inAssumption 2.3.4, so one still cannot apply Theorem 2.3.13 directly on PDE (4.5). (ii)IfthePDEhasaclassical solution, byapplyingthesocalledpartialcompar- ison (comparison between classical semisolution and viscosity semisolution), which is much easier than the comparison principle for viscosity solutions, one can easily see that u is unique in viscosity sense. 47 (iii) In general viscosity solution case, even if the PDE is wellposed in viscosity sense, we are not able to extend Theorems 2.3.12 and 2.3.15 directly, because G violates the uniform Lipschitz continuity. However, if one can approximate G by certain G ε and the PDE with generator G ε has classical solution, then following the stability of viscosity solutions we may numerically approximate u, in the spirit of Remark 2.3.9 (iii). 2.5 Implementation of the scheme In this section we discuss how to implement the scheme. Fix x 0 ∈ R d ,0= t 0 < ···<t n = T, and some desirable σ 0 and p, our goal is to numerically compute u h (t 0 ,x 0 ). Define (X n t i ,F n t i ) as in the proof of Lemma 2.3.11. That is, denote X n t 0 := x 0 , F n t 0 := {∅,Ω}, and define recursively: for i=0,···,n−1, X n t i+1 := X n t i + √ hσ 0 (t i ,X n t i )ξ i+1 , F n t i+1 := F n t i ∨σ(ξ i+1 ), (5.1) where ξ i+1 := ξ t i ,X n t i is determined by (3.3) (corresponding to p(t i ,X n t i )) and is independent of F n t i . We next define Y n tn := g(X n tn ), and for i = n−1,··· ,0, Y n t i := E t i [Y n t i+1 ] +hF t i ,X n t i ,E t i [Y n t i+1 ],E t i [Y n t i+1 K 1 (ξ i+1 )],E t i [Y n t i+1 K 2 (ξ i+1 )] , (5.2) 48 where the kernels K 1 and K 2 are defined in (3.6). Then one can easily check: Y n t i = u h (t i ,X n t i ). (5.3) In particular, u h (t 0 ,x 0 )= Y n t 0 , and thus it suffices to compute Y n t 0 . Clearly, the main issue is to compute efficiently the conditional expectations in (5.2). 2.5.1 Low dimensional case If p and σ 0 are constants, then the forward process (X n t i ) 0≤i≤n form a trinomial tree with n i=0 (2i+1) d nodes. When the dimension is low, say d ≤ 3, we may generate the whole trinomial tree and compute the exact value of Y n (and hence u h )ateach node, where the conditional expectations in (5.2) are computed by the weighted average. This method is very efficient and the result is deterministic. It is in fact comparable to the standard finite difference method. We remark that Bonnans and Zidani [15] proposed an improved finite difference scheme for HJB equations. We will implement our algorithm on Example 2.6.1 below with dimension 3. When p and σ 0 vary for different (t,x), the number of nodes in the trinomial tree (X n t i ) 0≤i≤n grows exponentially in n and the above exact method is not feasible anymore. Similarly, the number of nodes will grow exponentially in d and thus this method also becomes infeasible when d is high, even if p and σ 0 are constants. In these cases we will use least square regression combined with Monte Carlo 49 simulation to approximate the conditional expectations in (5.2). This method has been widely used in the literature, see e.g. Longstaff and Schwartz [63] and Gobet, Lemor and Waxin [50], and will be the subject of the next subsections. 2.5.2 Least square regression For each i=0,···,n−1, fix an appropriate set of basis functions e i j : R d → R, j=1,···,J i . Typically we set J i and e i j independent of i, but in general they may vary for different i.For k=0,1,2 and any function ϕ :R d → R,let P i k (ϕ)denote the least regression function of ϕ on the linear span of {e i j ,1 ≤ j ≤ J i } as follows: P i k (ϕ):= J i j=1 α i,k j e i j , where {α i,k j } 1≤j≤J i := argmin {α j } 1≤j≤J i E J i j=1 α j e i j (X n t i )−ϕ(X n t i+1 )K k (ξ i+1 ) 2 . (5.4) We then define u J h (t n ,·):= g,andfor i = n−1,··· ,0, u J h (t i ,x): = P i 0 (u J h (t i+1 ,·)) +hF t i ,x,P i 0 (u J h (t i+1 ,·)),P i 1 (u J h (t i+1 ,·)),P i 2 (u J h (t i+1 ,·)) .(5.5) 50 Assume we have actually chosen a countable set of basis functions (e i j ) j≥0 sat- isfying: lim J→∞ inf {α j } 1≤j≤J E J j=1 α j e i j (X n t i )−E t i [u h (t i+1 ,X n t i+1 )K k (ξ i+1 )] 2 (5.6) =0, ∀(i,k). Then, following the arguments in [28] or [50], one can easily show that lim J→∞ u J h (t 0 ,x 0 )= u h (t 0 ,x 0 ). (5.7) The rate of convergence in (5.7) depends on that in (5.6). Since the focus of this chapter is the monotone scheme (in terms of the time discretization) , we omit the detailed analysis of the convergence (5.7). Clearly, it is crucial to find good basis functions. Notice that the conditional expectations in (5.6) are approximations of u(t i ,·), Du(t i ,·), and D 2 u(t i ,·), respec- tively. Ideally, in the case that the true solution u is smooth, we want to choose (e i j ) 1≤j≤J i whose linear span include u(t i ,·), Du(t i ,·), and D 2 u(t i ,·). This is of course not feasible in practice since u is unknown. Another naive choice is to use the indicator functions of the hypercubes from uniform space discretization. Theoretically this will ensure the convergence very well. However, in this case the number of hypercubes will grow exponentially in dimension d, and thus the curse of dimensionality remains exactly as in standard finite difference method. 51 Intheliterature,peopletypicallyuseorthogonalbasisfunctionssuchasHermite polynomials, which is convenient for solving the optimal arguments in (5.4). There are some efforts to improve the basis functions, see e.g. the martingale basis functions in Bender and Steiner [11], and the local basis functions in Bouchard and Warin [21]. However, overall speaking to find good basis functions is still an open problem and is certainly our interest in future research. 2.5.3 Monte Carlo simulation As a standard in the literature of BSDE numerics, in high dimensional case we use Monte Carlo approach to approximate the minimum arguments (α i,k j ) 1≤j≤J i in (5.5). To be precise, we simulate L-paths for the forward diffusion X n and the correspondingtrinomialrandomvariablesξ i ,denotedas(X n,l ) 1≤l≤L and(ξ i,l ) 1≤l≤L , i=1,···,n. Then, in the backward induction the expectation in (5.4) is replaced by the sample average: 1 L L l=1 J i j=1 α j e i j (X n,l t i )−ϕ(X n,l t i+1 )K k (ξ i+1,l ) 2 . (5.8) 52 This can be easily solved by linear algebra. Let u J,L h be defined by (5.5), but replacing α i,k j with ¯ α i,k j , the optimal arguments of (5.8). We remark that u J,L h is random, and by the law of large numbers, lim L→∞ u J,L h (t 0 ,x 0 )= u J h (t 0 ,x 0 ), a.s. (5.9) Moreover, one may obtain the rate of convergence in the spirit of the Central Limit Theorem. We refer to [50] for more details. To understandtheconvergence (5.9), itisimportanttounderstand thevariance of u J,L h for given L. One can easily see that the variance in each step of our scheme is O(d/L), which leads to an O(nd/L) variance in total. As the theoretical rate of convergenceforPDEwithsmoothsolutionis1/n,seeTheorem2.3.13,thestandard deviation of the numerical result vanishes in the same rate only when L = O(n 3 ). On the other hand, Glassman and Yu [48] illustrated that the number of paths should be of O(exp(J)),J being the number of basis functions. Hence around O(n 3 exp(J)) paths are supposed to be sampled in theory. This is prohibitive, if not impossible in practice, especially when J is large. However, various examples in next section show that that it’s generally feasible to obtain a desirable rate of convergence with much less paths in practice. Finally we remark that Monte Carlo method is much less sensitive to dimen- sions. For example, it can be seen in next section that we can use Monte Carlo 53 method to approximate a 12-dimensional PDE with 160 time steps and 13333333 paths, while for finite difference method with d = 12, even for 2 time steps the number of grid points already exceeds 13333333. 2.5.4 Some further comments We note that there are three types of errors involved in this algorithm: Total Err = Discretization Err + Regression Err + Simulation Err. The main contribution of this chapter is the introduction of the new monotone scheme and thus we have focused our discussion on the Discretization Error in Sections 2.3 and 2.4. We remark that the analysis of the Regression Error and the Simulation Error is independent of the discretization error. Since this is not the main focus of the present chapter, we shall apply standard procedure for the regression and the simulation steps. In particular, since we know the true solution for many examples below, we will include the true solution (and its derivatives) in the basis functions, thus the numerical results in these examples will reflect the Discretization Error and the Simulation Error only. We have also tested examples where the basis functions do not include the true solution (Example 2.6.2) or the true solution is unknown (Example 2.6.4 and the last part of Example 2.6.3). We shall emphasize though, when the true solution is unknown, the numerical result is an approximation of u J h , which roughly speaking is the least square regression of the true solution u in the span of the basis 54 functions. For fixed basis functions, the increase of n and L cannot eliminate the RegressionErrorandthustheconvergence we observe innumerical resultsdoesnot necessarily reflect a small Total Error. Again, a thorough analysis of the regression error, especially a good mechanism for choosing basis functions, is an important open problem. Moreover, although our theoretical results hold true only under the monotonic- ity Assumption 2.3.4, we nevertheless implement our scheme to some examples which violate Assumption 2.3.4 and thus our scheme may not be monotone. It is interesting to observe the convergence in these examples as well. As far as we know, a rigorous analysis of non-monotone schemes is completely open. 2.6 Numerical examples In this section we apply our scheme to various examples 2 . For simplicity, except in Example 2.6.2, we shall choose constant σ 0 and p, and assume the α and α in (3.10) are also constants (or more precisely, use inf (t,x) α(t,x)andsup (t,x) α(t,x) instead). Quite often, we will use the following functions: SIN(t,x):= sin(t+ n i=1 x i ),COS(t,x):= cos(t+ n i=1 x i ). (6.1) 2 All numerical examples below are computed using a personal Laptop, which is a core i5 2.50 GHz processor with 8GB memory. 55 2.6.1 Examples under monotonicity condition In this subsection we consider examples with diagonal G γ and we shall always choose σ 0 diagonal, again except Example 2.6.2, so θ = 0 and thus there is no constraint on Λ, see Remark 2.3.8 (i). We start with a 3-dimensional example for which we can compute its values over the trinomial tree by using the weighted averages. We remark that in this example only the discretization error is involved. Example 2.6.1. A 3-dimensional Fully Nonlinear PDE −∂ t u− 1 2 sup σ≤σ≤σ (σ 2 I d ): D 2 u +f(t,x,u,Du)=0, in [0,T)×R d , u(T,x)=SIN(T,x), on R d , (6.2) where 0<σ < σ are both in R,and f(t,x,y,z)= 1 d d i=1 z i − d 2 inf σ≤σ≤σ σ 2 y . (6.3) We remark that we set f inthis way so that (6.2) hasthe classical solution: u = SIN, with which we can verify the convergence of our numerical approximation. To test its convergence under different nonlinearities, we assume that d=3, σ=1, σ = √ 2, √ 4, or √ 6. Supposing that T =0.5and x 0 =(5,6,7), we know the true solution is u(0,x 0 )=sin(5+6+7)≈−0.750987. 56 According to our scheme, when ˜ G γ is diagonal, θ = 0, which implies α p =2 and, recalling Remark 2.3.8 (iv), we can choose the following parameters: Λ= σ 2 σ 2 ,p=min 1 2(Λ−1) , 1 3 ,α = 1 2pΛ+α p −2p ,σ 0 = σ √ 2α I d . We remark that Λ = 2,4,6 respectively, which violates the constraint (2.8) and thus the algorithm in [46] may not be monotone. Denote the number of time partitions by n. By applying the weighted average method we can obtain the results in Figure 2.1, where the cost in time increases from 0.1 second to 800 seconds exponentially as n increases from 20 to 160 linearly. The table in Figure 2.1 contains the numerical solutions when σ 2 = 2 exclusively, while the graph depicts the errors under three different choices of σ 2 . As we can see from Figure 2.1, the rate of convergence is approximately C · h, whereas the C depends on the structure of G. Therefore, our scheme works generally for large Λ when ˜ G is diagonal or diagonally dominant with a small θ. In Figure 2.2 we compare the convergence of our scheme with that of finite difference method by fixing σ =1, σ = √ 2. It can be seen that our result converges slightly slower than, but is comparable to, the finite difference method in solving low dimensional problems. 57 20 40 60 80 100 120 140 160 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 n Absolute Error σ 2 =2 σ 2 =4 σ 2 =6 n Approx. σ 2 =2 20 -0.72984 40 -0.74028 60 -0.74382 80 -0.74667 100 -0.74560 120 -0.74738 140 -0.74790 160 -0.74829 Ans. -0.750987 Figure2.1: A3-dimensionalexamplewithvariousdegreesofnon-linearityinExam- ple 6.2. 20 40 60 80 100 120 140 160 0 0.005 0.01 0.015 0.02 0.025 n Absolute Error Our Algorithm Finite Difference n Ours F.D. 20 -0.72984 -0.76420 40 -0.74028 -0.75785 60 -0.74382 -0.75562 80 -0.74667 -0.75447 100 -0.74560 -0.75379 120 -0.74738 -0.75332 140 -0.74790 -0.75300 160 -0.74829 -0.75274 Ans. -0.75099 -0.75099 Figure 2.2: Comparison with finite difference method in Example 6.2. To see more of our scheme in extreme condition, we assume σ=0. Then we truncate G γ from below with a positive definite matrix εI d > 0. That is, we approximate (6.2) by the following nondegenerate PDE: −∂ t u− 1 2 sup ε≤σ≤σ (σ 2 I d ): D 2 u +f(t,x,u,Du)=0,ε=0.01 58 20 40 60 80 100 120 140 160 0 0.05 0.1 0.15 0.2 0.25 n Absolute Error σ 2 =2 σ 2 =4 σ 2 =6 n Approx. σ 2 =2 20 -0.76285 40 -0.75705 60 -0.75508 80 -0.75401 100 -0.75339 120 -0.75297 140 -0.75269 160 -0.75247 Ans. -0.750987 Figure 2.3: Convergence of a degenerate PDE truncated in Example 6.2. where f is given by (6.3) (with σ=0). Figure 2.3 shows the feasibility of truncation in dealing with σ=0. Example 2.6.2. A 4-dimensional PDE with (σ 0 ,p) depending on (t,x) ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ −∂ t u−G(D 2 u)+f(t,x)=0, in [0,T)×R 4 , u(T,x)= SIN(T,x), on R 4 , (6.4) 59 where SIN 2 := 2SIN ×COS,and σ = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 10 0 0 SIN 10 0 COS SIN 10 SIN COS SIN 1 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ,A = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 1 2SIN 27 SIN 2 54 SIN 2 54 2SIN 27 1 2SIN 27 SIN 2 54 SIN 2 54 2SIN 27 1 2SIN 27 SIN 2 54 SIN 2 54 2SIN 27 1 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ; Γ= 4 5 · (6−|SIN|)(3−2|SIN|) (3−|SIN|) 2 = 4 5 2− 2|SIN| (3−|SIN|) 2 ∈ [1, 8 5 ]; a := σσ T , a=Γ[σAσ T ],G(M):=max{a :M, a : M}; and f is chosen so that u := SIN is a classical solution of the PDE. We first specify the parameters so that monotonicity Assumption 2.3.4 holds. Set σ 0 := βσ for some scalar functionβ> 0. Then, roughly speaking, ˜ G γ is either 1 β 2 I d or Γ β 2 A. This implies D[ ˜ G γ ] ≤ (1+θ) ˜ G γ for θ := 2|SIN| 9−2|SIN| ≤ 2 7 = 2 d+3 . Next, notice that Λ := α/α=Γ= 4 5 · (6−|SIN|)(3−2|SIN|) (3−|SIN|) 2 and recall 2.3.8 (ii). Set p := 2−θ 6(1+θ) ∈ [ θ 2(1+θ) , 1 3 ]∩(0, 1 3 ]. One can check that 1+ 1 (d−1)p 1− θ 2(1+θ)p > Λ. 60 We remark that here we do not use p := 2θ 2−(d−3)θ as specified in Remark 2.3.8 (ii), because it becomes zero when θ = 0. Finally, β is determined by: β 2 = 1 α = c 2 p =2pΛ+α p −2p = 1944−24|SIN| 2 −1260|SIN| 270(3−|SIN|) ∈ [ 11 9 , 12 5 ]. In particular, we emphasize that here σ 0 and p depend on (t,x). As explained in Section 2.5.1, in this case we cannot use the weighted averages asin previous example. We thus use the combination of least square regression and Monte Carlo simulation. To illustrate the important role of the basis functions, we implement our scheme using three different set of basis functions: • the true solution and its derivatives; • second order polynomials consisting of 1, {x i } d i=1 , {x i x j } 1≤i≤j≤d ; • the local basis functions proposed by Bouchard and Warin [21]. The idea of local basis functions is as follows. Divide the samples at each time step into 3 d local hypercubes, such that there are 3 partitions in each dimension, and there are approximately the same amount of particles in each hypercubes. Then we project samples in each hypercubes into a linear polynomial of d+1 degrees of freedom, so there are 3 d ·(1+d) local basis functions in total. Since each linear polynomial has local hypercube support, the corresponding matrix in the regression is sparse, making it easier to solve than a regression problem of dense matrix. 61 Set T =0.1, x 0 =(2,3,4,5), and thus the true solution is sin(2 + 3 + 4 + 5) ≈ 0.9906. As our first example using Monte Carlo regression, we will sample L = 3125n 2 to see how the convergence works. Moreover, we shall repeat the tests identically and independently for K times. The numerical results are reported in Figure 2.4, where the average of the results is denoted as Ans. and the average time (in seconds) is denoted as Cost. Basis functions: Polynomials Local Basis True Solution Basis n L K Ans. Cost Ans. Cost Ans. Cost 3 28125 27 1.094 0.27 1.1057 0.47 1.0959 0.17 5 78125 16 1.0488 1.6 1.0679 2.8 1.0421 0.89 10 312500 8 1.0271 16 1.0390 34 1.0123 8.6 15 703125 6 1.0261 58 1.0311 142 1.0008 30 20 1250000 4 1.0221 137 1.0258 355 1.001 71 25 1953125 4 1.0240 276 1.0247 710 0.9986 142 30 2812500 3 1.0228 444 1.0209 1250 0.9966 243 40 5000000 2 1.0218 897 1.0156 2725 0.9952 567 True Solution 0.9906 0.9906 0.9906 0 5 10 15 20 25 30 35 40 0.001 0.003 0.005 0.007 0.01 0.03 0.05 0.07 0.1 0.2 0.3 0.4 0.5 n Absolute Error Polynomial Basis Local Basis True Solution Basis Figure 2.4: Numerical results using different basis functions in Example 2.6.2. Withoutsurprise, thetruesolutionbasisfunctionsperformthebest. Weremark that the results for the other two sets of basis functions include the regression error 62 as well. From the numerical results, the local basis functions seems to have smaller regressionerrorthanthepolynomials, when nislarge. However, when applyingthe local basis functions it is time consuming to sort the L sample paths and localize them into different hypercubes. When the same number of paths are sampled, the more basis functions we used, the slower simulation will be. More seriously, when the dimension d increases, the number of basis functions increases dramatically, which requires an exponential increase in the number of paths in return(see [48] for a detailed investigation of the relation between basis functions and paths). So further efforts are needed for higher dimensional problems. Our main motivation is to provide an efficient algorithm for high dimensional PDEs. At below we test our scheme on a 12 dimensional example, for which we shall again use the the regression-based Monte Carlo method. Example 2.6.3. A 12-dimensional example Consider the PDE (6.2) with d=12, σ=1, σ = √ 2, f(t,x,y,z)=COS − d 2 inf σ≤σ≤σ σ 2 SIN . (6.5) The true solution is again u = SIN. As explained in Section 2.5.4, in this chapter we want to focus on the discretization error and simulation error, so we 63 rule out the regression error and test our algorithm by using the following perfect set of basis functions: 1,x,SIN(T,x),COS(T,x). To test the result, we fix T =0.2and x 0 = {1,2,...,12}, which implies that the true solution is sin(78) = 0.513978. As the nonlinear term is diagonal, under the same framework as in Example 2.6.1 we take p := min{1/3,1/(1+d(Λ− 1))} = 1/13,σ 0 := I d , which also satisfy the monotonicity condition Assumption 2.3.4. Assuming thatwerepeat Kidentical andindependent tests, andwesample Lpaths in each test. We do not use L = O(N 3 ) in this example and the ones following, since it’s usually not necessary in practice. The results are reported in Figure 2.5, where we conduct fewer tests for larger L, because the results are stable enough to draw our conclusion. It can be seen from Figure 2.5 that the error shrinks slightly slower than O(h), which is due to the simulation error. Hence we want to explore the influence of simulation error by using all the parameters as above but fixing n = 40, K = 2,d = 12, T =0.2,n=40,σ=1, σ = √ 2. We increase the sample size L to see how the error reduces in Figure 2.6. While the variance and error decrease with more paths sampled, the cost in time increases linearly with respect to L from 8 seconds to 1400 seconds in Figure 2.6. 64 n L K Avg(Ans.) Var(Avg.) cost (in seconds) 2 2083 160 0.659639 3.53×10 −6 4.48×10 −2 5 13021 64 0.562635 1.99×10 −6 1.46×10 −1 10 52083 32 0.546598 8.41×10 −7 1.17×10 0 20 208333 16 0.530432 8.04×10 −7 1.08×10 1 40 833333 8 0.521343 2.25×10 −7 9.11×10 1 80 3333333 4 0.519701 1.21×10 −7 7.28×10 2 160 13333333 2 0.517363 6.17×10 −8 5.86×10 3 True Solution 0.513978 10 0 10 1 10 2 0.003 0.005 0.007 0.01 0.03 0.05 0.07 0.1 0.2 n Absolute Error Figure 2.5: Numerical results of a 12-dimensional example in Example 2.6.3. 10 5 10 6 10 7 0.004 0.006 0.008 0.01 0.02 0.03 L Absolute Error L Approx. 83333 0.543643 166667 0.526979 416667 0.523897 833333 0.524683 1666667 0.521531 333333 0.521017 6666667 0.520083 13333333 0.518607 Ans.: 0.513978 Figure 2.6: Relation between size of sample and errors in Example 2.6.3. 65 Wehave seen thatourscheme converges tothetrueclassical solutionif itexists. Meanwhile, if the PDE only has a unique viscosity solution, our scheme can render a converging result as well. Let f be zero in (6.2), then this equation has some unknown viscosity solu- tion. However, our numerical results in Figure 2.7 still demonstrate a converging sequence. The number of paths we sampled in 2.7 is the same as that in Figure2.5. This can be also be observed from the decreasing differences between the numeri- cal results. The Δ i−j in Figure 2.7 denotes a numerical result with i partitions in time minus another numerical result with j time steps. We shall remark though in this case our choice of basis functions may not be the best, and roughly speaking the numerical result we obtain is an approximation of the regression of the true solution in the linear span of the basis functions. Again, we leave the analysis of the basis functions to future study. 66 0 20 40 60 80 100 120 140 160 180 0.17 0.18 0.19 0.2 0.21 0.22 0.23 n Numerical Results(NR) Δ 5−2 0.0334337 Δ 10−5 0.0077685 Δ 20−10 0.0076637 Δ 40−20 0.0034146 Δ 80−40 0.0012785 Δ 160−80 0.0002586 Figure 2.7: Numerical results for a PDE with unknown viscosity solution in Exam- ple 2.6.3. It is well known that Isaacs equations have a unique viscosity solution under mildtechnicalconditions. WenexttestourschemeonthefollowingIsaacsequation to see its performance. Example 2.6.4. A 12-dimensional Isaacs equation with unknown vis- cosity solution −u t −G(D 2 u)=0 on [0,T)×R d ,u(T,x)=SIN(T,x) on R d , 67 where G(γ):= d i=1 sup 0≤u≤1 inf 0≤v≤1 1 2 σ 2 (u,v)γ ii +f(u,v) = d i=1 inf 0≤v≤1 sup 0≤u≤1 1 2 σ 2 (u,v)γ ii +f(u,v) , σ 2 (u,v):=(1+u+v),f(u,v):= − u 2 4 + v 2 4 . One can easily simplify G(γ)as: G(γ)= d i=1 g(γ ii )where g(γ ii ):= ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩ γ ii − 1 4 ,γ ii ∈ (1,+∞), γ ii 2 + (γ + ii ) 2 4 − (γ − ii ) 2 4 ,γ ii ∈ [−1,1], γ ii + 1 4 ,γ ii ∈ (−∞,−1). ThereforeG(γ)isneither concave norconvex when γ=0.SettingT=0.2, d=12, we assign arbitrary initial value x 0 = {x (i) 0 } d i=1 to inspect the outcome. Obviously here a=2I d and a = I d . We then take p=min{1/3,1/(1+d(Λ−1))}=1/13,σ 0 = I d . One example tested here is x (i) 0 =2iπ − T−0.5π d . The number of paths we sampled is 625·n 2 . Though the viscosity solution is unknown, our scheme still renders a converging numerical result in Figure 2.8. We next test our scheme for a 12-dimensional coupled FBSDE. Example 2.6.5. A 12-dimensional coupled FBSDE 68 0 10 20 30 40 50 60 70 80 0.23 0.235 0.24 0.245 0.25 0.255 0.26 0.265 n Numerical Results(NR) Δ 5−2 -0.019953 Δ 10−5 -0.005390 Δ 20−10 -0.003752 Δ 40−20 -0.002893 Δ 80−40 -0.001213 Δ 160−80 -0.000426 Figure 2.8: A 12-dimensional Isaacs equation with unknown viscosity solution in Example 2.6.4. Consider FBSDE (4.2) with m = d=12, σ is diagonal, and b i (t,x,y,z):=cos(y +z i ),σ ii (t,x,y):=1+ 1 3 sin 1 d d j=1 x j +y , f(t,x,y,z):= d 2 SIN(t,x) 1+ 1 3 sin 1 d d i=1 x i +y 2 − 1 d d i=1 z i 1+ 1 3 sin( 1 d d j=1 x j +y) −dCOS(t,x)cos y +COS(t,x) ; g(x):= SIN(T,x). 69 The associated PDE (4.1) looks quite complicated, however, the coefficients are constructed in a way so that u :=SIN is the classical solution. Consequently, the FBSDE has the following solution: denoting ¯ X t := 1 d d j=1 X j t , Y t = sin(t+d ¯ X t ),Z i t =cos(t+d ¯ X t ) 1+ 1 3 sin ¯ X t +sin(t+d ¯ X t ) . For PDE (4.1), we see that G γ = 1 2 σ 2 is diagonal and 2 3 ≤ σ ii ≤ 4 3 for each i. Hence a reasonable choice of parameters that maintains the monotonicity would be σ 0 := 4 9 I d , p := min{ 1 1+d(Λ−1) , 1 3 }=1/37. We note that f is not bounded and not Lipschitz continuous in y, however, since Z is bounded, then f(t,x,y,Z t )is bounded and Lipschitz continuous in y, and thus actually we may still apply The- orem 2.4.1. Set d=12,T =0.2,X 0 =(2,3,4,...,13), and apply the parameters specified before for our scheme. An approximation of Y 0 is shown in Figure 2.9, where the true solution Y t = sin(t+ d i=1 X i t ) has value 0.893997 at t=0. 2.6.2 Examples violating monotonicity condition In this subsection we apply our scheme to some examples which do not satisfy our monotonicity Assumption 2.3.4. So theoretically we do not know if our scheme converges or not. However, our numerical results show that the approximation still converges to the true solution. It will be very interesting to understand the scheme under these situations, and we shall leave it for future research. 70 n L K Avg(Ans.) Var(Avg.) cost (in seconds) 2 2083 160 1.462543 3.35×10 −5 1.56×10 −2 5 13021 64 1.111675 2.30×10 −5 2.36×10 −1 10 52083 32 1.014295 2.48×10 −5 2.43×10 0 20 208333 16 0.925712 8.10×10 −6 2.29×10 1 40 833333 8 0.912373 2.46×10 −6 1.94×10 2 80 3333333 4 0.908013 2.89×10 −7 1.56×10 3 160 13333333 2 0.888747 1.62×10 −8 3.42×10 4 10 1 10 2 0.003 0.005 0.007 0.01 0.03 0.05 0.07 0.1 0.2 0.3 0.4 0.5 n Absolute Error Figure 2.9: A 12-dimensional coupled FBSDE in Example 2.6.5. Example 2.6.6. A 12-dimensional PDE with σ=0 Consider the same setting as Example 2.6.3 except that σ=0. Instead of truncating G γ as we did at the end of Example 6.2, we will pick parameters p and σ 0 as if σ were some small positive number: p := 1/d, σ 0 := 2/(2−p)σI d . Then Assumption 2.3.4 is violated and our scheme is in fact not monotone. Nevertheless, our numerical results show that our approximations still converge to the true solution, if L := 625n 2 paths are used, see Figure 2.10. We next apply our scheme to the following HJB equation which is associated with 71 10 0 10 1 10 2 0.01 0.03 0.05 0.07 0.1 0.2 0.3 n Absolute Error n Approx. 2 0.22363 5 0.28971 10 0.38098 20 0.44215 40 0.47712 80 0.49699 160 0.50097 Ans. 0.51398 Figure 2.10: A 12-dimensional example without monotonicity in Example 2.6.6. a Markovian second order BSDEs, introduced by [26, 81]: ⎧ ⎪ ⎪ ⎨ ⎪⎪ ⎩ − ∂u ∂t − 1 2 sup σ≤σ≤σ [σ 2 : D 2 u]−f(t,x)=0, on [0,T)×R d , u(T,x)= g(x), on R d . (6.6) When f = 0, this PDE yields exactly the G-expectation introduced by Peng [76]. We emphasize that, unlike in previous examples, here σ,σ,σ ∈S d are matrices and 0<σ ≤ σ ≤ σ. In particular, G γ is not diagonal anymore. We remark that one has a representation for the solution of this PDE in terms of stochastic control: u(0,x)=sup σ E g(X σ T )+ T 0 f(t,X σ t )dt ,X σ t := x+ t 0 σ s dW s , where W is a d-dimensional Brownian motion, and the control σ is an F W - progressively measurable S d -valued process such that σ ≤ σ ≤ σ. Due to this 72 connection, these kind of PDEs and the related G-expectation and second order BSDEsareimportantinapplicationswithdiffusion control and/orvolatilityuncer- tainty. Example 2.6.7. A 10-dimensional HJB equation Consider the PDE (6.6) with g(x)=sin(T +x 1 + x 2 2 ...+ x d d ) and appropriate f(t,x) so that u(t,x)=sin(t+x 1 + x 2 2 +...+ x d d ) is the true solution to the PDE. We set d=10. To begin our test, we select randomly an initial point X 0 and two 10- dimensional positive definite matrices σ 2 and σ 2 . The parameters used in this PDE are chosen randomly as: X 0 =(0.8870626082, 1.8313582937, 2.1710945122, 2.3703744353, 1.2018847713, 2.6518851292, 2.2648022663, 1.9037585152, 2.336892572579084, 1.1590768112), 73 which gives a true solution -0.99966, σ 2 = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 2.29, 0.07,−0.48, 0.15, 0.89,−0.06, 0.14, 0.31, 0.59,−0.36 0.07, 1.82, 0.55, 0.32, 0.28, 0.08,−0.30,−0.07,−0.46, 0.66 −0.48, 0.55, 2.54,−0.35, 0.14,−0.25,−0.31, 0.16,−0.71,−0.10 0.15, 0.32,−0.35, 1.71,−0.16, 0.67, 0.20, 1.11,−0.03,−0.64 0.89, 0.28, 0.14,−0.16, 1.36,−0.47,−0.46, 0.07,−0.07,−0.03 −0.06, 0.08,−0.25, 0.67,−0.47, 2.60, 0.26, 0.34,−0.02,−0.67 0.14,−0.30,−0.31, 0.20,−0.46, 0.26, 2.61,−0.26, 0.32, 0.29 0.31,−0.07, 0.16, 1.11, 0.07, 0.34,−0.26, 2.66,−0.19,−1.78 0.59,−0.46,−0.71,−0.03,−0.07,−0.02, 0.32,−0.19, 1.80,−0.43 −0.36, 0.66,−0.10,−0.64,−0.03,−0.67, 0.29,−1.78,−0.43, 2.16 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ , and σ 2 = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 1.53,−0.40,−0.30,−0.20, 0.66,−0.43, 0.38, 0.10, 0.84,−0.31 −0.40, 0.72, 0.58,−0.06,−0.15,−0.28,−0.07,−0.14,−0.32, 0.42 −0.30, 0.58, 1.55,−0.05,−0.07,−0.54,−0.03,−0.20,−0.51, 0.38 −0.20,−0.06,−0.05, 0.55,−0.14, 0.22,−0.09, 0.60,−0.13,−0.37 0.66,−0.15,−0.07,−0.14, 0.61,−0.50,−0.09,−0.10, 0.25,−0.12 −0.43,−0.28,−0.54, 0.22,−0.50, 1.27, 0.15, 0.34,−0.06,−0.21 0.38,−0.07,−0.03,−0.09,−0.09, 0.15, 1.78, 0.13,−0.24, 0.17 0.10,−0.14,−0.20, 0.60,−0.10, 0.34, 0.13, 1.04, 0.16,−0.94 0.84,−0.32,−0.51,−0.13, 0.25,−0.06,−0.24, 0.16, 1.22,−0.56 −0.31, 0.42, 0.38,−0.37,−0.12,−0.21, 0.17,−0.94,−0.56, 1.36 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ 74 One can check that σ 2 >σ 2 because the smallest eigenvalue of σ 2 − σ 2 is 0.001634, which is positive. This PDE is not diagonally dominant and typically we cannot find σ 0 and p to make our scheme monotone. However, it is very interesting to observe that our scheme converges to the true solution if we choose p := 1/3and σ 0 := d √ d 2 √ d+1 √ σ 2 , see Figure 2.11. We emphasize again that these parameters still do not satisfy Assumption 2.3.4. It will be very interesting to understand further these numerical results and we will leave it for future research. N L K Avg(Ans.) Var(Avg.) cost (in seconds) 2 283 40 -1.1703 9.62×10 −7 0.057 5 1118 16 -1.12773 3.80×10 −6 1.9 10 3162 8 -1.0802 5.98×10 −6 23.6 15 5809 5 -1.0557 2.32×10 −6 103 20 8944 4 -1.0405 1.57×10 −6 291 30 16432 3 -1.0253 9.05×10 −6 1135 40 25298 2 -1.0124 2.16×10 −5 3074 True Solution -0.99966 0 5 10 15 20 25 30 35 40 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 n Numerical Errors Figure 2.11: A 10-dimensional HJB equation in Example 2.6.7. Note that the PDE (6.6) involves the computation of sup σ≤σ≤σ [σ 2 : γ]. We provide some discussion below. 75 Remark 2.6.8. Let σ 2 −σ 2 = LL T be the Cholesky Decomposition, namely L is a d×d lower triangular matrix. Then for any γ ∈ S d , we have sup σ 2 ≤σ 2 ≤σ 2 [σ 2 : γ]= σ 2 : γ + n i=1 ˆ γ + i , where ˆ γ i , i=1,···,d, are the eigenvalues of L T γL. Proof. Obviously, any σ 2 ∈ S d between σ 2 and σ 2 can be expressed as σ 2 = σ 2 +A, where 0 ≤ A ≤ LL T .Then 0 ≤ L −1 AL −T ≤ I d . We make the following eigenvalue decompositions: L −1 AL −T = U ˆ AU T ,L T γL = Pˆ γP T , where UU T = PP T = I d ,and ˆ A and ˆ g diagonal matrices. It is clear that the diagonal terms of ˆ A are ˆ a i ∈ [0,1] and the diagonal terms of ˆ γ are ˆ γ i .Denote Q := U T P.Then σ 2 : γ −σ 2 : γ = A : γ=[L −1 AL −T ]:[L T γL]=[U ˆ AU T ]:[L T γL] = ˆ A:[U T L T γLU]= ˆ A:[Qˆ γQ T ] = d i=1 ˆ a i d j=1 q 2 ij ˆ γ j ≤ d i=1 d j=1 q 2 ij ˆ γ j + . 76 Note that d j=1 q 2 ij = 1. Then by Jensen’s inequality, σ 2 : γ −σ 2 : γ ≤ d i=1 d j=1 q 2 ij ˆ γ j + ≤ d i=1 d j=1 q 2 ij ˆ γ + j = d j=1 ˆ γ + j d i=1 q 2 ij = d j=1 ˆ γ + j . This proves the remark. Moreover, from the proof we see that the equality holds when ˆ a i =1 { d j=1 q 2 ij ˆ γ j >0} and Q = I d . That is, U = P and thus σ 2 = σ 2 + LP ˆ AP T L T ,where ˆ A is the diagonal matrix whose diagonal terms are ˆ a i =1 {ˆ γ i >0} . We remark that the above computation is in fact quite time consuming. At belowweprovide anotherexample whereG γ istridiagonalandthescheme becomes much more efficient. Example 2.6.9. A 10-dimensional example with tridiagonal structure Consider the PDE (1.1) with G(t,x,y,z,γ):= ! 3 d i=1 γ ii + |i−j|=1 1 √ 1+(γ ij ) 2 " +f(t,x), g(x):=SIN(T,x), (6.7) and f is chosen so that u := SIN is the true solution of the PDE. 77 n L K Avg(Ans.) Var(Avg.) cost (in seconds) 2 2500 160 -1.47362 1.0×10 −5 1.2×10 −2 5 15625 64 -1.15004 1.7×10 −6 1.4×10 −1 10 62500 32 -1.06194 9.1×10 −6 1.0×10 0 20 250000 16 -1.04519 2.1×10 −6 8.9×10 0 40 1000000 8 -1.03326 6.2×10 −7 7.1×10 1 80 4000000 4 -1.03092 5.8×10 −8 5.9×10 2 160 16000000 2 -1.01910 3.0×10 −9 1.4×10 4 10 0 10 1 10 2 0.01 0.03 0.05 0.07 0.1 0.2 0.3 0.4 0.5 0.6 n Absolute Error Figure 2.12: A 10-dimensional example with tridiagonal generator in Example 2.6.9. In this case one may check straightforwardly that [G γ ] ii =3 and [G γ (t,x,y,z,γ)] ij = − γ ij (1+γ 2 ij ) 3 2 , |i−j|=1. When d = 10, this example is out of the scope of our monotonicity Assumption 2.3.4, even with our choice of p and σ 0 : p := min 1 3 , 1 (1+d∗(5−1)) =1/41,σ 0 = I d . However, if we test it using T =0.2,x 0 =(1,2,...,10), the numerical results show that our scheme still converges to the true solution, sin(55) = −0.999755, as presented in Figure 2.12. 78 We shall remark though that this example is computationally more expensive than Example 2.6.3 because here we need to approximate 3d−2 second derivatives. 79 Chapter 3 Extension to Path-dependent PDEs 3.1 Introduction In this chapter we aim to numerically solve the following fully nonlinear Path Dependent PDE (PPDE, for short) with terminal condition u(T,ω)= g(ω): Lu(t,ω):= −∂ t u(t,ω)−G(t,ω,u,∂ ω u,∂ 2 ωω u)=0, 0 ≤t<T, (1.1) where ω is a continuous path on [0,T], and G is increasing in ∂ 2 ωω u, which indicates that the PPDE is parabolic. This kind of PPDE is a cornerstone of non-Markovian models, e.g. stochastic control or stochastic game with diffusion control, as well as financial models with path-dependent volatility, to mention a few. Formally definedbyPengin[75],PPDEhasenjoyedmanyapplicationsincludingmartingales (path-dependent heat equations), BSDEs(semilinear PPDEs) [73], G−martingales [74] and second-order BSDEs [81]. 80 We shall note that in the Markovian case, namely u(t,ω)= v(t,ω t ),g(ω)= f(ω T ),andG(t,ω,y,z,γ)= F(t,ω t ,y,z,γ)forsomedeterministic functionsv,f,F, the PPDE (1.1) becomes a standard PDE with terminal condition v(T,x)= f(x): Lv(t,x):= −∂ t v(t,x)−F(t,x,v,Dv,D 2 v)=0, 0 ≤t<T. (1.2) In this Chapter we extend the theory by Barles and Souganidis in [6] to (1.1). We will follow the definition of viscosity solutions by Ekren et al. in [41, 42, 43, 44], which is based on the functional Itˆ o calculus initiated by Dupire [40] and further developed by Cont and Fournie [30] as well as Ekren et al. [41, 42, 43, 44]. Since the viscosity solution in [42] is defined through some optimal stopping problem under nonlinear expectation, our definition of monotonicity for a numerical PPDE scheme also reflects this perspective. With a few technical arguments to estimate the hitting time involving in the definition of path-dependent viscosity solution, we will show that our monotone scheme converges to the unique viscosity solution of the PPDE, following a similar arguments as in [6]. To begin with, we recall the main result of (Barles and Souganidis [6]). We shall follow the presentation in Chapter 2. We first recall the definition of viscosity solutions for PDE (1.1): an upper (resp. lower) semicontinuous function v is called 81 a viscosity subsolution (resp. viscosity supersolution) of PDE (1.1) if Lϕ(t,x) ≤ (resp. ≥) 0, for any (t,x) ∈ [0,T)×R d and any smooth function ϕ satisfying: [u−ϕ](t,x)=0 ≥ (resp. ≤)[u−ϕ](s,y), for all (s,y) ∈ [0,T]×R d . (1.3) For the viscosity theory of PDEs, we refer to the classical references [31, 47, 84]. We shall adopt the following standard assumptions: Assumption 3.1.1. (i) F(·,0,0,0) and f are bounded. (ii) F is continuous in t, uniformly Lipschitz continuous in (x,y,z,γ),and f is uniformly Lipschitz continuous in x. (iii) PDE (1.1) is parabolic, that is, F is nondecreasing in γ. (iv) Comparison principle for PDE (1.1) holds in the class of bounded viscosity solutions. That is, if v 1 and v 2 are bounded viscosity subsolution and viscosity supersolution of PDE (1.1), respectively, and v 1 (T,·) ≤ f ≤ v 2 (T,·),then v 1 ≤ v 2 on [0,T]×R d . For any t ∈ [0,T)and h ∈ (0,T − t), let T t,x h be an operator on the set of measurable functions ϕ : R d → R.For n ≥ 1, denote h := T n , t i := ih, i=0,1,···,n, and define: v h (t n ,x):= f(x),v h (t,x):= T t,x t i −t [v h (t i ,·)],t ∈ [t i−1 ,t i ),i = n,··· ,1. (1.4) 82 The following convergence result is reported in Chapter 2 Theorem 2.2.2, which is based on [6] and is due to Fahim, Touzi and Warin [46] Theorem 3.6. Theorem 3.1.2. Let Assumption 3.1.1 hold. Assume T t,x h satisfies the following conditions: (i) Consistency: for any (t,x) ∈ [0,T)×R d and any ϕ ∈ C 1,2 ([0,T)×R d ), lim (t ,x ,h,c)→(t,x,0,0) [c+ϕ](t ,x )−T t ,x h [c+ϕ](t +h,·) h = Lϕ(t,x). (ii) Monotonicity: T t,x h [ϕ] ≤T t,x h [ψ] whenever ϕ ≤ ψ. (iii) Stability: v h is bounded uniformly in h whenever f is bounded. (iv) Boundary condition: lim (t ,x ,h)→(T,x,0) v h (t ,x )= f(x) for any x ∈ R d . Then PDE (1.2) with terminal condition v(T,·)= f has a unique bounded viscosity solution v,and v h converges to v locally uniformly as h → 0. In this chapter we propose similar conditions for consistency, monotonicity and stability of a numerical scheme for PPDE and show that they are all one need to guaranttee the convergence of an approximated numerical solution to the unique viscosity solution of the PPDE. Moreover, if the PPDE yields a smooth classical solution, we can obtain a linear rate of convergence of our scheme. An implementable scheme, as well as a testing example of this scheme is proposed later. Weshallpointoutthatsofarproposinganimplementable efficient numerical 83 scheme for PPDEs, especially for high dimensional case (d¿2) is still a challenging problem and deserves further exploration in our future research. The rest of this chapter is organized as follows. In Section 3.2 we introduce path dependent PDEs and its viscosity solutions. In Section 3.3 we prove the main theorem, namely the convergence of monotone schemes. In Section 3.4 we propose an implementable scheme which satisfies all the desired conditions. In Section 3.5 we obtain the rate of convergence of our scheme in the case that the solution is smoothenough. Finallya1-dimensionalexample ofthisscheme istested inSection 3.6. 3.2 Preliminaries In this section, we recall the setup and the notations of [42, 43, 44]. 3.2.1 The canonical setting Let Ω := ω ∈ C([0,T],R d ): ω 0 = 0 , the set of continuous paths starting from the origin, B the canonical process, F the natural filtration generated by B, P 0 the Wiener measure, and Λ := [0,T]×Ω. Here and in the sequel, for notational simplicity, we use 0 to denote vectors or matrices with appropriate dimensions 84 whose components are all equal to 0. Let S d denote the set of d × d symmetric matrices, and x·x := d i=1 x i x i for any x,x ∈R d ,γ : γ := Trace[γγ ] for any γ,γ ∈S d . We define a semi-norm on Ω and a pseudometric on Λ as follows: for any (t,ω),(t ,ω ) ∈ Λ, ω t := sup 0≤s≤t |ω s |, d (t,ω),(t ,ω ) := |t−t |+ # # ω .∧t −ω .∧t # # T . (2.1) Then (Ω,· T ) is a Banach space and (Λ,d) is a complete pseudometric space. Remark 3.2.1. In [42, 43, 44], following [40] we used pseudometric: d ∞ (t,ω),(t ,ω ) := |t−t |+ # # ω .∧t −ω .∧t # # T . Clearly d and d ∞ induce the same topology, and all the results in [42, 43, 44] still hold true under d. However, when we consider the regularity of viscosity solutions, see (4.11) below, it is more natural to use d. Indeed, since B is typically a semimartingale, fort<t we see that √ t −t and B t t are roughly in the same order. 85 We shall denote by L 0 (F T )and L 0 (Λ) the collection of all F T -measurable ran- dom variables and F-progressively measurable processes, respectively. In partic- ular, for any u ∈ L 0 (Λ), the progressive measurability implies that u(t,ω)= u(t,ω ·∧t ). Let C 0 (Λ) (resp. UC(Λ)) be the subset of L 0 (Λ) whose elements are continuous (resp. uniformly continuous) in (t,ω) underd. The corresponding sub- sets of bounded processes are denoted as C 0 b (Λ) and UC b (Λ). Finally, L 0 (Λ,R d ) denotethespaceofR d -valuedprocesses withentriesinL 0 (Λ), andwedefinesimilar notations for the spaces C 0 , C 0 b , UC,and UC b . We denote by T the set of F-stopping times, andH⊂T the subset of those hitting times h of the form h := inf{t : B t / ∈ O}∧t 0 =inf{t : d(ω t ,O c )=0}∧t 0 , (2.2) for some 0<t 0 ≤ T, and some open and convex set O ⊂R d containing 0. For allL> 0, let P L denote the set of semimartingale measures P on Ω whose drift and diffusion characteristics are bounded by L and √ 2L, respectively, and P ∞ := $ L>0 P L . To be precise, let ˜ Ω:=Ω 2 be an enlarged canonical space with canonical process (B,W). For any P∈P L , there exist an extension of probability measure ˜ P on ˜ Ωand α P ∈L 0 ([0,T]× ˜ Ω,R d ), 0 ≤ β P ∈L 0 ([0,T]× ˜ Ω,S d ) such that W is a ˜ P-Brownian motion and |α P |≤ L, |β P |≤ √ 2L, dB t = α P t dt+β P t dW t , ˜ P-a.s. (2.3) 86 We remark that, when α P and β P are deterministic, especially when they are constants, P is uniquely determined by them. We define the path derivatives via the functional Itˆ o formula. Definition 3.2.2. We say u ∈ C 1,2 (Λ) if u ∈ C 0 (Λ) and there exist ∂ t u ∈ C 0 (Λ), ∂ ω u ∈ C 0 (Λ,R d ), ∂ 2 ωω u ∈ C 0 (Λ,S d ) such that, for any P∈P ∞ , u is a local P- semimartingale and it holds: du = ∂ t udt+∂ ω u·dB t + 1 2 ∂ 2 ωω u : dB t , 0 ≤ t ≤T, P-a.s. (2.4) The above ∂ t u, ∂ ω u and ∂ 2 ωω u, if they exist, are unique. Consequently, we call them the time derivative, the first order and second order space derivatives of u, respectively. Definition 3.2.3. We say u ∈ C 1,2 (Λ) is a classical solution (resp. supersolution, subsolution) of PPDE (1.1) if Lu(t,ω)= (resp. ≥,≤) 0, for all (t,ω) ∈ [0,T)×Ω. 3.2.2 The shifted spaces Fix 0 ≤ t ≤ T. -LetΩ t := ω ∈ C([t,T],R d ): ω t = 0 be the shifted canonical space; B t the shifted canonical process on Ω t ; F t the shifted filtration generated by B t , P t 0 the Wiener measure on Ω t ,andΛ t := [t,T]×Ω t . 87 -For s ∈ [t,T], define · s on Ω t and d on Λ t in the spirit of (2.1), and the sets L 0 (Λ t )etc. inanobviousway. -For s ∈ [0,t], ω ∈ Ω s and ω ∈ Ω t , define the concatenation path ω⊗ t ω ∈ Ω s by: (ω ⊗ t ω )(r):= ω r 1 [s,t) (r)+(ω t +ω r )1 [t,T] (r), for all r ∈ [s,T]. -Let s ∈ [0,T), ξ ∈ L 0 (F s T ), and X ∈ L 0 (Λ s ). For (t,ω) ∈ Λ s , define ξ t,ω ∈ L 0 (F t T )and X t,ω ∈L 0 (Λ t )by: ξ t,ω (ω ):= ξ(ω⊗ t ω ),X t,ω (ω ):= X(ω ⊗ t ω ), for all ω ∈ Ω t . Moreover, for a random time τ, we shall use the notation ξ τ,ω := ξ τ(ω),ω . - Define T t , H t , P t L , P t ∞ ,and C 1,2 (Λ t ) etc. in an obvious manner. It isclear thatu t,ω ∈ C 0 (Λ t )foranyu ∈ C 0 (Λ)and(t,ω) ∈ Λ. Similar property holds for other spaces introduced above. Moreover, for any τ ∈T (resp. h∈H) and any (t,ω) ∈ Λ such thatt<τ(ω) (resp. t< h(ω)), it is clear that τ t,ω ∈T t (resp. h t,ω ∈H t ). Viscosity solutions of PPDEs We first introduce the spaces for viscosity solutions. 88 Definition 3.2.4. Let u ∈L 0 (Λ). (i) We say u is right continuous in (t,ω) under d if: for any (t,ω) ∈ Λ and any ε> 0, there existsδ> 0 such that, for any (s, ˜ ω) ∈ Λ t satisfying d((s, ˜ ω),(t,0)) ≤ δ, we have |u t,ω (s,˜ ω)−u(t,ω)|≤ ε. (ii) We say u∈U if u is bounded from above, right continuous in (t,ω) under d, and there exists a modulus of continuity function ρ such that for any (t,ω),(t ,ω ) ∈ Λ: u(t,ω)−u(t ,ω ) ≤ ρ d (t,ω),(t ,ω ) whenever t ≤ t . (2.5) (iii) We say u ∈ U if −u∈U. It is clear that U ∩ U = UC b (Λ). We also recall from [42] Remark 3.2 that Condition (2.5) implies that u has left-limits and positive jumps. We next introduce the nonlinear expectations. Denote by L 1 (F t T ,P t L )theset of ξ ∈L 0 (F t T )withsup P∈P t L E P [|ξ|] < ∞, and define, for ξ ∈L 1 (F t T ,P t L ), E L t [ξ]= sup P∈P t L E P [ξ]and E L t [ξ] = inf P∈P t L E P [ξ]= −E L t [−ξ]. 89 We now define viscosity solutions. For any u ∈ L 0 (Λ), (t,ω) ∈ [0,T)×Ω, and L> 0, let A L u(t,ω):= ϕ ∈ C 1,2 (Λ t ): ∃h∈H t s.t. (ϕ−u t,ω ) t =0= inf τ∈T t E L t (ϕ−u t,ω ) τ∧h , A L u(t,ω):= ϕ ∈ C 1,2 (Λ t ): ∃h∈H t s.t. (ϕ−u t,ω ) t =0= sup τ∈T t E L t (ϕ−u t,ω ) τ∧h . (2.6) Definition 3.2.5. (i) LetL> 0.Wesay u ∈U (resp. U)isaviscosity L- subsolution (resp. L-supersolution) of PPDE (1.1) if, for any (t,ω) ∈ [0,T)× Ω and any ϕ∈A L u(t,ω) (resp. ϕ ∈ A L u(t,ω)): L t,ω ϕ(t,0):= −∂ t ϕ−G t,ω (.,ϕ,∂ ω ϕ,∂ 2 ωω ϕ) (t,0) ≤ (resp. ≥)0. (2.7) (ii) We say u∈U (resp. U) is a viscosity subsolution (resp. supersolution) of PPDE (1.1) if u is a viscosity L-subsolution (resp. L-supersolution) of PPDE (1.1) for someL> 0. (iii) We say u ∈ UC b (Λ) is a viscosity solution of PPDE (1.1) if it is both a viscosity subsolution and a viscosity supersolution. As pointed out in [43] Remark 3.11 (i), without loss of generality in (2.6) we may always set h = h t ε for some smallε> 0: h t ε := inf{s>t : |B t s |≥ ε}∧(t+ε). (2.8) 90 3.3 Monotone scheme for PPDEs Our goal of this section is to extend Theorem 3.1.2 to PPDE (1.1). Similar to Assumption 3.1.1, we assume Assumption 3.3.1. (i) G(·,0,0,0) and g are bounded. (ii) G is continuous in (t,ω), uniformly Lipschitz continuous in (y,z,γ),and g is uniformly continuous in ω.Denoteby L 0 the Lipschitz constant of G in (z,γ). (iii) PPDE (1.1) is parabolic, that is, G is nondecreasing in γ. (iv) Comparisonprinciple for PPDE (1.1) holds in the class of bounded viscosity solutions. That is, if u 1 and u 2 are bounded viscosity subsolution and viscosity supersolution of PPDE (1.1), respectively, and u 1 (T,·) ≤ g ≤ u 2 (T,·),then u 1 ≤ u 2 on Λ. For the comparison principle in (iv) above, we refer to [44] for some sufficient conditions. Now for any (t,ω) ∈ [0,T)×Ωand h ∈ (0,T −t), let T t,ω h be an operator on L 0 (F t t+h ). For n ≥ 1, denote h := T n , t i := ih, i=0,1,···,n, and define: u h (t n ,ω):= g(ω),u h (t,ω):=T t,ω t i −t u h (t i ,·) ,t ∈ [t i−1 ,t i ),i = n,··· ,1. (3.1) where we abuse the notation that: T t,ω h [ϕ]:= T t,ω h [ϕ t,ω ], for ϕ ∈L 0 (F t+h ). 91 The following main result is analogous to Theorem 3.1.2. Theorem 3.3.2. Let Assumption 3.3.1 hold. Assume T t,ω h satisfies the following conditions: (i) Consistency: for any (t,ω) ∈ [0,T)×Ω and ϕ ∈ C 1,2 (Λ t ), lim (t ,ω ,h,c)→(t,0,0,0) [c+ϕ](t ,ω )−T t ,ω⊗tω h [c+ϕ](t +h,·) h = L t,ω ϕ(t,0). (3.2) where (t ,ω ) ∈ Λ t , h ∈ (0,T −t), c ∈R,and L t,ω ϕ is defined in (2.7). (ii) Monotonicity: for some constant L ≥ L 0 and any ϕ,ψ ∈UC b (F t t+h ),there exists a modulus of continuity function ρ mon , which depends only on L, d,andthe uniform continuity of ϕ,ψ, but does not depend on the specific ϕ,ψ, such that E L t [ϕ−ψ] ≤ 0 implies T t,ω h [ϕ] ≤T t,ω h [ψ]+hρ mon (h). (3.3) (iii) Stability: u h is uniformly bounded and uniformly continuous in ω,uni- formly on h. Moreover, there exists a modulus of continuity function ρ, indepen- dent of h, such that |u h (t,ω)−u h (t ,ω ·∧t )|≤ ρ (t −t)∨h , for anyt<t and any ω ∈ Ω. (3.4) Then PPDE (1.1) with terminal condition u(T,·)= g has a unique bounded L- viscosity solution u,and u h converges to u locally uniformly as h → 0. 92 Remark 3.3.3. The conditions in Theorem 3.3.2 reflect the features of our defi- nition of viscosity solution for PPDEs. (i) For the consistency condition (3.2), we require the convergence only for t ≥ t. (ii) The monotonicity condition in Theorem 3.1.2 (ii) is due to the maximum condition (1.3) in the definition of viscosity solutions for PDEs. In our path depen- dent case, the monotonicity condition (3.3) is modified in a way to adapt to (2.6). Moreover, typically we have ρ mon =0. (iii) Due to the uniform continuity required in the definition of viscosity solu- tions, the stability condition in Theorem 3.3.2 (iii) is somewhat strong. Note that this condition obviously implies the counterparts of the Stability and Boundary conditions in Theorem 3.1.2. To prove the theorem, we need a technical lemma. Lemma 3.3.4. LetL> 0, h∈H, τ ∈T, τ ≤ h,and X ∈U with modulus of continuity function ρ in (2.5). Assume E L 0 [X τ ]−E L 0 X h ≥c> 0 (3.5) 93 Then there exist constants δ 0 = δ 0 (c,L,d,ρ) > 0, C = C(L,d) > 0,and ω ∗ ∈ Ω such that t ∗ := τ(ω ∗ ) < h(ω ∗ ) and sup P∈P t∗ L P h t∗,ω ∗ −t ∗ ≤ δ ≤ Cδ 2 for all δ ≤ δ 0 . (3.6) Proof. Let h correspond to O and t 0 in (2.2). We first claim there exist δ 0 = δ 0 (c,L,d,ρ)and ω ∗ such that t ∗ := τ(ω ∗ )<t 0 −δ 0 and d(ω ∗ t∗ ,O c ) ≥ δ 1 6 0 . (3.7) In particular, this implies that t ∗ < h(ω ∗ ). Then, for any P∈P t∗ L and δ ≤ δ 0 , P h t∗,ω ∗ −t ∗ ≤ δ = P h t∗,ω ∗ −t ∗ ≤ δ, ω ∗ t∗ +B t∗ h t∗,ω ∗ ∈ O c ≤ P sup t∗≤s≤t∗+δ |B t∗ s |≥ d(ω ∗ t∗ ,O c ) ≤P sup t∗≤s≤t∗+δ |B t∗ s |≥ δ 1 6 0 ≤ δ −1 0 E P sup t∗≤s≤t∗+δ |B t∗ s | 6 ≤Cδ 2 , proving (3.6). We now prove (3.7) by contradiction. Assume (3.7) is not true, then τ ≥ t 0 −δ 0 or d(B τ ,O c )<δ 0 1 6 , ∀ω ∈ Ω. (3.8) 94 By definition of E L 0 , there exists P∈P 0 L such that E L 0 [X τ ] ≤E P [X τ ]+ c 2 . (3.9) Note that B τ (ω) ∈ O whenever τ(ω) < h(ω). Recall (2.3) and let η(ω)denote the unit vector pointing from B τ (ω)to O c .Set η(ω) be a fixed unit vector when τ(ω)= h(ω). Then η∈F τ . Construct ˆ P∈P 0 L as follows: α ˆ P t := α P t 1 [0,τ) (t)+Lη1 [τ,t 0 ) ,β ˆ P t := β P t 1 [0,τ) (t). That is, ˆ P = P on F τ and dB τ(ω) t = Lη(ω)dt, t ≥ τ, ˆ P τ,ω -a.s., where ˆ P τ,ω is the regular conditional probability distribution of P. Then, one can easily see that |B τ(ω) t | = L[t−τ(ω)], h τ,ω −τ(ω)= d(B τ (ω),O c ) L ∧[t 0 −τ(ω)], ˆ P τ,ω -a.s. for all ω ∈ Ω. This, together with (3.8), implies d (τ,ω),(h,ω) = h−τ+sup τ≤t≤h |B τ t | ≤ C[h−τ] ≤ C[ δ 0 1 6 L +δ 0 ] ≤Cδ 0 1 6 , ˆ P τ,ω -a.s. (3.10) 95 Then, by (3.9), (2.5), and (3.10), E L 0 [X τ ]−E L 0 X h ≤E P [X τ ]−E ˆ P [X h ]+ c 2 = E ˆ P [X τ −X h ]+ c 2 ≤ E ˆ P ρ d (τ,ω),(h,ω) + c 2 ≤ ρ Cδ 1 6 0 + c 2 . This contradicts with (3.5) when δ 0 is small enough, and thus (3.7) holds true. Proof of Theorem 3.3.2. By the stability, u h is bounded. Define u(t,ω) := liminf h→0 u h (t,ω), u(t,ω) := limsup h→0 u h (t,ω). (3.11) Clearly u(T,ω)= g(ω)= u(T,ω), u ≤ u,and u,u are bounded and uniformly continuous. We shall show that u (resp. u) is a viscosity L-supersolution (resp. L-subsolution) of PPDE (1.1). Then by the comparison principle we see that u ≤ u and thus u := u = u is the unique viscosity solution of PPDE (1.1). The convergence of u h is obvious now, which, together with the uniform regularity of u h and u, implies further the locally uniform convergence. Without loss of generality, we shall only prove by contradiction that u satisfies the viscosity L-supersolution property at (0,0). Assume not, then there exists ϕ 0 ∈ A L u(0,0) with corresponding h∈H such that −c 0 := Lϕ 0 (0,0) < 0. Denote ϕ(t,ω):= ϕ 0 (t,ω)− c 0 2 t. (3.12) 96 Then Lϕ(0,0)= − c 0 2 < 0. (3.13) Denote X 0 := ϕ −u, X h := ϕ −u h ,and E := E L 0 , E := E L 0 . Recall (2.8) and denote h ε := h 0 ε ∧ ε 5 , c ε := 1 3 c 0 ε 5 .Notethat h ε ≤ h for ε small enough, and by [43] (2.8), sup P∈P L P(h ε = ε 5 )= sup P∈P L P(h 0 ε <ε 5 ) ≤CL 4 ε −4 ε 10 ≤Cεc ε . (3.14) Then E[ε 5 −h ε ] ≤ E ε 5 1 {hε =ε 5 } ≤Cεc ε . Thus, for ε small, it follows from ϕ 0 ∈ A L u(0,0)that X 0 0 −E[X 0 hε ] =[ϕ 0 −u] 0 −E (ϕ 0 −u) hε − c 0 2 h ε ≥ E (ϕ 0 −u) hε −E (ϕ 0 −u) hε − c 0 2 h ε ≥E c 0 2 h ε = c 0 ε 5 2 − c 0 2 E[ε 5 −h ε ] ≥ 3cε 2 −Cεc ε ≥ c ε > 0. (3.15) 97 Let h k ↓ 0 be a sequence such that lim k→∞ u h k 0 = u 0 , (3.16) and simplify the notations: u k := u h k , X k := X h k . Then (3.15) leads to c ε ≤ [ϕ 0 −liminf h→0 u h 0 ]−E ϕ hε −liminf h→0 u h hε ≤ [ϕ 0 − lim k→∞ u k 0 ]−E ϕ h ε −liminf k→∞ u k hε . Note that X k is uniformly bounded. Then by (3.14) we have E |X k hε −X k ε 5| ≤Cεc ε . Since u h is uniformly continuous, applying the monotone convergence theorem under nonlinear expectation E, see e.g. [42] Proposition 2.5, we have c ε ≤ lim k→∞ [ϕ 0 −u k 0 ]−E limsup k→∞ [ϕ hε −u k hε ] ≤ lim k→∞ X k 0 −E limsup k→∞ X k ε 5 +Cεc ε = lim k→∞ X k 0 −E lim m→∞ sup k≥m X k ε 5 +Cεc ε = lim k→∞ X k 0 − lim m→∞ E sup k≥m X k ε 5 +Cεc ε ≤ lim k→∞ X k 0 −limsup k→∞ E X k ε 5 +Cεc ε ≤ lim k→∞ X k 0 −limsup k→∞ E X k hε +Cεc ε = liminf k→∞ X k 0 −E X k hε +Cεc ε . 98 Then, for all ε small enough and k large enough, X k 0 −E X k hε ≥ c ε 2 . (3.17) Now for each k, define Y k t (ω):= sup τ∈T t E L t (X k ) t,ω τ∧h t,ω ε ,t ≤ h ε (ω), and τ k := inf{t ≥0: Y k t = X k t }. We remark that here Y k ,τ k depend on ε as well, but we omit the superscript ε for notational simplicity. Applying [42] Theorem 3.6, we know τ k ≤ h ε is an optimal stopping time for Y k 0 and thus 0 < c ε 2 ≤ X k 0 −E X k hε ≤ Y k 0 −E X k hε = E X k τ k −E X k hε By Lemma 3.3.4, for k large enough so that h k ≤ δ 0 ( cε 2 ,L,d,ρ), there exists ω k ∈ Ω such that t k ∗ := τ k (ω k ) < h ε (ω k )and sup P∈P t k ∗ L P h k ε −t k ∗ ≤ δ ≤Cδ 2 for all δ ≤ h k , (3.18) 99 where h k ε := h t k ∗ ,ω k ε .Let {t k i ,i=0,···,n k } denote the time partitioncorresponding to h k , and assume t k i−1 ≤ t k ∗ <t k i .Notethat X k t k ∗ (ω k )= Y k t k ∗ (ω k ) ≥ E L t k ∗ (X k ) t k ∗ ,ω k τ∧h k ε , ∀τ∈T t k ∗ . Set δ k := t k i −t k ∗ ≤ h k and τ := t k i . Combine the above inequality and (3.18) we have [ϕ−u k ](t k ∗ ,ω k ) ≥ E L t k ∗ (ϕ−u k ) t k ∗ ,ω k t k i ∧h k ε ≥ E L t k ∗ (ϕ−u k ) t k ∗ ,ω k t k i −Cδ 2 k . This implies E L t k ∗ ϕ t k ∗ ,ω k t k i −[ϕ−u k ](t k ∗ ,ω k )−Cδ 2 k −(u k ) t k ∗ ,ω k t k i ≤ 0. Recall the stability condition that u k are uniformly continuous, uniformly in k. Then by the monotonicity condition (3.3) we have u k (t k ∗ ,ω k )= T t k ∗ ,ω k δ k [u k t k i ] ≤ T t k ∗ ,ω k δ k ϕ t k i −[ϕ−u k ](t k ∗ ,ω k )−Cδ 2 k +δ k ρ mon (δ k ) . (3.19) We next use the consistency condition (3.2). For (t,ω)=(0,0), set t := t k ∗ ,ω := ω k ,h := δ k ,c := −[ϕ−u k ](t k ∗ ,ω k )−Cδ 2 k +δ k ρ mon (δ k ). 100 By first sending k→∞ and then ε → 0, we see that d((t k ∗ ,ω k ),(0,0)) ≤ h ε +sup 0≤t≤hε |ω k t |≤ 2ε → 0,h ≤ h k → 0, which, together with (3.12), (3.16), andthe uniform continuity of ϕ andu k , implies |c|≤ [ϕ−u k ](t k ∗ ,ω k )−[ϕ−u k ](0,0) +|u k 0 −u 0 |+Cδ 2 k +δ k ρ mon (δ k ) → 0. Then, by the consistency condition (3.2) we obtain from (3.19) that 0 ≤ u k (t k ∗ ,ω k )−T t k ∗ ,ω k δ k ϕ t k i −[ϕ−u k ](t k ∗ ,ω k )−Cδ 2 k +δ k ρ mon (δ k ) δ k = [c+ϕ](t k ∗ ,ω k )−T t k ∗ ,ω k δ k [c+ϕ] t k i δ k +Cδ k −ρ mon (δ k )→Lϕ(0,0). This contradicts with (3.13). 3.4 An implementable monotone scheme We first remark that the monotonicity condition (3.3) is solely due to our defi- nition of viscosity solution of PPDEs. It is sufficient but not necessary for the convergence of the scheme. In Markovian case, the PPDE (1.1) is reduced back 101 to PDE (1.1). The schemes proposed in [46] and [52] satisfy the traditional mono- tonicity condition in Theorem 3.1.2, but violate our new monotonicity condition (3.3). However, as proved in [46, 52], we know those schemes do converge. The goal of this section is to propose an implementable scheme which satisfies all the conditions in Theorem 3.3.2 and thus converges. However, to ensure the monotonicity condition (3.3), we will need certain conditions which are purely technical. This conditions imposes a restriction on the non-linearity of the PPDE. Finding monotone schemes for general parabolic PPDEs is a challenging problem and we shall leave it for future research. We denote by [v] i the i−th element of vector v,andby[A] ij the (i,j)−th component of matrix A. Our scheme will involve some parameters: μ i > 0,σ i > 0,μ ∈R d , [μ] i = μ i ,σ ∈R d×d , [σ] ij = σ j ,i,j=1,···,d. (4.1) Given (t,ω) ∈ [0,T)×Ω, recall (2.3) and let ˆ P be a set of probability measures P on Ω t such that α P = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ξ 1 μ 1 ξ 2 μ 2 . . . ξ d μ d ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ,β P = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ η 11 σ 1 ··· η 12 σ 2 η 21 σ 1 ··· η 22 σ 2 . . . ··· . . . η d1 σ 1 ··· η dd σ d ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ , 102 where ξ i ∈{0,1},η ij ≡ η ji ∈{0,1}, 1 ≤ i,j ≤ d. Obviously, ˆ P hasacardinalityof| ˆ P|=2 1.5d+0.5d 2 .Tofacilitatethepresentation, we introduce the following subsets of ˆ P:for i,j=1,...,d, P i 1 := {P ∈ ˆ P,[α P ] i = μ i }, P i 0 := {P ∈ ˆ P,[α P ] i =0}, P ii 1 := {P ∈ ˆ P,[β P ] ii = σ i }, P ii 0 := {P ∈ ˆ P,[β P ] ii =0}, P ij 1 := {P ∈ ˆ P,[β P ] ij = σ j , [β P ] ji = σ i }, P ij 0 := {P ∈ ˆ P,[β P ] ij =0, [β P ] ji =0}. One can see easily that each of the subsets above has a cardinality of 1 2 | ˆ P|. Now for h ∈ (0,T −t)and ϕ ∈L 0 (F t t+h ), define T t,ω h [ϕ]:= D (0) ϕ+h G t,ω,D (0) ϕ,D (1) ϕ,D (2) ϕ − 1 2 μ·D (1) ϕ− 1 4 (σσ T ): D (2) ϕ , (4.2) 103 where D (0) ϕ, D (1) ϕ, D (2) ϕ take values in R, R d , S d , respectively, with each com- ponent defined by D (0) ϕ := 1 | ˆ P| P∈ ˆ P E P [ϕ], D (1) i ϕ := 1 μ i h| ˆ P|/2 ⎛ ⎝ P∈P i 1 E P [ϕ]− P∈P i 0 E P [ϕ] ⎞ ⎠ , D (2) i,i ϕ := 1 σ 2 i h| ˆ P|/4 ⎛ ⎝ P∈P ii 1 E P [ϕ]− P∈P ii 0 E P [ϕ] ⎞ ⎠ , D (2) i,j ϕ := 1 σ i σ j h| ˆ P|/2 ⎧ ⎨ ⎩ ⎛ ⎝ P∈P ij 1 E P [ϕ]− P∈P ij 0 E P [ϕ] ⎞ ⎠ − ⎛ ⎝ P∈P ii 1 E P [ϕ]− P∈P ii 0 E P [ϕ] ⎞ ⎠ − ⎛ ⎝ P∈P jj 1 E P [ϕ]− P∈P jj 0 E P [ϕ] ⎞ ⎠ ⎫ ⎬ ⎭ ,i = j. (4.3) We now verify the conditions in Theorem 3.3.2. Lemma 3.4.1 (Consistency). Under Assumption 3.3.1, T t,ω h satisfies the consis- tency condition (3.2). Proof. Without loss of generality, we assume (t,ω)=(0,0). For any P ∈ ˆ P we define a derivative operator ∂ P := ∂ t + i:α P i =1 μ i ∂ ω i + 1 2 i,j:β P ij =1 σ i σ j ∂ 2 ω i ω j. 104 Let (t ,ω ,h,c) be as in (3.2), and for notational simplicity, at below we write (t ,ω )as(t,ω). Now for ϕ ∈ C 1,2 (Λ), denote ψ := c + ϕ t,ω (t+ h,·) ∈ L 0 (F t t+h ), and ϕ| s t := ϕ t,ω (s,B t ) − ϕ(t,ω). Send (t,ω,h,c) → (0,0,0,0), by the functional Itˆ o formula and the smoothness of ϕ, one can easily check that D (0) ψ = c+ P∈ ˆ P E P [ϕ] | ˆ P| → P∈ ˆ P ϕ(0,0) | ˆ P| = ϕ(0,0), and D (0) ψ −[c+ϕ](t,ω) h = 1 h| ˆ P| P∈ ˆ P E P [ϕ| t+h t ] = 1 h| ˆ P| P∈ ˆ P t+h t E P ∂ P ϕ(s,ω ⊗ t B t ) ds → ! ∂ t ϕ+ 1 2 μ·∂ ω ϕ+ 1 4 (σσ T ): ∂ ωω ϕ " (0,0); Moreover, the approximation of path derivatives are consistent as well: D (1) i ψ = 1 μ i h| ˆ P|/2 ⎛ ⎝ P∈P i 1 E P − P∈P i 0 E P ⎞ ⎠ [ϕ| t+h t ] = 2 μ i h| ˆ P| ⎧ ⎨ ⎩ t+h t P∈P i 1 E P ∂ P ϕ(s,ω ⊗ t B t ) − P∈P i 0 E P ∂ P ϕ(s,ω ⊗ t B t ) ds ⎫ ⎬ ⎭ → ∂ ω iϕ(0,0); 105 D (2) i,i = 4 σ 2 i h| ˆ P| ⎛ ⎝ P∈P ii 1 E P − P∈P ii 0 E P ⎞ ⎠ [ϕ| t+h t ] = 4 σ 2 i h| ˆ P| ⎧ ⎨ ⎩ t+h t P∈P ii 1 E P ∂ P ϕ(s,ω ⊗ t B t ) − P∈P ii 0 E P ∂ P ϕ(s,ω ⊗ t B t ) ds ⎫ ⎬ ⎭ → ∂ 2 ω i ω iϕ(0,0); D (2) i,j ϕ = 2 σ i σ j h| ˆ P| ⎧ ⎨ ⎩ ⎛ ⎝ P∈P ij 1 E P [ϕ| t+h t ]− P∈P ij 0 E P [ϕ| t+h t ] ⎞ ⎠ − ⎛ ⎝ P∈P ii 1 E P [ϕ| t+h t ]− P∈P ii 0 E P [ϕ| t+h t ] ⎞ ⎠ − ⎛ ⎝ P∈P jj 1 E P [ϕ| t+h t ]− P∈P jj 0 E P [ϕ| t+h t ] ⎞ ⎠ ⎫ ⎬ ⎭ → 1 σ i σ j 1 2 σ 2 i ∂ 2 ω i ω i + 1 2 σ 2 j ∂ 2 ω j ω j +σ i σ j ∂ 2 ω i ω j − 1 2 σ 2 i ∂ 2 ω i ω i − 1 2 σ 2 j ∂ 2 ω j ω j ϕ(0,0) → ∂ 2 ω i ω jϕ(0,0); Plug these into (4.2) and (4.3), we obtain (3.2) immediately. Toensurethemonotonicitycondition(3.3),weneedsomeadditionalconditions. 106 Assumption 3.4.2. Assume G is differentiable in (z,γ) and one may choose μ i ,σ i so that the scheme (4.2) is non-decreasing with respect to E P [ϕ] for any P ∈ ˆ P. In other words, there exists some ε 0 ∈ (0,1) such that for any P ∈ ˆ P M P :=1−2 ( i:[α P ] i =0 ! ∂z i G μ i − 1 2 " − j:[α P ] j =1 ! ∂z j G μ j − 1 2 " + i:[β P ] ii =0 2 ∂γ ii G σ 2 i − 1 4 − j:j =i ∂γ ij G σ i σ j − 1 4 − j:[β P ] jj =1 2 ∂γ jj G σ 2 j − 1 4 − i:i =j ∂γ ij G σ i σ j − 1 4 + i =j:[β P ] ij =0 ∂γ ij G σ i σ j − 1 4 − i =j:[β P ] ij =1 ∂γ ij G σ i σ j − 1 4 ≥ ε 0 . (4.4) Remark3.4.3. (i)ThedifferentiabilityofGisjustforconvenience. Fornotational simplicity, at below we shall assume G is differentiable in y as well. (ii) When ∂ γ G is diagonal, (4.4) reduces to that 1− d i=1 2 ∂ z i G μ i −1 − d i=1 4 ∂ γ ii G σ 2 i −1 ≥ ε 0 . (iii) A sufficient condition for the inequality in the above assumption can be that ∂z i G μ i − 1 2 ≤ (d+3d 2 ) −1 ( 1−ε 0 2 ), 1 ≤ i ≤ d ∂γ ii G σ 2 i − 1 4 ≤ (d+3d 2 ) −1 ( 1−ε 0 2 ), 1 ≤ i ≤ d ∂γ ij G σ i σ j − 1 4 ≤ (d+3d 2 ) −1 ( 1−ε 0 2 ), 1 ≤ i = j ≤ d. 107 This essentially implies that the fully nonlinear PPDE will degenerate to a semi- linear PPDE under the restriction of monotonicity. An efficient scheme that works with less constraint will be explored in our future research. Lemma 3.4.4 (Monotonicity). Under Assumptions 3.3.1 and 3.4.2, T t,ω h satisfies the monotonicity condition (3.3) with ρ mon =0 for L ≥ L 0 large enough and h small enough. Proof. Without loss of generality, we assume (t,ω)=(0,0) and denote T h := T t,ω h . Assume L ≥ L 0 is large enough so that the P ∈ ˆ P in this section are in P L .Let ϕ 1 ,ϕ 2 ∈UC b (F h )satisfy E L [ψ] ≤ 0, where ψ := ϕ 1 −ϕ 2 . (4.5) Then, recalling (4.2), T h ϕ 1 −T h ϕ 2 = D (0) ψ +h D (0) ψ∂ y G+(∂ z G− μ 2 )·D (1) ψ+(∂ γ G− σσ T 4 ): D (2) ψ . Note that here ∂ y G etc are deterministic. By (4.3) we have T h ϕ 1 −T h ϕ 2 = h∂ y GD (0) ψ + P∈ ˆ P E P [ψ] M P | ˆ P| = P∈ ˆ P E P [ψ] M P +h∂ y G | ˆ P| , (4.6) 108 Note that P∈ ˆ P M P +h∂ y G | ˆ P| =1+h∂ y G. Then one may define the following probability measure: ˆ P := 1 1+h∂ y G M P +h∂ y G | ˆ P| P , (4.7) and rewrite (4.6) as T h ϕ 1 −T h ϕ 2 =(1+h∂ y G)E ˆ P [ψ]. (4.8) Since ∀P ∈ ˆ P,P∈P L , (4.5) implies E P [ψ] ≤ 0,∀P ∈ ˆ P, and thus E ˆ P [ψ] ≤ 0. This leads to (3.3) immediately. Remark 3.4.5. In general ˆ P may not be in P L . However, we still have E L ≤E ˆ P ≤ E L . We now verify the stability condition. Lemma 3.4.6 (Stability). Let Assumptions 3.3.1 and 3.4.2 hold, and assume further that G and g are uniformly Lipschitz continuous in ω.Then T t,ω h satisfies the stability condition in Theorem 3.3.2 for L large enough and h small enough. 109 Proof. We assume L and h are chosen so that T t,ω h satisfies the monotonicity con- dition (3.3). (i) We first show that u h is uniformly bounded. Denote C i := C h i := sup ω∈Ω |u h (t i ,ω)|, ϕ := [u h (t i+1 ,·)] t i ,ω , and recall (3.1). By (4.2) we have u h (t i ,ω)= D (0) ϕ+hG(t i ,ω,0,0,0)−hG(t i ,ω,0,0,0) +h G(t,ω,D (0) ϕ,D (1) ϕ,D (2) ϕ)− 1 2 μ·D (1) ϕ− 1 4 (σσ T ): D (2) ϕ . Following the arguments for (4.6), for some ˆ P defined in the spirit of (4.5)-(4.7), we have u h (t i ,ω)= (1+h∂ y G)E ˆ P [ϕ]+hG(t i ,ω,0,0,0). (4.9) Then |u h (t i ,ω)|≤ (1+h∂ y G) E ˆ P [ϕ] +h|G(t i ,ω,0,0,0)|≤ (1+Ch)C i+1 +Ch. That is, C i ≤ [1+Ch]C i+1 +Ch. 110 Note that C n = g ∞ . Then by the discrete Gronwall inequality we see that max 0≤i≤n C i ≤ C, where the constant C is independent of h. Finally, for t ∈ (t i ,t i+1 ), following similar arguments we can easily show that |u h (t,ω)|≤ [1+Ch]C i+1 +Ch ≤ C. Therefore, u h is uniformly bounded. (ii) We next show that u h is uniformly Lipschitz continuous in ω.Let L i := L i h denote the Lipschitz constant of u h (t i ,·). Given ω 1 ,ω 2 ∈ Ω, denote ψ := [u h (t i+1 ,·)] t i ,ω 1 −[u h (t i+1 ,·)] t i ,ω 2 ,then |ψ|≤ L i+1 ω 1 ⊗ t i B t i −ω 2 ⊗ t i B t i t i+1 = L i+1 ω 1 −ω 2 t i . Note that G is uniform Lipschitz continuous in ω with certain Lipschitz constant L G . Then similar to (i) above, we have |u h (t i ,ω 1 )−u h (t i ,ω 2 )|≤ (1+h∂ y G)E ˆ P [|ψ|]+L G hω 1 −ω 2 t i ≤ L i+1 ω 1 −ω 2 t i [1+Ch]+L G hω 1 −ω 2 t i . Then L i ≤ L i+1 [1+Ch]+L G h. 111 Since L n = L g is the Lipschitz constant of g which is independent of h,weseethat max 0≤i≤n L i is independent of h. Finally, as in the end of (i) above we see that u h (t,·) is uniformly Lipschitz continuous in ω, uniformly in t and h. (iii) We now prove the following time regularity in two steps: |u h (t,ω)−u h (t ,ω ·∧t )|≤ C √ t −t+h, for all 0 ≤t<t ≤T. (4.10) Step 1. We first assume t = T and t = t i .For j = i+1,···,n, in the spirit of (4.9), we may define ˆ P j such that ˆ P j+1 = ˆ P j on F t i t j and u h (t j ,ω ⊗ t i B t i ) = [1+hb j ]E ˆ P j+1 u h (t j+1 ,ω ⊗ t i B t i ) F t i t j +hc j , where b j := ∂ y G(t j )and c j := G(t j ,ω ⊗ t i B t i ,0,0,0)arein L ∞ (F t i t j ). Denote Γ i := 1, Γ j+1 := ) j k=i [1+hb k ]. By induction we have u h (t i ,ω)= E ˆ Pn Γ n u h (t n ,ω ⊗ t i B t i )+h n−1 j=i Γ j c j = E ˆ Pn Γ n g(ω ⊗ t i B t i )+h n−1 j=i Γ j c j . One may easily check that |Γ j |≤ C, |Γ j −1|≤ C(j −i)h ≤ C(n−i)h = C(T −t i ). 112 Thus |u h (t i ,ω)−u h (t n ,ω ·∧t i )| = E ˆ Pn Γ n g(ω ⊗ t i B t i )+h n−1 j=i Γ j c j −g(ω ·∧t i ) ≤ E ˆ Pn |Γ n −1||g(ω⊗ t i B t i )|+|g(ω ⊗ t i B t i )−g(ω ·∧t i )|+C(n−i)h ≤ C(T −t i )+CE ˆ Pn [B t i T ]. One can easily show thatE ˆ Pn [B t i T ] ≤ C √ T −t i . Then (4.10) holds in this case. Step 2. Wenowverify thegeneral case. Assume t i−1 ≤t<t i andt j−1 ≤ t <t j , then clearly i ≤ j.Since u h (t i ,·)and u h (t j ,·) are Lipschitz continuous in ω,by (4.9) and following the arguments in Step 1, one can similarly show that |u h (t,ω)−u h (t i ,ω ·∧t )|≤ C √ t i −t ≤ C √ h |u h (t ,ω ·∧t )−u h (t j ,ω ·∧t )|≤ C t j −t ≤ C √ h, |u h (t i ,ω ·∧t )−u h (t j ,ω ·∧t )|≤ C t j −t i ≤ C √ t −t+h. These lead to (4.10) immediately. Combine Lemmas 3.4.1, 3.4.4, and (2.3.10), it follows from Theorem 3.3.2 that Theorem 3.4.7. Assume all the conditions in Lemma 3.4.6 hold. Then u h con- vergeslocallyuniformlyto the unique viscositysolutionu of PPDE(1.1). Moreover, |u(t,ω)−u(t ,ω )|≤ Cd((t,ω),(t ,ω )), for all (t,ω),(t ,ω ) ∈ Λ. (4.11) 113 3.5 The case with classical solution In this section, we obtain the rate of convergence of our scheme, provided that the PPDE has smooth enough solution. Denote C 2,4 b := u ∈ C 1,2 b : ∂ t u,∂ ω u,∂ 2 ωω u ∈ C 1,2 b (Λ) . (5.1) We shall remark though, as we see in Buckdahn, Ma and Zhang [22], in general ∂ t ,∂ ω i ,∂ ω j do not commute, and ∂ 2 ω i ω j u = 1 2 [∂ ω i (∂ ω j u)+∂ ω j (∂ ω i u)]. We first have the following general result, in the spirit of Theorem 3.3.2. Theorem 3.5.1. Let Assumption 3.3.1 hold and the PPDE (1.1) has a classical solution u ∈ C 2,4 b (Λ). Assume a discretization scheme T t,ω h satisfies: (i) For any (t,ω) ∈ [0,T)×Ω and ϕ ∈ C 2,4 b (Λ t ), ϕ(t,ω)−T t,ω h [ϕ(t+h,·)] h −Lϕ(t,ω) ≤Ch, ∀h ∈ (0,T −t). (5.2) (ii) There exists L ≥ L 0 such that, for any (t,ω) ∈ [0,T) × Ω and ϕ,ψ ∈ UC b (Λ t ) uniformly Lipschitz continuous, T t,ω h [ϕ]−T t,ω h [ψ] ≤ (1+Ch)E L t [|ϕ−ψ|]. (5.3) 114 Then we have max 0≤i≤n |u h (t i ,ω)−u(t i ,ω)|≤Ch. (5.4) Proof. Denote ε i := sup ω∈Ω |u h (t i ,ω)−u(t i ,ω)|.Since Lu = 0, then (5.2) implies T t i ,ω h [u(t i+1 ,·)]−u(t i ,ω) ≤Ch 2 . Now by (3.1) and (5.3) we have |[u h −u](t i ,ω)|≤ T t i ,ω h [u h (t i+1 ,·)]−T t i ,ω h [u(t i+1 ,·)] +Ch 2 ≤ (1+Ch)ε i+1 +Ch 2 . This implies ε i ≤ (1+Ch)ε i+1 +Ch 2 . (5.5) Sinceε n = 0, thenbydiscreteGronwallinequality weobtain(5.4)immediately. We next apply the above result to the scheme proposed in Section 3.4. Theorem 3.5.2. Assume all the conditions in Lemma 2.3.10 hold true, and the PPDE (1.1) has a classical solution u ∈ C 2,4 b (Λ). Then for the scheme introduced in Section 3.4, it holds that |u h (t i ,ω)−u(t i ,ω)|≤Ch. 115 Proof. First, (5.3) follows directly from (4.6) and Remark 3.4.5. By Theorem 3.5.1 it suffices to check (5.2). Without loss of generality, we assume (t,ω)=(0,0). Fix ϕ ∈ C 2,4 b (Λ), and set ψ := ϕ(h,·). Recall the computation in Lemma 3.4.1 with (t,ω)=(0,0)and c=0,wehave D (0) ψ = φ(0,0)+ 1 | ˆ P| P∈ ˆ P h 0 E P ∂ P ϕ(s,B) ds = φ(0,0)+ !! ∂ t ϕ+ 1 2 μ·∂ ω ϕ+ 1 4 (σσ T ): ∂ ωω ϕ " (0,0) " h+O(h 2 ) D (1) i ψ = 1 μ i h| ˆ P|/2 ⎛ ⎝ P∈P i 1 E P − P∈P i 0 E P ⎞ ⎠ [ϕ| h 0 ] = 2 μ i h| ˆ P| ⎧ ⎨ ⎩ h 0 P∈P i 1 E P ∂ P ϕ(s,B) − P∈P i 0 E P ∂ P ϕ(s,B) ds ⎫ ⎬ ⎭ = ∂ ω iϕ(0,0)+O(h); Similarly, we can show that D (2) i,j ψ = ∂ 2 ω i ω jϕ(0,0)+O(h),i,j=1,···,d. Plug all these into (4.2) and recall that G is uniformly Lipschitz continuous in (y,z,γ), we obtain (5.2), and hence prove the theorem. 116 3.6 Implementation of a 1-dimensional numeri- cal scheme In this section we discuss how to implement the monotone scheme introduced by (3.1), (4.2) and (4.3). Our discussion here is quite preliminary. Efficient numerical methods for PPDEs, especially in high-dimension setting, is a very challenging problem, and we shall leave it for future research. We will use the least square regression approach, see [52] and the references therein. For the ease of presentation, we consider 1-dimensional case only. Fix a time t, probability P and a P−Brownian motion W. Denote ξ (0) s := 0,ξ (1) s := α[s−t],ξ (2) s := β[W s −W t ],ξ (3) s := ξ (1) s +ξ (2) s , ∀s ≥ t. (6.1) Then we may rewrite (4.3) as D (0) φ = 1 4 3 i=0 E P [φ(t+h,ω ⊗ t ξ (i) h )]; D (1) φ = 1 2αh i=0,2 E P φ(t+h,ω ⊗ t ξ (i+1) h )−φ(t+h,ω ⊗ t ξ (i) h ) ; D (2) φ = 1 β 2 h i=0,1 E P φ(t+h,ω ⊗ t ξ (i+2) h )−φ(t+h,ω ⊗ t ξ (i) h ) ; (6.2) However, if we need to simulate ξ (j) ,j=0,1,2,3, at each node, the number of branches will grow exponentially. In practice, we will simulate an F t+ −measurable 117 random variable θ t that takes into values 0, 1, 2 and 3 with equal probabilities, 1 4 , respectively. Then (6.2) can be transformed to the following scheme: D (0) φ(t,ω)=E P φ t+h,ω ⊗ t ξ (θt) ; D (1) φ(t,ω) 2 αh E P sin((θ t −0.5)π)φ t+h,ω ⊗ t ξ (θt) ; D (2) φ = 1 β 2 h E P sign(θ t −1.5)φ t+h,ω ⊗ t ξ (θt) . (6.3) Thus T t,ω h [φ]: = D (0) φ(t,ω) +h G(·,D (0) φ,D (1) φ,D (1) φ)− α 2 D (1) φ− β 2 4 D (2) φ (t,ω) (6.4) gives us a viable scheme for fully nonlinear PPDE, as long as we choose μ and σ such that 2 G 1 α −1 +|4 G 11 β 2 −1|≤ 1−ε for some ε ∈ (0,1). This is an assumption for the monotonicity. In addition, the conditional expectations in (6.4) can be evaluated by Least Square regression. This is a similar process as we presented in Subsection 2.5.2 and 2.5.3 in Chapter 2. 118 That’s, we define a random process X such that X t 0 =0,X t i +r = X ⊗ t i ξ (θt i ) , ∀ r ∈ [0,h]. Now we want to approximate D (k) φ(t i ,X)= E P ψ (k) (t i+1 ,X,θ t i )|(t i ,X) ,k=0,1,2, where each ψ (k) is short for the function inside the expectation E P [·] in the approx- imation of D (k) in (6.4) We emphasize again that θ t i is actually F t i + −measurable. To do this, we project ψ (k) (t i+1 ,X,θ t i ) into the space spanned by J selected basis functions e i 1 (t i ,X),e i 2 (t i ,X),...,e i J (t i ,X) in the sense that {α i j } 1≤j≤J =argmin {α j } 1≤j≤J E P ⎡ ⎣ ψ (k) (t i+1 ,X,θ t i )− J j=1 α j e i j (t i ,X) 2 ⎤ ⎦ (6.5) and D (k) φ(t i ,X) ≈ J j=1 α i j e i j (t i ,X). 119 Moreover, in practice we only simulate a finite set of L paths X λ L λ=1 starting from t=0and X t 0 = x 0 , such that X λ t i +r −X λ t i , ∀ r ∈ [0,h], is a sample of ξ (θt i ) . Then the expectation in (6.5) is replaced by 1 L ⎡ ⎣ L λ=1 ψ (k) t i+1 ,X λ ,θ t i − J j=1 α j e i j (t i ,X λ ) 2 ⎤ ⎦ . The rest is just a traditional procedure of solving a linear square regression. We refer to Subsection 2.5.2 and 2.5.3 in Chapter 2 for more details. 3.6.1 A one-dimensional-PPDE example We now consider the following fully nonlinear PPDE: Example 3.6.1. u t +sup 1≤σ≤2 (σu ωω )=0, 0 ≤t<T; u(T,ω)= max 0≤t≤T ω t +min 0≤t≤T ω t , which has an unknown path-dependent viscosity solution u(t,ω). Let T =0.3,ω 0 = π 2 ,L = 100n 3 and e i (X t i ∧· )= {X t 0 ,X t 1 ,...,X t i }. 120 n L K Avg(Ans.) Var(Avg.) cost (in seconds) 3 270 106 3.2230 2.37×10 −5 3.08×10 −3 5 1250 64 3.2440 1.22×10 −5 6.13×10 −3 10 10000 32 3.2568 6.60×10 −6 6.45×10 −2 20 80000 16 3.2704 2.52×10 −6 2.31×10 0 30 270000 10 3.2767 1.75×10 −6 2.29×10 1 40 640000 8 3.2783 2.25×10 −6 1.02×10 2 0 5 10 15 20 25 30 35 40 3.22 3.23 3.24 3.25 3.26 3.27 3.28 3.29 n Numerical Result Figure 3.1: A 1 dimensional example of PPDE for example (3.6.1) Each choice of n is run K times, with the mean of results from the K tests denoted by ”Avg(Ans.),” the variance of the mean denoted by ”Var(Avg.)” and the average cost of each test denoted by ”cost(in seconds)” in the table in figure (3.1). Taking σ 0 =2 √ 2, we get a sequence of converging results by least square regression with the above basis functions, as shown in figure (3.1). It can be seen that the numerical solution converges to some value around 3.28. According to our main theorem, this value is the projection of the unique viscosity solution of the PPDE into the space spanned by the basis functions. We can’t tell whether the numerical solution is converging linearly since this PPDE may not yield a smooth classical solution. 121 Chapter 4 A Primal-Dual Algorithm for BSDEs 4.1 Introduction In this chapter we want to construct tight confidence intervals for the solution (Y i ) i=0,...,n of a dynamic programming equation of form Y i =max{S i ,E i [Y i+1 ]+f(i,Y i ,E i [β i+1 Y i+1 ])Δ i },Y n = S n ; (1.1) at time i=0. This form is a natural time discretization schemes for (reflected) BSDEs. We assume that (Ω,F,(F i ) i=0,...,n ,P) is a filtered probability space and E i [·] is conditional expectation given F i . The reflecting barrier S is an adapted R∪{−∞}−valued process, β is an adapted R D −valued process related to the driver of the BSDE(e.g. suitably truncated and normalized increments of a D- dimensional Brownian motion), the generator f:Ω×{0,...,n−1}×R×R D →R isanadaptedrandomfield, andΔ i arethestepsizes ofatimediscretization scheme, 122 which are also constants. The standing assumptions of regularities will be specified later on. In a standard procedure for solving (1.1) is to replace the conditional expec- tations in (1.1) by some approximate conditional expectation operator. However, since the conditional expectation is approximated numerically in each time step, it’s important that boththe cost and theerror of this approximation won’t explode with each step in time. Please refer to Chapter 1 for more feasible numerical schemes of this framework. Stable as they are, it is hard to assess the quality of the numerical results. Therefore in this chapter we generalize the primal-dual methodology to the backward dynamic programming (1.1). Essentially we use the approximate numerical solution of Y i ,i=0,...,n, as an input, and construct an upper bound and lower bound for the true solution at Y 0 . If the numerical approximation provided is close to the true dynamics of Y i , the confidence interval computed by us will be very tight. This can serve as an indicator of how good the numerical approximation is. This chapter is organized as follows: In Section 4.2 we discuss some basic prop- erties of the dynamic programming equation (1.1) and show how it is connected in our two numerical examples, funding risk and counterparty risk. In Section 4.3 we focus on the case of a convex generator f. In Section4.3.1 we first suggest a pathwise approach to the dynamic programming equation (1.1) which avoids the evaluation of conditional expectations in the backward recursion in time. This 123 pathwise approach depends on the choice of a (D+1)−dimensional martingale and leads to the construction of super solutions for (1.1) and to a minimization problem over martingales with value process Y i . We then note in Section 4.3.2 that due to convexity, Y i can also be represented as the supremum over a class of classi- cal optimal stopping problems. This representation can be thought of as a discrete time, reflected counterpart of a result in [45] for continuous time, non-reflected BSDEs driven by a Brownian motion. If we think of this maximization problem as the primal problem, then the pathwise approach can be interpreted as a dual minimization problem in the sense of information relaxation. This type of duality was first introduced independently by [79] and [54] in the contxt of Bermudan option pricing, and was later extended by [23] to general discrete time stochastic control problems. Finally, in Section 4.3.3 we provide some discussion of how the tightness of the bounds depends on the quality of the input approximations used in their construction. In Section 4.4.1 we explain, how to construct confidence intervals for Y 0 by Monte Carlo simulation with the representations for Y 0 as the value of a maxi- mization and a minimization problem. This algorithm generalizes the primal-dual algorithm of [2] from optimal stopping problems (i.e., the case f ≡ 0) to cases with convex generators. Some generic control variates that turn out to be significant in our numerical examples are suggested. Numerical examples for the pricing of a European and a Bermudan option on the maximum of 5 assets under different 124 interest rates for borrowing and lending are presented in Section 4.4.3. The input approximations are constructed by the least-squares Monte Carlo algorithm intro- duced by [62], and the martingale basis variant by [11] with just up to 7 basis functions. It turns out that this is sufficient for achieving a tight 95% confidence intervals with relative error less than 1% between lower and upper confidence bounds in the 5-dimensional examples. The non-convex generators f is studied in Section 4.5.1 and 4.5.2 by applying the input approximation of the approximate dynamic program in an attempt to construct a convex auxiliary generator f up that dominates f, as well as another concave auxiliarygenerator f low dominated byf. The two new auxiliarygenerators canbeconstructed suchthattheyconvergetof,aslongastheinputapproximation of the approximate dynamic program approaches the true solution. Hence the techniques and corresponding result in the previous sections can be applied to the auxiliary generators, to build a confidence interval for Y 0 in the general settings. In Section 4.5.3 we test this algorithm in two examples, including the previous example of funding risk and a model of counterparty credit risk where the driver is neither concave or convex. Tight price bounds can be achieved again. 125 4.2 Discrete time reflected BSDEs We consider a discretized scheme of a reflected BSDE of the form (1.1) with the following assumptions: Δ i ,i=0,...,n −1 are positive stepsize in time. S is an adapted process with values in R∪{−∞} such that n−1 i=0 E[|S i I {S i >−∞} |]+E[|S n |] < ∞; The generator f(·,y,z) is adapted for all (y,z) ∈R×R D , and n−1 i=1 E[|f(i,0,0)|] < ∞. Moreover, there are adapted, nonnegative processes α (d) ,d=0,...,D such that the following stochastic Lipschitz condition holds for all (y,z),(y ,z ) ∈R×R D . |f(i,y,z)−f(i,y ,z )|≤ α (0) i |y −y |+ D d=1 α (d) i |z d −z d |. Finally, β is a bounded, adapted R D −valued process and the following relations hold; α (0) i < 1 Δ i , D d=1 α (d) i |β d,i+1 |≤ 1 Δ i . (2.1) 126 Under these assumptions there existis a unique adapted and integrable process Y satisfying (1.1), which is a result of a contraction mapping argument. Example 4.2.1. We will introduce two nonlinear pricing problems to illustrate the setting. These examples will also appear inour numerical experiments: Pricing with different interest rates for borrowing and lending, and pricing in a reduced- form model of counterparty credit risk. The first one is a standard example in the BSDE literature back in [14]. Recently [61] revealed its practical relevance in the context of funding risk. Since the financial crisis there has also been more interest in credit risk models similar to the second example. To name a few, [71, 32, 55] and the references therein. (i) Assume that there is a financial market with 2 riskless assets with bounded and predictable short rates R l t and R b t with R l t <R b t almost surely, a D risky assets X 1 t ,...,X D t evolving according to dX d t = X d t . μ d t dt+ D k=1 σ d,k t dW k t / where W is a standard D−dimensional Brownian motion, and μ and σ are pre- dictable and bounded processes. Moreover, σ is assumed to be invertible with bounded inverse almost surely. The filtration is given by the usual augmented Brownian filtration. The two short rates denote for the lending and borrowing interest rates respectively. Namely, an investor can only hold positive positions 127 in R l t , and only negative ones in R b t . Consider a square-integrable European claim h(X T ) maturing at T. we know that a replicating porfolio for f(X T ) is character- ized by two processes Y t ∈R and Z t ∈R D that solve the BSDE dY t = −f(t,Y t ,Z t )dt+Z T t dW t (2.2) with terminal condition Y T = h(X T ), where f(t,y,z)= −R l t y −z T σ −1 t μ t −R l t 1 +(R b t −R l t ) y −z T σ −1 t 1 − , and1 ∈R D is a vector containing only ones, see [14] or the survey paper of [45] for more details. Obviously the function f here is convex and Lipschitz continuous. Our goal is the fair price of the claim at time 0, also known as Y 0 .z T t σ −1 represents the vector of amounts of money invested in the risky assets at time t. Discretizing time andthen taking conditional expectations leads to a recursion of the form(1.1) forY, see [86] and [20]. In this European case S i ≡−∞ fori<n and S n ≡ h(X T ). The Z−part is approximated by Z i = E i Wt i+1 −Wt i t i+1 −t i Y i+1 , which corresponds to a vector of Malliavin derivatives of Y and is thus naturally related to a delta hedge. Comparing with (1.1) this would imply taking β i+1 = Wt i+1 −Wt i t i+1 −t i . Nevertheless, to satisfy assumption (2.1) we have to truncate the Brownian increments at some value. Since Wt i+1 −Wt i t i+1 −t i is a vector of normally distributed random variables with standard deviation of order 1 √ Δ i , where Δ i := t i+1 − t i , the truncation error can 128 be made as small as we want as Δ i → 0, provided the Lipschitiz continuity of f, see e.g. [62]. For a Bermudan or American claim, (2.2) is replaced by a proper reflected BSDE, the discretization of which will contain S i that denotes the payoff if exercised at time t i . (ii)The second example isaspecial caseofthemodel ofcounterparty credit risk due to [39]. We change the setting of (i) by assuming a unique riskless asset with rate R t that can be borrowed or lent. Besides, we assume μ t = R t 1, a risk-neutral valuation setting. With a square-integrable European claim h(X T ) maturing at T, we denote by Y t the claim’s fair price at time t conditioned on no default having occurred yet. The claim’s possible default is modelled through a stopping time which is the first jump time of a Poisson process with intensity Q t := Q(Y t ), a decreasing, continuous and bounded function of Y t . This is due to the fact that if theclaim’svalueislow, defaultbecomes morelikely. Wealsoassume thatifdefault occurs, the holder of the claim will receive a fraction δ ∈ [0,1) of the current value Y t . Following Proposition 3 in [39], the value process is defined by the nonlinear equation Y t =E t T t f(s,Y s )ds+h(X T ) , 129 where f(t,y)= −(1−δ)Q(y)y−R t y. Discretizing the above equation gives us an equation of type (1.1) with β ≡ 0. Assumption (2.1) then reduces to the require- ment that the time discretization is sufficiently fine. The solution Y of the equation (1.1) is also the solution to the optimal stopping problem Y i = esssup τ∈S i E i 0 S τ + τ−1 j=i f(j,Y j ,E j [β j+1 Y j+1 ])Δ j 1 ,i=0,...,n, where S i is the set of stopping times with values no less than i. This optimal stopping problem is unusual in the sense that the reward upon stopping depends on the Snell envelope Y i . Note that one can impose restrictions on the set of admissible stopping times by subtracting the set {(i,ω); S i (ω)= −∞}, at which exercise is never optimal. We can therefore restrict the supremum in this optimal stopping problem to the subset S i ⊂S i of stopping times τ which take values in E(ω)= {i; S i (ω) > −∞}. An optimal stopping time is given by τ ∗ i =inf{j ≥ i; S j ≥E j [Y j+1 ]+f(j,Y j ,E j [β j+1 Y j+1 ])Δ j } (2.3) We also present the following alternative expression of Y i via optimal stopping of a nonlinear functional. 130 Proposition 4.2.2. For every i=0,...,n, Y i = esssup τ∈S i Y (τ) i where (Y (τ) j ) j≥i is the solution to the dynamic programming equation Y (τ) j =E j [Y (τ) j+1 ]+f(j,Y (τ) j ,E j [β j+1 Y (τ) j+1 ])Δ j ,i ≤j<τ, Y (τ) τ = S τ . Moreover, the stopping time τ ∗ i defined in (2.3) is optimal. This characterization is a direct result of the following simple but useful com- parison theorem. For nonreflected discrete time BSDEs similar results can be found in [29] and [27] under different assumptions. Proposition 4.2.3. Suppose there are stopping times σ ≤ τ such that for every i ∈ [σ,τ) Y up i ≥ max S i ,E i [Y up i+1 ]+f(i,Y up i ,E i [β i+1 Y up i+1 ])Δ i Y low i ≤ max S i ,E i [Y low i+1 ]+f(i,Y low i ,E i [β i+1 Y low i+1 ])Δ i and Y up τ ≥ Y low τ . Then under the standing assumptions, Y up i ≥ Y low i holds for any i ∈ [σ,τ]. 131 Proof. We denote the difference of Y up i and Y low i by ΔY i =(Y up i −Y low i )I {σ≤i≤τ} ,i=1,...,n. Then it suffices to prove that ΔY i ≥ 0 for every i=1,...,n. We prove this assertion by backward induction and note that it holds in the case i = n by assumption. Supposing that ΔY i+1 ≥ 0 is already shown, then on the set {Y low i >S i }∩{σ ≤ i<τ} we obtain by the Lipschitz assumption (2.1) of f that ΔY i ≥ E[ΔY i+1 ]+ f(i,Y up i ,E i [β i+1 Y up i+1 ])−f(i,Y low i ,E i [β i+1 Y low i+1 ]) Δ i ≥ E i 0 ΔY i+1 . 1− D d=1 α (d) i |β d,i+1 |Δ i /1 −α (0) i |ΔY i |Δ i ≥−α (0) i |ΔY i |Δ i , which implies ΔY i ≥ 0. On the set {Y low i ≤ S i }∪{i ≥ τ}∪{i<σ}, ΔY i ≥0for obvious reason and the definition of ΔY · . 4.3 The case of a convex generator In Section 4.3.1 and 4.3.2 we discuss how to construct ”tight” supersolutions and subsolutionstothedynamicprogrammingequation(1.1)whenf isconvexin(y,z). These constructions are based, respectively, on the choice of suitable martingales and control processes. In Section 4.3.3 we present error analysis for the case where 132 f is independent ofz, which illustrates how the choices of auxiliary processes affect the quality of the error bounds generated. 4.3.1 Upper bounds We first consider a pathwise method that generates supersolutions of the dynamic program. This idea is to remove all conditional expectations from equation (1.1) and subtract proper martingale increments. To do this, we will focus on a one- dimensional martingale M 0 and an R D −valued martingale M such that D d=1 n−1 i=0 E[α (d) i |M d,i+1 −M d,i |]Δ i < ∞. The set of all such pairs (M 0 ,M) is denoted by M 1+D . Given (M 0 ,M)∈M 1+D we define the non-adapted process θ up i = θ up i (M 0 ,M)by θ up n = S n θ up i =max S i ,θ up i+1 −(M 0 i+1 −M 0 i ) +f(i,θ up i ,β i+1 θ up i+1 −(M i+1 −M i+1 ))Δ i . (3.1) The recursion above can be solved path by path once the martingales are cho- sen. The integrability of θ up i is a result of the stochastic Lipschitz condition on f and the assumptions on M. Therefore, after solving the recursion, we can take conditional expectation for once, instead of taking nested conditional expectations 133 repeatedly as in the original dynamic program (1.1). The convexity of f ensures that E i [θ up i ] is always an upper bound for Y i , and it can be a representation of Y i if suitable martingales are chosen. Before introducing one of the main theorems in this chapter we recall that the martingale part N of the Doob decomposition of an integrable stochastic process V is given by N i := i−1 j=0 (V j+1 −E j [V j+1 ]),i=0,...,n. Theorem 4.3.1. Suppose f is convex in (y,z). Then, for every i=0,...,n, Y i = essinf (M 0 ,M)∈M 1+D E i [θ up i (M 0 ,M)], where θ up (M 0 ,M) is defined by the pathwise dynamic programming equation (3.1). Moreover, the martingales (M 0,∗ ,M ∗ ), where M 0,∗ and M ∗ are the Doob martin- gales of Y and βY, is optimal even in the sense of pathwise control, i.e. θ up i (M 0,∗ ,M ∗ )= Y i ,P −a.s. Proof. By the convexity of f and of the max-operator as well as the martingale property, we obtain E i [θ up i ] ≥ max{S i ,E i [E i+1 [θ up i+1 ]]+f(i,E i [θ up i ],E i [β i+1 E i+1 [θ up i+1 ]])Δ i }. 134 Consequently E i [θ up i (M 0 ,M)] is a supersolution of (1.1) and by the comparison result of Proposition 4.2.3 E i [θ up i (M 0 ,M)] ≥ Y i . We now choose M 0,∗ and M ∗ as the Doob martingales of Y andYβ respectively. By (2.1) we have D d=1 n−1 i=0 E[α (d) i |M d,i+1 −M d,i |]Δ i = D d=1 n−1 i=0 E α (d) i |β d,i+1 Y i+1 −E i [β d,i+1 Y i+1 ]| Δ i ≤ 2 n−1 i=0 E[|Y i+1 |] < ∞, so (M 0,∗ ,M ∗ )∈M 1+D . Now we claim that θ up,∗ i := θ up i (M 0,∗ ,M ∗ )= Y i , a.s. 135 This can be shown by backward induction on i. The case when i = n is trivial. Hence if the statement holds for i+1, then the definition of the Doob martingale implies that θ up,∗ i =max{S i ,Y i+1 −(Y i+1 −E i [Y i+1 ]) +f(i,θ up,∗ i ,β i+1 Y i+1 −(β i+1 Y i+1 −E i [β i+1 Y i+1 ]))Δ i } =max{S i ,E i [Y i+1 ]+f(i,θ up,∗ i ,E i [β i+1 Y i+1 ])Δ i } By the Lipschitz property of f, the solution of the dynamic programming problem above is unique and therefore θ up,∗ i = Y i by definition. This theorem is useful in computing an upper confidence bound for Y 0 . To this end one only needs to choose a (1+D)−dimensional martingale, in the hope of approximating the Doob martingale of (Y,βY). This can be done with e.g. the Doob martingale of an approximation ˜ Y of Y that was pre-computed by an algorithm of one’s choice. Then by solving the pathwise dynamic program in (3.1) one can finally approximate the expectation by averaging over sample paths. We will discuss the details of the implementation in Section 4.4.1. We shall note that the pathwise dynamic program is not explicit in time, since θ up i appears on both sides of the equation. This can be solved by a Picard iteration though. In some situationsthefollowing explicit representation intermsofapathwise maximization problem is a better approach. 136 Proposition 4.3.2. Suppose f is convex in (y,z), and define the convex conjugate in the y-variable by f #y (ω,i,r,z)=sup{ry −f(ω,i,y,z); y ∈R} which is defined on D (i,ω,z) f #y := {r ∈R; f #y (ω,i,r,z) < ∞} ⊂ [−α (0) i (ω),α (0) i (ω)] Then for (M 0 ,M)∈M 1+D and i=0,...,n − 1, the θ up i defined in (3.1) can be rewritten as θ up i =max 2 S i , sup r∈D (i,ω,z up i ) f #y 1 1−rΔ i θ up i+1 −(M 0 i+1 −M 0 i ) −f #y (i,r,β i+1 θ up i+1 −(M i+1 −M i ))Δ i 3 , (3.2) where z up i = β i+1 θ up i+1 −(M i+1 −M i ). Proof. Since f is convex and Lipschitz continuous in y, we know f has both non- decreasing left and non-decreasing right derivatives with respect to y at all points. Denote them as f y− and f y+ respectively. Due to the convexity, it’s obvious that 137 f y− ≤ f y+ , where the equality holds everywhere except at countably many points. Actually D (i,ω,z) f # y ⊂ lim y→−∞ f y− (ω,i,y,z), lim y→+∞ f y+ (ω,i,y,z) . Moreover, if r 0 ∈ [f y− (ω,i,y 0 ,z),f y+ (ω,i,y 0 ,z)], for some y 0 ∈ R then the mono- tonicity of left and right derivatives imply that f #y (ω,i,r 0 ,z)= r 0 y 0 −f(ω,i,y 0 ,z). This indicates that f(ω,i,y 0 ,z)= r 0 y 0 −f #y (ω,i,r 0 ,z) ≤ sup r∈D (i,ω,z) f #y ry 0 −f #y (ω,i,r,z) = f(ω,i,y 0 ,z) #y #r = ε+r ε y 0 −sup y {r ε y −f(ω,i,y,z)} ≤ ε+r ε y 0 −{r ε y 0 −f(ω,i,y 0 ,z)} = ε+f(ω,i,y 0 ,z), where #r denotes the convex conjugate in the r−variable of f #y (i,·). By sending εto0wecanseethat f(ω,i,y 0 ,z)= f(ω,i,y 0 ,z) #y #r , where the supremum on the right hand side is achievable at any r 0 ∈ [f − y (ω,i,y 0 ,z),f + y (ω,i,y 0 ,z)]. 138 Hence θ up i =max 2 S i , sup r∈D (i,ω,z up i ) f #y θ up i+1 −(M 0 i+1 −M 0 i )+rθ up i Δ i −f #y (i,r,β i+1 θ up i+1 −(M i+1 −M i ))Δ i 3 where the supremum is achieved at some r ∗ . Therefore θ up i =max S i ,θ up i+1 −(M 0 i+1 −M 0 i )+r ∗ θ up i Δ i −f #y (i,r ∗ ,β i+1 θ up i+1 −(M i+1 −M i ))Δ i . On one hand, for θ up i >S i , we thus obtain θ up i = 1 1−r ∗ Δ i ! θ up i+1 −(M 0 i+1 −M 0 i )−f #y (i,r ∗ ,β i+1 θ up i+1 −(M i+1 −M i ))Δ i " , which means that θ up i is dominated by the right hand side of the assertion. On the other hand, θ up i ≥ S i and the equation θ up i ≥ θ up i+1 −(M 0 i+1 −M 0 i )+rθ up i Δ i −f #y (i,r,β i+1 θ up i+1 −(M i+1 −M i ))Δ i , or its equivalent θ up i ≥ 1 1−rΔ i ! θ up i+1 −(M 0 i+1 −M 0 i )−f #y (i,r,β i+1 θ up i+1 −(M i+1 −M i ))Δ i " , 139 holdsforallr ∈ D (i,ω,z up i ) f #y .Thereforethereverse inequalityintheassertionisproved as well. Example 4.3.3. In Example 4.2.1 (i) f #y (i,r,z)= z T σ −1 i (μ i +r1) and the maximizer must belong to the set {−R b i ,−R l i }, because f(i,·)= (f(i,·) #y ) #r . Therefore for the European option an explicit recursion for θ up will be θ up i =sup r∈{−R b i ,−R l i } 1 1−rΔ i ( θ up i+1 −(M 0 i+1 −M 0 i ) −[β i+1 θ up i+1 −(M i+1 −M i )] T σ −1 i (μ i +r1)Δ i 4 . 4.3.2 Lower bounds We now construct the subsolutions. To get a maximization problem with value process given by Y i , we denote by f # the convex conjugate of f in (y,z), i.e. f # (ω,i,r,ρ)=sup{ry+ρ T z −f(ω,i,y,z); (y,z) ∈R 1+D }, 140 whichisdefinedon D (i,ω) f # := {(r,ρ) ∈R 1+D ;f # (ω,i,r,ρ) < ∞} ⊂ D 5 d=0 [−α (d) i (ω),α (d) i (ω)]. We also define U i (f # ):= {(r j ,ρ j ) j≥i adapted; n−1 j=i E[|f # (j,r j ,ρ j )|]Δ j < ∞}, and note that (f # (j,r j ,ρ j )) j≥i is an adapted process for (r,ρ)∈U i (f # ). The next lemma in particular shows that U i (f # )isnonempty. Lemma 4.3.4. Suppose ( ˜ Y) j≥i is an R−valued adapted and integrable process and ( ˜ Z j ) j≥i is an R D −valued adapted process such that D d=1 n−1 j=i E[α (d) j | ˜ Z d,j |]Δ j < ∞. Then there is a pair (˜ r, ˜ ρ)∈U i (f # ) such that for j = i,...,n−1, ˜ r j ˜ Y j +˜ ρ T j ˜ Z j −f # (j,˜ r j , ˜ ρ j )= f(j, ˜ Y j , ˜ Z j ). 141 Proof. . Similarly to the proof of Proposition 6.1 in [27], we exploit the existence of a measurable subgradient due to Theorem 7.10 in [25], where it is stated that ∀j = i,...,n−1 there exists a F j −measurable random vector (˜ r j , ˜ ρ j ) such that f(j, ˜ Y j +y, ˜ Z j +z)−f(j, ˜ Y j , ˜ Z j ) ≥ ˜ r j y+˜ ρ T j z for any (y,z) ∈ R1+D. Taking the supremum over all (y,z) ∈R 1+D , one has ˜ r j ˜ Y j +˜ ρ T j ˜ Z j −f # (j,˜ r j , ˜ ρ j ) ≥ f(j, ˜ Y j , ˜ Z j ), where (˜ r j (ω),˜ ρ j (ω)) ∈ D (j,ω) f # . The inequality in the other direction follows imme- diately from f ## = f by convexity. So we only need to show that n−1 j=i E[|f # (j,˜ r j , ˜ ρ j )|]Δ j < ∞. It follows from the Lipschitz property of f and the boundedness of D (j,ω) f # that n−1 j=i E[|f # (j,˜ r j , ˜ ρ j )|]Δ j = n−1 j=i E[|˜ r j ˜ Y j +˜ ρ T j ˜ Z j −f(j, ˜ Y j , ˜ Z j )|]Δ j ≤ 2 n−1 j=i E[α (0) j Δ j | ˜ Y j |]+2 D d=1 n−1 j=i E[α (d) j | ˜ Z d,j |]Δ j + n−1 j=i E[|f(j,0,0)|]Δ j < ∞, due to (2.1). 142 The following result is a discrete time reflected analogue of Proposition 3.4 in [45]. For discrete time(reflected) BSDE a similar result (for convex generators in z only) can be found in [27] under different assumptions. Theorem 4.3.5. Suppose f is convex in (y,z). We define θ low i (τ,r,ρ):=Γ i,τ (r,ρ)S τ − τ−1 j=i Γ i,j (r,ρ) f # (j,r j ,ρ j )Δ j 1−r j Δ j ,where Γ i,j (r,ρ):= j−1 5 k=i 1+ρ T k β k+1 Δ k 1−r k Δ k . Then Y i = esssup τ∈S i esssup (r,ρ)∈U i (f # ) E i [θ low i (τ,r,ρ)] The essential supremum can be obtained by any (r ∗ j ,ρ ∗ j ) j≥i such that for j = i,...,n−1, r ∗ j Y j +ρ ∗T j E j [β j+1 Y j+1 ]−f # (j,r ∗ j ,ρ ∗ j )= f(j,Y j ,E j [β j+1 Y j+1 ]) (3.3) and τ ∗ i is defined in (2.3) 143 Proof. For a given i∈{0,...,n} and a stopping time τ ∈ S i and a pair (r,ρ) ∈ U i (f # ) we define Y j (τ,r,ρ):=E j θ low j (τ,r,ρ) ,i ≤ j ≤τ. Then Y τ (τ,r,ρ)= S τ and, for i ≤j<τ, Y j (τ,r,ρ) = E j [Y j+1 (τ,r,ρ)]+ ρ T j E j [β j+1 Y j+1 (τ,r,ρ)]+r j Y j (τ,r,ρ)−f # (j,r j ,ρ j ) Δ j ≤ E j [Y j+1 (τ,r,ρ)]+f(j,Y j (τ,r,ρ),E j [β j+1 Y j+1 (τ,r,ρ)])Δ j , where the inequality follows from the fact that f ## = f by convexity. Now a straightforward application of Proposition 4.2.3 and 4.2.2 leads to that Y i (τ,r,ρ) ≤ Y (τ) i ≤ Y i . In order to prove the converse inequality, we note that Lemma 4.3.4 implies the existence of a pair of processes (r ∗ j ,ρ ∗ j ) j≥i ∈U i (f # ) such that (3.3) holds, due to the fact that by (2.1) D d=1 n−1 j=i E[α (d) j |E j [β d,j+1 Y j+1 ]|]Δ j < n−1 j=i E[|Y j+1 |]≤∞. 144 Then by the definition of τ ∗ i we know for i ≤j<τ ∗ i Y j = E j [Y j+1 ]+f(j,Y j ,E j [β j+1 Y j+1 ])Δ j = E j [Y j+1 ]+ r ∗ j Y j +ρ ∗T j E j [β j+1 Y j+1 ]−f # (j,r ∗ ,ρ ∗ j ) Δ j . As Y τ ∗ i = S τ ∗ i we conclude that Y i = Y i (τ ∗ i ,r ∗ ,ρ ∗ ) by the uniqueness of the solution of the dynamic programming in (1.1). Remark 4.3.6. If we consider the representation in Theorem 4.3.5 as a ’primal’ maximizationproblem, thentherepresentationinTheorem4.3.1canbeinterpreted as a dual minimization problem in the sense of information relaxation. This dual approach was introduced for Bermudan option pricing by [79] and [54], and was further developed for discrete time stochastic control problems by [23]. Indeed, given a martingale (M 0 ,M)∈M 1+D , we can define P M 0 ,M : {i,...,n}× n−1 5 j=i D (j,ω) f # → L 1 (Ω,P), (k,(r,ρ)) → k−1 j=i Γ i,j (r,ρ) (M 0 j+1 −M 0 j )+ρ T j (M j+1 −M j )Δ j 1−r j Δ j . 145 Then for every (τ,(r,ρ)) ∈ S i ×U i (f # ), E i [P M 0 ,M (τ,r,ρ)] = 0. (3.4) We next relax the adaptedness property of the controls (τ,(r,ρ)) and observe that, by Theorem 4.3.5 and Equation (3.4), Y i = esssup τ∈S i esssup (r,ρ)∈U i (f # ) E i [θ low i (τ,r,ρ)−P M 0 ,M (τ,r,ρ)] ≤ E i ⎡ ⎢ ⎢ ⎣ max k=i,...,n max (r j ,ρ j )∈D (j,ω) f # j=i,...,k−1 θ low i (k,r,ρ)−P M 0 ,M (k,r,ρ) ⎤ ⎥ ⎥ ⎦ =: E i [ ˜ θ i (M 0 ,M)]. Since the maximum on the right hand side of the inequality is taken pathwise, we may now choose anticipating controls. Though we relax the quantity by allowing anticipating controls, a penalty, i.e. P M 0 ,M is subtracted. This penalty doesn’t take any effect with non-anticipating controls, due to (3.4). A penalty P ∗ is called optimal, if it penalizes anticipating controls such that the pathwise maximum is achieved at a non-anticipating control. This implies Y i = E i ⎡ ⎢ ⎢ ⎣ max k=i,...,n max (r j ,ρ j )∈D (j,ω) f # j=i,...,k−1 θ low i (k,r,ρ)−P ∗ (k,r,ρ) ⎤ ⎥ ⎥ ⎦ . 146 One can show that ˜ θ i (M 0 ,M)= θ up i (M 0 ,M). This can be seen by firstly deriving a recursion formula for ˜ θ i (M 0 ,M)andthen using the arguments following Proposition 4.3.2. The optimal penalty is given by P M 0,∗ ,M ∗ by Theorem 4.3.1 4.3.3 Error estimates Now we will provide some error analysis of the lower and upper bounds for the convex case. This is about how the accuracy of the input approximations affect the rightness of the upper and lower bounds. For simplicity, we only focus on the case where β=0. Though it’s much restrictive, this case does cover many applications of practical interest, such as the BSDEs arising in the credit risk literature, see [32] and[55]. For the lower bound, we assume that the suboptimal controls are derived fromaninput approximation of theprocess Y in exactly the same way, in which we choose these controls in the algorithm presented in Section 4.4. Further analysis of the general case involving a non-trivial β would certainly require additional assumptions on β and is beyond the scope of this chapter. 147 Theorem 4.3.7. (i) Suppose M 0 ∈M 1 , then E i [θ up i (M 0 )]−Y i ≤E i C(i,α (0) )max j=i,...,n |M 0 j −M 0,∗ j | , where C(α (0) ,i)=1+ n−1 5 l=i (1−α (0) l Δ l ) −1 . 1+ n j=i α (0) j Δ j / and M 0,∗ still denotes the Doob martingale of Y. (ii) Suppose ˜ Q j is an adapted and integrable approximation of E j [Y j+1 ] and ˜ Y j is an adapted and integrable approximation of Y j . We define an adapted process rby r j ˜ Y j −f # (j,r j )= f(j, ˜ Y j ),j = i,...,n−1, (3.5) and τ := inf{j ≥ i; S j ≥ ˜ Q j +f(j, ˜ Y j )Δ j }∧n. Then Y i −E i [θ low i (τ,r)] ≤ E i ⎡ ⎣ c(i,α (0) ,τ) ⎛ ⎝ τ∧(n−1) j=i | ˜ Y j −Y j |α (0) j Δ j + τ−1 j=i I A j ( ˜ Q j −E j [Y j+1 ]) + +I A c τ ∩{τ<n} (E τ [Y τ+1 ]− ˜ Q τ ) + /1 , 148 where c(i,α (0) ,τ)= 3 τ∧(n−1) 5 l=i (1−α (0) l Δ l ) −1 , A j = {S j ≥ E j [Y j+1 ]+f(j,Y j )Δ j },j = i,...,τ. Remark 4.3.8. In the nonreflected case, i.e. S i = −∞ fori<n, we have τ = n and A j = ∅ for anyj<n. Therefore the lower bound estimate can be simplified to Y i −E i [θ low i (τ,r)] ≤E i 0 c(i,α (0) ,n) n−1 j=i | ˜ Y j −Y j |α (0) j Δ j 1 . Inthereflected case, theindicatorsI A j andI A c τ ∩{τ<n} correspondtowrongstopping decisions of the approximate stopping time τ compared to the optimal stopping time. In practice, the stopping decision will usually be accurate, when a proper approximation ˆ Q j of the continuation value E j [Y j+1 ] is applied. Therefore the corresponding terms are supposed to be small. For a rigorous statement of this intuition in the case of optimal stopping we refer to Belomestny [7]. 149 Proof. (i) Given a martingale M 0 we define for k∈{0,...,n} and i∈{0,...,k−1} that θ up,k i (M 0 )= θ up,k i+1 (M 0 )+f ! i, max i≤κ≤n θ up,κ i (M 0 ) " Δ i −(M 0 i+1 −M 0 i ), θ up,k k (M 0 )= S k , with the convention that θ up,k i =0fori>k. Backward induction combined with a contraction mapping argument shows that there exist a unique solution θ up,k i (M 0 ) such that E max i≤k≤n |θ up,k i (M 0 )| < ∞. we claim that θ up i (M 0 ) defined via (3.1) is the same as max i≤k≤n θ up,k i (M 0 ). To see this one we note that max i≤k≤n θ up,k n (M 0 )= S n , max i≤k≤n θ up,k i (M 0 )=max ( S i , max i+1≤k≤n θ up,k i+1 (M 0 )+f ! i, max i≤k≤n θ up,k i (M 0 ) " Δ i −(M 0 i+1 −M 0 i ) 4 . Therefore we can conclude that θ up i (M 0 )= max i≤k≤n θ up,k i (M 0 ). 150 due to the uniqueness. Moreover, by Theorem 4.3.1, E i [θ up i (M 0 )] ≤ E i max i≤k≤n θ up,k i (M 0,∗ ) +E i max i≤k≤n |θ up,k i (M 0 )−θ up,k i (M 0,∗ )| (3.6) = Y i +E i max i≤k≤n |θ up,k i (M 0 )−θ up,k i (M 0,∗ )| . To estimate the last term in the equation above, we define Δθ i := max i≤k≤n |θ up,k i (M 0 )−θ up,k i (M 0,∗ )−M 0 i +M 0,∗ i |. Then it follows that Δθ i ≤ Δθ i+1 + f ! i, max i≤k≤n θ up,k i (M 0 )−M 0 i +M 0,∗ i " Δ i −f ! i, max i≤k≤n θ up,k i (M 0,∗ ) " Δ i + f ! i, max i≤k≤n θ up,k i (M 0 )−M 0 i +M 0,∗ i " Δ i +f ! i, max i≤k≤n θ up,k i (M 0 ) " Δ i . Therefore Δθ i ≤ (1−α (0) i Δ i ) −1 Δθ i+1 +α (0) i Δ i |M 0 i −M 0,∗ i | , 151 which implies that Δθ i ≤ n−1 5 l=i (1−α (0) l Δ l ) −1 . Δθ n + n j=i α (0) j Δ j |M 0 j −M 0,∗ j | / . Hence max i≤k≤n |θ up,k i (M 0 )−θ up,k i (M 0,∗ )|≤ C(α (0) ,i)max i≤j≤n |M 0 j −M 0,∗ j |. This, together with (3.6) yields the error estimate for the upper bound. (ii) To estimate the lower bound, we note that by Lemma 4.3.4 there exists an adapted process r such that (3.5) holds. Since (3.5) implies that r j (ω) ∈ D (j,ω) f # , we observe that |r j |≤ α (0) j . Define Y low j = E j [θ low j (τ,r)],i ≤ j ≤ τ. Then, using arguments in the proof of Theorem 4.3.5 and the relation between r and ˜ Y, we obtain Y low j = E j [Y low j+1 ]+(Y low j − ˜ Y j )r j Δ j +f(j, ˜ Y j )Δ j ,i ≤j<τ, Y low τ = S τ . Recall that Y j = E j [Y j+1 ]+f(j,Y j )Δ j +(S j −E j [Y j+1 ]−f(j,Y j )Δ j ) + . 152 Hence for i ≤j<τ E i [Y j −Y low j ] ≤ E i [Y j+1 −Y low j+1 ]+2E i [α (0) j | ˜ Y j −Y j |]Δ j +E i [α (0) j |Y j −Y low j |]Δ j +E i [(S j −E j [Y j+1 ]−f(j,Y j )Δ j ) + ]. Since S j < ˜ Q j +f(j, ˜ Y j )Δ j forj<τ we obtain Y i −Y low i ≤ E i 0 τ−1 5 l=i (1−α (0) l Δ l ) −1 (3.7) . Y τ −Y low τ +3 τ−1 j=i α (0) j | ˜ Y j −Y j |Δ j + τ−1 j=i I A j ( ˜ Q j −E j [Y j+1 ]) + /1 In particular, by the definition of τ we have Y τ −Y low τ = I A c τ ∩{τ<n} (E τ [Y τ+1 ]+f(τ,Y τ )Δ τ −S τ ) ≤ I A c τ ∩{τ<n} (E τ [Y τ+1 ]− ˜ Q τ ) + α (0) τ |Y τ − ˜ Y τ |Δ τ . This together with (3.7) leads to the lower bound estimate. 153 4.4 A primal-dual algorithm for the convex case 4.4.1 The algorithm We will explain how to construct an upper biased estimator, a lower biased esti- mator and confidence intervals for Y at time 0, given the results in the last section. This is done in the spirit of the [2] algorithm for Bermudan option pricing, when f is convex in (y,z). Markovian setting and input approximations We will illustrate in a Markovian setting, namely, f(i,·)= F(i,X i ,·)and S i = G i (X i ) depend on ω only through an R N −valued Markovian process X i and the mappings F and G are measurable in the x−component and such that the resulting f and S fulfill the conditions stated in Section 4.2. In particular, β i+1 is assumed to be independent of F i . Then there exist deterministic functions y i (x),q i (x),z d,i (x),d=1,...,D, such that Y i = y i (x i ), E i [Y i+1 ]= q i (X i ), E i [β d,i+1 Y i+1 ]= z d,i (X i ) and E 0 n−1 i=1 . |y i (X i )|+|q i (X i )|+ D d=1 (α (d) i +1)|z d,i (X i )| /1 < ∞ (4.1) 154 We assume that the measurable approximations ˜ y i (x), ˜ q i (x)and˜ z d,i (x)forthese functions are pre-computed by some numerical algorithm and satisfy the integra- bility condition (4.1). Therefore it is integrable random variables that we will draw the samples from in the following numerical algorithm. We assume in our numerical experiments that a least-squares Monte Carlo estimator is conducted for approximating the conditional expectations in (1.1), but other choices are possible as well. Upper biased estimator With the approximations ˜ y i (x), ˜ q i (x)and ˜ z d,i (x), we sample Λ out independent copies (X i (λ),β i (λ); i=0,...,n) λ=1,...,Λ out of (X i ,β i ; i=0,...,n) as ”outer” paths. To give the upper confidence bound we apply Theorem 4.3.1 and calculate θ up i (M 0 ,M) for some martingales M 0 ,M, that are ”close” to the unknown Doob martingales of Y and βY. This can be approximated by the Doob martingales of the approximations ˜ y(X)and β˜ y(X) 155 to Y and βY. Fixing the λth outer path this yields θ up n (λ)= G n (X n (λ)) by (3.1). Moreover, for i∈{0,...,n−1} we have θ up i (λ) =max G i (X i (λ)),θ up i+1 (λ)−(˜ y i+1 (X i+1 (λ))−E[˜ y i+1 (X i+1 )|X i = X i (λ)]) +F i,X i (λ),θ up i (λ),β i+1 (λ)θ up i+1 (λ) −(β i+1 (λ)˜ y i+1 (X i+1 (λ))−E[β i+1 ˜ y i+1 (X i+1 )|X i = X i (λ)]))Δ i } (4.2) Finally the estimator for Y 0 is given by ˆ Y up := 1 Λ out Λ out λ=1 θ up 0 (λ), which is a direct result of Theorem 4.3.1. This approximation by averaging over the outer paths has a positive bias. Usually the conditional expectations in (4.2) can not be calculated in closed form, but can be approximated by a conditionally unbiased estimator, which is an average over a set of ”inner” samples. For a given i and an outer path X(λ) we generate Λ in independent copies of (X i+1 ,β i+1 ) under the conditional law provided that X i = X i (λ). These samples are denoted 156 by (X i+1 (λ,l),β i+1 (λ,l)),l=1,...,Λ in . Then the Monte Carlo estimator for the conditional expectations in (4.2) along the λth paths can be defined by ˆ E[˜ y i+1 (X i+1 )|X i = X i (λ)] = Λ in l=1 ˜ y i+1 (X i+1 (λ,l)) Λ in , ˆ E[β i+1 ˜ y i+1 (X i+1 )|X i = X i (λ)] = Λ in l=1 β i+1 (λ,l)˜ y i+1 (X i+1 (λ,l)) Λ in . (4.3) By replacing all conditional expectations in (4.2) by the Monte Carlo estimators above we obtain an approximation θ up,AB (λ)of θ up (λ) by recursive construction. The corresponding upper bound estimator for Y 0 is obviously given by ˆ Y up,AB := 1 Λ out Λ out λ=1 θ up,AB 0 (λ). The superscript ”AB” here stands for Andersen and Broadie, who suggested this method for Bermudan options in 2004. The convexity of the max-operator and f, together with an argument of Jensen’s inequality shows that ˆ Y up,AB has a positive bias as an estimator for Y 0 . There are other constructions for the input martingales for Bermudan option pricing problems, see e.g. [8], [36] and [80]. These constructions can be adapted to the present BSDE setting without much effort. 157 Lower biased estimator In an attempt to construct an estimator for Y 0 with a negative bias, we define a stopping time ˜ τ(λ) along a given outer path (i.e. for λ=1,...,Λ out )by ˜ τ(λ) = inf{j ≥ 0;G j (X j (λ)) ≥ ˜ q j (X j (λ))+F (j,X j (λ),˜ y j (X j (λ)),˜ z j (X j (λ)))Δ j } and controls (˜ r i (λ), ˜ ρ i (λ)) i=0,...,n−1 ∈U 0 (F # ) as the solutions of ˜ r j (λ)˜ y j (X j (λ))+ ˜ ρ j (λ)˜ z j (X j (λ))−F # (j,X j (λ),˜ r j (λ), ˜ ρ j (λ)) =F(j,X j (λ),˜ y j (X j (λ)),˜ z j (X j (λ))), (4.4) as in Lemma 4.3.4. Then by Theorem 4.3.5, a negatively biased Monte Carlo estimator for Y 0 can be given by ˆ Y low,AB := 1 Λ out Λ out λ=1 θ low,AB 0 (λ) θ low,AB 0 (λ):=Γ 0,˜ τ(λ) (˜ r(λ), ˜ ρ(λ))G(˜ τ(λ),X ˜ τ(λ) (λ)) + ˜ τ(λ)−1 j=0 Γ 0,j (˜ r(λ), ˜ ρ(λ)) F # (j,X j (λ),˜ r j (λ), ˜ ρ j (λ))Δ j 1− ˜ r j (λ)Δ j . 158 Confidence intervals With a positively biased estimator and another estimator that is negatively biased, we can construct asymptotic confidence intervals for Y 0 under additional square integrability conditions that E[|θ low,AB 0 (λ)| 2 +|θ up,AB 0 (λ)| 2 ] < ∞. To this end, we assume that E 0 |G(n,X n )| 2 + n−1 i=1 |F(i,X i ,0,0)| 2 +|G(i,X i )| 2 I {G(i,X i )>−∞} 1 < ∞, which implies that n−1 i=1 E 0 |y i (X i )| 2 +|q i (X i )| 2 + D d=1 (α (d) d +1) 2 |z d,i (X i )| 2 1 < ∞. (4.5) Therefore we shall also impose the stronger integrability assumption (4.5) on the pre-computed approximations ˜ y i (x),˜ q i (x)and ˜ z d,i (x). With this assumption the (˜ r i (λ), ˜ ρ i (λ)) i=0,...,n−1 defined in (4.4) satisfy n−1 j=0 E[|F # (j,X j (λ),˜ r i (λ), ˜ ρ i (λ))| 2 ]Δ j < ∞, 159 and this square integrability additionally needs to be assumed, if (4.4) holds approximately. Now given the square integrable and independent copies (θ low,AB 0 (λ),θ up,AB 0 (λ)), λ=1,...,Λ out at hand, an (asymptotic) 95% confidence interval I (95) for Y 0 can be constructed by adding (resp. subtracting) 1.96 empirical standard deviations to the upper estimator (from the lower estimator), i.e. I (95) = ⎡ ⎣ ˆ Y low,AB −1.96 . Λ out λ=1 (θ low,AB 0 (λ)− ˆ Y low,AB ) 2 Λ out (Λ out −1) /1 2 , ˆ Y up,AB +1.96 . Λ out λ=1 (θ up,AB 0 (λ)− ˆ Y up,AB ) 2 Λ out (Λ out −1) /1 2 ⎤ ⎦ . This asymptotic confidence interval is valid even if one applies the same bunch of outer paths for both the lower estimator and the upper estimator. This can be seen by noting that using the central limit theorem, P({Y 0 ∈ I (950 }) ≤ P({Y 0 <L Λ out })+P({Y 0 >U Λ out }) ≤ P({E[ ˆ Y low,AB ]<L Λ out })+P({E[ ˆ Y up,AB ]>U Λ out }) → 0.95, Λ out →∞, where L Λ out and U Λ out are the upper and lower bounds of I (95) respectively. 160 4.4.2 Control variates The numerical experiments in the next subsection illustrate that the additional bias of the upper bound estimator due to the inner simulations may be substantial with a moderate number of inner paths(say, 1000). Therefore it seems to be essen- tial to apply variance reduction techniques for the estimation of the conditional expectations in (4.2) by Monte Carlo. We suggest some control variates, for which we only require the closed form of E[β d,i+1 ], E[β d,i+1 β d ,i+1 ],d,d ∈{1,...,D}. Thisise.g. thecasewhenβ d,i+1 is(up toaconstant) given bytruncated increments of independent Brownian motions. In this case we perform an orthogonal projec- tion of ˜ y i+1 (X i+1 ) on the span of the random variables (β 1,i+1 ,...,β D,i+1 ) under the conditional probability given X i . This orthogonal projection is given by β T i+1 B + i+1 E[β i+1 ˜ y i+1 (X i+1 )|X i = x], where B + i+1 is the Moore-Penrose pseudo-inverse of the matrix B i+1 =(E[β d,i+1 β d ,i+1 ]) d, d ∈{1,...,D} . 161 Here we used the fact that β i+1 is independent of F i . If ˜ y and ˜ z are good approx- imations of y and z, then ˜ z i (X i ) is also expected to be a good approximation of E[β i+1 ˜ y i+1 (X i+1 )|X i ]. These considerations motive us to replace the estimators (4.3) for the conditional expectations in (4.2) by ˆ E C [˜ y i+1 (X i+1 )|X i = X i (λ)] = E[β i+1 ] T B + i+1 ˜ z i (X i (λ)) + 1 Λ in Λ in l=1 ˜ y i+1 (X i+1 (λ,l))−β i+1 (λ,l) T B + i+1 ˜ z i (X i (λ)) ˆ E C [β i+1 ˜ y i+1 (X i+1 )|X i = X i (λ)] = E[β i+1 ]˜ q i (X i (λ))+B i+1 B + i+1 ˜ z(X i (λ)) + 1 Λ in Λ in l=1 β i+1 (λ,l) ˜ y i+1 (X i+1 (λ,l))− ˜ q i (X i (λ))−β i+1 (λ,l) T B + i+1 ˜ z i (X i (λ)) , (4.6) which are conditionally unbiased. The variance-reduced estimator ˆ Y up,ABC is then calculated similarly to ˆ Y up,AB , except that it is (4.6) but not (4.3) that we will apply. By Jensen’s inequality, the ’up’-estimator has a positive bias. For the classical optimal stopping problem, a similar control variate for inner simulations was suggested by [8] in the special case when β i+1 are increments of independent Brownian motions. We also recommend to run the lower bound estimator ˆ Y low,AB with a con- trol variate in order to reduce the number of samples Λ out . In this regard, if the conditional expectations can be found in closed form, we suggest the use of ˜ τ−1 j=0 Γ 0,j (˜ r, ˜ ρ) ˜ y j+1 (X j+1 )−E[˜ y j+1 (X j+1 )|X j ] 1− ˜ r j Δ j +Γ 0,j (˜ r, ˜ ρ) ˜ ρ T j Δ j (β j+1 ˜ y j+1 (X j+1 )−E[β j+1 ˜ y j+1 (X j+1 )|X j ]) 1− ˜ r j Δ j . (4.7) 162 Otherwise a set of ”inner” simulations will be required for the construction of the upper bound estimator anyway, and this inner sample can be used to estimate the conditional expectations in the control variate (4.7) via (4.6). The resulting estimator with a negative bias is denoted as ˆ Y low,ABC . The construction of asymp- totic confidence intervals is similar to the situation without control variates, as we discussed before. 4.4.3 Numerical examples We apply the algorithm above to the Example 4.2.1 (i). We consider the pricing problem of a European and a Bermudan call spread option with maturity T on the maximum of D assets, which are modeled by independent, identically distributed geometric Brownian motion with drift μ and volatility σ whose values at time t i = T i n ,i=0,...,n are denoted by X d,i . The interest rates R b and R l are constant over time. The generator f is then given by F(i,x,y,z)= −R l y − μ−R l σ D d=1 z d +(R b −R l ) . y − D d=1 z d σ / − . We define β d,i+1 (t i+1 − t i ) as the truncated Brownian increment driving the dth stock over the period [t i ,t i+1 ]. The payoff of the option is given G i (x)= ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ (max d=1,...,D x d −K 1 ) + −2(max d=1,...,D x d −K 2 ) + ,i∈E −∞,i ∈E. 163 for strikes K 1 ,K 2 and a set of time points E at which the option can be exercised. Hence, E = {n} gives a European option. For the Bermudan option case we consider the situation of four exercise dates which are equidistant over the time horizon, i.e. E = { n 4 , n 2 , 3n 4 ,n}. Unless otherwise noted, we use the following parameter values: D=5,T =0.25,R l =0.01,R b =0.06,X d,0 = 100, μ=0.05,σ=0.2,K 1 =95,k 2 = 115. We firstly generate a set of approximations ˜ y LGW , ˜ q LGW and ˜ z LGW by the least- squares Monte Carlo algorithm of [62]. As we discussed in the previous chap- ters, this algorithm requires the choice of a set of basis functions. Then an empirical regression on the span of these basis functions is performed with a set of Λ reg samples paths, which are independent of the outer and inner sam- ples required for the primal-dual algorithm later on. In the European option case we apply the following sets of basis functions: For the implementation with b y = 2 basis functions we choose 1 and E[G n (X n )|X i = x] for the computa- tion of ˜ y LGW i (x), ˜ q LGW i (x)and x d d dx d E[G n (X n )|X i = x] for the computation of ˜ z LGW d,i (x),d =1,...,D. For call options on the maximum of D Black-Scholes stocks, closed form expressions for the option price and its delta in terms of a multivariate normal distribution are derived in [58]. In the present setting, this 164 formula can be simplified to an expectation of a function of a one-dimensional standard normal random variable, see e.g. [8]. As a trade-off between computa- tional time and accuracy, we approximate this expectation via quantization of the one-dimensional standard normal distribution with 21 grid points. In the imple- mentation with b y = 7 basis functions we additionally apply x 1 ,...,x 5 as basis functions for ˜ y LGW i (x), ˜ q LGW i (x)and x d as a basis function for ˜ z LGW d,i (x). For the Bermudan option case we use six basis functions for ˜ y LGW i (x), ˜ q LGW i (x), namely 1, E[G j (X j )|X i = x],j∈E, and max j∈E,j≥i E[G j (X j )|X i = x]. The corresponding deltas x d d dx d E[G n (X n )|X i = x],j ∈E,j ≥ i are chosen as basis functions for ˜ z LGW d,i (x). In the European option case, this choice of basis functions also allows to apply the martingale basis algorithm of [6], although a slight bias in introduced due to the approximation of the basis functions by the quantization approach. Compared to the generic least-squares Monte Carlo algorithm the use of martingale basis functions allows to compute some conditional expectations in the approximate backward dynamic program explicitly. These closed form computations can be thought of as a perfect control variate within the regression algorithm. For the computation of the upper confidence bounds we use the explicit recur- sion for θ up derived in Example 4.3.3. For the computation of the lower confidence 165 bound we note that the defining equation (4.4) for the approximate controls (˜ r, ˜ ρ) for the lower bound can be solved explicitly as ˜ r i = −R b I {˜ yx(X i )≤σ −1 D d=1 ˜ z d,i (X i )} −R l I {˜ yx(X i )>σ −1 D d=1 ˜ z d,i (X i )} ˜ ρ d,i = −σ −1 (˜ r i +μ). Figure 4.1 illustrates the effectiveness of the control variate for the inner samples in the computation of the upper bounds for the European option case with n=40 time steps. The input approximation is generated by the martingale basis algo- rithm with seven basis functions and Λ reg = 1000 sample paths for the empirical regression. The figure depicts the corresponding upper bound estimator for the option price Y 0 with Λ out = 10000 sample paths as a function of the number of inner samples Λ in . From top to bottom, it shows the upper estimators ˆ Y up,AB (i.e. without inner control variate), ˆ Y up,ABC (i.e. with inner control variate), and for comparison the lower bound estimator ˆ Y low,AB . We immediately observe that the predominant part of the upper bias in ˆ Y up,AB stems from the sub-sampling in the approximate construction of the Doob martin- gales. Without the use of inner control variates, the relative error between upper and lower estimator is about 6% for Λ in = 100 inner samples and decreases to about 1.5% for Λ in = 1000 inner samples. Application of the inner control variates 166 100 200 300 400 500 600 700 800 900 1000 13.7 13.8 13.9 14 14.1 14.2 14.3 14.4 14.5 14.6 Λ in Upper bound without control variate Upper bound with control variate lower bound Figure 4.1: Influence of the number of inner simulations and the control variate: upper bound without inner control variate, upper boundwith inner control variate, and lower bound (from the top to the bottom). reduces this relative error to less than 0.25% even in the case of only Λ in = 100 inner samples. Table 4.1 illustrate the influence of different input approximations. It shows realizations of the lower estimator ˆ Y low,ABC and the upper estimator ˆ Y up,ABC for the option price Y 0 as well as the empirical standard deviations, as the number of time steps increases from n=40to n = 160. The column on the left explains which algorithm is run for the input approximation. Here LGW stands for the Lemor-Gobet-Warin algorithm and MB for the martingale basis algorithm. It also states the number of regression samples and the number of basis functions b y , which are applied in the least-square Monte Carlo. The lower and upper price estimates for the Bermudan options case are presented in the last two lines. In this case, the martingale basis algorithm is not available, and the Lemor-Gobet-Warin 167 Algorithm \ n 40 80 120 160 LGW 13.7786 13.8339 13.7597 13.8858 13.7583 13.9482 13.7498 14.0149 Λ reg =10 4 ,b y =2 (0.0028) (0.0031) (0.0033) (0.0041) (0.0037) (0.0051) (0.0043) (0.0062) LGW 13.7783 13.8172 13.7817 13.8443 13.7848 13.8682 13.7855 13.8967 Λ reg =10 5 ,b y =2 (0.0022) (0.0024) (0.0022) (0.0027) (0.0024) (0.0029) (0.0025) (0.0033) MB 13.7850 13.8185 13.7898 13.8435 13.7863 13.8578 13.7904 13.8779 Λ reg =10 2 ,b y =2 (0.0022) (0.0023) (0.0021) (0.0025) (0.0022) (0.0025) (0.0022) (0.0026) LGW 13.7818 13.8140 13.7767 13.8321 13.7789 13.8560 13.7764 13.8902 Λ reg =10 5 ,b y =7 (0.0020) (0.0021) (0.0020) (0.0022) (0.0022) (0.0025) (0.0025) (0.0031) LGW 13.7829 13.8079 13.7867 13.8233 13.7884 13.8393 13.7867 13.8515 Λ reg =10 6 ,b y =7 (0.0017) (0.0018) (0.0016) (0.0018) (0.0017) (0.0020) (0.0017) (0.0022) MB 13.7844 13.8077 13.7897 13.8245 13.7887 13.8353 13.7880 13.8485 Λ reg =10 3 ,b y =7 (0.0017) (0.0017) (0.0016) (0.0017) (0.0016) (0.0019) (0.0017) (0.0021) LGW Bermudan 15.5362 15.5664 15.5441 15.6160 15.5246 15.6396 15.5342 15.6886 Λ reg =10 5 (0.0028) (0.0028) (0.0037) (0.0035) (0.0041) (0.0042) (0.0041) (0.0048) LGW Bermudan 15.5422 15.5684 15.5482 15.6050 15.5441 16.6364 15.5443 15.6694 Λ reg =10 6 (0.0028) (0.0026) (0.0032) (0.0033) (0.0035) (0.0039) (0.0039) (0.0042) Table 4.1: Upper and lower price bounds for different time discretizations and input approximations in the European and Bermudan case. Standard deviations are in brackets. algorithm is run with the six basis functions stated above. We apply Λ out = 10000 and Λ in = 100 samples in all cases. By and large, the table shows that in this 5-dimensional example extremely tight 95%confidence intervals can be computed by the primal-dual algorithm, although the input approximations are based on very few, but well chosen, basis function. For the martingale basis algorithm as input approximation with just two basis functions and 100 regression paths the relative error between lower and upper 95%−confidence bound is about 0.7% even for n = 160 steps in the time discretization. It can be further decreased to less than 0.5%, when seven basis functions and 1000 regression paths are applied. If one takes the input approxi- mation of the Lemor-Gobet-Warin algorithm with the same set of basis functions, then the primal-dual algorithm can in principle produce confidence intervals of 168 about the same length as in the case of the martingale basis algorithm. How- ever, in our simulation study the number of regression paths must be increased by a factor of 1000 in order to obtain input approximations which have the same quality as those computed by the martingale basis algorithm. Hence our number- ical results demonstrate the huge variance reduction effect of the martingale basis algorithm. In the Bermudan option case, the primal-dual algorithm still yields 95%−confidence intervals with a relative width of less than 1% for up to n = 160 time steps, when the input approximation is computed by the Lemor-Gobet-Warin algorithm with 6 basis functions and 1 million regression paths. 4.5 The case of a non-convex generator In this section we skip the assumption on the convexity of the generator f and merely assume that the standing assumptions are in force. In this situation the construction of confidence bounds for Y 0 can be based on local approximations of f by convex and concave generators. 4.5.1 Upper bounds We first turn to the construction of upper bounds. For fixed i=0,...,n −1we assume that some approximation ( ˜ Y j , ˜ Z j ) j=i,...,n−1 of (Y j ,E j [β j+1 Y j+1 ]) j=i,...,n−1 is 169 given. This approximation can be pre-computed by any algorithm. We merely assume that the approximation is adapted and satisfies n−1 j=i E | ˜ Y j |+|α j · ˜ Z j | < ∞. The set of such admissible input approximations is denoted by A i . We now choose a measurable function h up :Ω×{0,...,n}×R×R D ×R×R D → R with the following properties: a) h up (·,˜ y,˜ z;y,z)isadaptedforevery(˜ y,˜ z), (y,z) ∈R×R D .Moreoverh up satisfies the stochastic Lipschitz condition |h up (i,˜ y,˜ z;y,z)−h up (i,˜ y,˜ z;y ,z )|≤ α (0) i |y −y |+ D d=1 α (d) i |z d −z d | for every (˜ y,˜ z), (y,z), (y ,z ) ∈ R ×R D (with the same stochastic Lipschitz constants as f). b) h up (·,˜ y,˜ z;y,z)isconvexin(y,z),h up (·,˜ y,˜ z;0,0) = 0 for every (˜ y,˜ z) ∈R×R D , and h up (i,˜ y,˜ z;˜ y −y,˜ z −z) ≥ f(i,y,z)−f(i,˜ y,˜ z), ∀(˜ y,˜ z),(y,z) ∈R×R D . 170 Remark 4.5.1. Given h up and the approximate ( ˜ Y i , ˜ Z i ) we can define a new gen- erator f up (i,y,z):= f(i, ˜ Y i , ˜ Z i )+h up (i, ˜ Y i , ˜ Z i , ˜ Y i −y, ˜ Z i −z). Then f up (i,y,z)isconvexin(y,z) and dominates the original generator f, i.e. f up (i,y,z) ≥ f(i,y,z). Moreover, E[|f up (i,Y i ,E[β i+1 Y i+1 ])−f(i,Y i ,E[β i+1 Y i+1 ])|] ≤ 2E α (0) i | ˜ Y i −Y i |+|α i ·( ˜ Z i −E i [β i+1 Y i+1 ])| , which shows that -evaluated at the true solution (Y i ,E[β i+1 Y i+1 ]) - the auxiliary generator f up approximates the true generator f, as the approximation ( ˜ Y, ˜ Z) approaches the true solution. A generic choice is the function f |up| (i,˜ y,˜ z;y,z)= α (0) i |y|+ D d=1 α (d) i |z d |, which obviously satisfies the properties a) and b) above. We will illustrate in the numerical examples below, that it might be beneficial to tailor the function h up to the specific problem instead of applying the generic choice h |up| . 171 Given h up , ( ˜ Y, ˜ Z) we define Θ h up i =Θ h up i ( ˜ Y, ˜ Z)via Θ h up i =max S i , Θ h up i+1 −( ˜ Y i+1 −E i [ ˜ Y i+1 ])+f i ( ˜ Y i , ˜ Z i )Δ i +h up (i, ˜ Y i , ˜ Z i ; ˜ Y i −Θ h u p i , ˜ Z i −β i+1 Θ h up i+1 +β i+1 ˜ Y i+1 −E i [β i+1 ˜ Y i+1 ])Δ i (5.1) initiated at Θ h up n = S n . We then obtain the following minimization problem with value process Y i in terms of Θ h up i ( ˜ Y, ˜ Z). Theorem 4.5.2. For every i=0,...,n, Y i = essinf ( ˜ Y, ˜ Z)∈A i E i [Θ h up i ( ˜ Y, ˜ Z)]. Moreover, a minimizing pair is given by (Y ∗ j ,Z ∗ j )=(Y j ,E j [β j+1 Y j+1 ]) which even satisfies the principle of pathwise optimality. Proof. Wefixapairofadaptedandintegrableprocess( ˜ Y, ˜ Z)anddefineY up j ,j ≥ i, as Y up n = S n , Y up j =max S j , E j [Y up j+1 ]+ f(j, ˜ Y j , ˜ Z j ) +h up j, ˜ Y j , ˜ Z j ; ˜ Y j −Y up j , ˜ Z j −E j [β j+1 Y up j+1 ] Δ j , 172 which satisfies Y up i ≥ Y i by the comparison result in Proposition 4.2.3. Then, an application of Theorem 4.3.1, with Y i replaced by Y up i yieldsE i [Θ h up i ( ˜ Y, ˜ Z)] ≥ Y up i . Hence Y i ≤ esssup ( ˜ Y, ˜ Z)∈A i E i [Θ h up i ( ˜ Y, ˜ Z)]. It now suffices to show that Y j =Θ h up j (Y,E · [β ·+1 Y ·+1 ]) =: Θ h up ,∗ j , P−almost surely for every j = i,...,n. This is certainly true for j = n. Going backwards in time we obtain by induction Θ h up ,∗ j =max ( S j ,Y j+1 −(Y j+1 −E j [Y j+1 ])+f(j,Y j ,E j [β j+1 Y j+1 ])Δ j +h up j,Y j ,E j [β j+1 Y j+1 ];Y j −Θ h up ,∗ j , E j [β j+1 Y j+1 ]−β j+1 Y j+1 +β j+1 Y j+1 −E j [β j+1 Y j+1 ] Δ j 4 =max{S j ,E j [Y j+1 ] +(f(j,Y j ,E j [β j+1 Y j+1 ])+h up (j,Y j ,E j [β j+1 Y j+1 ];Y j −Θ h up ,∗ j ,0))Δ j }. 173 As h up (j,Y j ,E j [β j+1 Y j+1 ];0,0) = 0, we observe that Y j also solves the above equa- tion. Hence, by uniqueness (due to the Lipschitz assumption on h up ), we obtain Y j =Θ h up ,∗ j . 4.5.2 Lower bounds A maximization problem with value process Y i can be constructed similarly by bounding f from below by a concave generator. The main difference is that in place of the results of Section 4.3 we now rely on the following result for the concave case which is proved at the end of this section: Theorem 4.5.3. Suppose f is concave in (y,z). (i) Then for every i=0,...,n, Y i = essinf M 0 ∈M 1 essinf (r,ρ)∈U i ((−f) # ) E i [ϑ up i (r,ρ,M 0 )],where ϑ up i (r,ρ,M 0 )= max k=i,...,n ( Γ i,k (−r,−ρ)S k + k−1 j=i Γ i,j (−r,−ρ) (−f) # (j,r j ,ρ j )Δ j 1+r j Δ j −(M 0 k −M 0 i ) 4 . Minimizers (even in the sense of pathwise optimality) are given by (r ∗ j ,ρ ∗ j ) j≥i satisfying −r ∗ j Y j −ρ ∗T j E j [β j+1 Y j+1 ]+(−f) # (j,r ∗ j ,ρ ∗ j )= f(j,Y j ,E j [β j+1 Y j+1 ]) (5.2) 174 and M 0,∗ being the martingale part of the Doob decomposition of (Y j Γ i,j (−r ∗ ,−ρ ∗ )) j≥i . (ii) Given a stopping time τ ∈ S i and a martingale (M 0 ,M)∈M 1+D , define ϑ low j = ϑ low j (τ,M 0 ,M) for i ≤j<τ via ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ ϑ low τ = S τ , ϑ low j = ϑ low j+1 −(M 0 j+1 −M 0 j )+f(j,ϑ low j ,β j+1 ϑ low j+1 −(M j+1 −M j ))Δ j . Then Y i = esssup r∈S i esssup (M 0 ,M)∈M 1+D E i [ϑ low i (r,ρ,M 0 )] A maximizer (even in the sense of pathwise optimality) is given by the triplet (τ ∗ i ,M 0,∗ ,M ∗ ), where τ ∗ i was defined in (2.3) and M 0,∗ ,M ∗ are the Doob martingales of Y and βY, respectively. This result is not completely symmetric to the convex case, because the reflec- tion at a lower barrier (i.e. application of the maximum-operator) is convex. Note that if f is concave itself then the upper and lower bounds from Theorem 4.5.3 are preferable to the upper bound of Theorem 4.5.2 and to the generic lower bounds which are constructed next. We denote by h low any mapping which satisfies the same properties as h up but with condition b) in Subsection 4.5.1 by 175 b’) h low (i,˜ y,˜ z;y,z) is concave in (y,z),h low (i,˜ y,˜ z;0,0) = 0 for every (˜ y,˜ z) ∈ R×R D , and h low (i,˜ y,˜ z;˜ y −y,˜ z −z) ≤ f(i,y,z)−f(i,˜ y,˜ z) for every (˜ y,˜ z), (y,z) ∈R×R D . The generic choice is now h |low| (i,˜ y,˜ z;y,z)= −α (0) i |y|− D d=1 α (d) i |z d |. Given h low , a pair of adapted processes ( ˜ Y, ˜ Z) and a stopping time τ ∈ S 0 we define Θ h low i =Θ h low i ( ˜ Y, ˜ Z,τ)via Θ h low i =Θ h low i+1 −( ˜ Y i+1 −E i [ ˜ Y i+1 ])+f i ( ˜ Y i , ˜ Z i )Δ i +h low (i, ˜ Y i , ˜ Z i ; ˜ Y i −Θ h low i , ˜ Z i −β i+1 Θ h low i+1 +β i+1 ˜ Y i+1 −E i [β i+1 ˜ Y i+1 ])Δ i (5.3) fori<τ initiated at Θ h low τ = S τ . Making use of Theorem 4.5.3 and the same arguments as in Theorem 4.5.2 we obtain the next theorem. Theorem 4.5.4. For every i=0,...,n, Y i = esssup τ∈S i esssup ( ˜ Y, ˜ Z)∈A i E i [Θ h low i ( ˜ Y, ˜ Z,τ)]. 176 Moreover, a minimizing triplet is given by (Y ∗ j ,Z ∗ j ,τ ∗ )= (Y j ,E j [β j+1 Y j+1 ],τ ∗ i ) which even satisfies the principle of pathwise optimality. (We recall that τ ∗ i was defined in (2.3)). Example 4.5.5. For the generic choices h |up| and h |low| , we can apply Proposition 4.3.2 in order to make the recursion formulas in (5.1) and (5.3) explicit. They read Θ h |up| i =max 2 S i , sup r∈{−α (0) i ,α (0) i } 1 1+rΔ i Θ h |up| i+1 −( ˜ Y i+1 −E i [ ˜ Y i+1 ])+f(i, ˜ Y i , ˜ Z i )Δ i +α (0) i | ˜ Y i |+ D d=1 α (d) i | ˜ Z d,i −β d,i+1 Θ h |up| i+1 +β d,i+1 ˜ Y i+1 −E i [β d,i+1 ˜ Y i+1 ]|Δ i /3 and Θ h |low| i =inf r∈{−α (0) i ,α (0) i } 1 1+rΔ i Θ h |low| i+1 −( ˜ Y i+1 −E i [ ˜ Y i+1 ])+f(i, ˜ Y i , ˜ Z i )Δ i −α (0) i | ˜ Y i |− D d=1 α (d) i | ˜ Z d,i −β d,i+1 Θ h |low| i+1 +β d,i+1 ˜ Y i+1 −E i [β d,i+1 ˜ Y i+1 ]|Δ i / The main advantage of the corresponding upper and lower bounds is that they can be calculated generically without any extra information on f (such as the convex conjugates which were required in the section on convex generators). There is, however, a price to pay for this generic approach. Indeed, given the Lipschitz 177 process α (d) i , the choice h |up| ,h |low| can be shown to lead to the crudest upper and lower bounds among all admissible functions h up ,h low , i.e. E i [Θ h |up| i ( ˜ Y, ˜ Z)] ≥E i [Θ h up i ( ˜ Y, ˜ Z)] for every pair ( ˜ Y, ˜ Z), and analogously for the lower bounds. In practice, the generic bounds may be too crude, when D is large and the approximation ˜ Z i of E i [β i+1 Y i+1 ] is not yet very good. In general we therefore recommend to choose the functions h up and h low in a way that h up (i, ˜ Y i , ˜ Z i ;y,z)and h low (i, ˜ Y i , ˜ Z i ;y,z) are close to zero in a neighborhood of zero in the (y,z)−coordinates, in which one expects the residuals ( ˜ Y i −Y i , ˜ Z i −E i [β i+1 Y i+1 ]) to be typically located. We close this section with the proof of Theorem 4.5.3 Proof. (i) To simplify the notations we denote A i,k (r,ρ):=Γ i,k (−r,−ρ)S k + k−1 j=i Γ i,j (−r,−ρ) (−f) # (j,r j ,ρ j )Δ j 1+r j Δ j . Then E i [ϑ up i (r,ρ,M 0 )] = E i [max k=i,...,n {A i,k (r,ρ)−(M 0 k −M 0 i )}] ≥ esssup τ∈S i E i [A i,τ (r,ρ)−(M 0 τ −M 0 i )] ≥ esssup τ∈S i Y (τ) i = Y i , (5.4) 178 where Y (τ) i is defined in Proposition 2.2, the last equality above follows by Propo- sition 2.2, and the inequality in (5.4) holds for the following reasoning: Let ˆ Y τ j := E j [A j,τ (r,ρ)], then ˆ Y τ τ = Y (τ) τ . For all j satisfying that i ≤j<τ we have ˆ Y τ j = E j 0 1−ρ j β j+1 Δ j 1+r j Δ j ˆ Y τ j+1 + (−f) # (i,r j ,ρ j )Δ j 1+r j Δ j 1 = E j [ ˆ Y τ j+1 ]− ρ j E j [β j+1 ˆ Y τ j+1 ]+r j ˆ Y τ j −(−f) # (i,r j ,ρ j ) Δ j ≥ E j [ ˆ Y τ j+1 ]− (−f) j, ˆ Y τ j ,E j [β j+1 ˆ Y τ j+1 ] Δ j . Thus the inequality in (5.4) follows by Proposition 2.3. Now we only need to verify that Y i = E i [ϑ up i (r ∗ ,ρ ∗ ,M 0,∗ )]. (5.5) 179 This can be shown by backward induction, since Y n = S n = E n [ϑ up n (r ∗ ,ρ ∗ ,M 0,∗ )]. Actually, if (5.5) holds for Y i+1 , then E i [ϑ up i (r ∗ ,ρ ∗ ,M 0,∗ )] =E i 0 max k=i,...,n 2 A i,k (r ∗ ,ρ ∗ )− k−1 l=i (M 0,∗ l+1 −M 0,∗ l ) 31 =max ( S i , 1−ρ ∗ i β i+1 Δ i 1+r ∗ i Δ i Y i+1 + (−f) # (i,r ∗ i ,ρ ∗ i )Δ i 1+r ∗ i Δ i − 1−ρ ∗ i β i+1 Δ i 1+r ∗ i Δ i Y i+1 −E i 1−ρ ∗ i β i+1 Δ i 1+r ∗ i Δ i Y i+1 4 =max ( S i ,E i 1−ρ ∗ i β i+1 Δ i 1+r ∗ i Δ i Y i+1 + (−f) # (i,r ∗ i ,ρ ∗ i )Δ i 1+r i Δ i 4 =max ( S i ,E i 1−ρ ∗ i β i+1 Δ i 1+r ∗ i Δ i Y i+1 + f(i,Y i ,E i [β i+1 Y i+1 ])+r ∗ i Y i +ρ ∗ i E i [β i+1 Y i+1 ] 1+r ∗ i Δ i Δ i 4 =max ( S i , E i [Y i+1 ]+f(i,Y i ,E i [β i+1 Y i+1 ])Δ i +r ∗ i Y i Δ i 1+r ∗ i Δ i 4 . One can easily see that when Y i = S i , E i [ϑ up i (r ∗ ,ρ ∗ ,M 0,∗ )] = S i ;when Y i = E i [Y i+1 ]+ f(i,Y i ,E i [β i+1 Y i+1 ])Δ i >S i , E i [ϑ up i (r ∗ ,ρ ∗ ,M 0,∗ )] = S i . Therefore part (i) is proved. (ii)By the concavity of f, we have for i ≤j<τ E j [ϑ low j ] ≤E j [ϑ low j+1 ]+f(j,E j [ϑ low j ],E j [β j+1 ϑ low j+1 ])Δ j ,ϑ low τ = S τ . Following Proposition 2.3, we know E j [ϑ low j ] ≤ Y (τ) j . Moreover, when M 0,∗ ,M ∗ are the Doob martingales of Y and βY, respectively, then one can claim that 180 ϑ low j (τ,M 0,∗ ,M ∗ )= Y (τ) j . This can be shown by backward induction on j, with a trivial initial case ϑ low,∗ τ = Y (τ) τ . Suppose that this claim holds true for i+1, then ϑ low,∗ j = Y (τ) j+1 −(Y (τ) j+1 −E j [Y (τ) j+1 ]) +f(j,ϑ low,∗ j ,β j+1 Y (τ) j+1 −(β j+1 Y (τ) j+1 −E j [β j+1 Y (τ) j+1 ]))Δ j = E j [Y (τ) j+1 ]+f(j,ϑ low,∗ j ,E j [β j+1 Y (τ) j+1 ])Δ j . Hence ϑ low (τ,M 0,∗ ,M ∗ )= Y (τ) follows by the Lipschitz property of f with respect to y. By Proposition 2.2 the assertion in this part holds. 4.5.3 Numerical examples Once the functions h up and h low are chosen, an algorithm for computing confidence intervals for Y 0 based on Theorem 4.5.2 and 4.5.4 can be designed analogously to the primal-dual algorithm in Section 4.4.1 for the convex case. We first illustrate the algorithm in the context of Example 4.2.1 (ii). For the underlying, we choose the same five-dimensional geometric Brownian motion as in Section 4.4.3except thatT = 1and thedrift and risk-freerateequal R=0.02.The payoff of the (European) claim is given by G n (x)=min d=1,...,D x d . For the default risk function Q, we assume that there are three regimes, high risk, intermediate risk and low risk: There are thresholds v h <v l and rates γ h >γ l such that Q(y)= γ h fory<v h and Q(y)= γ l fory>v l . Over [v h ,v l ],Q interpolates 181 linearly. The resulting function f is Lipschitz continuous but generally neither convex norconcave. Thecandidates fortheLipschitz constantα (0) aretheabsolute values of the left and right derivatives of f in v h and v l . In the implementation, we stick to the generic choice −h | low | (i,˜ y;y)= h |up| (i,˜ y;y)= α (0) |y|, using that the nonlinearity is independent of the Z−part in this example. We choose v h =54,v l =90,γ h =0.2,γ l =0.02. For the calculation of ˜ y, we use the Lemor-Gobet-Warin algorithm with two basis functions, 1 and E[G n (X n )|X i = x], and Λ reg = 100,000. Moreover, Λ out = 4000, Λ in = 1000. In the absence of default risk, the claim’s value is given by 78.37. Table 4.2 displays upper and lower price bounds for different time discretizations and recovery rates δ. As expected, a smaller recovery rate leads to a smaller option value. The relative width of the confidence intervals is well below 0.5% in all cases. For the larger values of δ, the bounds are even tighter: Larger values of δ lead to less nonlinearity in the pricing problem and to smaller Lipschitz constants (α (0) =0.41, 0.27, 0.12 for δ=0, 1 3 , 2 3 ). Compared to the example of Section 182 δ\n 40 80 120 160 0 71.6551 71.8589 71.6774 71.8828 71.6664 71.8656 71.6621 71.8659 (0.0071) (0.0068) (0.0072) (0.0068) (0.0070) (0.0068) (0.0069) (0.0072) 1 74.1023 74.2241 74.1010 74.2225 74.1032 74.2229 74.1187 74.2391 3 (0.0062) (0.0060) (0.0065) (0.0062) (0.0062) (0.0061) (0.0065) (0.0063) 2 76.3335 76.3865 76.3364 76.3886 76.3416 76.3943 76.3290 76.3814 3 (0.0057) (0.0057) (0.0057) (0.0057) (0.0059) (0.0058) (0.0061) (0.0059) Table 4.2: Upper and lower price bounds for different recovery rates and time discretizations. Standard deviations are in brackets. 4.4.3, the bounds are much less dependent on the time discretization. This is due to the fact that no Z−part has to be approximated, as is the case for many BSDEs in the credit risk literature, see [32] and [55]. To sum up, the generic approach is perfectly sufficient in this example. We finally revisit the example of Section 4.4.3. For the input approximation we run the martingale basis algorithm with seven basis functions for Y and 1,000 regression paths as specified there. The confidence bounds for the European call spread option on the maximum of fives Black-Scholes stocks are calculated with Λ in =Λ out = 1000 paths based on the following choices of h low and h up . For the fully generic implementation we apply −h |low| (i,˜ y,˜ z;y,z)= h |up| (i,˜ y,˜ z;y,z)= R b |y|+ max{|R b −μ|, |R l −μ|} σ 5 d=1 |z d |. For the semi-generic implementation we choose h low (i,˜ y,˜ z;y,z)= R l y + μ−R l σ 5 d=1 z d −(R b −R l ) . y − 5 d=1 z d σ / − , h up (i,˜ y,˜ z;y,z)= R l y + μ−R l σ 5 d=1 z d +(R b −R l ) . y − 5 d=1 z d σ / + . 183 Algorithm\n 40 80 120 160 fully generic 13.3604 14.1774 12.7905 14.7496 12.0148 15.8512 10.7872 17.5326 (0.0132) (0.0169) (0.0332) (0.0407) (0.0612) (0.0834) (0.1005) (0.1504) semi-generic 13.7259 13.8505 13.6984 13.8801 13.6811 13.9136 13.6686 13.9459 (0.0041) (0.0046) (0.0053) (0.0059) (0.0059) (0.0071) (0.0065) (0.0078) Table 4.3: Upper and lower price bounds for different time discretizations under the fully generic and semi-generic algorithms. Standard deviations are in brackets. This choice only partially exploits the structure of the generator. It can be applied to any generator which is a linear function of (y,z) plus a nondecreasing (R b − R l )−Lipschitz continuous function of a linear combination of (y,z). The specific form of the Lipschitz function is not used in this construction of h low and h up , but, of course, the coefficients for the linear combinations must be adjusted to the generator in the obvious way. For this semi-generic case the pathwise recursion formulas for Θ h up andΘ h low canbe made explicit intime analogously to the generic case, which was discussed in Example 4.5.5 Table 4.3 shows the resulting low-biased and high-biased estimates for the option price Y 0 as well as their empirical standard deviations. We observe that the generic bounds are not satisfactory in this example. The relative width of the 95% confidence intervals ranges from about 6.5% for n = 40 to more than 65% for n = 160 time steps. This can be explained by the fact that the approximation of E i [β i+1 Y i+1 ]by ˜ Z i (which is expressed in terms of just two basis functions) is not yet good enough. The quality of ˜ Z plays an all important role for the generic bounds due to the appearance of the terms 5 d=1 |z d | in the definitions of h |low| and h |up| . In the semi-generic setting the expressions of the form (y − 5 d=1 z d σ ) ± 184 in h up and h low are much more favorable concerning the approximation error of E i [β i+1 Y i+1 ]by ˜ Z i . Therefore, the semi-generic implementation yields much better 95% confidence intervals with a relative width of about 1% for n = 40 and still less than 2.5% for n = 160 time steps. By and large, this example shows that the generic bounds may be too crude, if applied to good but not excellent approximations ( ˜ Y, ˜ Z), in particular when the z−variable of the generator is high-dimensional. Nonethelss every acceptable confidence intervals can still be obtained based on the same approximation ( ˜ Y, ˜ Z), if some information about the generator is incorporated in the choice of h up and h low . 185 Bibliography [1] Alanko, S. and Avellaneda, M. (2013). Reducing Variance in The Numerical Solution of BSDEs. C.R. Math. Acad. Sci. Paris 351, 135–138, 2013. [2] Andersen, L. and Broadie, M. (2004). A Prmal-Dual Simulation Algorithm for Pricing Multidimensional American Options. Management Sci. 50, 1222– 1234. [3] Bally, V. and Pag` es, G. (2003). A Quantization Algorithm for Solving Multi- dimensional Discrete-Time Optimal Stopping Problems. Bernoulli 9, 1003– 1049. [4] Bally, V., Pag` es, G. and Printems, J. (2005). A Quantization Tree Method forPricing andHedgingMultidimensional AmericanOptions. Math. Finance, 15, 119–168. [5] Barles, G. and Jakobsen, E. R. (2007). Error Bounds for Monotone Approx- imation Schemes for Parabolic Hamilton-Jacobi-Bellman Equations. Math. Comp., 76, 1861–1893. [6] Barles, G. and Souganidis, P. E. (1991). Convergence of Approximation Schemes for Fully Nonlinear Second Order Equation, Asymptotic Anal., 4 , 271–283. [7] Belomestny, D. (2011). Pricing Bermudan Options by Nonparametric Regres- sion: Optimal Rates of Convergence for Lower Estimates. Finance Stoch. 15, 655–683. [8] Belomestny, D., Bender, C. and Schoenmakers, J. (2009). True Upper Bounds for Bermudan Products via Non-Nested Monte Carlo. Math. Finance 19, 53– 71. 186 [9] Bender, C. and Denk, R. (2007). A Forward Scheme for Backward SDEs. Stoch. Process. Appl., 117, 1793–1812. [10] Bender, C., Schweizer, N. and Zhuo, J. (2014). A Primal-Dual Algorithm for BSDEs arXiv preprint 1310.3694 [11] Bender, C. and Steiner, J. (2012). Least-Squares Monte Carlo for BSDEs. Numerical Methods in Finance, Springer, Carmona et al. (Eds.) 257–289. [12] Bender, C. and Steiner, J. (2013). A-Posteriori Estimates for Backward SDEs. SIAM/ASA J. Uncertainty Quantification 1, 139–163. [13] Bender, C. and Zhang, J. (2008). Time Discretization and Markovian Itera- tion for Coupled FBSDEs. Annals of Applied Probability, 18, 143–177. [14] Bergman, Y. Z. (1995). Option Pricing with Differential Interest Rates. Rev. Financ. Stud. 8, 475-500. [15] Bonnans, J. F. and Zidani, H. (2003). Consistency of Generalized Finite Difference Schemes for The Stochastic HJB Equation. SIAM J. Numer. Anal., 41, 1008–1021. [16] Briand, P., Delyon, B. and M´ emin, J. (2001). Donsker-type Theorem for BSDEs, Electron. Comm. Probab., 6 , 1–14 (electronic). [17] Briand, Ph. and Labart, C. Simulation of BSDEs by Wiener Chaos Expan- sion, Annals of Applied Probability, to appear. [18] Bouchard, B. and Chassagneux, J.-F. (2008). Discrete-Time Approximation for Continuously and Discretely Reflected BSDEs. Stochastic Process. Appl. 118, 2269–2293. [19] Bouchard, B. and Elie, R. (2008). Discrete-Time Approximation of Decoupled Forward-Backward SDE with Jumps. Stochastic Process. Appl. 118, 53–75. [20] Bouchard, B. and Touzi, N. (2004). Discrete-time Approximation and Monte- Carlo Simulation of Backward Stochastic Differential Equations. Stochastic Process. Appl., 111, 175–206. [21] Bouchard, B. and Warin, X. (2012). Monte-Carlo Valuation of American Options: Facts and New Algorithms to Improve Existing Methods. Springer Proceedings in Mathematics, 12, 215–255. [22] Buckdahn, R., Ma, J. and Zhang, J. Pathwise Taylor Expansions for Random Fields on Multiple Dimensional Paths, preprint, arXiv:1310.0517. 187 [23] Brown, D. B., Smith, J. E. and Sun, P. (2010). Information Relaxations and Duality in Stochastic Dynamic Programs. Oper. Res. 58, 785–801. [24] Chassagneux, J.-F.andRichou, A.(2013).Numerical SimulationofQuadratic BSDEs. arXiv preprint 1307. 5741. [25] Cheridito, P., Kupper, M. and Vogelpoth, N. (2012). Conditional Analysis on R d . arXiv preprint 1211.0747 [26] Cheridito, P., Soner, H. M., Touzi, N. and Victoir, N. (2007). Second-Order Backward Stochastic Differential Equations and Fully Non-Linear Parabolic PDEs. Commun. Pur. Appl. Math, 60, 1081–1110. [27] Cheridito, P., Stadje, M. (2013). BSΔEs and BSDEs with Non-Lipschitz Drivers: Comparison, Convergence and Robustness. Bernoulli 19, 1047–1085. [28] Clement, E., Lamberton, D. and Protter, P. (2002). An Analysis of A Least Squares Regression Method for American Option Pricing. Finance Stoch., 6, 449–471. [29] Cohen, S.N. and Elliott, R. J. (2010). A General Theory of Finite State Back- ward StochasticDifference Equations.Stochastic Process. Appl. 120, 442–466. [30] Cont, R. and Fournie, D. (2013). Functional Itˆ o Calculus and Stochastic Integral Representation of Martingales, Annals of Probability, 41 , 109–133. [31] Crandall, M.G., Ishii, H. and Lions, P.L. (1992). User’s guide to Viscosity Solutions of Second Order Partial Differential Equations, Bull. Amer. Math. Soc. (NS), 27, 1–67. [32] Cr´ epey, S., Gerboud, R., Grbac, Z. and Ngor, N. (2013). Counterparty Risk and Funding: The Four Wings of The TVA. Int, J. Theor. Appl. Finance 16, 1350006. [33] Crisan, D. and Manolarakis, K. (2012). Solving Backward Stochastic Dif- ferential Equations Using The Cubature Method: Application To Nonlinear Pricing. SIAM J. Financial Math. no. 1, 534–571. [34] Cvitanic, J. and Zhang, J. (2005). The Steepest Descent Method for FBSDEs. Electronic Journal of Probability, 10, 1468–1495. [35] Delarue, F. and Menozzi, S. (2006). A Forward-Backward Stochastic Algo- rithm for Quasi-Linear PDEs. Ann. Appl. Probab., 16, 140–184. [36] Desai, V. V., Farias, V. F.andMoallemi, C. C. (2012).Pathwise Optimization for Optimal Stopping Problems. Management Sci. 58, 2292–2308. 188 [37] Di Nunno, G., Øksendal, B. and Proske, F. (2009). Malliavin Calculus for L´ evy Process with Applications in Finance. Springer. [38] Douglas, J., Ma, J. and Protter, P. (1996). Numerical Methods for Forward- Backward Stochastic Differential Equations. Ann. Appl. Probab., 6, 940–968. [39] Duffie, D., Schroder, M. and Skiadas, C. (1996). Recursive Valuation of Defaultable Securities and The Timing of Resolution of Uncertainty. Ann. Appl. Probab. 6, 1075-1090. [40] Dupire, B. (2009). Functional Itˆ ocalculus, papers.ssrn.com. [41] Ekren, I., Keller, C., Touzi, N. and Zhang, J. (2014). On Viscosity Solutions of Path Dependent PDEs, Annals of Probability, 42, 204-236 [42] Ekren, I., Touzi, N. and Zhang, J. Optimal Stopping under Nonlinear Expec- tation, Stochastic Processes and Their Applications, to appear. [43] Ekren, I., Touzi, N. and Zhang, J. Viscosity Solutions of Fully Nonlinear Parabolic Path Dependent PDEs: Part I, preprint, arXiv:1210.0006. [44] Ekren, I., Touzi, N. and Zhang, J. Viscosity Solutions of Fully Nonlinear Parabolic Path Dependent PDEs: Part II, preprint, arXiv:1210.0007. [45] EL Karoui, N., Peng, S. and Quenez, M. C. (1997). Backward Stochastic Differential Equations in Finance Math. Finance 7, 1–71. [46] Fahim,A.,Touzi, N.andWarin,X.(2011). AProbabilisticNumericalMethod for Fully Nonlinear Parabolic PDEs, Ann. Appl. Probab., 21, 1322–1364. [47] Fleming, W. and Soner, H.M. (2006). Controlled Markov Processes and Vis- cosity Solutions, 2nd ed., Stochastic Modelling and Applied Probability, 25. Springer, New York [48] Glasserman, P. and Yu, B. (2004). Number of Paths Versus Number of Basis Functions in American Option Pricing. Annals of Applied Probability, 14, 2090–2119. [49] Gobet, E. and Labart, C. (2007). Error Expansion for The Discretization of Backward Stochastic Differential Equations. Stochastic Process. Appl. 117, 803–829. [50] Gobet, E., Lemor, J. and Waxin, X. (2005). A Regression-based Monte CarloMethod toSolve Backward Stochastic Differential Equations. Ann. Appl. Probab., 15, 2172–2202. 189 [51] Gobet, E. and Makhlouf, A. (2010). L 2 −Time Regularity of BSDEs with Irregular Terminal Functions. Stochastic Process. Appl. 120, 1105–1132. [52] Guo, W., Zhang, J. and Zhuo, J. A Monotone Scheme for High Dimensional Fully Nonlinear PDEs, Annals of Applied Probability, to appear [53] Guyon, J. and Henry-Labord` ere, P. (2011). Uncertain Volatility Model: A Monte Carlo Approach. Journal of Computational Finance. Published online 22 Feb 2011. [54] Haugh, M. and Kogan, L. (2004). Pricing American Options: A Duality Approach. Oper. Res. 52, 258-270. [55] Henry-Labord` ere (2012). Cutting CVA’s Complexity. Risk Magazine, 67–73, July 2012. [56] Henry-Labord` ere, P., Tan, X. and Touzi, N. A Numerical Algorithm for A Class of BSDE via Branching Process, Stochastic Processes and their Appli- cations, to appear. [57] Hu, Y., Nualart, D. and Song, X. (2011). Malliavin Calculus for Backward Stochastic Differential Equations and Application to Numerical Solutions, Annals of Applied Probability, 21, 2053–2482. [58] Johnson, H. (1987). Options on The Maximum or The Minimum of Several Assets J. Financial Quant. Anals. 22, 277-283. [59] Krylov, N. (1998). On The Rate of Convergence of Finite Difference Approx- imations for Bellman Equations, St. Petersburg Math. J., 9 , 639–650. [60] Ladyzenskaja, O.A., Solonnikov, V. A. and Ural’ceva, N.N. (1968). Linear and Quasilinear Equations of Parabolic Type. Providence, R.I.: Amer. Math. Soc. [61] Laurent, J.-P., Amzelek, P. and Bonnaud, J. (2012). An Overview of The Valuation of Collateralized Derivative Contracts. Working Paper, Universit´ e Paris I. [62] Lemor, J.-P., Gobet, E. and Warin, X. (2006). Rate of Convergence of An Empirical Regression Method for Solving Generalized Backward Stochastic Differential Equations. Bernoulli 12, 889-916. [63] Longstaff, F. A., and Schwartz, R.S. (2001). Valuing American options by simulation: A simple least-squares approach. Rev. Financial Studies, 14, 113– 147. 190 [64] Ma, J., Protter, P., San-Martin, J. and Torres, S. (2002). Numerical Method forBackward Stochastic Differential Equations, Annals of Applied Probability, 12, 302–316. [65] Ma, J., Protter, P. and Yong, J. (1994). Solving Forward-Backward Stochastic Differential Equations Explicitly-A Four Step Scheme. Probab. Theory Relat. Fields, 98, 339–359. [66] Ma, J., Shen, J. and Zhao, Y. (2008). Numerical Method for Forward- Backward Stochastic Differential Equations. SIAM Journal on Numerical Analysis, 46, 2636–2661. [67] Ma, J. and Zhang, J. (2005). Representations and Regularities for Solutions to BSDEs with Reflections. Stochastic Process. Appl. 115, 539–569. [68] Makarov, R. (2003). Numerical Solution of Quasilinear Parabolic Equations and Backward Stochastic Differential Equations. Russian J. Numer. Anal. Math. Modelling, 19, 397–412. [69] Milstein, G. N. and Tretyakov, M. V. (2007). Discretization of Forward- Backward Stochastic Differential Equations and Related Quasi-linear Parabolic Equations. J. Numer. Anal., 27, 24–34. [70] Nualart, D. (2006). The Malliavin Calculus and Related Topics. 2nd ed., Springer. [71] Pallavicini, A., Perini, D. and Brigo, D. (2012). Funding, Collateral and Hedging: Uncovering the Mechanics and the Subtleties of Funding Valuation Adjustments. arXiv present 1210.3811 [72] Pardoux, E. and Peng, S. (1992). Backward stochastic differential equations and quasilinear parabolic partial differential equations. Lecture Notes in CIS, Springer, 176, 200–217. [73] Pardoux, E. and Peng, S. (1990). Adapted Solutions of Backward Stochastic Differential Equations, System and Control Letters, 14, 55–61. [74] Peng, S. G-Brownian Motion and Dynamic Risk Measure Under Volatility Uncertainty, preprint, arXiv:0711.2834. [75] Peng, S. (2010). Backward Stochastic Differential Equation, Nonlinear Expec- tation and Their Applications, Proceedings of the International Congress of Mathematicians, Hyderabad, India [76] Peng, S.(2010).NonlinearExpectationsandStochasticCalculusunder Uncer- tainty. preprint, arXiv:1002.4546. 191 [77] Peng, S. and Xu, M. (2011). Numerical Algorithms for Backward Stochastic Differential Equations with 1-d Brownian Motion: Convergence and Simula- tions, ESAIM: Mathematical Modelling and Numerical Analysis, 45, 335–360. [78] Rockefellar (1970). Convex Analysis Princeton University Press. [79] Rogers, L.C.G. (2002). Monte Carlo Valuation of American Options. Math. Finance 12, 271–286. [80] Schoenmakers, J., Zhang, J. and Huang, J. (2013).Optimal Dual Martingales, Their Analysis, and Application to New Algorithms for Bermudan Products. SIAM J. Financial Math. 4, 86–116. [81] Soner, H.M., Touzi, N. and Zhang, J. (2012). Well-Posedness of Second Order Backward SDEs, Probability Theory and Related Fields, 153, 149-190. [82] Tan, X. (2011). Discrete-Time Probabilistic Approximation of Path- Dependent Stochastic Control Problems. Annals of Applied Probability, to appear. [83] Tan, X. (2013). A Splitting Method for Fully Nonlinear Degenerate Parabolic PDEs. Electron. J. Probab., 18, 1–24. [84] Yong, J. and Zhou, X. ( 1999). Stochastic Control: Hamiltonian Systems and HJB Equations, Applications of Mathematics: Stochastic Modelling and Applied Probability, Vol. 43, Springer-Verlag, 1999. [85] Zhang, G., Gunzburger, M. and Zhao, W. (2013). A Sparse-Grid Method for Multi-Dimensional Backward Stochastic Differential Equations. J. Comput. Math. 31, 221–248. [86] Zhang, J. (2004). A Numerical Scheme for Backward Stochastic Differential Equations Annals of Applied Probability, 14, 459–488. [87] Zhang, J. and Zhuo, J. (2014). Monotone Schemes for Fully Nonlinear Parabolic Path Dependent PDEs. Journal of Financial Engineering, to appear 192
Abstract (if available)
Abstract
In this dissertation we will discuss three topics, starting with a monotone scheme for high dimensional fully nonlinear PDEs, which include the quasi‐linear PDE that corresponds to a coupled FBSDE as a special case. This work is strongly motivated by the remarkable work by Fahim, Touzi and Warin, and stays in the paradigm of monotone schemes initiated by Barles and Souganidis. It weakens a critical constraint imposed in the previous literature, especially when the generator of the PDE depends only on the diagonal terms of the Hessian matrix. Several numerical examples, up to dimension 12, are reported. ❧ The second part is dedicated to monotone schemes for fully nonlinear path‐dependent PDEs. This is an extension of the seminal work of PDE by Barles and Souganidis to path dependent case. Based on the viscosity theory of path dependent PDEs, developed by Ekren, Keller, Touzi and Zhang, we show that a monotone scheme converges to the unique viscosity solution of the (fully nonlinear) parabolic path dependent PDE. An implementable example of such monotone scheme is proposed and tested with a 1‐dimensional example. Moreover, in the case that the solution is smooth enough, we obtain the rate of convergence of our scheme. ❧ Finally we propose an extension of the primal‐dual algorithm, which is popular in the pricing of early‐exercise options, to a backward dynamic programming equation associated with time discretization schemes of (reflected) BSDEs. This is a collaborative project with Bender and Schweizer. Taking as an input some approximate solution of the backward dynamic program, which was precomputed, e.g., by least‐squares Monte Carlo, our methodology allows us to construct a confidence interval for the unknown true solution of the time discretized(reflected) BSDE at time 0. We numerically demonstrate the practical applicability of our method in two five‐dimensional nonlinear pricing problems where tight price bounds were previously unavailable. ❧ This dissertation is a collection of the three publications completed during my Ph.D. study.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Pathwise stochastic analysis and related topics
PDF
Numerical methods for high-dimensional path-dependent PDEs driven by stochastic Volterra integral equations
PDF
Path dependent partial differential equations and related topics
PDF
Topics on set-valued backward stochastic differential equations
PDF
Zero-sum stochastic differential games in weak formulation and related norms for semi-martingales
PDF
Topics on dynamic limit order book and its related computation
PDF
Forward-backward stochastic differential equations with discontinuous coefficient and regime switching term structure model
PDF
Defaultable asset management with incomplete information
PDF
Dynamic approaches for some time inconsistent problems
PDF
Tamed and truncated numerical methods for stochastic differential equations
PDF
Equilibrium model of limit order book and optimal execution problem
PDF
Optimal dividend and investment problems under Sparre Andersen model
PDF
Optimal investment and reinsurance problems and related non-Markovian FBSDES with constraints
PDF
Set values for mean field games and set valued PDEs
PDF
Monte Carlo methods of forward backward stochastic differential equations in high dimensions
PDF
New results on pricing Asian options
PDF
Concentration inequalities with bounded couplings
PDF
On non-zero-sum stochastic game problems with stopping times
PDF
Effect of basis functions in least-squares Monte Carlo (LSM) for pricing options
PDF
Large-scale inference in multiple Gaussian graphical models
Asset Metadata
Creator
Zhuo, Jia
(author)
Core Title
Probabilistic numerical methods for fully nonlinear PDEs and related topics
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Applied Mathematics
Publication Date
08/05/2014
Defense Date
05/09/2014
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
BSDE,numerical methods,OAI-PMH Harvest,path‐dependent,PDE,probability
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Zhang, Jianfeng (
committee chair
), Kocer, Yilmaz (
committee member
), Ma, Jin (
committee member
)
Creator Email
jiazhuo@usc.edu,jiazhuo1987@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-452880
Unique identifier
UC11287863
Identifier
etd-ZhuoJia-2766.pdf (filename),usctheses-c3-452880 (legacy record id)
Legacy Identifier
etd-ZhuoJia-2766.pdf
Dmrecord
452880
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Zhuo, Jia
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
BSDE
numerical methods
path‐dependent
PDE
probability