Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Theoretical foundations of approximate Bayesian computation
(USC Thesis Other)
Theoretical foundations of approximate Bayesian computation
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
THEORETICAL F OUND A TIONS OF APPR O XIMA TE BA YESIAN COMPUT A TION b y Alexander Stram A Thesis Presen ted to the F A CUL TY OF THE USC GRADUA TE SCHOOL UNIVERSITY OF SOUTHERN CALIF ORNIA In P artial F ulfillmen t of the Requiremen ts for the Degree MASTER OF AR TS (MA THEMA TICS) Ma y 2015 Cop yrigh t 2015 Alexander Stram Con ten ts Abstract iv 0 Preliminary definitions, theorems and notation 1 0.1 Probabilit y spaces and random v ariables . . . . . . . . . . . . . . . 1 0.2 Common probabilit y distributions . . . . . . . . . . . . . . . . . . . 4 0.3 Con v ergence in Distribution . . . . . . . . . . . . . . . . . . . . . . 6 0.4 Con v ergence in Probabilit y . . . . . . . . . . . . . . . . . . . . . . . 7 0.5 Con v ergence Almost Surely . . . . . . . . . . . . . . . . . . . . . . 7 0.6 Strong La w of Large Num b ers . . . . . . . . . . . . . . . . . . . . . 8 0.7 Cen tral Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 8 0.8 Ba y es’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 0.9 Mark o v c hains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 0.10 Big-O Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1 Mon te Carlo Estimates 13 1.1 Numerical In tegration and the Curse of Dimensionalit y . . . . . . . 13 1.2 Mon te Carlo In tegration . . . . . . . . . . . . . . . . . . . . . . . . 16 1.3 Rejection Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.4 Mark o v c hain - Mon te Carlo . . . . . . . . . . . . . . . . . . . . . . 22 1.5 Imp ortance Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 28 1.6 Sampling/Imp ortance Resampling . . . . . . . . . . . . . . . . . . . 33 2 Sequen tial Mon te Carlo Estimates 35 2.1 Sequen tial Imp ortance Sampling . . . . . . . . . . . . . . . . . . . . 35 2.2 Sequen tial Imp ortance Resampling . . . . . . . . . . . . . . . . . . 36 3 Appro ximate Ba y esian Computation 39 3.1 P oin t Estimates of Lik eliho o d F unctions . . . . . . . . . . . . . . . 40 3.2 ABC Rejection Sampling . . . . . . . . . . . . . . . . . . . . . . . . 41 3.3 ABC Mark o v c hain Mon te Carlo . . . . . . . . . . . . . . . . . . . 42 3.4 ABC Imp ortance Sampling and Sampling/Imp ortance Resampling . 43 3.5 ABC Sequen tial Mon te Carlo . . . . . . . . . . . . . . . . . . . . . 44 ii 4 Considerations for Appro ximate Ba y esian Computation - Sequen- tial Mon te Carlo 47 4.1 Mark o v k ernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.2 Epsilon sc hedules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.3 Distance metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Reference List 51 iii Abstract W e in tro duce Mon te Carlo estimates with discussion of n umerical in tegration and the curse of dimensionalit y , using a to y example of estimating using a d dimen- sional h yp er-sphere em b edded in ad dimensional unit h yp er-cub e. Some common Mon te Carlo metho ds suc h as Rejection Sampling, Mark o v c hain Mon te Carlo, Imp ortance Sampling, and Sequen tial Mon te Carlo are then discussed. W e fol- lo w with an in tro duction to a t yp e of Ba y esian inference kno wn as Appro ximate Ba y esian Computation (ABC), and apply eac h of our discussed Mon te Carlo algo- rithms in ABC terms, ultimately arguing that Sequen tial Mon te Carlo metho ds are b est suited for ABC. iv Chapter 0 Preliminary definitions, theorems and notation This c hapter is to serv e as a reference for definitions, theorems and notation that will b e frequen tly referred in the course of this pap er. 0.1 Probabilit y spaces and random v ariables When studying the resp ectiv e probabilities of a n um b er of p ossible outcomes o ccur- ring, w e denote a non-empt y set as the set of all p ossible outcomes, and call the sample sp ac e . W e refer to zero of more outcomes in o ccurring as an event , and consider the algebra F of , sometimes denoted as ( ) , the smallest collection of subsets of suc h that: 1. F con tains : 2F 2. F is closed under complemen ts: !2F =) ! C 2F 3. F is closed under coun table unions: ! i 2F for i = 1;2;3;::: =) [ i ! i 2 F . W e refer toF as the set of events , and sa y that ( ;F) is a me asur able sp ac e . W e then in tro duce a pr ob ability me asur e P :F! [0;1] , a function suc h that: 1. The measure of the en tire sample space, , is equal to 1: P( ) = 1 1 2. P is coun tably additiv e: F or ! 1 ;! 2 ; 2 F pairwise disjoin t, w e ha v e P([ i ! i ) = P i P(! i ) F or some ev en t A2 , w e sa y that ev en t A o ccurs with probabilit y equal to P(A) . W e then refer to ( ;F; P) as a pr ob ability sp ac e . F or another measurable space, (X;G) , w e sa y that some function X : !X is a r andom variable if X is a me asur able function on !X . That is, for ev ery E2X , the preimage of E under f is inF : f!2 jX(!)2Xg2F 8E2G F urthermore, for some measurable space (;H ) and measurable function g : X ! , w e should note that g(X) :X ! is also a measurable function, and th us can b e considered a random v ariable itself. When referring to the probabilit y of X assuming some binary relation R with some x2X , w e define: P(X R x) = P(f!2 jX(!) R xg) T w o random v ariables, X : !X and Y : !X are said to b e indep endent if and only if, for ev ery x;y2X , the ev en ts {X R x} and {Y R y} satisfy: P(X R x\Y R y) = P(X R x)\ P(Y R y) W e define the cumulative distribution function of random v ariable X as P X (x) = P(Xx) 2 If for allx , w e can write the cum ulativ e distribution function of random v ariable X as P X (x) = X x i x P(X =x i ) w e then sa y that X is a discr ete random v ariable, with pr ob ability mass function p X (x) = P(X =x) The abilit y to writeP X (x) = P x i x P(X =x i ) for allx implies thatX is coun table. IfX is uncoun table and P X (x) is a con tin uous function for all x , w e sa y that X is a c ontinuous r andom variable . IfX is uncoun table and there exists a non-negativ e pr ob ability density function p X suc h that P X (a) = Z a 1 p X (x) dx w e sa y that X is absolutely c ontinuous [Shi95, p. 171]. F or the sak e of this pap er, w e will assume that an y con tin uous v ariable X : !X will b e suc h thatXR . That is, w e assume that all con tin uous random v ariables are real-v alued. F or probabilit y space ( ;F; P) and measurable space (X;G) , w e consider random v ariables, X i : ! X for i = 1;2; ;d , referring to the d tuple, X = (X 1 ;X 2 ; ;X d ) , as a multivariate r andom variable . F or absolutely con tin u- ous random v ariables, w e ha v e an analogous multivariate cumulative distribution and multivariate pr ob ability density function with the relationship: P X (a 1 ;a 2 ; ;a d ) = Z [1;a 1 ][1;a 2 ][1;a d ] p Z (z 1 ;z 2 ; ;z d ) d(z 1 ;z 2 ; ;z d ) 3 whic h can trivially b e adapted for the discrete case. F or some probabilit y space ( ;F; P) and measurable space (X;G) , w e consider some functiong of con tin uous random v ariable X : !X with probabilit y densit y function p X . W e sa y that the exp e cte d value of g(X) is E[g(X)] = Z X g(x)p X (x) dx: (1) This is trivially adapted for the discrete case. If g(x) = x , w e refer to E[g(X)] = E[X] as the me an of X , denoted b y . If g(x) = (x) 2 , w e refer to E[g(X)] = E[(X) 2 ] as the varianc e of X , denoted b y 2 . If Y is a random v ariable with mean , w e define the c ovarianc e of X and Y as E[(X )(Y )] . In the case of n random v ariables, X 1 ;X 2 ; ;X n , with resp ectiv e means 1 ; 2 ; ; n , w e define an nn c ovarianc e matrix , where the (i;j) th cell of is equal to E[(X i i )(X j j )] . 0.2 Common probabilit y distributions When describing an absolutely con tin uous random v ariable X with probabilit y distribution function p X , or a discrete random v ariable X with probabilit y mass functionp X , w e mak e this asso ciation b y writingXp X () , and sa y thatX is gen- er ate d or simulate d from p X . In this pap er, w e will refer to random v ariables from three probabilit y distributions: the uniform distribution, the normal distribution, and the negativ e binomial distribution. 4 Uniform distribution F or probabilit y space ( ;F; P) and measurable space (R;R) w e designate the letter U to represen t an absolutely con tin uous random v ariable U : !R with the probabilit y densit y function: p U (u) = 8 < : 1 ba : aub 0 : otherwise W e will call U a c ontinuous Uniform r andom variable on the interval [a,b] , and use the notation U Unif[a;b] . F or the m ultiv ariate case that U is comp osed of d indep enden t and iden tically distributed Unif[a;b] random v ariables, w e write U Unif[a;b] d . Normal distribution F or some probabilit y space ( ;F; P) and measurable space (R;G) , w e sa y that some absolutely con tin uous random v ariable Z : !R has the normal distribu- tion with me an and varianc e 2 if for some ; 2 2 R , ZN(; 2 ) has the probabilit y densit y function,: p Z (z) = 1 p 2 2 e (z) 2 2 2 When Z is comp osed ofd normally distributed random v ariablesZ 1 ;Z 2 ; ;Z d with resp ectiv e meansu 1 ;u 2 ; ;u d and non-singular c ovarianc e matrix , w e sa y that Z has a multivariate normal distribution with me an ve ctor u = (u 1 ;u 2 ; ;u d ) 5 and c ovarianc e matrix , and use the shorthand notation ZN( u;) The prob- abilit y densit y function of some ZN( u;) is: p Z ( z) = 1 p 2 d jj e 1 2 ( z) T 1 ( z) Negativ e binomial distribution F or some probabilit y space ( ;F; P) and measurable space (X;G) , w e sa y that some discrete random v ariable N : !X , the n um b er of indep enden t and iden- tical trials necessary to obtain r successes giv en trial success probabilit y p , has ne gative binomial distribution . F or some in teger r and p2 [0;1] , N has the prob- abilit y mass function: p N (n) = n1 r1 (1p) nr p r W e denote this N NB(r;p) . 0.3 Con v ergence in Distribution F or probabilit y space( ;F; P) , measurable space(X;G) and a sequence of random v ariablesX 1 ;X 2 ; suc h thatX i : !X , with resp ectiv e cum ulativ e distribution functions P X i , w e sa y that the sequence X 1 ;X 2 ; c onver ges in distribution to a random v ariable X : !X with distribution function P X , if for all x2X , lim n!1 P Xn (x) =P X (x) W e denote this X n D !X: 6 Con v ergence in distribution is regarded as the w eak est t yp e of con v ergence of random v ariables. A stronger v ersion is con v ergence in probabilit y . 0.4 Con v ergence in Probabilit y F or probabilit y space( ;F; P) , measurable space(X;G) and a sequence of con tin- uous random v ariablesX 1 ;X 2 ; suc h thatX i : !X , w e sa y that the sequence X 1 ;X 2 ; c onver ges in pr ob ability to a con tin uous random v ariable X : !X if for all > 0 , lim n!1 P(jX n Xj) = 0 W e denote this X n P !X: Con v ergence in probabilit y implies con v ergence in distribution. 0.5 Con v ergence Almost Surely F or probabilit y space( ;F; P) , measurable space(X;G) and a sequence of con tin- uous random v ariablesX 1 ;X 2 ; suc h thatX i : !X , w e sa y that the sequence X 1 ;X 2 ; c onver ges almost sur ely to a con tin uous random v ariable X : !X if P lim n!1 X n =X = 1 W e denote this X n a:s: ! X: Con v ergence almost surely implies con v ergence in probabilit y . 7 0.6 Strong La w of Large Num b ers When X 1 ;X 2 ; ;X N is a sequence of indep enden t and iden tically distributed random v ariables suc h that for all X i : !X , E[X i ] = <1 , w e ha v e that, as N!1 : 1 N N X i=1 X i a:s: ! (2) F or some measurable function,g :X! , w e can consider eac hg(X i ) a random v ariable. If g = E[g(X i )]<1 , then as N!1 : 1 N N X i=1 g(X i ) a:s: ! g (3) 0.7 Cen tral Limit Theorem If w e ha v e a sequence of random v ariables X 1 ;X 2 ; that are indep enden t and iden tically distributed, with E[X i ] = and E[(X i ) 2 ] = 2 , then p n 1 n n X i=1 X i D !N(0; 2 ) 0.8 Ba y es’ Theorem F or some probabilit y space ( ;F; P) , w e ma y pic k t w o ev en ts, A;B2F , eac h of whic h o ccurs with non-zero probabilit y (P(A) > 0 and P(B) > 0 ). If w e denote the probabilit y of b oth ev en tsA andB o ccurring P(A\B) , then w e denote the conditional probabilit y of ev en t A o ccurring giv en that ev en t B has o ccurred P(AjB) , and define P(AjB) = P(A\B) P(B) ; 8 w e then ha v e that P(AjB) P(B) = P(A\B) = P(B\A) = P(BjA) P(A); and it follo ws that P(AjB) = P(BjA) P(A) P(B) : (4) Equation (4) is kno wn as Bayes’ The or em . Often times, the denominator of the righ t hand side of equation (4) can b e considered a normalizing constan t, in whic h case w e ma y ignore P(B) and instead sa y that P(AjB) is prop ortional to P(BjA) P(A) . That is, P(AjB)/ P(BjA) P(A): (5) 0.9 Mark o v c hains W e will discuss Mark o v c hains with resp ect to absolutely con tin uous random v ari- ables. A daptations can b e made for the more simple case of discrete random v ariables. F or probabilit y space ( ;F; P) and measurable space (X;G) , consider a sequence of random v ariables X 0 ;X 1 ; suc h that X i : !X . Mark o v c hain LetG n b e the algebra generated b yfX 0 ;X 1 ; ;X n g . If for all i , P(X i+1 2AjG i ) = P(X i+1 2AjX i ); w e call the sequence X 0 ;X 1 ; a Markov chain , denoted with (X n ) [CR04, p. 209]. 9 Mark o v k ernel A mapping K :XG! [0;1] is a Markov kernel [CR04, p. 208] if 1. for all x inX , K(x;) is a probabilit y measure 2. for all sets A inG , K(;A) is measurable Mark o v c hain (X n ) has Mark o v k ernel K if for all i, P(X i+1 2AjX i =x i ) = Z A K(x i ; dx) W e denote K n the k ernel for n transitions with P(X i+n 2AjX i =x i ) = Z A K n (x i ; dx) and define: K n (x;A) = Z X K n1 (y;A)K(x; dy) where K 1 =K [CR04, p. 210]. Stationary distribution W e call probabilit y distribution the stationary distribution [CR04, p. 223] of Mark o v c hain (X n ) with Mark o v k ernel K if for all B2G , Z B (x) dx = Z X K(x;B)(x) dx irreducibilit y Giv en measure , Mark o v c hain (X n ) is irr e ducible if for ev eryA2G suc h that (A)> 0 , there exists a finite n suc h that K n (x;A)> 0 for all x2X . 10 Recurrency A Mark o v c hain (X n ) is r e curr ent [CR04, p. 220] if 1. there exists a measure suc h that (X n ) is irreducible, and 2. for ev ery A2G suc h that (A)> 0 , the exp ected n um b er of future visits to A from ev ery x2A is infinite. If Mark o v c hain (X n ) is irreducible and has stationary distribution , the Mark o v c hain is p ositive r e curr ent . If Mark o v c hain (X n ) is irreducible without a stationary distribution, the Mark o v c hain is nul l r e curr ent . Harris recurrency A Mark o v c hain (X n ) is Harris r e curr ent if 1. there exists a measure suc h that (X n ) is irreducible, and 2. for ev ery A2G suc h that (A) > 0 , the n um b er of future visits to A from ev ery x2A is infinite with probabilit y 1. If Mark o v c hain (X n ) ’s only b ounded harmonic functions 1 are constan t, then the c hain is Harris recurren t [CR04, p. 241]. Ap erio dicit y A -irreducible Mark o v c hain (X n ) has cycle of length d if there exists a small set C , asso ciated in teger M , and probabilit y measure M suc h that 1 A measurable function h is harmonic for Mark o v c hain (X n ) if E[h(X n+1 jx n )] = h(x n ) [CR04, p. 240]. 11 d = gcd(fm 1 :9 m > 0 suc h that C is small for m m M g): A Mark o v c hain (X n ) is ap erio dic if it has cycle of length d = 1 [CR04, p. 220]. ergo dic Mark o v c hain A Mark o v c hain is called er go dic if it is p ositiv e recurren t and ap erio dic. geometrically ergo dic Mark o v c hain An ergo dic Mark o v c hain is ge ometric al ly er go dic if it has a stationary distribution and there exists a real v alued function M with (M(x)) <1 ev erywhere and some r : 0<r < 1 suc h that sup A2X jK n (x;A) Z A (y) dyjM(x)r n for all x2X [Tie94, p. 1713]. 0.10 Big-O Notation When c haracterizing the asymptotic b eha vior of a function or the error of an appro ximation, w e ma y wish to use Big O notation [Kn u97, p. 104]. F or some sets R andXR , and function f : !X , if there exists some M > 0 , function g : !X and x 0 2X suc h that for all x>x 0 , w e ha v e f(x)Mjg(x)j and sa y that f is of or der g(x) as x tends to infinity , while using the notation O(g(x)) as shorthand, disregarding the actual v alue of constan ts M and x 0 . 12 Chapter 1 Mon te Carlo Estimates 1.1 Numerical In tegration and the Curse of Dimensionalit y F or an absolutely con tin uous random v ariableX : !X , ifX2R , w e c haracter- izeX with a cum ulativ e distribution functionP X :R! [0;1] and densit y function p X :R!R + ,with the relation: P X (a) = Z a 1 p X (x) dx (1.1) When c haracterizing X , w e ma y then b e in terested in probabilities suc h as P(a< Xb) =P X (b)P X (a) , whic h w ould b e found b y in tegrating the righ t hand side of equation (1.1) with the desired limits of in tegration. Alternativ ely , w e ma y b e in terested in the v alue of E[g(X)] = R 1 1 g(x)p X (x) dx for some real v alued function g . As w e are not guaran teed that our resp ectiv e in tegrand is easily in tegrated b y analytic metho ds, w e ma y resort to n umerical appro ximations Simpson’s R ule One w ell-kno wn metho d for n umerically appro ximating an in tegral is to use Simp- son’s rule, whic h states that: Z b a p X (x) dx (ba) 6 p X (a)+4p X ( a+b 2 )+p X (b) 13 In order to increase the accuracy of this appro ximation, w e partition the in terv al [a;b] in to N slices, [x i ;x i+1 ] for i = 0;1; ;N , with x 0 = a and x N = b , and x i <x i+1 , so that [a;b] = [x 0 ;x 1 ][[x 1 ;x 2 ][[[x N1 ;x N ] , then use the fact that: Z b a p X (x) dx = N X i=1 Z x i x i1 p X (x) dx A T a ylor series expansion of the error term in Simpson’s R ule can b e used to find the error of this comp osite appro ximation, whic h turns out to b e in the order of O(N 4 ) pro vided that p X is sufficien tly smo oth [EC07, p.219-220]. If X =X 1 X 2 is a t w o dimensional random v ariable, Simpson’s rule is applied with an app eal to F ubini’s Theorem, whic h states that for some p X (x 1 ;x 2 ) mea- surable and 1 Z XX jp X (x 1 ;x 2 )j d(x 1 ;x 2 )<1 w e can write Z XX p X (x 1 ;x 2 )d(x 1 ;x 2 ) = Z X Z X p X (x 1 ;x 2 ) dx 2 dx 1 = Z X Z X p X (x 1 ;x 2 ) dx 1 dx 2 : The appro ximation of this t w o dimensional in tegral is then a matter of an iterativ e application of Simpson’s rule, and lik ewise for dimensions d> 2 . The error of the appro ximation will increase with d , as eac h of our N slices will mak e a con tribu- tion of appro ximately N 1/d slices in eac h dimension, resulting in an error of order O(N 4/d ) . W e m ust therefore increase N exp onen tially in order to preserv e a cer- tain order of accuracy [FK05]. While w e ha v e only considered the sp ecific case of 1 This is guaran teed sincep X is non-negativ e ev erywhere, and in tegrates to “1” o v er the sample space 14 Simpson’s R ule, this extra effort required to main tain accuracy when appro ximat- ing m ultidimensional in tegrals is kno wn as the “curse of dimensionalit y”, and is applicable to a wide range of high-dimensional problems. The curse of dimension- alit y is b est visualized b y considering the relativ e v olume of the largest p ossible d dimensional sphere em b edded in a d dimensional unit cub e. Curse of dimensionalit y: Hyp er-sphere example While the v olume of ad dimensional unit cub e will alw a ys remain “1”, the v olume of the largest p ossible d dimensional sphere whic h can b e em b edded in this unit cub e will ha v e a v olume whic h shrinks as d increases. F or example, the “v olume” (area) of a 2-dimensional sphere (circle) em b edded in a 2-dimensional unit cub e (square), is (0:5) 2 , and th us tak es appro ximately 78.5% of the area of the cub e. In 3 dimensions, this is 4 3 (0:5) 3 , or appro ximately 52.3% of the cub e. W e can generalize the v olume of a sphere with radius r in d> 1 dimensions, whic h w e will call a d dimensional hyp er-spher e , with the equation [K ö00]: V d (r) = 8 < : r d d/2 (d/2)! (d ev en) r d 2 (d(d1)/2) (d1)/2 13d (d o dd) Observing that lim d!1 V d (r) = 0 for all finite r > 0 2 , w e th us see that while the largest h yp er-sphere em b edded in a hyp er-cub e touc hes the h yp er-cub e at the extremities of eac h dimension, its v olume b ecomes increasingly small as d is increased. Using the ab o v e definition of V d (r) , the v olume of a h yp er-sphere of radius r in d dimensions, w e can write an iden tit y for . Assuming without loss of generalit y that d is ev en, w e ha v e: 2 F or d ev en, w e consider Stirling’s appro ximation that r d d/2 (d/2)! r d d/2 p 2(d/2)( d/2 e ) d/2 ! 0 as d ! 1 . F or d o dd, w e consider that V d1 (r) > V d (r) , and lim d!1 V d1 (r) = 0 , th us lim d!1 V d (r) = 0 15 = 2 d/2 ( d 2 )! Z 0 q x 2 1 +x 2 2 ++x 2 d 0:5 1d(x 1 ;x 2 ; ;x d ) 2/d (1.2) If w e attempted to appro ximate the con tained in tegral using Simpson’s rule andN “slices”, our appro ximation of the in tegral w ould then ha v e error of orderO(N 4/d ) , and in turn our appro ximation of w ould ha v e error of orderO(N 8/d 2 ) . Lik ewise, the order of error in most other deterministic metho ds that require “slicing” the d dimensional in terv al in to pieces to ev aluate the ab o v e in tegral w ould b e neg- ativ ely affected b y an increasing d . Some deterministic metho ds, suc h as sp arse grid , are dimensionally-adaptiv e and not affected b y dimensionalit y , but no prac- tical information ab out the error term is a v ailable [FK05, p.1322]. In terestingly , if instead of slicing the d dimensional in terv al in to N equally sized pieces, w e ev al- uate the in tegrand of in terest at N random p oin ts, w e can obtain an estimate of the in tegral in equation (1.2) whose order of accuracy is not affected b y d , whic h will in turn impro v e our appro ximation of . 1.2 Mon te Carlo In tegration Supp ose that for some probabilit y space ( ;F; P) and measurable space (X;G) with absolutely con tin uous random v ariable X : !X , and Xp X () , w e w an t to in tegrate: Z b a p X (x) dx 16 Rather than using a deterministic approac h suc h as Simpson’s rule, w e ma y instead consider N indep enden t random v ariables U 1 ;U 2 ;U N Unif[a;b] . W e kno w then from equation (1) that: E[p X (U i )] = Z b a p X (u)p U (u) du = 1 (ba) Z b a p X (u) du Since p X : X ! R + is itself a measurable function, w e can then consider p X (U 1 );p X (U 2 ); ;p X (U N ) as another sequence of random v ariables. Using the Strong La w of Large Num b ers, w e then ha v e that for U Unif[a;b] , as N!1 : 1 N N X i=1 p X (U i ) a:s: ! E[p X (U)] whic h leads us to the n umerical appro ximation: (ba) N N X i=1 p X (U i ) Z b a p X (u) du (1.3) W e refer to appro ximation (1.3) as a basic form of Monte Carlo inte gr ation . W e can mak e some statemen ts ab out the order of error in Mon te Carlo in tegration using the Cen tral Limit Theorem. Recall that the Cen tral Limit Theorem tells us that for X 1 ;X 2 ; ;X N inde- p enden t and iden tically distributed with E[X i ] = and E[(X i ) 2 ] = 2 p N 1 N N X i=1 X i D !N(0; 2 ) If w e then set p X = E[p X (U)] and 2 p X = E[(p X (U) p X ) 2 ] , w e lik ewise ha v e that: p N 1 N N X i=1 p X (U i ) p X D !N(0; 2 p X ) 17 it therefore follo ws that our appro ximation satisfies: (ba) N N X i=1 p X (U i ) Z b a p X (u) du D !N(0; 2 p N ) whic h implies that the order of this Monte Carlo estimate of R b a p X (u) du , using N p oin ts, is of order O(N 1/2 ) . While this exhibits significan tly slo w er con v ergence than theO(N 4 ) con v ergence of Simpson’s rule, w e should note that an estimate of ad dimensional in tegral w ould also b e of orderO(N 1/2 ) . Therefore, whiled ma y certainly influence the size of constan t term, our Mon te Carlo in tegral estimates’ error con v erges to 0 at a pace indep enden t of the n um b er of dimensions b eing in tegrated o v er! This mak es Mon te Carlo appro ximations an in v aluable to ol in the appro ximation of complicated, m ulti-dimensional distribution functions. Mon te Carlo appro ximation of in d dimensions In equation (1.2), w e wrote the relation b et w een and some d dimensional in te- gral, for d ev en: Z 0 q x 2 1 +x 2 2 +x 2 d 0:5 1d(x 1 ;x 2 ; ;x d ) A Mon te Carlo algorithm for estimating the ab o v e in tegral w ould b e as straigh t- forw ard as using the piece-wise function: g(u 1 ;u 2 ; ;u d ) = 8 < : 1 : 0 p u 2 1 +u 2 2 ++u 2 d 0:5 0 : otherwise 18 Then sim ulating U 1 ; U 2 ; ; U N Unif[0:5;0:5] d indep enden tly and ev aluating g( U i ) for eac h U i , w e ha v e the follo wing estimate, for d ev en: Z 0 q x 2 1 +x 2 2 +x 2 d 0:5 1d(x 1 ;x 2 ; ;x d ) 1 N N X i=1 g( U i ) whose error term will approac h 0 with order O(N 1/2 ) , regardless of the size of d . Our Mon te Carlo estimate of , for d ev en is th us 2 d/2 ( d 2 )! 1 N N X i=1 g( U i ) 2/d : This estimate of will con v erge at rate O(N 1/d ) , as opp osed to the O(N 8/d 2 ) con v ergence of an estimate using Simpson’s rule. W e ha v e sho wn an in tuitiv e metho d for appro ximating , considering random p oin ts within a h yp er-cub e, and considering the ratio of those p oin ts whic h w ere con tained within a cen tered, em b edded h yp er-sphere. Those p oin ts within the sphere w ere giv en a v alue of “1” (accepted), and those outside the sphere w ere giv en a v alue of “0” (rejected). Had w e instead only desired those random v ariates within the h yp er-sphere, w e ma y ha v e discarded those p oin ts outside the sphere, in a most basic implemen tation of the R eje ction Sampling algorithm. 1.3 Rejection Sampling When n umerically estimating c haracteristics of some probabilit y distribution p X , w e t ypically require sample v ariates sim ulated from p X . F or example, to esti- mate E[g(X)] , w e sim ulate X 1 ;X 2 ; ;X N p X () and use the appro ximation E[g(X)] P N i=1 g(X i ) . If w e a re unable to directly sim ulate random v ariates from 19 p X , w e are not out of luc k– w e can instead sim ulate Y 1 ;Y 2 ; ;Y N from some sim- ilar distribution p Y , and accept or reject eac h v ariate Y i according to a probabilit y prop ortionate to p X (Y i ) p Y (Y i ) . Our set of accepted Y i s will then ha v e probabilit y distri- bution p X . T o sho w this, consider the follo wing adaptation of the F undamental The or em of Simulation [CR04, p. 47]. F or some U Unif[0;1] , Xp X () and Yp Y () sampled indep enden tly: P(YxjUp X (Y)) = Z x 1 Z p X (y) 0 p UY (u;y) p U (u) du dy = Z x 1 p X (y)p Y (y) dy In the simple case that p X has supp ort [0;1] , w e can pic k p Y = Unif[0;1] , and then w e w ould ha v e: P(YxjUp X (Y)) = Z x 1 p X (y)1 dy =P X (x) W e can generalize this result forp Y 6= 1 , ifp Y satisfies sup( p X p Y ) =C for some finite C 1 3 . Then, w e condition on U p X (Y) Cp Y (Y) : P(YxjU p X (Y) Cp Y (Y) ) = Z x 1 p X (y) Cp Y (y) Cp Y (y) dy =P X (x) (1.4) This result allo ws us to justify the R eje ction Sampling algorithm for sim ulating random v ariates with probabilit y distribution p X , algorithm (1). After applying the Rejection Sampling algorithm, the set of acceptances,fX 1 ;X 2 ;:::X n g will b e indep enden t and iden tically distributed random v ariables with densit y function p X . 3 In tuitiv ely , C is used to scale p X p Y suc h that 0 p X Cp Y 1 . W e require that C 1 in order to ensure that the supp ort of p X is con tained within p Y . 20 Algorithm 1 Rejection Sampling algorithm for i = 1;2; ;n do propose Sim ulate U Unif[0;1] and Yp Y () indep enden tly if U p X (Y) Cp Y (Y) then Set X i Y else go to propose end if end for W e generally wish that the prop ortion of rejections o ccurring in the Rejection Sampling algorithm b e minimized to increase computational efficiency . F or some Y p Y () , the probabilit y of accepting Y is E[ p X (Y) Cp Y (Y) ] = R 1 1 p X (y) Cp Y (y) p Y (y) dy = 1/C , th us the n um b er of trials necessary to obtain n acceptances (successes), N , with acceptance probabilit y (probabilit y of success) 1/C has distribution N NB(n;1/C) . Therefore, the closer that p Y resem bles the target p X , the less rejections should o ccur, with no rejections o ccurring when p Y = p X . In our Mon te Carlo estimate of , w e implicitly c hose b oth p X and p Y as the Unif[0;1] d distribution and hence accepted all prop osed v ariates, using piece-wise function g to determine whic h of our accepted X i s w ere con tained in the h yp er-sphere. If w e instead desired only v ariates uniformly distributed from the h yp er-sphere, w e w ould c ho ose, for d ev en, p X (x) = 2 d/2 (d/2)! d/2 1(jjxjj 0:5) , set C = 2 d/2 (d/2)! d/2 , and apply the Rejection Sampling algorithm accordingly . Ho w ev er, w e w ould ha v e that 1/C! 0 as d!1 , hence our exp ected n um b er of trials to obtain n acceptances w ould b e affected b y the curse of dimensionalit y . 21 While w e ha v e demonstrated that the order of error of an estimate using n v ariates obtained from Mon te Carlo metho ds is not affected b y dimensionalit y , Rejection Sampling still suffers from the curse of dimensionalit y with regards to n um b er of trials necessary to obtainn v ariates from the targetedp X distribution. It is only after obtaining ourn v ariates that w e can apply our previous assertion that our Mon te Carlo estimates con v erge with order O(n 1/2 ) . W e can more efficien tly pic k eac h of our n samples through the use of Mark o v c hains. 1.4 Mark o v c hain - Mon te Carlo In our example of pic king uniform v ariates from within a h yp er-sphere, the Rejec- tion Sampling algorithm w as not particularly efficien t. If(0;0; ;0) w as prop osed and accepted as ourk ’th estimate, X k , this w ould ha v e no b earing onX k+1 , whose prop osals, due to indep enden t sampling, w ould b e equally lik ely to come from an y p oin t in the unit-h yp ercub e. If our samples X 1 ;X 2 ; w ere not indep enden tly distributed, and instead follo w ed a Mark o v k ernel, w e could more efficien tly pic k our candidates. In fact, if w e constructed a Mark o v k ernel suc h that our target distribution, p X , is the stationary distribution of the Mark o v c hain, w e could c on- se cutively sim ulate as man y v ariates from p X as desired, in a t yp e of estimate kno wn as Markov chain Monte Carlo . Giv en some arbitrary p X , w e then wish to construct a Mark o v k ernel K , suc h that a Mark o v c hain (X n ) with K as its Mark o v k ernel has p X is its stationary distribution. Ifp X is the stationary distribution of suc h a c hain, w e then ha v e that [SC95, p. 328] p X (y) dy = Z X K(x; dy)p X (x) dx 22 and therefore, X k p X =) X k+1 p X . While it is not immediately ob vious ho w to c ho ose Mark o v k ernel K , w e should consider that some K whic h satisfies the detaile d b alanc e pr op erty , p X (y)K(y;x) =p X (x)K(x;y); is a stationary distribution of p X , since [CR04, p. 230], Z X K(y;B)p X (y) dy = Z X Z B K(y;x)p X (y) dx dy = Z X Z B K(x;y)p X (x) dx dy = Z B p X (x) dx In order to find a k ernelK whic h satisfies the detailed balance equation, w e will follo w a string of argumen ts articulated in [SC95]. W e first pic k some conditional densit y q(j) , whose supp ort con tains the supp ort of p X . Since w e cannot b e sure y et that the detailed balance condition p X (x)q(yjx) =p X (y)q(xjy) is met, w e app end a function 0 1 , suc h that if: p X (x)q(yjx)>p X (y)q(xjy) (1.5) then: p X (x)q(yjx)(x;y) =p X (y)q(xjy)(y;x) and since 1 , w e m ust ha v e that (y;x) = 1 , th us p X (x)q(yjx)(x;y) =p X (y)q(xjy) 23 and therefore (x;y) = p X (y)q(xjy) p X (x)q(yjx) : If instead of inequalit y (1.5) w e ha v e instead that p X (x)q(yjx)<p X (y)q(xjy) w e w ould instead set (x;y) = 1 and deriv e (y;x) accordingly , giving us (x;y) = min( p X (y)q(xjy) p X (x)q(yjx) ;1): (1.6) W e th us ha v e our Metr op olis Hastings Mark o v k ernel, K(x; dy) = q(yjx)(x;y) dy + r(x) x ( dy) , where x is the Dirac delta function 4 and r(x) = 1 R X q(yjx)(x;y) dy ensures that this k ernel in tegrates to 1. T o construct a Mark o v c hain (X n ) with K as its k ernel, w e then c ho ose some Y q(jX i ) and accept Y with probabilit y (X i ;Y) . If Y is rejected, w e set X i+1 =X i to accoun t forr(x) x ( dy) . W e ha v e th us justified the Metr op olis Hastings algorithm, algorithm (2). Once the Metrop olis Hastings algorithm reac hes the stationary distribution,p X , at some v ariateX k , all future v ariates (X k+j for allj ) will also ha v e distributionp X . By virtue of b eing related b y a Mark o v c hain, these v ariates are not indep enden tly distributed, violating the indep endence assumptions of the Strong La w of Large Num b ers and Cen tral Limit Theorem. In order to mak e statemen ts ab out the con v ergence of Metrop olis Hastings estimates, w e m ust therefore adapt the Strong La w of Large Num b ers and Cen tral Limit Theorem to the case of Mark o v c hains. 4 Defined in [Dir58, p. 58] as x (y) = 0 for x6=y with the constrain t R 1 1 x (y) dy = 1 24 Algorithm 2 Metrop olis Hastings algorithm Set X 0 arbitrarily for i = 0;1; do Sim ulate Yq(jX i ) , U Unif[0;1] indep enden tly if U(X i ;Y) = min( p X (Y)q(X i jY) p X (X i )q(YjX i ) ;1) then Set X i+1 Y else Set X i+1 X i end if end for Mark o v c hain Ergo dic Theorem If Mark o v c hain(X n ) has stationary distribution , the follo wing t w o are equiv alen t [CR04, p. 241]: 1. (X n ) is Harris recurren t. 2. If g2L 1 () , then lim N!1 1 N N X i=0 g(X i ) = Z g(x) d(x): (1.7) Since w e ha v e constructed our Mark o v c hain (X n ) with predetermined station- ary distribution, p X , w e m ust therefore sho w that this c hain is Harris recurren t. Since (X n ) is p ositiv e recurren t 5 and ap erio dic 6 , there exists an ac c essible set A suc h that: 1. P 1 n=1 K n (x 0 ;A)> 0 for all x 0 2X 5 Ev ery c hain whic h is p ositiv e is recurren t [CR04, p. 223]. 6 P(X k+1 =xjX k =x) = 1(x;x)> 0 . 25 2. inf x;y2A K 1 (x;y)> 0 then, b y a theorem whic h is due to [KA96], lim n!1 sup A K n (x 0 ;A) Z A p X (x) dx = 0 for p X almost all x 0 . F or some b ounded harmonic function h suc h that h(x 0 ) = E[h(X 1 )jx 0 ] = E[h(X n )jx 0 ] for all n , w e then consider that lim n!1 sup A Z A h(x)K n (x 0 ;x) dx Z A h(x)p X (x) dx = 0 for p X almost all x 0 , and th us h(x 0 ) = E[h(X)] for p X almost all x 0 , hence h is p X almost ev erywhere constan t. W e then consider that E[h(X 1 )jx 0 ] = R h(x)q(xjx 0 )(x 0 ;x) dx + h(x 0 )r(x 0 ) = h(x 0 ) , th us E[h(X)](1 r(x 0 )) + r(x 0 )h(x 0 ) =h(x 0 ) . Sincer(x 0 )> 0 ev erywhere, w e ha v eh is constan t ev erywhere. (X n ) is therefore Harris recurren t, and equation (1.7) applies [CR04, p. 273]. With con v ergence of our estimate established, w e no w c haracterize the rate of con v ergence with a Cen tral Limit Theorem for Mark o v c hains, whic h is due to Chan and Gey er [K C94, p. 1751]. Mark o v c hain Cen tral Limit Theorem If (X n ) is a geometrically ergo dic Mark o v c hain, and R jg(x)j 2+ p X (x) dx<1 for > 0 , then 1 N +1 N+k X n=k g(X i ) Z g(x)p X (x) dx D !N(0; 2 p N ) 26 where 2 = 0 (g)+2 1 X k=1 k (g) (1.8) and k (g) = RR g(x)g(y)K k (x; dy)p X (x) dx is referred to as auto c ovarianc e . While the Metrop olis Hastings algorithm sho ws promise in o v ercoming the dimensionalit y h urdles whic h the Rejection Sampling algorithm has, it has serious practical considerations. Geometric ergo dicit y and auto co v ariance of samples W e ha v e sho wn that the Metrop olis Hastings Mark o v c hain (X n ) is p ositiv e recur- ren t and ap erio dic, it is therefore ergo dic. Ho w ev er, it is not necessarily true that (X n ) is geometrically ergo dic– pro ving that a sp ecific c hain is geometrically ergo dic in v olv es detailed considerations, found in literature suc h as [Y A05], [Jon04] and [SJ00]. Ev en if geometric ergo dicit y is sho wn, our Mark o v c hain Cen tral Limit The- orem is only useful if the sum in equation (1.8) is minimized. This is generally done b y only considering ev ery k th sample, in order to a v oid correlation b et w een neigh b oring samples. When k is c hosen to b e large, for example k = 1000 , this will significan tly the efficiency of the algorithm. Metrop olis Hastings Bias to w ards x 0 The Metrop olis Hastings algorithm do es not instan tly reac h its stationary distri- bution, so w e m ust consider that if X k is the first v ariate in the c hain suc h that X k p X , w e m ust drop v ariates X 0 ;X 1 ; ;X k1 in order to a v oid biasing our estimate to w ards arbitrarily c hosen X 0 . Since it is not ob vious when the c hain has in fact reac hed stationary distribution p X , a conserv ativ ely long “burn-in” 27 time is used, in whic h samples are discarded, again decreasing the efficiency of the algorithm. Metrop olis Hastings tends to w ards lo cal maxim ums If it happ ens that some X k is at a lo cal maxim um of p X , the Metrop olis Hastings algorithm ma y ha v e difficult y getting out of this lo cal maxim um, dep ending on our c hoice of conditional densit y q(j) . If the v ariance of q(jX i ) is c hosen small, prop osals for X i+1 will lik ely also b e in the lo cal maxim um. If c hosen to o large, the the algorithm will lik ely reject the prop osal forX i+1 , and setX i+1 =X i , unless a jump to another relativ ely large lo cal maxim um is prop osed. W e m ust therefore c ho ose conditional densit yq(j) carefully , making considerations suc h as in [A G96] and [A G97]. 1.5 Imp ortance Sampling Giv en our discussed pitfalls of the Metrop olis Hastings algorithm, w e see new v alue in sampling indep enden tly from distribution p Y in order to appro ximate p X . W e therefore consider Imp ortanc e Sampling as an alternativ e to b oth Rejection Sam- pling and Mark o v c hain Mon te Carlo alg orithms suc h as the Metrop olis Hastings algorithm. The principles of Imp ortance Sampling are in fact v ery closely related to Rejection Sampling. Recall that in Rejection Sampling, to sim ulate a v ariate from target distribution p X , w e sim ulated some Y p Y , and accepted X i = Y with probabilit y p X (Y) Cp Y (Y) , where C = sup( p X p Y ) 1 . In Imp ortance Sampling, w e instead sim ulate Y i p Y indep enden tly , our imp ortanc e distribution and weight this random v ariate according tow i = p X (Y i ) p Y (Y i ) . W e claim that forN large, whenX is c hosen from the set offY 1 ;Y 2 ; ;Y N g with probabilit y prop ortionate to w eigh ts 28 fw 1 ;w 2 ; ;w n g ,X follo ws nominal distributionp X () , pro vided that the supp ort of p X is con tained within the supp ort of p Y . The justification of Imp ortance Sam- pling is as follo ws. Recall from the definition of exp ected v alues and the Strong La w of Large Num b ers, that for some absolutely con tin uous Xp X () : 1 N N X i=1 g(X i ) a:s: ! E[g(X)] = Z 1 1 g(x)p X (x) dx Using ideas pioneered b y Rosen bluth and Rosen bluth [MR55], Hammersley and Morton [JH54], for Y i p Y () indep enden tly , w e ha v e that 1 N N X i=1 p X (Y i ) p Y (Y i ) g(Y i ) a:s: ! E[ p X (Y) p Y (Y) g(Y)] = Z 1 1 p X (y) p Y (y) g(y)p Y (y) dy = Z 1 1 g(x)p X (x) dx; whic h giv es our Imp ortance Sampling estimate, whic h will con v erge almost surely b y the Strong La w of Large Num b ers, and ha v e error, b y the Cen tral Limit Theo- rem, of order O(N 1/2 ) : 1 N N X i=1 p X (Y i ) p Y (Y i ) g(Y i ) E[g(X)]: (1.9) W e do not need to kno w p X in its en tirely– w e ma y in fact use a sim ulat- ing function, q Y suc h that p X = Cq X , whic h is in v aluable when the normalizing constan t of p X , C , is unkno wn. F or Y i p Y () , w e ha v e that C N N X i=1 q X (Y i ) p Y (Y i ) C Z q X (y) p Y (y) p Y (y) dy = Z p X (x) dx = 1 and therefore 29 C 1 1 N P N i=1 q X (Y i ) p Y (Y i ) : Algorithm 3 Imp ortance Sampling algorithm for i = 1;2; ;N do Sim ulate Y i p Y () indep enden tly Set w i / p X (Y i ) p Y (Y i ) end for w P N i=1 w i for i = 1;2; ;N do ^ w i w i /w end for W e use the set of v ariates fY 1 ;Y 2 ; ;Y N g and normalized w eigh ts fw 1 ;w 2 ; ;w N g resulting from the Imp ortance Sampling algorithm, algorithm (3), to mak e some estimate E[g(X)] P N i=1 ^ w i g(Y i ) . W e should note that when q X 6=p X , there is a small bias in our estimate. Imp ortance Sampling Bias F or unnormalized w i , E[Cw i g(Y i )] =C Z 1 1 q X (y) p Y (y) g(y)p Y (y) dy = Z 1 1 g(y)p X (y) dy = E[g(X)] 30 W e then exp ect that since E[ 1 1 N P N i=1 w i ] =C , w e w ould ha v e E[^ w i g(Y i )] = E[g(X)] . Ho w ev er, for N finite, E[^ w i g(Y i )] =E w i g(Y i ) 1 N P N j=1 w j =E N(p X (Y i )/p Y (Y i ))g(Y i ) N +p X (Y i )/p Y (Y i )1 = Z 1 1 p X (y)g(Y) 1+ 1 N (p X (y)/p Y (y)1) dy 6= E[g(X)] W e therefore ha v e a bias of order O(N 1 ) when using the self-normalizing Imp ortance Sampling algorithm 7 . Luc kily , this bias has order smaller than the error of our estimate, and asymptotically , Imp ortance Sampling is un biased. Comparison to Rejection Sampling While similar in concept to Rejection Sampling, Imp ortance Sampling do es ha v e some notable adv an tages o v er Rejection Sampling. Rejection Sampling requires us not only to kno w the densit y function p X exactly , but b e able to find constan t C = sup( p X p Y ) . Alternativ ely , Imp ortance Sampling requires only that w e ha v e some q X / p X , whic h is adv an tageous when p X ’s normalizing constan t is not kno wn. F urthermore, there is no need to find sup( p X p Y ) , whic h can b e difficult to do in high dimensional cases, and imp ossible when the normalizing constan t of p X is unkno wn. In cases where sim ulating Y i p Y () indep enden tly or calculating w i = p X (Y i ) p Y (Y i ) is computationally in tensiv e, Imp ortance Sampling is preferred, as all suc h Y i will con tribute to w ards our final estimate, unlik e in Rejection Sampling 8 . This is not to 7 As an example, when N = 1 , normalizing with P N j=1 w j = w 1 w ould of course set w j = 1 , hence for some X obtained from the Imp ortance Sampling algorithm, E[X ] = E[Y 1 ] . 8 Alternativ ely , if calculatingg(Y i ) giv enY i is computationally in tensiv e, w e ma y instead prefer Rejection Sampling 31 sa y that the larger sample sizes of Imp ortance Sampling mak es its estimates inher- en tly b etter– if w e consider n accepted v ariatesfX 1 ;X 2 ; ;X n g whic h to ok N trials in v olvingY 1 ;Y 2 ; ;Y N p Y () to obtain with the Rejection Sampling algo- rithm, w e m ust consider thatN is a random v ariable with distribution NB(n;1/C) . The setfY 1 ;Y 2 ; ;Y N g is therefore not indep enden tly distributed, so w e can not mak e inferences ab out the order of an estimate’s error when re-using these samples with the Imp ortance Sampling algorithm [CR04, p. 104]. W e ma y instead consider the effe ctive sample size of an Imp ortance Sampling estimate, whic h is usually smaller than N . Imp ortance Sampling Degeneracy When using Imp ortance Sampling to estimate some E[g(X)] , the v ariance of our estimate, 1 N P N i=1 (N ^ w i g(Y i ) E[g(X)]) 2 , is minimized when the v ariance of our normalized w eigh ts, V ar[^ w] , is equal to zero– that is, ^ w i = 1 N for all i , and greatest when one v ariate has normalized w eigh t “1” and all others ha v e w eigh t “0”. This v ariance is of course con tingen t of the resem blance of our imp ortance distribution, p Y , to nominal distribution p X . When suc h v ariance is high, w e gauge its impact on our estimates b y considering a heuristic summary statistic, effe ctive sample size , as giv en b y [JL95]: N eff = N 1+ 1 N P N i=1 (N ^ w i 1) 2 If ^ w i = 1 N for i = 1;2; ;N , N eff is equal to N . Con v ersely , if a few samples ha v e large w eigh ts and all others are 0 , N eff is relativ ely small compared to N , a condition kno wn as sample de gener acy , with the v ariance of our estimate b eing appro ximately equal to one obtained with N eff estimates sim ulated from nominal distributionp X . In order to obtain an estimate with a sp ecific effectiv e sample size with Imp ortance Sampling, w e r esample . 32 1.6 Sampling/Imp ortance Resampling In order to reduce the v ariance of our Imp ortance Sampling w eigh ts, w e consider that, for some U Unif[0;1] , as N!1 : P(Y i xjU ^ w i ) D !P X (x) W e are therefore justified in r esampling according to w eigh ts, ^ w i , so that w e can reassign an equal w eigh t, 1 n to eac h of our n resampled v ariates, making the v ariance of these resampled w eigh ts equal to 0, and hence ha ving N eff =n . If w e desiren equally w eigh ted v ariates from some arbitraryp X () distribution, w e ma y consider the Sampling/Imp ortanc e R esampling algorithm , as prop osed b y R ubin [R ub88]. Algorithm 4 Sampling/Imp ortance Resampling algorithm Cho ose Nn for i = 1;2; ;N do Sim ulate Y i p Y () indep enden tly Set w i / p X (Y i ) p Y (Y i ) end for Set w P N i=1 w i for i = 1;2; ;N do Set ^ w i w i /w end for for i = 1;2; ;n do Cho ose X i fromfY 1 ;Y 2 ; ;Y 2 g according to w eigh tsf^ w 1 ; ^ w 2 ; ; ^ w N g end for 33 W e then ha v e 1 n n X i=1 g(X i ) E[g(X)] as our Sampling/Imp ortance Resampling estimate, whic h will also con v erge almost surely as n!1 , with order O(n 1/2 ) [AH96]. T w o p oin ts of discussion w ere made b y R ubin regarding this algorithm, b oth of whic h ha v e dominated literature since. Choice of N If p Y = p X exactly , w e can c ho ose N suc h that N n = 1 . When p Y is not a close appro ximation of p X , w e ma y desire that N n !1 for the estimate to b e accurate, whic h of course w ould b e quite computationally inefficien t. F or a compromise b et w een accuracy and efficiency , w e should consider that the shap e of p X p Y will ha v e an impact on c ho osingN , with suc h a c hoice dep ending primarily on the righ t-tail of p X p Y – if this has infinite v ariance, N m ust b e v ery large compared to n . If p X p Y has a momen t generating function, N ma y b e of order O(n ln(n)) , and when p X p Y is b ounded ab o v e, N is of order O(n) [Li07]. Cho osing n v ariates from N with or without replacemen t, is also dep enden t on the resem blance of p Y to p X . If p Y =p X , sampling without replacemen t is preferable (for example, when N =n ), but when p Y is not similar to p X , w e can sample with replacemen t in order to a v oid ”pigeon-holing” p o or v ariates in to our estimate [R ub88, p. 396]. Rather than c ho oseN upfron t, w e can instead implemen t a sequen tial v ersion of the Sampling/Imp ortance Resampling algorithm, in whic h eac h sequence con tains only our desired n sample size, but impro v emen ts in these n samples are made with eac h progressiv e sequence. F urthermore, w e ma y impro v e the efficiency of our algorithm b y appro ximating a series of in termediate distributions whic h con v erge to our nominal p X . 34 Chapter 2 Sequen tial Mon te Carlo Estimates W e ha v e seen in the previous sections that one ma jor disadv an tage of Rejection Sampling is the inabilit y of our sim ulating distribution to adapt to those v ariates whic h pro duce go o d guesses of the target distribution p X , whereas with Mark o v c hain Mon te Carlo metho ds suc h an adaptation is made. W e can in fact implemen t Imp ortance Sampling adaptiv ely b y constructing a sequence of T1 interme diate distributions, p X (1);p X (2); ;p X (T1) whic h lead us to our nominal distribution p X (T) = p X . W e then sim ulate from a sequence of T imp ortance distributions, p Y (1);p Y (2); ;p Y (T) , in order to estimate eac h in termediate distribution, follo w ed b y our nominal distribution p X . Our sequence of in termediate distributions are t ypically defined suc h that p X is approac hed smo othly [JL01, p. 229]. W e ma y use p X (i) = p X for all i , though generally not advisable when p X is complex [PdM06, p. 416]. W e define p Y (1) as desired, then use some Mark o v k ernel K to construct our successiv e in termediate distributions: p Y (t)(x) = Z 1 1 p Y (t1)(y)K t1 (y;x) dy 2.1 Sequen tial Imp ortance Sampling As a most basic case, consider t w o imp ortance distributions, p Y (1) andp Y (2) , whic h are used to appro ximate in termediate distribution p X (1) and nominal distribution 35 p X (2) resp ectiv ely , where the supp ort ofp X (1) is con tained in p Y (1) , and the supp ort of p X (2) is in p Y (2) . W e ma y use the Imp ortance Sampling estimate, with Y (2) i p Y (2)() indep en- den tly: E X (2)[X] N X i=1 p X (2)(Y i ) p Y (2)(Y i ) Y (2) i In order to sim ulate Y (2) i p Y (2)() , w e consider the Mon te Carlo estimate of in tegral, for Y (1) i p Y (1) : p Y (2)(x) = Z 1 1 p Y (1)(y)K 1 (y;x) dy 1 N N X i=1 K 1 (Y i ;x) hence, w e can sim ulate Y (2) i K 1 (Y (1) i ;) where Y (1) i p Y (1)() . W e can then extend this for all t : to generate Y (t) i p Y (t)() , w e sim ulate Y (t) i K t1 (Y (t1) i ;) and assign w eigh t p X (t) (Y (t) i ) 1 N P N j=1 K t1 (Y (t1) j ;Y (t) i ) , whic h is then nor- malized. W e ha v e therefore justified the Se quential Imp ortanc e Sampling algo- rithm, algorithm (5). Recall that Imp ortance Sampling suffers from sample degeneracy– a small effec- tiv e sample size compared to our selected N . Through T generations of Sequen- tial Imp ortance Sampling, this problem is quic kly comp ounded, and within a few sequences, the effectiv e sample size can b e reduced to 1 [AD01, p. 10]. W e can alleviate this problem through the use of resampling b et w een eac h generation. 2.2 Sequen tial Imp ortance Resampling Applying a resampling step to Sequen tial Imp ortance Sampling is trivial, and the follo wing algorithm, in tro duced under the name “Bo otstrap filter” [NG93] follo ws directly from our section on Imp ortance Sampling/Resampling, with the sligh t 36 Algorithm 5 Sequen tial Imp ortance Sampling algorithm for i = 1;2; ;N do Sim ulate Y (1) i p Y (1)() indep enden tly end for for t = 2;3; ;T do for i = 1;2; ;N do Sim ulate Y (t) i K t1 (Y (t1) i ;) indep enden tly Set w (t) i p X (t) (Y (t) i ) 1 N P N j=1 K t1 (Y (t1) j ;Y (t) i ) end for Set w P N i=1 w (t) i for i = 1;2; ;N do Set ^ w (t) i w (t) i /w end for end for mo dification that w e set our imp ortance distribution p Y (t) = p X (t1) for t > 1 in order to mak e sense of our resampling step. Our(t) th estimate is then E[g(X (t) )] P N i=1 ^ w (t) i g(Y (t) i ) , whic h will con v erge with order O(N 1/2 ) . 37 Algorithm 6 Sequen tial Imp ortance Resampling algorithm for i = 1;2; ;N do Sim ulate Y (1) i p Y (1)() indep enden tly Set w (1) i / p (1) X (Y i ) p (1) Y (Y i ) end for Set w = P N i=1 w (1) i for i = 1;2; ;N do Set ^ w (1) i w (1) i /w end for for t = 2;3; ;T do for i = 1;2; ;N do Cho ose with replacemen t X (t1) i fromfY (t1) 1 ;Y (t1) 2 ; ;Y (t1) N g accord- ing to w eigh tsf^ w (t1) 1 ; ^ w (t1) 2 ; ; ^ w (t1) N g Sim ulate Y (t) i K t1 (X (t1) i ;) indep enden tly Set w (t) i / p X (t) (Y (t) i ) P N j=1 ^ w (t1) j K t1 (Y (t1) j ;Y (t) i ) end for Set w P N i=1 w (t) i for i = 1;2; ;N do Set ^ w (t) i w (t) i /w end for end for 38 Chapter 3 Appro ximate Ba y esian Computation F or some probabilit y space ( ;F; P) , and measurable spaces (X;G) and (M;H ) , w e ma y consider discrete or absolutely con tin uous random v ariable X : !X , and discrete or absolutely con tin uous random v ariable M :X !M . Then, for D2M , w e ma y b e concerned with the distribution of: fX2XjM(X) =Dg (3.1) If w e refer to M as our mo del , X as the p ar ameters of M , and D as some observe d data , (3.1) can b e regarded as the set of those parameters whic h b est sim ulate observ ed data D . W e ma y then b e in terested in the distribution of X in (3.1). If p XjD is the p osterior probabilit y densit y or mass function of (3.1), w e ha v e from Ba y es’ Theorem that: p XjD (x) = P(M(x) =D)p X (x) P(M(X) =D) where P(M(x) =D) is the likeliho o d function , andp X the prior probabilit y densit y or mass function. Since for some observ ed data D , P(M(X) = D)) do es not c hange, it is treated as a normalizing constan t. 39 The defining c haracteristic of A ppr oximate Bayesian Computation (ABC) algo- rithms is appro ximating the p osterior without direct ev aluation of the lik eliho o d function. 3.1 P oin t Estimates of Lik eliho o d F unctions Based on our assumptions 1 , our lik eliho o d function P(M(x) =D) will only assume v alues inf0;1g . W e can therefore use the indicator function 1(M(x) =D) as our lik eliho o d function, whic h w e will do for the sak e of clarit y 2 . Due to complexit y of our mo del M , our normalizing constan t P(M(X) = D) ma y b e so small that lik eliho o d function 1(M(x) =D) consisten tly yields 0 for nearly all x . In that case, w e can relax our requiremen t thatM(X) =D exactly , and instead in tro duce a metric :XX!R + then, for some 0 , instead concern ourselv es with the distribution of: fX2Xj(M(X);D)g 1 When mo deling complex phenomena, w e often simplify our mo del b y replacing a n um b er of parameters with a r andom se e d . W e ma y instead use M :X [0;1]!M , and eac h time w e sim ulate data using our mo del, pic k some U Unif[0;1] . When mo deling the flip of a coin, for example, rather than undertaking the imp ossible task of considering all en vironmen tal factors whic h con tribute to the coin landing heads or tails, w e commonly use the mo del: M(p;U) = 1(U p) , where p2 [0;1] is our sole parameter, and U is the random seed. T o quan tify the lik eliho o d P(M(P;U) = 1jP) indep enden t of random seed U , w e then use the exp ected v alue of the lik eliho o d o v er all p ossible random seeds: E[ P(M(p;U) = 1)] = R 1 0 P(M(p;u) = 1) du 2 When using a random seed in our mo del, w e often ha v e that 0< R 1 0 P(M(x;u) =D) du< 1 . F or some U Unif[0;1] , w e can use 1(M(x;U) =D) as an un biased estimate of our lik eliho o d function, since E[1(M(x;U) =D)] = R 1 0 P(M(x;u) =D) du . 40 Since w e are generally unable to mak e inferences ab out the relation b et w eenfX2 Xj(M(X);D)g and its subsetfX2XjM(X) =Dg when> 0 , w e generally fa v or that b e as small as p ossible (ideally equal to 0 ) in order to b est appro ximate p XjD . W e will consider this during our discussion of algorithms whic h are b est suited for Appro ximate Ba y esian Computation. 3.2 ABC Rejection Sampling When applying the Rejection Sampling algorithm to Appro ximate Ba y esian Com- putation, w e sim ulate some X from our prior, p X () , define constan t C = sup( p XjD (x) p(x) ) , and reject according to p XjD (X) Cp X (X) , the accepted v ariates then coming from the p XjD () distribution. W e should note that p XjD (x) p X (x) 1((M(X);D))p X (x) P((M(X);D))p X (x) = 1((M(x);D)) P((M(X);D)) , and C = sup( 1((M(x);D)) P((M(X);D))) = 1 P((M(X);D)) . W e therefore reject based on the v alue of 1((M(x);D)) , whic h will assume v alues inf0;1g only , remo ving the need for some U Unif[0;1] . W e then ha v e justified algorithm (7), as prop osed b y T a v aré [ST97]. Algorithm 7 ABC Rejection Sampling algorithm for i = 1;2; ;N do propose: Sim ulate Xp X () indep enden tly if (M(X),D) then Set X i X else go to propose end if end for 41 While simple to implemen t, algorithm (7) can p oten tially suffer from large inefficiency problems when p X do es not closely resem ble p XjD , esp ecially in high- dimension cases, since N , our exp ected n um b er of trials necessary to find n accep- tances, has distribution N NB(n; P((M(X);D) )) . W e could lo w er our exp ected n um b er of trials b y c ho osing large, but this w ould then compromise our appro ximation of the p osterior p XjD . When P((M(X);D) ) is prohibitiv ely small, w e ma y consider a Mark o v c hain Mon te Carlo application of Appro ximate Ba y esian Computation. 3.3 ABC Mark o v c hain Mon te Carlo Recall from our discussion of the Metrop olis Hastings algorithm that for some transition k ernelq andYq(jX i ) ,U Unif[0;1] indep enden tly , w e setX i+1 =Y if U(X i ;Y) , where satisfies the detailed balance condition, p XjD (x)q(yjx)(x;y) =p XjD (y)q(xjy)(y;x); in order to construct a Mark o v c hain with stationary distribution p XjD . An immediately c hoice w ould b e (x;y) = min( p XjD (y)q(xjy) p XjD (x)q(yjx) ;1) , although b y Ba y es’ theorem w e ha v e that p XjD (y) p XjD (x) = 1((M(y);D))p X (y) 1((M(x);D))p X (x) . Assuming that our initial X 0 is c hosen suc h that (M(X 0 );D) , w e instead use (x;y) = min(1((M(x);D) ) p X (y)q(xjy) p X (x)q(yjx) ;1) [PM03, p. 15325], as in algorithm (8), an ABC adaptation of the Metrop olis Hastings algorithm. 42 Algorithm 8 ABC Mark o v c hain Mon te Carlo algorithm Set X 0 suc h that (M(X 0 );D) for i = 0;1; do propose: Sim ulate Yq(jX i ) indep enden tly if (M(Y);D)> then go to propose end if Sim ulate U Unif[0;1] indep enden tly if U(X i ;Y) = min( p X (Y)q(X i jY) p X (X i )q(YjX i ) ;1) then Set X i+1 Y else Set X i+1 X i end if end for 3.4 ABC Imp ortance Sampling and Sam- pling/Imp ortance Resampling T o find and ABC Imp ortance Sampling estimate, w e pic k prior p X as our imp or- tance distribution, and p XjD as our nominal distribution. W e then w eigh t eac h X i p X () with: w i = p XjD (X i ) p X (X i ) / 1((M(X i );D)))p X (X i ) p X (X i ) = 1((M(X i );D)) W e can therefore use algorithm (9) to yield the appro ximation E[g(X)jD] P N i=1 ^ w i g(X i ) . 43 Algorithm 9 Imp ortance Sampling algorithm for i = 1;2; ;N do propose: Sim ulate Xp X () indep enden tly if (M(X),D) then Set X i X Set ^ w i 1 N else go to propose end if end for Our c hoice of 1((M();D)) results in only t w o p ossible w eigh ts– “ 1 N ” and “0”. By ignoring v ariates with a w eigh t of “0”, w e ha v e implicitly resampled our estimate, and made this estimate iden tical to our ABC Rejection Sampling algo- rithm. W e therefore ha v e no indication of whic h of our accepted v ariates are “b et- ter” than others. W e could p erhaps observ e (M(X i );D) , and rank accordingly , taking only n estimates with smallest measuremen ts, for n < N . Alternativ ely , w e can implemen t a Sequen tial Mon te Carlo v ersion of the ABC Imp ortance Sam- pling algorithm, whic h will allo w us to gradually lo w er our threshold with eac h new generation. 3.5 ABC Sequen tial Mon te Carlo In our earlier discussion of Sequen tial Imp ortance Sampling metho ds, w e allo w ed the p ossibilit y of constructing a sequence of in termediate distributions in order to estimate the nominal distribution. In the con text of Appro ximate Ba y esian 44 Computation, these in termediate distributions will b ecome in v aluable, as they can b e easily defined, for 1 > 2 >> t > 0 , as: P X (t) jD (x) = P(Xxj(M(X);D) t ) F urthermore, this is a sequence of probabilit y distributions suc h that, as t!1 , P X (t) jD (x) D ! P(Xxj(M(X);D) 0) =P XjD (x): When adapting the Sequen tial Imp ortance Sampling algorithm to Appro xi- mate Ba y esian Computation, w e will use our prior, p X , as our initial imp ortance distribution (generation t = 1 ), whic h will b e used to find an estimate of our first in termediate distribution p X (1) jD . In successiv e generations, when estimating p X (t) jD , w e will reuse v ariates from our estimate of p X (t1) jD p erturb ed b y sym- metric Mark o v k ernel K t as our imp ortance distribution. Since this will require selecting candidate X (t) i s based on their normalized w eigh ts, this will double as a resampling step, remo ving our sample degeneracy concerns. W e can still resample our final estimate according to their normalized w eigh ts, but in practice this is rarely necessary . W e therefore ha v e justified algorithm (10), a Sequen tial Mon te Carlo applica- tion of Appro ximate Ba y esian Computation, as publicized b y T oni, Stumpf [TT09]. 45 Algorithm 10 ABC Sequen tial Mon te Carlo algorithm for i = 1;2; ;N do propose prior Sim ulate Xp X (1)() indep enden tly if (M(X);D) 1 then Set X (1) i X Set ^ w (1) i 1 N else go to propose prior end if end for for t = 2;3; ;T do for i = 1;2; ;N do propose Cho ose with replacemen t X fromfX (t1) 1 ;X (t1) 2 ; ;X (t1) N g according to w eigh tsf^ w (t1) 1 ; ^ w (t1) 2 ; ; ^ w (t1) N g Sim ulate X K t1 (X;) indep enden tly if (M(X ;D)) t then Set X (t) i X Set w (t) i / p X (t) (X (t) i ) P N j=1 ^ w (t1) j K t1 (X (t1) j ;X (t) i ) else go to propose end if end for Set w P N i=1 w (t) i for i = 1;2; ;N do Set ^ w (t) i w (t) i /w end for end for 46 Chapter 4 Considerations for Appro ximate Ba y esian Computation - Sequen tial Mon te Carlo W e ha v e laid the theoretical foundations for a three classes of Mon te Carlo algorithms whic h are suitable for Appro ximate Ba y esian Computation applica- tions: Rejection Sampling, Mark o v c hain Mon te Carlo, and Imp ortance Sam- pling. Rejection Sampling, while simple to implemen t, requires appro ximately N NB(n;/textP(M(X;D) )) trials to mak e an estimate with error of order O(n 1/2 ) . In high-dimensional cases, w e often find that the probabilit y of observing some data D within our c hosen threshold is exceedingly small, all but disquali- fying Rejection Sampling. W e ma y instead rely on a Mark o v c hain Mon te Carlo algorithm, suc h as the Metrop olis Hastings algorithm, whic h will more efficien tly generate samples from our target distribution. When considerations are made for burn-in time, auto-correlation of samples, Mark o v k ernel c hoice, and the mere exis- tence of Cen tral Limit Theorems for the desired estimate, the efficiency of Mark o v c hain Mon te Carlo algorithms can b e undermined b y the caution necessary when using resulting estimates. Seeking a Mon te Carlo algorithm other than Rejection Sampling whic h pro duces indep enden t samples, w e consider Imp ortance Sampling. While in general cases, Imp ortance Sampling do es not offer m uc h efficiency adv an- tage to Rejection Sampling, the abilit y to implemen t a Sequen tial Mon te Carlo 47 v arian t of Imp ortance Sampling sho ws m uc h promise, esp ecially with resp ect to Appro ximate Ba y esian Computation. Giving some observ ed D 2 and a mo del M : X ! , w e ha v e sho wn that in theory w e can obtain a Sequen tial Mon te Carlo estimate of the p osterior distribution p XjD through the use of a smo othly decreasing epsilon sc hedule, 1 > 2 ; 0 , a symmetric Mark o v k ernel K , and metric function : !R + . As our concluding remarks, w e mak e some practical considerations regarding eac h of these three comp onen ts. 4.1 Mark o v k ernels The symmetric Mark o v k ernel used can mak e or break the efficiency of an ABC Sequen tial Mon te Carlo estimate. Ideally , w e seek a Mark o v k ernel whic h closely resem bles our target distribution,p XjD , though practically sp eaking, either a time- heterogeneous uniform or normal k ernel is used, with the k ernel’s parameters c hanging after eac h generation. Stumpf [SF13] has sho wn through empirical obser- v ations that it is generally most effectiv e to use a k ernel whose v ariance is t wice that of the previous generations’ samples. If ( (t) ) 2 is the v ariance of estimate P X (t) jD , w e ma y then use K t = Unif[12( (t) ) 2 ;12( (t) ) 2 ] , or K t =N(0;2( (t) ) 2 ) to prop ose v ariates forP X (t+1) jD . In the m ultiv ariate case, w e ma y instead construct a co v ariance matrix (t) , and use K t =N( 0;2 (t) ) , in order to tak e adv an tage of p ossible correlations b et w een parameters. 4.2 Epsilon sc hedules When pic king our epsilon sc hedule, 1 > 2 > 0 , w e ma y generally pic k 1 large, for example 1 =1 , then base eac h successiv e t on the distribution of 48 measuremen ts of the previous generation’s parameter estimates. W e ma y , for example use a certain quan tile measuremen t as our t . W e again m ust exercise caution: if the quan tile is set to o aggressiv ely , w e ma y lac k “smo othness” in our in termediate distributions, resulting in inefficiency . If set to o lax, w e ma y end up generating excessiv ely man y in termediate distributions whic h are nearly iden tical– also inefficien t. Silk and Stumpf [DS13] ha v e argued against the use of quan tile- based epsilon sc hedules altogether, instead fa v oring an unscen ted transform to adaptiv ely predict the shap e of the /acceptance rate curv e in order to maximize efficiency . Most imp ortan tly , w e m ust consider that reac hing an estimate ofp X (t) jD , where t = 0 , ma y b e imp ossibly time consuming. W e m ust then pic k some terminal t > 0 suc h that p X (t) jD is considered a sufficien t appro ximation of p XjD . It often times is not clear what c hoice will suffice, and w e ma y end our estimates when the n um b er of trials p er acceptance of our lik eliho o d estimate b ecomes exp onen tially large, or our in termediate distributions b egin lo oking nearly iden tical. In the case of m ultiple observ ed data sets, D 1 ;D 2 ; ;D n , w e ma y examine the distribution of (D i ;D j ) for i6=j to giv e con text to our c hoice of an appropriate terminal t . 4.3 Distance metrics As a general rule, the Euclidean metric is a go o d candidate for : !R + . Often times, w e ma y not b e in terested in mo deling our observ ed data exactly , but are concerned with mo deling data that yields a summary statistic T : ! similar or equal to that of our observ ed data. W e ma y then instead use: (T;T) :!R + 49 in order to then estimate the p osterior distribution P(Xxj(T(M(X));T(D)) ) for some 0: 50 Reference List [AD01] N. Gordon A. Doucet, N. de F reitas. Se quential Monte Carlo Metho ds in Pr actic e . Springer, 2001. [A G96] W. Gilks A. Gelman, G. Rob erts. Efficien t metrop olis jumping rules. Bayesian Statistics , 5:599–607, 1996. [A G97] G. Rob erts A. Gelman, W. Gilks. W eak con v ergence and the optimal scaling of random w alk metrop olis algorithms. The A nnals of A pplie d Pr ob ability , 7.1:110–120, 1997. [AH96] L. Holden A. Hekto en. Ba y esian mo delling of sequence stratigraphic b ounding surfaces. Ge ostatistics W ol longong , 1996. [CR04] G. Casella C. Rob ert. Monte Carlo Statistic al Metho ds . Springer, 2nd edition, 2004. [Dir58] P . Dirac. The Principles of Quantum Me chanics . Oxford Univ ersit y Press, 4th edition, 1958. [DS13] M. Stumpf D. Silk. Optimizing threshold-sc hedules for sequen tial appro x- imate ba y esian computation: applications to molecular systems. Statisti- c al A pplic ations in Genetics and Mole cular Biolo gy , 12.5:603–618, 2013. [EC07] D. Kincaid E. Cheney . Numeric al Mathematics and Computing . Cengage Learning, 6th edition, 2007. [FK05] I. Sloan F. Kuo. Lifting the curse of dimensionalit y . Notic es of the A meric an Mathematic al So ciety, 52.11:1320–1328, 2005. [JH54] K. Morton J. Hammersley . P o or man’s mon te carlo. Journal of the R oyal Statistic al So ciety , 16.1:23–38, 1954. [JL95] R. Chen J. Liu. Blind decon v olution via sequen tial imputations. Journal of the A meric an Statistic al A sso ciation , 90.430:567–576, 1995. 51 [JL01] T. Logvinenk o J. Liu, R. Chen. A theoretical framew ork for sequen tial imp ortance sampling with resampling. In N. Gordon A. Doucet, N. F re- itas, editor, Se quential Monte Carlo Metho ds in Pr actic e . Springer, 2001. [Jon04] G. Jones. On the mark o v c hain cen tral limit theorem. Pr ob ability Surveys , 1:299–320, 2004. [KA96] J. Seth uraman K. A threy a. On the con v ergence of the mark o v c hain sim ulation metho d. The A nnals of Statistics , 24.1:69–100, 1996. [K C94] C. Gey er K. Chan. Discussion of ’mark o v c hains for exploring p osterior distributions’. The A nnals of Statistics , 22.4:1747–1758, 1994. [Kn u97] D. Kn uth. The A rt of Computer Pr o gr amming , v olume 1. A ddison- W esley , 1997. [K ö00] M. K öpp en. The curse of dimensionalit y . 5th Online W orld Confer enc e on Soft Computing in Industrial A pplic ations , pages 1–4, 2000. [Li07] K. Li. P o ol size selection for the sampling/imp ortance resampling algo- rithm. Statistic a Sinic a , 17:895–907, 2007. [MR55] A. Rosen bluth M. Rosen bluth. Mon te carlo calculation of the a v er- age extension of molecular c hains. The Journal of Chemic al Physics , 23.2:356–359, 1955. [NG93] A. Smith N. Gordon, D. Salmond. No v el approac h to nonlinear/non- gaussian ba y esian state estimation. R adar and Signal Pr o c essing, IEE Pr o c e e dings F , 140.2:107–113, 1993. [PdM06] A. Jasra P . del Moral, A. Doucet. Sequen tial mon te carlo samplers. Journal of the R oyal Statistic al So ciety , 68.3:411–436, 2006. [PM03] S. T a v aré P . Marjoram. Mark o v c hain mon te carlo without lik eliho o ds. Pr o c e e dings of the National A c ademy of Scienc es , 100.26:15324–15328, 2003. [R ub88] D. R ubin. Using the sir algorithm to sim ulate p osterior distributions. Bayesian Statistics , 3:395–402, 1988. [SC95] E. Green b erg S. Chib. Understanding the metrop olis-hastings algorithm. A meric an Statistician , 49.4:327–335, 1995. [SF13] M. Stumpf S. Filippi. On optimalit y of k ernels for appro ximate ba y esian computation using sequen tial mon te carlo. Statistic al A pplic ations in Genetics and Mole cular Biolo gy , 12.1:1–21, 2013. 52 [Shi95] A. Shiry aev. Pr ob ability . Springer, 2nd edition, 1995. [SJ00] E. Hansen S. Jarner. Geometric ergo dicit y of metrop olis algorithms. Sto chastic Pr o c esses and their A pplic ations , 85.2:341–361, 2000. [ST97] P . Donnelly S. T a v aré. Inferring coalescence times from dna sequence data. Genetics , 145.2:505–518, 1997. [Tie94] L. Tierney . Mark o v c hains for exploring p osterior distributions. The A nnals of Statistics , 22.4:1701–1728, 1994. [TT09] M. Stumpf T. T oni. Appro ximate ba y esian computation sc heme for parameter inference and mo del selection in dynamical systems. Jour- nal of the R oyal So ciety Interfac e , 6.31:187–202, 2009. [Y A05] F. P erron Y. A tc hade. On the geometric ergo dicit y of metrop olis-hastings algorithms. Statistics , 41:77–84, 2005. 53
Abstract (if available)
Abstract
We introduce Monte Carlo estimates with discussion of numerical integration and the curse of dimensionality, using a toy example of estimating π using a d−dimensional hyper‐sphere embedded in a d−dimensional unit hyper‐cube. Some common Monte Carlo methods such as Rejection Sampling, Markov chain Monte Carlo, Importance Sampling, and Sequential Monte Carlo are then discussed. We follow with an introduction to a type of Bayesian inference known as Approximate Bayesian Computation (ABC), and apply each of our discussed Monte Carlo algorithms in ABC terms, ultimately arguing that Sequential Monte Carlo methods are best suited for ABC.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Multi-population optimal change-point detection
PDF
Statistical inference of stochastic differential equations driven by Gaussian noise
PDF
Stochastic Variational Inference as a solution to the intractability problem of Bayesian inference
PDF
Bayesian analysis of transcriptomic and genomic data
PDF
Non-parametric multivariate regression hypothesis testing
PDF
Noise benefits in Markov chain Monte Carlo computation
PDF
Generalized Taylor effect for main financial markets
PDF
On spectral approximations of stochastic partial differential equations driven by Poisson noise
PDF
Regularity of solutions and parameter estimation for SPDE's with space-time white noise
PDF
Statistical inference for second order ordinary differential equation driven by additive Gaussian white noise
PDF
An application of Markov chain model in board game revised
PDF
Return time distributions of n-cylinders and infinitely long strings
PDF
Application of Bayesian methods in association mapping
PDF
On the simple and jump-adapted weak Euler schemes for Lévy driven SDEs
PDF
Obtaining breath alcohol concentration from transdermal alcohol concentration using Bayesian approaches
PDF
Applications of contact geometry to first-order PDEs on manifolds
PDF
Sequential testing of multiple hypotheses
PDF
Effect of basis functions in least-squares Monte Carlo (LSM) for pricing options
PDF
Finite sample bounds in group sequential analysis via Stein's method
PDF
Bayesian models for a respiratory biomarker with an underlying deterministic model in population research
Asset Metadata
Creator
Stram, Alexander H.
(author)
Core Title
Theoretical foundations of approximate Bayesian computation
School
College of Letters, Arts and Sciences
Degree
Master of Arts
Degree Program
Mathematics
Publication Date
04/23/2015
Defense Date
03/24/2015
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
approximate Bayesian computation,Bayesian statistics,importance sampling,Monte Carlo,OAI-PMH Harvest,sequential importance sampling,sequential Monte Carlo
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Lototsky, Sergey V. (
committee chair
), Marjoram, Paul (
committee member
), Mikulevičius, Remigijus (
committee member
)
Creator Email
astram@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-558978
Unique identifier
UC11300481
Identifier
etd-StramAlexa-3382.pdf (filename),usctheses-c3-558978 (legacy record id)
Legacy Identifier
etd-StramAlexa-3382.pdf
Dmrecord
558978
Document Type
Thesis
Format
application/pdf (imt)
Rights
Stram, Alexander H.
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
approximate Bayesian computation
Bayesian statistics
importance sampling
sequential importance sampling
sequential Monte Carlo