Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Two essays in econometrics: large N T properties of IV, GMM, MLE and least square model selection/averaging
(USC Thesis Other)
Two essays in econometrics: large N T properties of IV, GMM, MLE and least square model selection/averaging
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
TWO ESSAYS IN ECONOMETRICS: LARGE N T PROPERTIES OF IV, GMM, MLE AND LEAST SQUARE MODEL SELECTION/AVERAGING by Junwei Zhang A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Ful¯llment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ECONOMICS) August 2013 Copyright 2013 Junwei Zhang Dedication To my grandparents! ii Acknowledgments I would like to thank all my committee members for their precious time and all the guidance from them during my PhD study. I would especially thank Professor Hsiao for countless discussions and emails on the topics that I was working on throughout the years. I also want to thank my family for their support of my PhD study and staying in school for so many years, without whom my achievements would be impossible. And¯nally,thankstoallmyfriendsandpeoplethatoncehelpedmeduringmystudy abroad. With them, my PhD life becomes enjoyable and memorable. iii Table of Contents Dedication ii Acknowledgments iii List of Tables v Abstract vi Chapter 1: IV, GMM or Likelihood for Dynamic Panel With Large N T 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Model Setup and Estimators . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Simple IV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.4 GMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.5 (Quasi) MLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.6 Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Chapter 2: Least Squares Model Selection and Averaging 24 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.2 Least Squares Model Selection . . . . . . . . . . . . . . . . . . . . . . . . 26 2.2.1 Mallows Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.2.2 Asymptotic Loss E±ciency . . . . . . . . . . . . . . . . . . . . . . 28 2.2.3 Feasible Implementation . . . . . . . . . . . . . . . . . . . . . . . . 30 2.3 Least Squares Model Averaging . . . . . . . . . . . . . . . . . . . . . . . . 30 2.3.1 Hansen (2007) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.3.2 Wan et al. (2010) . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 References 34 Appendix A 36 Appendix B 61 iv List of Tables 1.1 GMM with Many IVs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.2 MLE with Stationary Mean and Variance . . . . . . . . . . . . . . . . . . 16 1.3 Fully Fledged MLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.4 Simple IV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.5 GMM with Single IV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.6 Crude GMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.7 MLE Treating Initial Value Fixed . . . . . . . . . . . . . . . . . . . . . . . 20 1.8 GMM with Many IVs, Chi2 . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.9 MLE with Stationary Mean and Variance, Chi2 . . . . . . . . . . . . . . . 21 1.10 Fully Fledged MLE, Chi2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.11 Simple IV, Chi2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 v Abstract In this dissertation, the issues of Large N T Properties of IV, GMM, MLE and Least Square Model Selection/Averaging are studied. The ¯rst part is based on a joint work with Professor Hsiao. We examine the asymptotic properties of IV, GMM or MLE to estimate dynamic panel data models when either N or T or both are large. We show that the Anderson & Hsiao (1981, 1982) simple instrumental variable estimator (IV) or the quasi-maximum likelihood estimator (QMLE) treating initial value as stochastic is asymptotically unbiased either N or T or both tend to in¯nity. On the other hand, the QMLE treating initial value as ¯xed is asymptotically unbiased only if N is ¯xed and T is large. If both N and T are large and N T !c6=0 asT !1, the QMLE treating initial values as ¯xed is asymptotically biased of order q N T . On the other hand, the Arellano &Bond(1991)typeGMMestimatorisasymptoticallybiasedoforder q T N if T N !c6=0 as N !1 even we restrict the number of instruments used. Monte Carlo studies show that whether an estimator is asymptotically biased or not has important implications on the actual size of conventional t-test. The second part of this dissertation is on the issue of least square model selection and model averaging. By developing a new Whittle's inequality, we generalize the asymptotic loss e±ciency results for Mallows criterion in di®erent model selection and model averaging settings to the case when the error terms are autocorrelated. In particular, we show that the optimality results are still true in themodelselectionstudiedbyLi(1987), themodelaveragingstudiedbyHansen(2007), and the model averaging studied by Wan et al. (2010) in the time series framework with autocorrelated errors.. vi Chapter 1 IV, GMM or Likelihood Approach to Estimate Dynamic Panel when either N or T or both are Large 1.1 Introduction Panel data involve at least two dimensions: a cross-sectional dimension of size N and a time series dimension of size T. The multi-dimensional asymptotics are much more complicated than the traditional one dimension asymptotics. As pointed out by Phillips &Moon(1999)thatsequentiallyapplyingonedimensionalasymptoticscanbemisleading when both N and T increase at the same rate. For instance, in a dynamic panel data model of the form y it =°y i;t¡1 +® i +u it ;j°j<1;i=1;:::;N;t=1;:::;T; (1.1) it is well known that the maximum likelihood estimator treating the individual-speci¯c e®ects ® i and initial values as ¯xed constants is the covariance estimator, ^ ° cv = P N i=1 P T t=1 (y it ¡ ¹ y i )(y i;t¡1 ¡ ¹ y i;¡1 ) P N i=1 P T t=1 (y i;t¡1 ¡ ¹ y i;¡1 ) 2 ; (1.2) where ¹ y i = 1 T P T t=1 y it , ¹ y i;¡1 = 1 T P T t=1 y i;t¡1 . The ^ ° cv is consistent and p T(^ ° cv ¡°) is asymptotically normally distributed with mean 0 if N is ¯xed and T is large. So is Arellano & Bond (1991) GMM. However, if T N ! c, 0 < c < 1, Hahn & Kuersteiner 1 (2002) show that the maximum likelihood estimator is asymptotically biased of order q N T if both the cross-sectional dimension N and the time series dimension T go to in¯nity and T N ! c6= 0. Alvarez & Arellano (2003) show that the generalized method of moments (GMM) estimator of a dynamic panel data model is asymptotically biased of order q T N , if T N !c ¤ 6=0 as N !1. Two issues arise in the statistical inference for dynamic panel data models: the pres- ence of individual-speci¯c e®ects and the treatment of initial observations. The process of removing individual-speci¯c e®ects creates correlations between the lagged regressors and the transformed errors. Standard approach to purge the correlations between the regressors and errors of the equation is to use instrumental variables. However, the way the sample moments are constructed to approximate the population moments could have important implications to the asymptotical distribution of the resulting estima- tors. For instance, the Arellano & Bond (1991) type GMM uses cross-sectional averages to approximate the population moments which leads to a bias of order 1 N . When the GMM estimator multiplied by the scale factor p NT, the procedure of equating sample moments to the population moments creates an asymptotic bias of order q T N . On the other hand, the simple IV or QMLE uses all NT observations to approximate the popu- lation moments, hence they are asymptotically unbiased independent of the way N or T goes to in¯nity. Thedi®erencebetweenthelikelihoodfunctionstreatinginitialvalueas¯xedconstant and random variable for an individual with T time series observations is of order 1 T . Therefore,ifNis¯xedandTgoestoin¯nity,theQMLEbasedon¯xedorrandominitial observation is asymptotically unbiased. However, if N also goes to in¯nity, mistreating initial observation as a ¯xed constant creates an asymptotic bias of order q N T , if N T ! c ¤ 6=0 as T !1. Whether an estimator is asymptotically biased or not has important implications in statistical inference. In this paper, we wish to explore the source of asymptotic bias and to ¯nd robust estimators that are asymptotically unbiased independent of the way that 2 N or T goes to in¯nity. We provide the model and estimators in Section 2. Section 3, 4, and 5 discuss the asymptotic properties of simple IV, GMM, and (quasi) MLE when either N or T or both go to in¯nity, respectively. Monte Carlo simulation of the properties of di®erent estimators with di®erent combination of N and T are in Section 6. Concluding remarks are in Section 7. Proofs of the asymptotic results are in the Appendix. 1.2 Model Setup and Estimators We consider a dynamic panel data model of the form y it =y i;t¡1 °+~ x it ~ ¯+® i +u it ; (1.3) where ~ x it are uncorrelated with the individual speci¯c e®ects ® i and the errors of the equations u it . Since whether a dynamic panel data model estimator is asymptotically biasedornotdependsonlyonthewaytheinstrumentsareusedtopurgethecorrelations between the regressors that are correlated with errors of the equation, there is no loss generality by considering a simple model of the form, y it =® i +°y i;t¡1 +u it ;j°j<1: (1.4) Assumption 1. u it is independent of ® i and is independently, identically distributed (i.i.d) across i and t with mean 0 and variance ¾ 2 with ¯nite fourth moment. Assumption 2. ® i are i.i.d. across individuals with E[® i ] = 0, var(® i ) = ¾ 2 ® and ¯nite fourth order moments. Let the observed sample be composed of y it , i=1;:::;N and t=0;1;:::;T. Stacking the T £1 observed value of y it as y i =(y i1 ;:::;y iT ) yields y i =y i;¡1 °+¿® i +u i ;i=1;:::;N; (1.5) 3 where y i;¡1 = (y i0 ;:::;y i;T¡1 ) 0 , u i = (u i1 ;:::;u iT ) 0 and ¿ is a T £1 vector of ones, ¿ = (1;:::;1) 0 . The two popular approaches of estimating the common parameter ° are the instru- mental variable (IV) and the likelihood approach. The simple IV estimator proposed by Anderson & Hsiao (1981, 1982) is to ¯rst di®erence (1.4) to yield My it =My i;t¡1 °+Mu it ;i=1;2:::;N;t=2;:::;T; (1.6) where M= (1¡L), L denotes the lag operator, Ly it = y i;t¡1 , then use either y i;t¡2 or My i;t¡2 as the instrument, ^ ° IV = P N i=1 P T t=2 My it y i;t¡2 P N i=1 P T t=2 My i;t¡1 y i;t¡2 =°+ P N i=1 P T t=2 Mu it y i;t¡2 P N i=1 P T t=2 My i;t¡1 y i;t¡2 ; (1.7) or ^ ° IV = P N i=1 P T t=3 My it My i;t¡2 P N i=1 P T t=3 My i;t¡1 My i;t¡2 =°+ P N i=1 P T t=3 Mu it My i;t¡2 P N i=1 P T t=3 My i;t¡1 My i;t¡2 : (1.8) The Arellano & Bond (1991) (or Arellano & Bover (1995)) GMM approach ¯rst elimi- nates the individual speci¯c e®ect ® i through a (T ¡1)£T deviation operator A that satis¯es the condition that A¿ = 0, where ¿ is a T £ 1 vector of ones, then use the instruments that satisfy the condition that E[Z i Au i ]=0: (1.9) For instance, the operator 4 A= 2 6 6 6 6 6 6 6 6 6 6 4 ¡1 1 ::: 0 0 0 ¡1 ::: 0 0 . . . . . . ::: . . . . . . 0 0 ::: 1 0 0 0 ::: ¡1 1 3 7 7 7 7 7 7 7 7 7 7 5 (1.10) yields the T ¡1 ¯rst di®erence equations My i =My i;¡1 °+Mu i ;i=1;:::N; (1.11) whereMy i =(My i2 ;:::;My iT ) 0 ,My i;¡1 =(My i1 ;:::;My i;T¡1 ) 0 . Then E[Au i u 0 i A 0 ]=¾ 2 2 6 6 6 6 6 6 6 6 6 6 4 2 ¡1 ::: 0 0 ¡1 2 ::: 0 0 . . . . . . ::: . . . . . . 0 0 ::: 2 ¡1 0 0 ::: ¡1 2 3 7 7 7 7 7 7 7 7 7 7 5 =¾ 2 0 =: (1.12) Arellano & Bover (1995) suggest using an upper triangular forward orthogonal oper- ator that satis¯es A¿ = 0, A 0 A = Q = I T ¡ 1 T ¿¿ 0 and AA 0 = I T¡1 , then u ¤ i = Au i with u ¤ it =c t [u it ¡ 1 T ¡t (u i;t+1 +:::+u iT )];t=1;:::;T ¡1; (1.13) where c 2 t = T¡t T¡t+1 , and E[u ¤ i u ¤ i ]=¾ 2 I T¡1 . LetZ i betheblockdiagonalmatrixwhichsatis¯escondition(1.9). WhenAtakesthe form (1.11), Z i take the form Z i = (q it ), where q it = (y i0 ;:::;y i;t¡2 ) 0 . When A takes the formofforwarddeviation,q it =(y i0 ;:::;y i;t¡1 ) 0 . TheArellano&Bond(1991)generalized method of moments (GMM) estimator solves ° by minimizing ( 1 N N X i=1 Z i Au i ) 0 ( 1 N N X i=1 Z i Au i u 0 i A 0 Z 0 i ) ¡1 ( 1 N N X i=1 Z 0 i A 0 u i ): (1.14) 5 The likelihood approach notes that the initial observation y i0 is a random variable and considers the joint distribution of (y i0 ;y i1 ;:::;y iT );i = 1;:::;N (e.g. Anderson & Hsiao (1981, 1982)). Assuming the data generating process is the same for all y it , then y i0 =°y i;¡1 +® i +u i0 =(1+°+:::)® i + X j=0 u i;¡j ° j : (1.15) Under Assumption 1 and 2, the (quasi) likelihood function takes the form N Y i=1 j ¤ j ¡ 1 2 expf¡ 1 2 (y i0 ¡a;y i1 ¡°y i0 ;:::;y iT ¡°y i;T¡1 ) ¤¡1 £(y i0 ¡a;y i1 ¡°y i0 ;:::;y iT ¡°y i;T¡1 ) 0 g; (1.16) where ¤ is a (T +1)£(T +1) matrix of the form ¤ = 2 4 ¾ 2 0 ¾ 2 1 ¿ 0 ¾ 2 1 ¿ V 3 5 ; V =¾ 2 I T +¾ 2 ® ¿¿ 0 ; ¾ 2 0 =var(y i0 ); ¾ 2 1 =cov(y i0 ;v it ); where v it =® i +u it . If y i0 has reached stationarity, we will have a=0, ¾ 2 1 = 1 1¡° ¾ 2 ® . Conditional on ¤ , the (quasi) maximum likelihood estimator of ° is the solution to the following ¯rst order condition (Anderson & Hsiao (1981, 1982), Bhargava & Sargan (1983)). N X i=1 (y i0 ¡a;y i1 ¡°y i0 ;:::;y iT ¡°y i;T¡1 ) ¤¡1 (0;y 0 i;¡1 ) 0 : (1.17) Since both the simple IV and GMM use the di®erenced form to eliminate ® i , we also consider the ¯rst di®erenced form QMLE. The ¯rst di®erenced form QMLE is called 6 transformedMLEbyHsiaoetal.(2002)fortheestimationofdynamicpaneldatamodels when ® i are treated as ¯xed constants. Under the assumption that y i0 is random (1.15), My i1 =My i0 °+Mu i1 : (1.18) Inotherwords,My i1 isgeneratedbythesamedatageneratingprocessasMy it . Assuming Assumption 3. E[My i1 ]=b and var(My i1 )=w¾ 2 . 1 Hsiaoetal.(2002)suggestestimating° bymaximizingthe(quasi)jointlog-likelihood function of (My i1 ;:::;My iT );i=1;:::N (when ® i are treated as ¯xed constants), l =¡ NT 2 ln(2¼)¡ NT 2 ln(¾ 2 )¡ N 2 ln[1+T(w¡1)] ¡ 1 2¾ 2 N X i=1 (Mu ¤ i ) 0 ~ ¤¡1 Mu ¤ i ; (1.19) where M u ¤ i = (M y i1 ¡ b;M v 0 i ) 0 , M v i = (M u i2 ;:::;M u i;T ) 0 , and ~ ¤ = 0 B B B B B B B B B B @ w ¡1 0 ::: 0 ¡1 2 ¡1 ::: 0 0 ¡1 2 ::: 0 . . . . . . . . . . . . . . . 0 0 0 ::: 2 1 C C C C C C C C C C A is a T £T matrix. 1.3 Simple IV The Anderson & Hsiao (1981, 1982) simple IV estimator uses either 1 N(T¡1) P N i=1 P T t=2 y i;t¡2 M u it = 0 or 1 N(T¡2) P T i=1 P T t=3 M y i;t¡2 M u it = 0 to approximate the moment condition E[y i;t¡2 Mu it ]=0; (1.20) 1 If the process starts from in¯nite past, then E[Myi1]=0, and var(Myi1)= 2¾ 2 1+° . The QMLE of ° is the solution of a quadratic equation of °. Since our focus is to show that the QMLE is asymptotically unbiased, for ease of exposition, we shall treat b and ! as arbitrary constants. 7 or E[My i;t¡2 Mu it ]=0: (1.21) Condition (1.20) or (1.21) implies that Theorem1.3.1. TheAnderson&Hsiao(1981,1982)simpleIVestimatorisasymptoti- cally unbiased either N or T or both go to in¯nity and p NT(^ ° IV ¡°)»N(0;¾ 2 IV ), where ¾ 2 IV = 4 (1¡° 2 )(1¡°) (1+° 2 ) 2 when using the di®erence instruments, and ¾ 2 IV = 2(1+°)[ 1 1¡° + 1+° (1¡°) 2 ¾ 2 ® ¾ 2 ] when using the level instruments. 1.4 GMM The di®erence between the Anderson and Hsiao (AH) simple IV and the Arellano-Bond (AB) type GMM are two folded: the number of moments conditions used and the way the sample moments are used to approximate the population moments. AH estimator uses one moment condition while AB type estimator uses O(T 2 ) moment conditions. AH uses the sample mean of all NT observations to approximate the single moment condition (1.20) or (1.21) while AB uses cross-sectional mean to approximate O(T 2 ) moment conditions (1.9). Alvarez & Arellano (2003) show that the AB type estimator is asymptotically biased. Theorem 1.4.1. Under Assumption 1 and 2, as both N and T tend to in¯nity, provide (logT) 2 =N !0, ^ ° GMM is consistent. Moreover, provided T N !c, 0·c<1, p NT(^ ° GMM ¡[°¡ 1 N (1+°)])! d N(0;1¡° 2 ): (1.22) Proof: See Alvarez & Arellano (2003). We show in this section that the source of the bias is due to the Arellano & Bond (1991) type GMM using the cross-sectional sample average 1 N P N i=1 Z i M u i = 0 to approximate E[Z i Mu i ]=0, not because of the number of moments used. We illustrate this by considering a single IV,M y i;t¡2 . Under Anderson & Hsiao (1981, 1982) simple 8 IV, it implies one moment condition (1.21). Under Arellano & Bond (1991) type formu- lation, it implies T ¡1 moment conditions. We show that the restricted GMM remains asymptotically biased of order q T N because the covariance between 1 N P N i=1 Z i Ay i;¡1 and 1 N P N i=1 Z i Au i is of order 1 N . The impact of this correlation gets aggravated if T also goes to in¯nity. Let A be the form of (1.10) and q it =My i;t¡2 , then the GMM estimator is ^ ° ¤ GMM ¡° = ( P N i=1 Z 0 i My i;¡1 ) 0 ( P N i=1 Z 0 i Z i ) ¡1 ( P N i=1 Z 0 i Mu i ) ( P N i=1 Z 0 i My i;¡1 ) 0 ( P N i=1 Z 0 i Z i ) ¡1 ( P N i=1 Z 0 i My i;¡1 ) , B A (1.23) Theorem 1.4.2. The GMM using only one instrument is consistent. Moreover, p NT(^ ° GMM ¡[°¡ 1 N (1+°) 2 (1¡°) 2 1+´ 1¡´ b 0 ])! d N(0;¾ 2 GMM ) (1.24) where b 0 = q(0)¡ ´q(1) + ´ 2 q(2)¡ (° 2 ´) 3 1¡(° 2 ´) 2 (1¡ ° 2 ´)c q , with q(0) = ¡ 2 1+° , q(1) = ¡ 1 1+° (¡2+3°¡3° 2 ), q(2)= 1 1+° (°¡5° 2 +5° 3 ¡3° 4 ), c q = 1 1+° (2° ¡3 ¡6° ¡2 +5° ¡1 ¡3), and ¾ 2 GMM = 1+° 1¡° (1+´) 2 ´ . Here ´ = w 1 ¡ p w 2 1 ¡4 2 and w 1 = 4 1¡° . The root of the asymptotic bias in the Arellano & Bond (1991) or Arellano & Bover (1995)GMMestimatorisinthepurgeofcorrelationsbetweenAy i;¡1 andAu i throughthe instrumentsZ i ,inwhichonlycross-sectionalinformationisused. Thecovariancebetween 1 N P N i=1 Z i Ay i;¡1 and 1 N P N i=1 Z i Au i isoforder 1 N . WhenN !1, 1 N P N i=1 Z i Ay i;¡1 and 1 N P N i=1 Z i Au i are asymptotically uncorrelated, hence when T is ¯xed and N !1, the GMM estimator is asymptotically unbiased. However, if T also goes to in¯nity and T N ! c 6= 0 is ¯nite, multiplying the GMM estimator by the scale factor p NT is equivalent to multiplying the covariance between 1 N P N i=1 Z i Ay i;¡1 and 1 N P N i=1 Z i Au i by the scale factor p cN, which creates a bias term of order p cO(1). Therefore unless T N ! 0 as N;T !1, the GMM estimator is asymptotically biased. This issue can be more clearly seen by considering the Alvarez & Arellano (2003) crude GMM estimator 9 ^ ° CGMM = P T t=3 ( P N i=1 z it My i;t¡1 ) 0 ( P N i=1 z it z 0 it ) ¡1 ( P N i=1 z it My it ) 0 P T t=3 ( P N i=1 z it My i;t¡1 ) 0 ( P N i=1 z it z 0 it ) ¡1 ( P N i=1 z it My i;t¡1 ) 0 = T X t=3 w t ^ ° t ; (1.25) where ^ ° t = P N i=1 z i;t My it P N i=1 z i;t My i;t¡1 . The bias of ^ ° t is of order 1 N . The standard deviation of each ^ ° t isoforder 1 p N . Theaveragingacrosstreducestheorderofstandarddeviationto 1 p NT without a®ecting the order of bias of ^ ° CGMM . Hence ^ ° CGMM divided by its standard deviation has a bias of order p cO(1) if T N !c<1: Theorem 1.4.3. The crude GMM using only one instrument is consistent, but asymp- totically biased with the limiting distribution equals to p NT(^ ° CGMM ¡[°¡ 1 N 2(1+°) (1¡°) 2 ])! d N(0;¾ 2 CGMM ); (1.26) when N and T go to in¯nity at the same rate, where ¾ 2 CGMM = 2(1+°)(3¡°) (1¡°) 2 . Theorem 1.4.2 and 1.4.3 basically says that if T goes to in¯nity faster than N, T N !c6=0 as N !1, the GMM could be asymptotically severely biased. 1.5 (Quasi) MLE The asymptotic properties of QMLE when ® i is random under di®erent combinations of N and T are discussed in Anderson & Hsiao (1982). In particular, it is shown by Alvarez & Arellano (2003) that the QMLE is asymptotically unbiased when both N and T go to in¯nity. In this section, we discuss the properties of the ¯rst di®erence QMLE. The ¯rst order conditions for ¾ 2 and ° of the (quasi) log-likelihood function are ^ ¾ 2 = 1 NT N X i=1 (Mu ¤ i ) 0 ¤¡1 Mu ¤ i ; (1.27) 10 T X s;t=2 ½ st N X i=1 My is My i;t¡1 + T X t=2 ½ 1t N X i=1 (My i1 ¡b)My i;t¡1 =° T X s;t=2 ½ st N X i=1 My i;s¡1 My i;t¡1 ; (1.28) From equation (1.28), we have ^ ° MLE =[ T X s;t=2 ½ st N X i=1 My i;s¡1 My i;t¡1 ] ¡1 [ T X s;t=2 ½ st N X i=1 My is My i;t¡1 + T X t=2 ½ 1t N X i=1 (My i1 ¡b)My i;t¡1 ]; (1.29) where ½ st is the (s;t) element of ¤¡1 . Then ^ ° MLE ¡° =A ¡1 MLE (B MLE +C MLE ); (1.30) where A MLE = P N i=1 P T s;t=2 ½ st My i;s¡1 My i;t¡1 , B MLE = P N i=1 P T s;t=2 ½ st Mu is My i;t¡1 , C MLE = P N i=1 P T t=2 ½ 1t (My i1 ¡b)My i;t¡1 . Lemma 1.5.1. Let B MLE;i = P T s;t=2 ½ st Mu is My i;t¡1 , C MLE;i = P T t=2 ½ 1t (My i1 ¡b)M y i;t¡1 . Then E[B MLE;i +C MLE;i ]=0: (1.31) Theorem 1.5.1. The MLE treating initial value as random is consistent and asymptot- ically unbiased either N or T or both go to in¯nity, independent of the relative speed of N T . The relative weight of initial value distribution y i0 in the joint distribution of (y i0 ;y i1 ;:::;y iT ) is O( 1 T ). In other words, when T ! 1, it appears that there is no di®erence between treating y i0 as random or as ¯xed constants to obtain the QMLE. However, this assertion is true only if N is ¯xed. If N also goes to in¯nity as T goes to 11 in¯nity, and N T !c ¤ 6=0 is ¯nite, then the QMLE treating initial valueas ¯xed constant is again asymptotically biased. The QMLE of ° treating initial value y i0 as ¯xed constant is equal to ~ ° =( N X i=1 My 0 i;¡1 ¡1 0 My i;¡1 ) ¡1 ( N X i=1 My 0 i;¡1 ¡1 0 My i ); (1.32) where ¡1 0 = (p st ) T¡1 s;t=1 and p st = 1 T m st (T ¡ M st ), with m st = min(s;t) and M st = max(s;t). For notational ease, we shall call the QMLE estimator treating initial values as ¯xed constantstheGLS.Then° GLS ¡° =A ¡1 GLS B GLS ,whereA GLS = P N i=1 P T¡1 s=1 P T¡1 t=1 p st M y is My it , and B GLS = P N i=1 P T¡1 s=1 P T¡1 t=1 p st Mu i;s+1 My it . De¯ne A GLS;i = P T¡1 s=1 P T¡1 t=1 p st M y is M y it , and B GLS;i = P T¡1 s=1 P T¡1 t=1 p st M u i;s+1 My it . Lemma 1.5.2. When N, T go to in¯nity, the leading T order of the mean of A GLS;i is E[A GLS;i ]= 1 1¡° 2 ¾ 2 T +o(T); (1.33) The asymptotic orders of the mean and variance of B GLS;i are E[B GLS;i ]=¡ 1 1¡° ¾ 2 +o(1); (1.34) and Var(B GLS;i )= ¾ 4 1¡° 2 T +o(T): (1.35) Theorem 1.5.2. The QMLE treating initial values as ¯xed (GLS) is inconsistent when T is ¯nite and N go to in¯nity, and consistent when T goes to in¯nity. When both N and T go to in¯nity, the asymptotic distribution of the GLS estimator is p NT[^ ° GLS ¡(°¡ 1 T (1+°))]! d N(0;1¡° 2 ): (1.36) 12 Inotherwords, theasymptoticbiasoftheGLS(QMLEtreatinginitialvalueas¯xed constant) is O( q N T ). We see that the asymptotic properties of the GLS estimator is the same as the Within Group (WG) estimator (Alvarez & Arellano (2003), Hahn & Kuersteiner (2002)). 1.6 Monte Carlo Simulation In this section, we use Monte Carlo simulation to demonstrate the performance of the IV, GMM, and the Likelihood estimators with di®erent combinations of N and T. The data is generated according to y it =°y i;t¡1 +® i +u it ;t=0;:::;T;i=1;:::N; (1.37) where ° = 0:2;0:5;0:8, u it » N(0;1), or u it is generated from a standardized (mean and standard deviation adjusted) chi-square distribution with 2 degree of freedom. The individual e®ects are generated by standardized normal or chi-square distribution cor- respondingly. To avoid the issue of initial values a®ecting the distribution of y it , we generate 200 + T observations of y it and throw away the ¯rst 200 observations. We consider various combinations of N and T; small N, large T (N=25, T=100); and large N and T (N=T=50; N=50, T=100; N=T=100; N=T=1000; N=100, T=1000; or N=1000, T=100) 2 . However, for the case of Arellano & Bond (1991) type GMM, we restrict the comparison up to T = 100 because even with T = 100, there are (100¡ 1)(100¡ 2)=2 = 4851 moment conditions. We replicate the experiments 1000 times to get the empirical bias and empirical size of di®erent estimators. We use Arellano & Bond (1991) GMM as the base for comparison. Table 1 presents the GMM estimated mean; mean square error; actual size of the t-test where the critical value for a 5% signi¯cance level is set from the standard normal table of 1.96; bias; and 2 For large N, small T cases, see Hsiao et al. (2002). 13 bias percentage (bias/true gamma) for di®erent combinations of N and T. Tables for other estimators present the mean; mean square error; actual size; bias; theoretical bias (except it is zero); relative mean square error (to the GMM mean square error) and relative bias (to the GMM bias) when T=50 or 100. For T greater than 100, we only report the actual MSE. (i) There is a signi¯cant size distortion for the GMM estimator when N and T are of similar magnitude. The distortion of size is greater when ° is close to one. For instance, when ° is 0.2, a 5% signi¯cance level test has actual size of over 20%, but for ° =0:8, it is over 50% (N=50, T=100) and can be as high as 80% (N=100, T=100) (Table I, VIII). (ii) The QMLE has negligible bias and actual size close to the nominal size for whatever combinations of N and T and whether ° is small or close to one (Table II, III). The QMLE is also the most e±cient (having MSE less than one-half of the GMM MSE) whether the error is normally distributed (N(0,1)) or skewed (chi-square with 2 degrees of freedom) (Table IX, X). (iii)TheAnderson&Hsiao(1981,1982)simpleIValsohasnegligiblebiasandactual sizeclosetothenominalsizeforwhatevercombinationsofNandT.However,thesimple IV is much less e±cient than the Arellano & Bond (1991) GMM. Therefore, if N and T are of similar order, there is a trade-o® between the e±ciency gain and getting correct statistical inference (Table IV, XI). (iv) The QMLE treating initial values as ¯xed constants has signi¯cant size distor- tions. In the case of N=50 or 100 and T=50 or 100, the size distortion is of similar magnitude as the GMM. The size distortion is smaller when ° is small, N is small and T is very large (N=25, T=100 or N=100, T=1000). However, if ° is close to 1 (° = 0:8), the size distortion remains quite signi¯cant even when N=100, T=1000. The actual size is over 17% while the nominal size is 5% (Table VII). However, compared to the GMM, it is relatively less biased and has smaller MSE. 14 (v) Even in the case we restrict q it =M y i;t¡2 for the GMM estimator, the size distortion is of the same magnitude as the Arellano-Bond GMM (Table V). Neither does the crude GMM perform any better (Table VI). (vi) When the sample size becomes very large, say N=1000 and T=1000, the actual bias becomes close to the theoretical bias derived in this paper. (vii) The performance of GMM or IV heavily depends on the relative magnitude of ¾ 2 ® to ¾ 2 u if level variables are used as IV. However, if ¯rst di®erenced variables are used as IV, the performance is not a®ected by the magnitude of ¾ 2 ® =¾ 2 u (e.g. Hsiao et al. (2002)). Table 1.1: GMM with Many IVs N T gamma Mean MSE Actual Size Bias Bias Percentage 50 50 0.2 0.1750 0.0011 0.2290 -0.0250 -0.1251 50 50 0.5 0.4671 0.0015 0.3860 -0.0329 -0.0657 50 50 0.8 0.7536 0.0024 0.7630 -0.0464 -0.0580 50 100 0.2 0.1872 0.0004 0.1380 -0.0128 -0.0639 50 100 0.5 0.4831 0.0005 0.2350 -0.0169 -0.0339 50 100 0.8 0.7790 0.0005 0.5120 -0.0210 -0.0263 100 100 0.2 0.1879 0.0003 0.2090 -0.0121 -0.0603 100 100 0.5 0.4837 0.0004 0.4170 -0.0163 -0.0326 100 100 0.8 0.7794 0.0005 0.8130 -0.0206 -0.0258 25 100 0.2 0.1869 0.0006 0.1030 -0.0131 -0.0656 25 100 0.5 0.4830 0.0006 0.1450 -0.0170 -0.0341 25 100 0.8 0.7790 0.0006 0.3490 -0.0210 -0.0262 15 Table 1.2: MLE with Stationary Mean and Variance N T gamma Mean MSE Actual Size Bias Relative MSE Relative Bias 50 50 0.2 0.1999 0.0004 0.0480 -0.0001 0.3551 0.0039 50 50 0.5 0.4997 0.0003 0.0470 -0.0003 0.1975 0.0097 50 50 0.8 0.7999 0.0001 0.0400 -0.0001 0.0604 0.0014 50 100 0.2 0.2000 0.0002 0.0540 0.0000 0.5266 0.0036 50 100 0.5 0.5003 0.0002 0.0480 0.0003 0.3237 -0.0188 50 100 0.8 0.7997 0.0001 0.0520 -0.0003 0.1220 0.0163 100 100 0.2 0.2002 0.0001 0.0500 0.0002 0.3806 -0.0189 100 100 0.5 0.4996 0.0001 0.0490 -0.0004 0.2071 0.0249 100 100 0.8 0.7996 0.0000 0.0530 -0.0004 0.0805 0.0188 25 100 0.2 0.1991 0.0004 0.0530 -0.0009 0.6910 0.0724 25 100 0.5 0.4995 0.0003 0.0480 -0.0005 0.4768 0.0279 25 100 0.8 0.7999 0.0001 0.0520 -0.0001 0.2256 0.0063 100 1000 0.2 0.2001 0.0000 0.0540 0.0001 100 1000 0.5 0.5000 0.0000 0.0490 0.0000 100 1000 0.8 0.8000 0.0000 0.0530 0.0000 1000 100 0.2 0.2002 0.0000 0.0530 0.0002 1000 100 0.5 0.5000 0.0000 0.0600 0.0000 1000 100 0.8 0.8000 0.0000 0.0490 0.0000 1000 1000 0.2 0.2000 0.0000 0.0510 0.0000 1000 1000 0.5 0.5000 0.0000 0.0490 0.0000 1000 1000 0.8 0.8000 0.0000 0.0500 0.0000 Table 1.3: Fully Fledged MLE N T gamma Mean MSE Actual Size Bias Relative MSE Relative Bias 50 50 0.2 0.1999 0.0004 0.0470 -0.0001 0.3725 0.0036 50 50 0.5 0.4996 0.0003 0.0550 -0.0004 0.2275 0.0121 50 50 0.8 0.8011 0.0003 0.0530 0.0011 0.1049 -0.0244 50 100 0.2 0.1998 0.0002 0.0490 -0.0002 0.5553 0.0173 50 100 0.5 0.5006 0.0002 0.0570 0.0006 0.3372 -0.0344 50 100 0.8 0.8001 0.0001 0.0490 0.0001 0.1849 -0.0050 100 100 0.2 0.2001 0.0001 0.0480 0.0001 0.3919 -0.0108 100 100 0.5 0.4997 0.0001 0.0540 -0.0003 0.2194 0.0196 100 100 0.8 0.7998 0.0000 0.0460 -0.0002 0.0982 0.0118 25 100 0.2 0.1989 0.0004 0.0490 -0.0011 0.7396 0.0809 25 100 0.5 0.4992 0.0003 0.0440 -0.0008 0.5286 0.0462 25 100 0.8 0.7998 0.0002 0.0520 -0.0002 0.3629 0.0085 16 Table 1.4: Simple IV N T gamma Mean MSE Actual Size Bias Relative MSE Relative Bias 50 50 0.2 0.2011 0.0043 0.0470 0.0011 4.0396 -0.0440 50 50 0.5 0.5067 0.0125 0.0430 0.0067 8.5183 -0.2044 50 50 0.8 0.8444 0.1008 0.0550 0.0444 41.1601 -0.9572 50 100 0.2 0.1994 0.0020 0.0490 -0.0006 5.3755 0.0439 50 100 0.5 0.5029 0.0056 0.0560 0.0029 11.9963 -0.1707 50 100 0.8 0.8144 0.0466 0.0450 0.0144 85.3687 -0.6860 100 100 0.2 0.2004 0.0011 0.0480 0.0004 4.2987 -0.0368 100 100 0.5 0.5019 0.0029 0.0500 0.0019 8.3923 -0.1138 100 100 0.8 0.8148 0.0206 0.0430 0.0148 42.9504 -0.7158 25 100 0.2 0.2019 0.0041 0.0560 0.0019 7.1513 -0.1455 25 100 0.5 0.5032 0.0121 0.0510 0.0032 18.9890 -0.1883 25 100 0.8 0.8441 0.0974 0.0470 0.0441 157.6636 -2.1020 100 1000 0.2 0.2001 0.0001 0.0480 0.0001 100 1000 0.5 0.5004 0.0003 0.0520 0.0004 100 1000 0.8 0.8015 0.0020 0.0560 0.0015 1000 100 0.2 0.2000 0.0001 0.0460 0.0000 1000 100 0.5 0.4996 0.0003 0.0490 -0.0004 1000 100 0.8 0.8004 0.0021 0.0440 0.0004 1000 1000 0.2 0.2001 0.0000 0.0470 0.0001 1000 1000 0.5 0.5000 0.0000 0.0460 0.0000 1000 1000 0.8 0.8001 0.0002 0.0550 0.0001 17 Table 1.5: GMM with Single IV N T gamma Mean MSE Actual Size Bias Theoretical Bias Relative MSE Relative Bias 50 50 0.2 0.1210 0.0096 0.2980 -0.0790 -0.0963 8.9918 3.1574 50 50 0.5 0.3015 0.0461 0.6660 -0.1985 -0.2860 31.4771 6.0392 50 50 0.8 0.1846 0.3952 0.9980 -0.6154 -1.9168 161.3757 13.2699 50 100 0.2 0.1192 0.0081 0.5260 -0.0808 -0.0963 21.5539 6.3297 50 100 0.5 0.2995 0.0432 0.9520 -0.2005 -0.2860 92.0090 11.8381 50 100 0.8 0.1760 0.3967 1.0000 -0.6240 -1.9168 725.9301 29.6827 100 100 0.2 0.1581 0.0027 0.2760 -0.0419 -0.0482 10.5957 3.4749 100 100 0.5 0.3843 0.0155 0.7130 -0.1157 -0.1430 44.1177 7.1076 100 100 0.8 0.3321 0.2252 1.0000 -0.4679 -0.9584 469.3852 22.6985 25 100 0.2 0.0587 0.0223 0.8340 -0.1413 -0.1926 39.2729 10.7684 25 100 0.5 0.1785 0.1080 0.9920 -0.3215 -0.5720 169.2581 18.8786 25 100 0.8 0.0664 0.5465 1.0000 -0.7336 -3.8335 884.4152 34.9520 100 1000 0.2 0.1566 0.0020 0.9950 -0.0434 -0.0482 100 1000 0.5 0.3823 0.0140 1.0000 -0.1177 -0.1430 100 1000 0.8 0.3298 0.2217 1.0000 -0.4702 -0.9584 1000 100 0.2 0.1954 0.0001 0.0740 -0.0046 -0.0048 1000 100 0.5 0.4860 0.0005 0.1400 -0.0140 -0.0143 1000 100 0.8 0.7150 0.0089 0.5490 -0.0850 -0.0958 1000 1000 0.2 0.1955 0.0000 0.3130 -0.0045 -0.0048 1000 1000 0.5 0.4863 0.0002 0.7500 -0.0137 -0.0143 1000 1000 0.8 0.7138 0.0076 1.0000 -0.0862 -0.0958 18 Table 1.6: Crude GMM N T gamma Mean MSE Actual Size Bias Theoretical Bias Relative MSE Relative Bias 50 50 0.2 0.1345 0.0077 0.2050 -0.0655 -0.075 7.2714 2.6167 50 50 0.5 0.3220 0.0385 0.5830 -0.1780 -0.24 26.3296 5.4146 50 50 0.8 0.2024 0.3736 0.9970 -0.5976 -1.8 152.5558 12.8855 50 100 0.2 0.1323 0.0062 0.3750 -0.0677 -0.075 16.4683 5.2977 50 100 0.5 0.3199 0.0356 0.8930 -0.1801 -0.24 75.8377 10.6353 50 100 0.8 0.1941 0.3743 1.0000 -0.6059 -1.8 685.0466 28.8221 100 100 0.2 0.1657 0.0021 0.2110 -0.0343 -0.0375 8.4204 2.8395 100 100 0.5 0.3983 0.0125 0.5880 -0.1017 -0.12 35.5099 6.2442 100 100 0.8 0.3500 0.2087 0.9990 -0.4500 -0.9 435.0491 21.8283 25 100 0.2 0.0779 0.0174 0.6720 -0.1221 -0.15 30.6689 9.3030 25 100 0.5 0.2019 0.0937 0.9860 -0.2981 -0.48 146.9297 17.5020 25 100 0.8 0.0785 0.5288 1.0000 -0.7215 -3.6 855.7009 34.3766 100 1000 0.2 0.1645 0.0014 0.9490 -0.0355 -0.0375 100 1000 0.5 0.3962 0.0110 1.0000 -0.1038 -0.12 100 1000 0.8 0.3475 0.2053 1.0000 -0.4525 -0.9 1000 100 0.2 0.1963 0.0001 0.0670 -0.0037 -0.00375 1000 100 0.5 0.4879 0.0004 0.1160 -0.0121 -0.012 1000 100 0.8 0.7194 0.0082 0.4960 -0.0806 -0.09 1000 1000 0.2 0.1964 0.0000 0.2010 -0.0036 -0.00375 1000 1000 0.5 0.4882 0.0002 0.6280 -0.0118 -0.012 1000 1000 0.8 0.7182 0.0068 1.0000 -0.0818 -0.09 19 Table 1.7: MLE Treating Initial Value Fixed N T gamma Mean MSE Actual Size Bias Theoretical Bias Relative MSE Relative Bias 50 50 0.2 0.1756 0.0010 0.2440 -0.0244 -0.024 0.9195 0.9731 50 50 0.5 0.4692 0.0013 0.4190 -0.0308 -0.03 0.8609 0.9377 50 50 0.8 0.7614 0.0017 0.7880 -0.0386 -0.036 0.6844 0.8316 50 100 0.2 0.1878 0.0004 0.1360 -0.0122 -0.012 0.9365 0.9526 50 100 0.5 0.4843 0.0004 0.2310 -0.0157 -0.015 0.8801 0.9279 50 100 0.8 0.7815 0.0004 0.5000 -0.0185 -0.018 0.7863 0.8814 100 100 0.2 0.1883 0.0002 0.2150 -0.0117 -0.012 0.9353 0.9734 100 100 0.5 0.4843 0.0003 0.4230 -0.0157 -0.015 0.9231 0.9631 100 100 0.8 0.7816 0.0004 0.8190 -0.0184 -0.018 0.7932 0.8935 25 100 0.2 0.1871 0.0006 0.1040 -0.0129 -0.012 0.9984 0.9823 25 100 0.5 0.4841 0.0006 0.1510 -0.0159 -0.015 0.8929 0.9363 25 100 0.8 0.7811 0.0005 0.3220 -0.0189 -0.018 0.8380 0.8984 100 1000 0.2 0.1989 0.0000 0.0750 -0.0011 -0.0012 100 1000 0.5 0.4985 0.0000 0.0770 -0.0015 -0.0015 100 1000 0.8 0.7981 0.0000 0.1700 -0.0019 -0.0018 1000 100 0.2 0.1880 0.0002 0.9780 -0.0120 -0.012 1000 100 0.5 0.4848 0.0002 1.0000 -0.0152 -0.015 1000 100 0.8 0.7813 0.0004 1.0000 -0.0187 -0.018 1000 1000 0.2 0.1988 0.0000 0.2120 -0.0012 -0.0012 1000 1000 0.5 0.4985 0.0000 0.4120 -0.0015 -0.0015 1000 1000 0.8 0.7982 0.0000 0.8500 -0.0018 -0.0018 20 Table 1.8: GMM with Many IVs, Chi2 N T gamma Mean MSE Actual Size Bias Bias Percentage 50 50 0.2 0.1755 0.0010 0.2160 -0.0245 -0.1226 50 50 0.5 0.4657 0.0015 0.4100 -0.0343 -0.0687 50 50 0.8 0.7532 0.0025 0.7680 -0.0468 -0.0585 50 100 0.2 0.1868 0.0004 0.1620 -0.0132 -0.0662 50 100 0.5 0.4840 0.0004 0.2350 -0.0160 -0.0320 50 100 0.8 0.7779 0.0006 0.6010 -0.0221 -0.0277 100 100 0.2 0.1879 0.0003 0.2160 -0.0121 -0.0605 100 100 0.5 0.4841 0.0003 0.4170 -0.0159 -0.0317 100 100 0.8 0.7789 0.0005 0.7560 -0.0211 -0.0264 25 100 0.2 0.1871 0.0006 0.1020 -0.0129 -0.0643 25 100 0.5 0.4846 0.0006 0.1370 -0.0154 -0.0308 25 100 0.8 0.7790 0.0006 0.3280 -0.0210 -0.0263 Table 1.9: MLE with Stationary Mean and Variance, Chi2 N T gamma Mean MSE Actual Size Bias Relative MSE Relative Bias 50 50 0.2 0.2000 0.0004 0.0510 0.0000 0.4123 0.0008 50 50 0.5 0.4982 0.0003 0.0530 -0.0018 0.2205 0.0516 50 50 0.8 0.7993 0.0002 0.0540 -0.0007 0.0677 0.0151 50 100 0.2 0.1999 0.0002 0.0510 -0.0001 0.5879 0.0089 50 100 0.5 0.5002 0.0002 0.0540 0.0002 0.3746 -0.0121 50 100 0.8 0.7995 0.0001 0.0510 -0.0005 0.1285 0.0225 100 100 0.2 0.2000 0.0001 0.0500 0.0000 0.3633 -0.0029 100 100 0.5 0.4998 0.0001 0.0470 -0.0002 0.2384 0.0143 100 100 0.8 0.7997 0.0000 0.0490 -0.0003 0.0786 0.0135 25 100 0.2 0.1999 0.0004 0.0520 -0.0001 0.6991 0.0084 25 100 0.5 0.5002 0.0004 0.0580 0.0002 0.6227 -0.0156 25 100 0.8 0.7990 0.0002 0.0530 -0.0010 0.2470 0.0491 1.7 Concluding Remarks InthispartwediscusstherelativemeritsofthesimpleIV,GMMandthelikelihoodbased estimators to estimate dynamic panel data models when N and T are of di®erent or the samemagnitude. Weshowthatalthoughallthreeapproachesyieldconsistentestimators, their asymptotic distributions are di®erent. When the estimators are multiplied by the scale factor p NT (inverse of the order of the magnitude of the standard error of the estimator), the GMM estimator is asymptotically biased at the order of q T N , while the simple IV or the likelihood based estimator is asymptotically unbiased. Whether an estimatorisasymptoticallybiasedornothasimportantimplicationinhypothesistesting as it could severely distort the size of the test. Our Monte Carlo studies show that when 21 Table 1.10: Fully Fledged MLE, Chi2 N T gamma Mean MSE Actual Size Bias Relative MSE Relative Bias 50 50 0.2 0.1989 0.0004 0.0500 -0.0011 0.3989 0.0432 50 50 0.5 0.4980 0.0003 0.0610 -0.0020 0.2180 0.0572 50 50 0.8 0.8002 0.0003 0.0530 0.0002 0.1093 -0.0033 50 100 0.2 0.1988 0.0002 0.0430 -0.0012 0.6119 0.0912 50 100 0.5 0.4995 0.0002 0.0530 -0.0005 0.3841 0.0335 50 100 0.8 0.7994 0.0001 0.0470 -0.0006 0.1729 0.0262 100 100 0.2 0.1992 0.0001 0.0500 -0.0008 0.4189 0.0627 100 100 0.5 0.5001 0.0001 0.0540 0.0001 0.2467 -0.0032 100 100 0.8 0.8001 0.0000 0.0480 0.0001 0.0822 -0.0049 25 100 0.2 0.1980 0.0004 0.0500 -0.0020 0.7125 0.1532 25 100 0.5 0.4993 0.0004 0.0550 -0.0007 0.6390 0.0479 25 100 0.8 0.7988 0.0002 0.0600 -0.0012 0.3524 0.0559 Table 1.11: Simple IV, Chi2 N T gamma Mean MSE Actual Size Bias Relative MSE Relative Bias 50 50 0.2 0.2017 0.0043 0.0570 0.0017 4.1232 -0.0697 50 50 0.5 0.5007 0.0126 0.0470 0.0007 8.1627 -0.0192 50 50 0.8 0.8441 0.0999 0.0510 0.0441 40.1981 -0.9441 50 100 0.2 0.1999 0.0022 0.0510 -0.0001 5.9289 0.0091 50 100 0.5 0.5057 0.0060 0.0570 0.0057 14.1844 -0.3572 50 100 0.8 0.8311 0.0466 0.0470 0.0311 79.0659 -1.4058 100 100 0.2 0.2005 0.0011 0.0550 0.0005 4.1889 -0.0420 100 100 0.5 0.5014 0.0034 0.0450 0.0014 9.9881 -0.0855 100 100 0.8 0.8035 0.0225 0.0560 0.0035 44.2868 -0.1660 25 100 0.2 0.2017 0.0043 0.0520 0.0017 7.7273 -0.1333 25 100 0.5 0.5072 0.0135 0.0570 0.0072 24.0277 -0.4688 25 100 0.8 0.8404 0.0994 0.0520 0.0404 157.4906 -1.9226 N and T are of similar magnitude, the size distortion of GMM or MLE treating initial values as ¯xed constants are very signi¯cant. For a 5% signi¯cance level, the actual size canbeashighas50%when° =0:5andover80%when° =0:8. Ontheotherhand, the Anderson-Hsiao simple IV or QMLE has actual size close to the nominal size. Moreover, from the MSE point of view, the QMLE is much more e±cient than the GMM when N and T are of similar magnitude. ThesourceofthebiasintheGMMestimatorisduetothenaturethattheGMMonly usesonedimensionaldata toestimatethe population momentswhilethe dataaremulti- dimensional. TheGMMestimatorisaninstrumentalvariabletypeestimatorinwhichthe instrumentsareusedtopurgethecorrelationsbetweentheregressorsandtheerrorsofthe equationsbytakingthecross-sectionalaveragesonly. Theresultingcorrelationsbetween the sample moments are of order 1 N . When T is ¯xed, the scale factor is proportional 22 to p N, hence yields an asymptotically unbiased estimator as N ! 1. However, if T also increase with N so that T N ! c6= 0, then the scale factor becomes p NT or p cN, which leads to an asymptotic bias of order p c. On the other hand, the simple IV or the likelihood based estimator purges the correlation between the regressors and the errors of the equations into uncorrelated regressors and errors by taking the average of all NT observations, hence, is asymptotically unbiased either N or T or both tend to in¯nity andisindependentofthewayN orT tendstoin¯nity. Moreover,theMLEestimationof theparametersusesthetransformedregressoritselfasinstruments,henceitsasymptotic e±ciency is independent of the magnitude of ° or the magnitude of ¾ 2 ® relative to ¾ 2 u , while the e±ciency of GMM could run into the weak instrument problems if ° is close to one or ¾ 2 ® ¾ 2 u is large as demonstrated in the Monte Carlo studies of Hsiao et al. (2002) and Binder et al. (2005). 23 Chapter 2 Asymptotic Loss E±ciency of Mallows Criterion for Least Squares Model Selection and Averaging With Autocorrelated Errors 2.1 Introduction Intheleastsquaresmodelselection/averagingliterature,thereareseveralmodelselection criteriathatwecanusetochoosethebestmodel. Amongthem,themostfrequentlyused include, for example, Akaike's information criterion (AIC), Bayesian information crite- rion (BIC), Hannan-Quinn information criterion (HQIC), and their variations. These information criteria have similar forms, i.e. AIC =2l T ( ^ µ)¡penalty: However, allthesecriterianeedthemodeltogiveafullspeci¯cationofthedatagenerat- ing process, which is very restrictive for practical use. There is another criterion called Mallows criterion, which has the advantage that it does not require the distribution of the data generating process. The original Mallows' criterion was derived by minimizing 24 the "scaled sum of squared errors": L = 1 ^ ¾ 2 P (^ ¹¡¹) 2 ; (cf. Mallows (1973), Kennard (1971) and Ronchetti & Staudte (1994)). Asymptotic loss e±ciency is one of the most important properties that we want a criterion to have. The paper by Li (1987) established the asymptotic optimality of the Mallows criterion for least squares model selection with independent and homoscedastic error terms. Andrews (1991) generalized the results to the case when the error terms are heteroscedastic. As for model averaging, Hansen (2007) established the asymptotic optimality of the optimal weights chosen by Mallows' criterion in least squares model averaging with independent and homoscedastic error terms. To relax the limitations (strictly nested models indexed by a discrete set) of Hansen (2007)'s paper, Wan et al. (2010) extended Hansen (2007)'s model averaging to the case of non-nested models with continuous weights using stronger assumptions. As cited above, all the results in this literature only consider the case when the error terms are independent across observa- tions. However, if the least squares regression is in the time series framework, the error termsareusuallyautocorrelated. Forexample,Ing&Wei(2005)consideredtheproblem of selecting the autoregressive (AR) order to approximate an AR(1) process. In this paper, we are able to establish the asymptotic optimality of Mallows criterion for least square model selection and averaging with autocorrelated errors. The results are based on a new Whittle's inequality (Whittle (1960)) for the weighted inner product of two vectors with no autocorrelations, which is proved using similar arguments as in Whittle (1960). InSection2, wederivetheMallowscriterionwhentheerrortermsareautocorrelated andgivetheasymptoticlosse±ciencyresultinthemodelselectionstudiedbyLi(1987). In Section 3, we generalize the model averaging problems studies in Hansen (2007) and Wan et al. (2010) with autocorrelated errors, and derive the asymptotic loss e±ciency results. Proofs of the main results are in the appendix. 25 2.2 Least Squares Model Selection The model selection/avaraging problem we study is the same as in Li (1987), Andrews (1991), and Hansen (2007), except that we consider a time series model and the error terms are autocorrelated. Let (y t ; x t ), t = 1; :::; T be a random sample. Assume that we have countable, in¯nitely many regressors x t =(x t1 ; x t2 ; :::). Consider the data y t =¹ t +e t ; (2.1) where ¹ t = P 1 j=1 ¯ t x tj , E[e t ]=0. For simplicity, we consider the case when the error terms follow a moving average process of order 1 (MA(1)). It's easy to show that our proofs apply to the case when the error terms are generated by a MA(q 0 ) process, as long as q 0 is a ¯nite number that does not increase with the number of observations. By assumption, the error terms are e t =´ t +½´ t¡1 ; (2.2) where´ t areindependentwithmean0andvariance¾ 2 . Forasequenceofapproximating models m=1; 2; :::; M(T), the m th model uses k m regressors chosen from the available in¯nitely many regressors: y t = km X j=1 ¯ j (m)x tj (m)+² t : (2.3) Assuming the invertibility of X(m) 0 X(m) for each approximating model, the least squares estimate of ¹ by the m th model is ^ ¹ m =P m y; (2.4) where P m =X(m)(X(m) 0 X(m)) ¡1 X(m) 0 is the projection matrix for the m th model. 26 2.2.1 Mallows Criterion Denote the covariance matrix of the error terms as V =Var(e). Let Q m =I¡P m . The loss function is L(m)=k¹¡ ^ ¹ m k 2 : (2.5) The residual sum of square (RSS) is RSS(m)=ky¡ ^ ¹ m k 2 : (2.6) The expectations of the loss function and the RSS have the following bias-variance decomposition E[L(m)]=kE[¹¡ ^ ¹ m ]k 2 +Var(¹¡ ^ ¹ m ); (2.7) E[RSS]=kE[y¡ ^ ¹ m ]k 2 +Var(y¡ ^ ¹ m ): (2.8) Since the bias terms are equal, Var(¹¡^ ¹ m )=tr(P m V), and Var(y¡^ ¹ m )=tr(Q m V), we know the risk R(m)=E[L(m)] satis¯es R(m)=E[RSS(m)]+E[tr(P m V)¡tr(Q m V)] =E[RSS(m)+2tr(P m V)¡tr(V)]: (2.9) If we follow Mallows' de¯nition (Mallows (1973)), we should de¯ne the Mallows criterion as De¯nition 2.2.1. The Mallows criterion for the least square model selection is C L (m)=RSS(m)+2tr(P m V)¡tr(V): 1 (2.10) 1 Since the term tr(V) does not change with the model, we can use CL(m)=RSS(m)+2tr(PmV) in numerical calculations. 27 Proposition 2.2.1. The Mallows criterion C L (m) is an unbiased estimator of the risk R n (m). 2.2.2 Asymptotic Loss E±ciency The proof of asymptotic loss e±ciency of Mallows criterion in the literature (Li (1987), Andrews (1991), and Hansen (2007)) depends heavily on Whittle's inequalities (Whittle (1960)), which give upper bonds for the inner product b 0 e and the quadratic form e 0 Ae, assuming the elements of the random vector e are independent. For dependent error terms, usually we don't have similar results. However, given the structure of the error terms, we can still prove the asymptotic loss e±ciency if we have an upper bound for the form ¿ 0 A with two random vectors ¿ and Â. The following inequality is valid with a similar proof to that of Whittle (1960) (see the appendix). Lemma 2.2.1. Assume A is a symmetric matrix. The elements of ¿ = (¿ 1 ;:::;¿ T ) are independent and the elements of Â=( 1 ;:::; T ) are independent. Let ° t (q)=E[j¿ t j q ] 1=q and ½ t (q)=E[j t j q ] 1=q . Then E[j¿ 0 A¡E[¿ 0 AÂ]j q ]·C( T X t;i=1 a 2 t;i ° 2 t (2q)½ 2 i (2q)) q=2 ; provided that q¸2, and ° 2 t (2q), ½ 2 t (2q) exist for every t. Using the above Whittle's inequality for the form ¿ 0 AÂ, we have the following e±- ciency result: Theorem 2.2.1. Let m ¤ =arginf m C L (m). Suppose the error term can be decomposed into e=¿ +Â, with ¿ t and  t independent across time conditional on X. Assume [A.1] lim n!1 sup m ¸(P m )<1 [A.2.1] sup t E[¿ 4h t ]·1, sup t E[ 4h t ]·1 for some integer h, [A.2.2] inf t var(¿ t )>0, inf t var( t )>0, [A.3] 0 inf m R T (m)!1. 28 Then C L is asymptotically loss e±cient, i.e., L(m ¤ ) infmL(m) ! p 1 as T !1. Theassumptions[A.1]and[A.3] 0 arethesameasthatinLi(1987),whiletheassump- tions [A.2.1] and [A.2.2] replace the assumption [A.2] in Li (1987), which set constraints on the error terms. It's possible to relax the assumption that e can be separated into twoserieswithnoconditionalautocorrelations. Infact,it'sstraightforwardtogeneralize the above results to the case when the error terms can be separated into N e series with no conditional autocorrelations as long as N e does not grow with the sample size. Proof Outline: The di®erence between the residual sum of squares and the loss function is RSS(m)¡L(m)=jjejj 2 +2<Q m ¹ m ;e>¡2<P m e;e>: (2.11) So C L (m) =L ( m)¡2(e 0 P m e¡tr(P m V)) +(e 0 e¡tr(V))+2<Q m ¹ m ;e>: (2.12) It's su±cient to show that the following lemmas are true. The di®erence from Li (1987) is that the errors are no longer conditional independent. Instead, e =¿ +Â. Please see the appendix for the proofs. Lemma 2.2.2. sup m je 0 P m e¡tr(P m V)j=R(m)!0 in pr. Lemma 2.2.3. sup m je 0 e¡tr(V)j=R(m)!0 in pr. Lemma 2.2.4. sup m j<Q m ¹ m ;e>j=R(m)!0 in pr. Lemma 2.2.5. sup m jL m =R m ¡1j!1 in pr. 29 2.2.3 Feasible Implementation Whenthecovariancematrixoftheerrortermsisunknown,weneedtoestimateitinstead. For the AR(1) error terms, the covariance matrix is tridiagonal with two unknowns. V = 0 B B B B B B B B B B @ (1+½ 2 )¾ 2 ½¾ 2 ½¾ 2 (1+½ 2 )¾ 2 ½¾ 2 . . . . . . . . . ½¾ 2 (1+½ 2 )¾ 2 ½¾ 2 ½¾ 2 (1+½ 2 )¾ 2 1 C C C C C C C C C C A If we assume that the model using the most regressors increases its number of regressors withthesamplesizeandgetscloserandclosertothetruemodel, thenwecansimplyuse the residuals ^ e of the model M(T) to estimate the covariance matrix. In particular, we canuse 1 T¡1 P T t=1 ^ e 2 t toreplacethediagonalelements,and 1 T¡1 P T¡1 t=1 ^ e t ^ e t+1 toreplacethe subdiagonal and superdiagonal elements to get a consistent estimator of the covariance matrix. We have the following corollary. Corollary 2.2.1. (Unknown Variances) The above asymptotic loss e±cient result still holds when each the variance-covariance matrix V is replaced with a consistent estimator b V. Proof. This is true because b V ! p V by elements uniformly implies sup m tr(P m ( b V ¡V)) R(m) !0: (2.13) ¤ 2.3 Least Squares Model Averaging Theorem 2.2.1 in the previous section generalized the results by Li (1987) to the case when the error terms are autocorrelated. In this section, we want to show that the 30 Mallows criterion is also asymptotic loss e±cient for least squares model averaging with autocorrelated errors. 2.3.1 Hansen (2007) Fist, we consider the model averaging in Hansen (2007). One important assumption in Hansen (2007) is that there is an ordering of the in¯nitely many regressors, and the m th modelusesthe¯stk m regressors,wherem=1;2;:::;M,and0·k 1 <k 2 <:::<k M . Let W =fw2 [0;1] M k P M m=1 w m = 1g be the unit simplex. Another restriction in Hansen (2007) is that the weights belong to a discrete subset W N of W. To be speci¯c, all the weights inW N belong tof0; 1 N ; 2 N ;:::;1g for some integer N. For a weighting vector w, the model averaging estimator is ^ ¯(w)= M X m=1 w m 0 @ ^ ¯ m 0 1 A : (2.14) Thenthemodelaveragingestimateof¹is ^ ¹(w)= P M m=1 w m ^ ¹ m =P(w)y,whereP(w)= P M m=1 w m P m . We de¯ne the loss function, the risk and the residual sum of squares similarly. The Mallows criterion for model averaging is C L (w)=RSS(w)+2tr(P(w)V)¡tr(V): (2.15) Again, we have the following proposition. Proposition 2.3.1. The Mallows' criterion C L (w) for model averaging is an unbiased estimator of the risk R n (w). The asymptotic loss e±ciency result in Hansen (2007) is proved by showing that the assumptions [A.1], [A.2], and [A.3] 0 in Li (1987) are satis¯ed. It's easy to see that the assumptions of Theorem 2.2.1 are also satis¯ed, since the only di®erence is that now we have [A.2.1] and [A.2.2] instead of [A.2]. As a result, we have the following theorem. 31 Theorem 2.3.1. Let w ¤ = arginf w C L (w). Suppose the error term can be decomposed intoe=¿+Â, with¿ t and t independentacrosstimeconditionalonX. Assume[A.2.1], [A.2.2], and [A.4] inf w R T (w)!1. Then C L (w) is asymptotically loss e±cient, i.e., L(w ¤ ) infwL(w) ! p 1 as T !1. 2.3.2 Wan et al. (2010) Wan et al. (2010) considered the least squares model averaging with non-nested mod- els and continuous index set. To be speci¯c, there is no ordering of the importance of the regressors, and the m th model can use any k m regressors chosen from the available in¯nitely many regressors. Also, the weights w 2 W can be any vector from the unit simplex. Using similar techniques as the proof of Theorem 2.2.1, we can show that the asymptotic loss e±ciency result also holds true when the error terms are autocor- related. We just need to modify the original proof in Wan et al. (2010) to account the autocorrelation as in the proof of Theorem 2.2.1. Theorem 2.3.2. Let w ¤ = arginf w C L (w). Suppose the error term can be decomposed intoe=¿+Â, with¿ t and t independentacrosstimeconditionalonX. Assume[A.2.1], [A.2.2], and [A.5] M» ¡2G T P M m=1 (R T (¿ m )) T !0, where » T =inf w2W R T (w), and ¿ m is a zero-one vector with only the m th element being one. Then C L (w) is asymptotically loss e±cient, i.e., L(w ¤ ) infwL(w) ! p 1 as T !1. Although the assumption [A.5] is stronger than its counterpart [A.4] in Hansen (2007), an example is given in Wan et al. (2010) when [A.5] is satis¯ed. 2.4 Conclusion In this chapter, we studied the least squares model selection and model averaging with autocorrelated errors. If the error terms can be separated into a ¯nite number of vectors 32 that don't have autocorrelation, as in the case of ¯nite-order moving average error, we show that the asymptotic loss e±ciency results are still true. This generalization has extended the usefulness of the existing results in the literature to accommodate auto- correlation, which is quite common in the time series models. It's also possible to show that the cross-validation (CV) and generalized cross-validation (GCV) are also asymp- totically loss e±cient under regular conditions as discussed in Li (1987) and Andrews (1991). 33 Bibliography Alvarez, J. & Arellano, M. (2003), `The time series and cross-section asymptotics of dynamic panel data estimators', Econometrica 71(4), 1121{1159. Anderson, T. W. & Hsiao, C. (1981), `Estimation of dynamic models with error compo- nents', Journal of the American Statistical Association 76(375), 598{606. Anderson, T. W. & Hsiao, C. (1982), `Formulation and estimation of dynamic models using panel data', Journal of Econometrics 18(1), 47{82. Andrews, D. (1991), `Asymptotic optimality of generalized c l , cross-validation, and gen- eralized cross-validation in regression with heteroscedastic errors', Journal of Econo- metrics 47, 359{377. Arellano, M. & Bond, S. (1991), `Some tests of speci¯cation for panel data: Monte carlo evidence and an application to employment equations', The Review of Economic Studies 58(2), 277{297. Arellano, M. & Bover, O. (1995), `Another look at the instrumental variable estimation of error-components models', Journal of Econometrics 68(1), 29{51. Bhargava, A. & Sargan, J. (1983), `Estimating dynamic random e®ects models from panel data covering short time periods", Econometrica 51(6). Binder, M., Hsiao, C. & Pesaran, M. H. (2005), `Estimation and inference in short panel vector autoregressions with unit roots and cointegration', Econometric Theory 21, 795{837. Hahn, J. & Kuersteiner, G. (2002), `Asymptotically unbiased inference for a dynamic panelmodelwith¯xede®ectswhenbothnandtarelarge',Econometrica70(4),1639{ 1657. Hansen, B. E. (2007), `Leats squares model averaging', Econometrica75(4), 1175{1189. Hsiao, C. (2003), Analysis of panel data, Cambridge university press. Hsiao,C.,Pesaran,M.H.&Tahmiscioglu,A.K.(2002),`Maximumlikelihoodestimation of ¯xed e®ects dynamic panel data models covering short time periods', Journal of Econometrics 109(1), 107 { 150. 34 Ing, C.-K. & Wei, C.-Z. (2005), `Order selection for same-realization predictions in autoregressive processes', The Annals of Statistics 33(5), 2423{2474. Kennard, R. (1971), `A note on the c p statistic', Technometrics 13(4), 899{900. Li, K.-C. (1987), `Asymptotic optimality for cp, cl, cross-validation and generalized cross-validation: Discrete index set', The Annals of Statistics 15(3), 958{975. Mallows, C. L. (1973), `Some comments on cp', Technometrics 15(4), 661{675. Phillips,P.C.B.&Moon,H.R.(1999),`Linearregressionlimittheoryfornonstationary panel data', Econometrica 67(5), 1057{1111. Rao, C. (1973), Linear Statistical Inference and Its Application, Wiley series in proba- bility and mathematical statistics, Wiley. Ronchetti, E. & Staudte, R. G. (1994), `A robust version of mallows's cp', Journal of the American Statistical Association 89(426), 550{559. Wan, A. T., Zhang, X. & Zou, G. (2010), `Least squares model averaging by mallows criterion', Journal of Econometrics 156(2), 277{283. Whittle, P. (1960), `Short communications: bounds for the moments of linear and quadratic forms in independent variables', Theory of Probability and its Applications (3), 302{305. 35 Appendix A Proofs of Chapter 1 FollowingAlvarezandArellano, wederiveourresultsusingthefundamentallemmathat convergence in quadratic mean implies convergence in probability (Rao (1973)). GMM Proof of Theorem 1.4.2 We note that (1.23) can be rewritten as ^ ° GMM ¡° = tr[( P N i=1 Z 0 i Z i ) ¡1 ( P N i=1 Z 0 i Mu i )( P N i=1 Z 0 i My i;¡1 ) 0 ] tr[( P N i=1 Z 0 i Z i ) ¡1 ( P N i=1 Z 0 i My i;¡1 )( P N i=1 Z 0 i My i;¡1 ) 0 ] : (16) From E[My 2 it ]= 2¾ 2 1+° , and E[My it My i;t¡1 ]=¡ 1¡° 1+° ¾ 2 , it follows that ( 1 N N X i=1 Z 0 i Zi) ¡1 !p ( 1¡° 1+° ¾ 2 0 B B B B B B B B B B @ w1 1 0 1 w1 1 . . . . . . . . . 1 w1 1 1 w1 1 C C C C C C C C C C A ) ¡1 , 1 ¾ 2 1+° 1¡° M (1) T¡2 ; where w 1 = 4 1¡° , and the general term of the inverse matrix M (1) T¡2 is m (1) st =(¡1) s+t cosh[(T ¡1¡jt¡sj)½]¡cosh[(T ¡1¡t¡s)½] 2sinh(½)sinh[(T ¡1)½] ; (17) where cosh(x)=(e x +e ¡x )=2, and sinh(x)=(e x ¡e ¡x )=2 are the hyperbolic functions. 36 If we take ´ =e ½ <1, i.e. ´ = w 1 ¡ p w 2 1 ¡4 2 , then the (s,t)-th element can be rewritten as m (1) st =(¡1) s+t ´ T¡1¡jt¡sj +´ ¡(T¡1¡jt¡sj) ¡´ T¡1¡t¡s ¡´ ¡(T¡1¡t¡s) (´¡´ ¡1 )(´ T¡1 ¡´ ¡(T¡1) ) : (18) Consistency lim N;T!1 ^ °¡° = lim N;T!1 1 NT [( P N i=1 Z 0 i My i;¡1 ) 0 ( P N i=1 Z 0 i Z i ) ¡1 ( P N i=1 Z 0 i Mu i )] 1 NT ( P N i=1 Z 0 i My i;¡1 ) 0 ( P N i=1 Z 0 i Z i ) ¡1 ( P N i=1 Z 0 i My i;¡1 ) = lim N;T!1 1 T trf(lim N!1 1 N P N i=1 Z 0 i Z i ) ¡1 1 N 2 [( P N i=1 Z 0 i Mu i )( P N i=1 Z 0 i My i;¡1 ) 0 ]g 1 T (lim N!1 1 N P N i=1 Z 0 i My i;¡1 ) 0 (lim N!1 1 N P N i=1 Z 0 i Z i ) ¡1 (lim N!1 1 N P N i=1 Z 0 i My i;¡1 ) = lim N;T!1 1 T trf(lim N!1 1 N P N i=1 Z 0 i Z i ) ¡1 [ 1 N 2 P N i=1 Z 0 i Mu i My 0 i;¡1 Z i + 1 N 2 P i6=j Z 0 i Mu i My 0 j;¡1 Z j ]g 1 T (lim N!1 1 N P N i=1 Z 0 i My i;¡1 ) 0 (lim N!1 1 N P N i=1 Z 0 i Z i ) ¡1 (lim N!1 1 N P N i=1 Z 0 i My i;¡1 ) WewanttoknowthelimitingpropertiesofthematrixC ne = P i6=j Z 0 i Mu i My 0 j;¡1 Z j . The (s;t) element of the matrix is P i6=j M y is M u i;s+2 M y j;t+1 M y jt . Since i6= j, we know E[M y is M u i;s+2 M y j;t+1 M y jt ] = E[M y is M u i;s+2 ]E[M y j;t+1 M y jt ] = 0. So the mean of C ne is E[C ne ]=0. 37 The variance of the general term is Var(Cne)=Var( X i6=j MyisMui;s+2Myj;t+1Myjt) =E[( X i6=j MyisMui;s+2Myj;t+1Myjt) 2 ] =4E[( N X i=1 N X j=i+1 MyisMui;s+2Myj;t+1Myjt) 2 ] =4 N X i=1 N X j=i+1 E[My 2 is Mu 2 i;s+2 My 2 j;t+1 My 2 jt ] +4 N X i=1 j6=k X j;k¸i+1 E[My 2 is Mu 2 i;s+2 Myj;t+1MyjtMy k;t+1 My kt ] =4 N X i=1 N X j=i+1 E[My 2 is ]E[Mu 2 i;s+2 ]E[My 2 j;t+1 My 2 jt ] +4 N X i=1 j6=k X j;k¸i+1 E[My 2 is ]E[Mu 2 i;s+2 ]E[Myj;t+1Myjt]E[My k;t+1 My kt ] =O(N 3 ): This means the mean square convergence 1 N 2 C ne ! m:s: 0. Consequently it implies the convergence in probability 1 N 2 C ne ! p 0. So lim N;T!1 ^ °¡° = lim N;T!1 1 T trf(lim N!1 1 N P N i=1 Z 0 i Z i ) ¡1 ( 1 N 2 P N i=1 Z 0 i Mu i My 0 i;¡1 Z i )g 1 T (lim N!1 1 N P N i=1 Z 0 i My i;¡1 ) 0 (lim N!1 1 N P N i=1 Z 0 i Z i ) ¡1 (lim N!1 1 N P N i=1 Z 0 i My i;¡1 ) = lim N;T!1 1 NT trf(lim N!1 1 N P N i=1 Z 0 i Z i ) ¡1 ( 1 N P N i=1 Z 0 i Mu i My 0 i;¡1 Z i )g 1 T (lim N!1 1 N P N i=1 Z 0 i My i;¡1 ) 0 (lim N!1 1 N P N i=1 Z 0 i Z i ) ¡1 (lim N!1 1 N P N i=1 Z 0 i My i;¡1 ) , 1 NT B GMM 1 T A GMM : Now we want to show that both A GMM and B GMM are of the order O(T), and conse- quently lim N;T!1 ^ °¡° = 1 N O(1) O(1) =0. For the Denominator: 38 We know 1 N P N i=1 Z 0 i Myi;¡1 = 0 B B B B @ 1 N P N i=1 Myi1Myi2 . . . 1 N P N i=1 Myi;T¡2Myi;T¡1 1 C C C C A !p¡ 1¡° 1+° ¾ 2 ¶;. As a result, A GMM ! p (1¡°) 2 (1+°) 2 ¾ 4 ¶ 0 ( 1 ¾ 2 1+° 1¡° M (1) T¡2 )¶= 1¡° 1+° ¾ 2 T¡2 X s;t=1 m (1) st : (19) P T¡2 s;t=1 m (1) st = P T¡2 s=1 m (1) ss + P s6=t m (1) st : We analyze each term separately. Let d T = (´¡´ ¡1 )(´ T¡1 ¡´ ¡(T¡1) ). (1) For the sum of the diagonal terms, d T T¡2 X s=1 m (1) ss = T¡2 X s=1 (´ T¡1 +´ ¡(T¡1) ¡´ T¡1¡2s ¡´ ¡(T¡1¡2s) ) = (T ¡2)´ T¡1 +(T ¡2)´ ¡(T¡1) ¡´ T¡1 ´ ¡2(T¡2) ¡1 1¡´ 2 ¡´ 1¡T ´ 2 ¡´ 2T¡2 1¡´ 2 : Multiplying both sides by ´ T¡1 , we know the leading order T term of (´ ¡ ´ ¡1 )(´ 2(T¡1) ¡1) P T¡2 s=1 m (1) ss is T, so the leading order T term of P T¡2 s=1 m (1) ss is ´ 1¡´ 2 T. (2)Withoutlossofgenerality, letusassumethatT¡2=2T 0 . Forthegeneralterms, we consider the cases when s, l are odd or even numbers respectively. There are four terms that need to be analyzed. X s6=t m (1) st =2 T¡3 X s=1 T¡2¡s X l=1 m (1) s;s+l =2 T 0 ¡1 X s 0 =1 [ T 0 ¡s 0 X k=1 m 2s 0 ;2s 0 +2k + T 0 ¡s 0 X k=1 m 2s 0 ;2s 0 +2k¡1 ] +2 T 0 X s 0 =1 [ T 0 ¡s 0 X k=1 m 2s 0 ¡1;2s 0 ¡1+2k + T 0 ¡s 0 +1 X k=1 m 2s 0 ¡1;2s 0 ¡1+2k¡1 ]: 39 For the ¯rst term d T T 0 ¡s 0 X k=1 m 2s 0 ;2s 0 +2k = T 0 ¡s 0 X k=1 (´ 2T 0 +1¡2k +´ ¡(2T 0 +1¡2k) ¡´ 2T 0 +1¡2k¡4s 0 ¡´ ¡(2T 0 +1¡2k¡4s 0 ) ) = ´ 2s 0 +1 ¡´ 2T 0 +1 1¡´ 2 + ´ 1¡2T 0 ¡´ 1¡2s 0 1¡´ 2 ¡ ´ 1¡2s 0 ¡´ 2T 0 +1¡4s 0 1¡´ 2 ¡ ´ 1¡2T 0 +4s 0 ¡´ 1+2s 0 1¡´ 2 : It can be simpli¯ed as (1¡´ 2 )d T T 0 ¡1 X s 0 =1 T 0 ¡s 0 X k=1 m 2s 0 ;2s 0 +2k = ´ ´ 2 ¡´ 2T 0 1¡´ 2 ¡(T 0 ¡1)´ 2T 0 +1 +(T 0 ¡1)´ 1¡2T 0 ¡´ ´ ¡2(T 0 ¡1) ¡1 1¡´ 2 ¡´ ´ ¡2(T 0 ¡1) ¡1 1¡´ 2 +´ 2T 0 +1 ´ ¡4(T 0 ¡1) ¡1 1¡´ 2 ¡´ 1¡2T 0 ´ 4 ¡´ 4T 0 1¡´ 4 +´ ´ 2 ¡´ 2T 0 1¡´ 2 : Multiplying both sides by ´ 2T 0 +1 yields (1 ¡ ´ 2 )(´ ¡ ´ ¡1 )(´ 2(2T 0 +1) ¡ 1) P T 0 ¡1 s 0 =1 P T 0 ¡s 0 k=1 m 2s 0 ;2s 0 +2k =T 0 ´ 2 +o(T 0 ), we have the order T leading term T 0 ¡1 X s 0 =1 T 0 ¡s 0 X k=1 m 2s 0 ;2s 0 +2k = ´ 3 (1¡´ 2 ) 2 T 0 +o(T 0 ): (20) Similarly, T 0 ¡1 X s 0 =1 T 0 ¡s 0 X k=1 m 2s 0 ;2s 0 +2k¡1 =¡ ´ 2 (1¡´ 2 ) 2 T 0 +o(T 0 ); (21) T 0 ¡s 0 X k=1 m 2s 0 ¡1;2s 0 ¡1+2k = ´ 3 (1¡´ 2 ) 2 T 0 +o(T 0 ); (22) and T 0 ¡s 0 X k=1 m 2s 0 ¡1;2s 0 ¡1+2k¡1 =¡ ´ 2 (1¡´ 2 ) 2 T 0 +o(T 0 ): (23) 40 Summing them together yields P s6=t m (1) st = 2(2 ´ 3 (1¡´ 2 ) 2 T 0 ¡ 2 ´ 2 (1¡´ 2 ) 2 T 0 ) + o(T 0 ) = ¡2 ´ 2 (1¡´)(1+´) 2 T +o(T 0 ): Adding the diagonal terms and o®-diagonal terms together yields T¡2 X s;t=1 m (1) st =¡2 ´ 2 (1+´)(1¡´ 2 ) T + ´ 1¡´ 2 T +o(T)= ´ (1+´) 2 T +o(T): (24) Hence the order T term of A GMM is 1¡° 1+° ´ (1+´) 2 T¾ 2 : For the Numerator: We need to calculate B GMM = trf(lim N!1 1 N P N i=1 Z 0 i Z i ) ¡1 E[ 1 N ( P N i=1 Z 0 i M u i M y 0 i;¡1 Z i ]g=trf 1 ¾ 2 1+° 1¡° M (1) T¡2 Qg= 1 ¾ 2 1+° 1¡° tr(M (1) T¡2 Q), where Q=E[Z 0 i Mu i My 0 i;¡1 Z i ]. The Matrix Q: We know q st = E[M y i;s¡1 M u i;s+1 M y i;t¡1 M y it ]. It has di®erent expected values depending on s and t. (i) When t·s¡1, q st =0. (ii) When t=s, q ss = E[My 2 i;s¡1 My is Mu i;s+1 ] = ° 2 E[My 2 i;s¡2 Mu is Mu i;s+1 ]+2°E[Mu i;s¡2 Mu i;s¡1 Mu is Mu i;s+1 ] +E[Mu 2 i;s¡1 Mu is Mu i;s+1 ] = ¡ 2¾ 4 1+° : (iii) When t=s+k, q s;s+k = E[My i;s¡1 Mu i;s+1 My i;s+k¡1 My i;s+k ] = E[My i;s¡1 Mu i;s+1 (° k My i;s¡1 + k¡1 X j=0 ° j Mu i;s+k¡1¡j ) £(° k+1 My i;s¡1 + k X j=0 ° j Mu i;s+k¡j )] , q(1;1)+q(2;1)+q(1;2)+q(2;2): 41 It's easy to see that the ¯rst term is zero. The remaining three terms can be shown as: (1) Let q s;s+k (2;2) = E[M y i;s¡1 M u i;s+1 ( P k¡1 j=0 ° j M u i;s+k¡1¡j )( P k j=0 ° j M u i;s+k¡j )]. It can be shown that q s;s+1 (2;2) = (¡2 + °)¾ 4 , q s;s+2 (2;2) = (° ¡ 4° 2 + ° 3 )¾ 4 , q s;s+3 (2;2)=(2° 3 ¡4° 4 +° 5 )¾ 4 . For k¸4, q s;s+k (2;2) = E[My i;s¡1 Mu i;s+1 ( k¡1 X j=0 ° j Mu i;s+k¡1¡j )( k X j=0 ° j Mu i;s+k¡j )] = ° 2k¡1 E[My i;s¡1 Mu i;s+1 Mu 2 is ]+2° 2k¡2 E[My i;s¡1 Mu 2 i;s+1 Mu is ] +2° 2k¡3 E[My i;s¡1 Mu i;s+1 Mu is Mu i;s+2 ] = (2° 2k¡3 ¡4° 2k¡2 +° 2k¡1 )¾ 4 : (2) Let q s;s+k (1;2) =E[My i;s¡1 Mu i;s+1 (° k My i;s¡1 )( P k j=0 ° j Mu i;s+k¡j )] =° k E[M y 2 i;s¡1 Mu i;s+1 ( P k j=0 ° j Mu i;s+k¡j )]. We have q s;s+1 (1;2)=2 (2°¡° 2 ) 1+° ¾ 4 , q s;s+2 (1;2)=¡2 (1¡°) 2 1+° ° 2 ¾ 4 . For k¸2, q s;s+k (1;2) = ° k E[My 2 i;s¡1 Mu i;s+1 ( k X j=0 ° j Mu i;s+k¡j )] = ° k E[My 2 i;s¡1 Mu i;s+1 (° k¡2 u i;s+2 +° k¡1 u i;s+1 +° k u is )] = ¡2 (1¡°) 2 1+° ° 2(k¡1) ¾ 4 : (3) Let q s;s+k (2;1) = E[M y i;s¡1 M u i;s+1 ( P k¡1 j=0 ° j M u i;s+k¡1¡j )(° k+1 M y i;s¡1 )] = ° k+1 E[My 2 i;s¡1 Mu i;s+1 ( P k¡1 j=0 ° j Mu i;s+k¡1¡j )], then q s;s+1 (2;1)=° 2 E[My 2 i;s¡1 Mu i;s+1 Mu i;s ]=¡2 ° 2 1+° ¾ 4 , q s;s+2 (2;1)=° 3 E[My 2 i;s¡1 Mu i;s+1 (Mu i;s+1 +°Mu is )]=2 (2°¡° 2 ) 1+° ° 2 ¾ 4 , q s;s+3 (2;1)=° 4 E[My 2 i;s¡1 Mu i;s+1 (Mu i;s+2 +°Mu i;s+1 +° 2 Mu is )]=¡2 (1¡°) 2 1+° ° 4 ¾ 4 . For k¸3, q s;s+k (1;2)=¡2 (1¡°) 2 1+° ° 2(k¡1) ¾ 4 . 42 After calculating each term and adding them together, it yields q s;s+1 =[(¡2+°)+ 2 (2°¡° 2 ) 1+° ¡2 ° 2 1+° ]¾ 4 = 1 1+° (¡2+3°¡3° 2 )¾ 4 , q s;s+2 = [(°¡4° 2 +° 3 )¡2 (1¡°) 2 1+° ° 2 + 2 (2°¡° 2 ) 1+° ° 2 ]¾ 4 = 1 1+° (°¡5° 2 +5° 3 ¡3° 4 )¾ 4 . For k ¸ 3, q s;s+k = [(2° 2k¡3 ¡4° 2k¡2 + ° 2k¡1 )¡2 (1¡°) 2 1+° ° 2(k¡1) ¡2 (1¡°) 2 1+° ° 2(k¡1) ]¾ 4 = 1 1+° (2° 2k¡3 ¡6° 2k¡2 +5° 2k¡1 ¡3° 2k )¾ 4 . So, q s;s+k =q(k)¾ 4 = 8 > > > > > > > > > > > > > > < > > > > > > > > > > > > > > : 0; k <0 ¡ 2 1+° ¾ 4 ; k =0 1 1+° (¡2+3°¡3° 2 )¾ 4 ; k =1 1 1+° (°¡5° 2 +5° 3 ¡3° 4 )¾ 4 ; k =2 ° 2k 1+° (2° ¡3 ¡6° ¡2 +5° ¡1 ¡3)¾ 4 ; k¸3 Analysis of the Trace: Without loss of generality, we assume again T ¡2=2T 0 . tr(M (1) T¡2 Q) = T¡2 X s=1 T¡2 X l=1 m (1) sl q ls = T¡2 X s=1 t¡1 X r=0 m (1) s;s¡r q s¡r;s = T 0 ¡1 X s 0 =1 [ s 0 ¡1 X r 0 =0 m (1) 2s 0 ;2s 0 ¡2r 0 q 2s 0 ¡2r 0 ;2s 0 + s 0 ¡1 X r 0 =0 m (1) 2s 0 ;2s 0 ¡(2r 0 +1) q 2s 0 ¡(2r 0 +1);2s 0 ] + T 0 X s 0 =1 [ s 0 ¡1 X r 0 =0 m (1) 2s 0 ¡1;2s 0 ¡1¡2r 0 q 2s 0 ¡1¡2r 0 ;2s 0 ¡1 + s 0 ¡2 X r 0 =0 m (1) 2s 0 ¡1;2s 0 ¡1¡(2r 0 +1) q 2s 0 ¡1¡(2r 0 +1);2s 0 ¡1 ]: 43 Let c q = 1 1+° (2° ¡3 ¡6° ¡2 +5° ¡1 ¡3). There are four terms to be analyzed. For the ¯rst term, d T s 0 ¡1 X r 0 =0 m (1) 2s 0 ;2s 0 ¡2r 0 q 2s 0 ¡2r 0 ;2s 0 = [´ 2T 0 +1 +´ ¡(2T 0 +1) ¡´ 2T 0 +1¡4s 0 ¡´ ¡(2T 0 +1¡4s 0 ) ]q(0)¾ 4 +[´ 2T 0 ¡1 +´ ¡(2T 0 ¡1) ¡´ 2T 0 +3¡4s 0 ¡´ ¡(2T 0 +3¡4s 0 ) ]q(2)¾ 4 + s 0 ¡1 X r 0 =2 [´ 2T 0 +1¡2r 0 +´ ¡(2T 0 +1¡2r 0 ) ¡´ 2T 0 +1+2r 0 ¡4s 0 ¡´ ¡(2T 0 +1+2r 0 ¡4s 0 ) ]c q ° 4r 0 ¾ 4 : Multiplying both sides by ´ 2T 0 +1 , we have (´¡´ ¡1 )(´ 2(2T 0 +1) ¡1) s 0 ¡1 X r 0 =0 m (1) 2s 0 ;2s 0 ¡2r 0 q 2s 0 ¡2r 0 ;2s 0 = [´ 2(2T 0 +1) +1¡´ 2(2T 0 +1)¡4s 0 ¡´ 4s 0 ]q(0)¾ 4 +[´ 4T 0 +´ 2 ¡´ 4T 0 +4¡4s 0 ¡´ ¡2+4s 0 ]q(2)¾ 4 + s 0 ¡1 X r 0 =2 [´ 2(2T 0 +1)¡2r 0 +´ 2r 0 ¡´ 2(2T 0 +1)+2r 0 ¡4s 0 ¡´ ¡2r 0 +4s 0 ]c q ° 4r 0 ¾ 4 : The third term equals to c q ¾ 4 s 0 ¡1 X r 0 =2 [(´ 2(2T 0 +1) ¡´ 4s 0 )(° 2 ´ ¡1 ) 2r 0 +(1¡´ 2(2T 0 +1)¡4s 0 )(° 2 ´) 2r 0 ] = c q ¾ 4 [(´ 2(2T 0 +1) ¡´ 4s 0 ) (° 2 ´ ¡1 ) 4 ¡(° 2 ´ ¡1 ) 2s 0 1¡(° 2 ´ ¡1 ) 2 +(1¡´ 2(2T 0 +1)¡4s 0 ) (° 2 ´) 4 ¡(° 2 ´) 2s 0 1¡(° 2 ´) 2 = c q ¾ 4 [ ° 8 ´ 4T 0 ¡2 ¡° 4s 0 ´ 4T 0 +2¡2s 0 ¡° 8 ´ 4s 0 ¡4 +° 4s ´ 2s 1¡(° 2 ´ ¡1 ) 2 + ° 8 ´ 4 ¡° 4s 0 ´ 2s 0 ¡° 8 ´ 4T 0 +6¡4s 0 +° 4s 0 ´ 4T 0 +2¡2s 0 1¡(° 2 ´) 2 ]: 44 So (´¡´ ¡1 )(´ 2(2T 0 +1) ¡1) P T 0 ¡1 s 0 =1 P s 0 ¡1 r 0 =0 m (1) 2s 0 ;2s 0 ¡2r 0 q 2s 0 ¡2r 0 ;2s 0 = [q(0)+´ 2 q(2)+ (° 2 ´) 4 1¡(° 2 ´) 2 c q ]¾ 4 T 0 + o(T 0 ), and consequently P T 0 ¡1 s 0 =1 P s 0 ¡1 r 0 =0 m (1) 2s 0 ;2s 0 ¡2r 0 q 2s 0 ¡2r 0 ;2s 0 = ´ 1¡´ 2 [q(0)+´ 2 q(2)+ (° 2 ´) 4 1¡(° 2 ´) 2 c q ]¾ 4 T 0 +o(T 0 ): Similarly, we know s 0 ¡1 X r 0 =0 m (1) 2s 0 ;2s 0 ¡(2r 0 +1) q 2s 0 ¡(2r 0 +1);2s 0 =¡ ´ 1¡´ 2 [´q(1)+ (° 2 ´) 3 1¡(° 2 ´) 2 c q ]¾ 4 T 0 +o(T 0 ); s 0 ¡1 X r 0 =0 m (1) 2s 0 ¡1;2s 0 ¡1¡2r 0 q 2s 0 ¡1¡2r 0 ;2s 0 ¡1 = ´ 1¡´ 2 [q(0)+´ 2 q(2)+ (° 2 ´) 4 1¡(° 2 ´) 2 c q ]¾ 4 T 0 +o(T 0 ); and s 0 ¡2 X r 0 =0 m (1) 2s 0 ¡1;2s 0 ¡1¡(2r 0 +1) q 2s 0 ¡1¡(2r 0 +1);2s 0 ¡1 = ´ 1¡´ 2 [´q(1)+ (° 2 ´) 3 1¡(° 2 ´) 2 c q ]¾ 4 T 0 +o(T 0 ): Sotr(M (1) T¡2 Q)= ´ 1¡´ 2 [q(0)¡´q(1)+´ 2 q(2)¡ (° 2 ´) 3 1¡(° 2 ´) 2 (1¡° 2 ´)c q ]¾ 4 T+o(T);andB GMM = 1+° 1¡° ´ 1¡´ 2 b 0 ¾ 2 T +o(T); where b 0 =q(0)¡´q(1)+´ 2 q(2)¡ (° 2 ´) 3 1¡(° 2 ´) 2 (1¡° 2 ´)c q . Asymptotic Bias The limit of the expectation of the bias is lim N;T!1 E[ p NT(^ °¡°)] = lim N;T!1 E[ lim N!1 p NT(^ °¡°)] = lim N;T!1 r T N E[ 1 N ( P N i=1 Z 0 i My i;¡1 ) 0 (lim N!1 1 N P N i=1 Z 0 i Z i ) ¡1 ( P N i=1 Z 0 i Mu i ) (lim N!1 1 N P N i=1 Z 0 i My i;¡1 ) 0 (lim N!1 1 N P N i=1 Z 0 i Z i ) ¡1 (lim N!1 1 N P N i=1 Z 0 i My i;¡1 ) ] = lim N;T!1 r T N 1 N E[( P N i=1 Z 0 i My i;¡1 ) 0 (lim N!1 1 N P N i=1 Z 0 i Z i ) ¡1 ( P N i=1 Z 0 i Mu i )] (lim N!1 1 N P N i=1 Z 0 i My i;¡1 ) 0 (lim N!1 1 N P N i=1 Z 0 i Z i ) ¡1 (lim N!1 1 N P N i=1 Z 0 i My i;¡1 ) = lim N;T!1 r T N trf(lim N!1 1 N P N i=1 Z 0 i Z i ) ¡1 1 N E[( P N i=1 Z 0 i Mu i )( P N i=1 Z 0 i My i;¡1 ) 0 ]g (lim N!1 1 N P N i=1 Z 0 i My i;¡1 ) 0 (lim N!1 1 N P N i=1 Z 0 i Z i ) ¡1 (lim N!1 1 N P N i=1 Z 0 i My i;¡1 ) = lim N;T!1 r T N trf(lim N!1 1 N P N i=1 Z 0 i Z i ) ¡1 E[ 1 N ( P N i=1 Z 0 i Mu i My 0 i;¡1 Z i ]g (lim N!1 1 N P N i=1 Z 0 i My i;¡1 ) 0 (lim N!1 1 N P N i=1 Z 0 i Z i ) ¡1 (lim N!1 1 N P N i=1 Z 0 i My i;¡1 ) , lim N;T!1 r T N B GMM A GMM : 45 The second line is due to the interchange of the limit sign and expectation sign. The fourth line is because the numerator is a constant, which can be moved out of the expectation sign. Therefore lim N;T!1 E[ p NT(^ °¡°)]= lim N;T!1 r T N B GMM A GMM » r T N 1+° 1¡° ´ 1¡´ 2 b 0 ¾ 2 T 1¡° 1+° ´ (1+´) 2 T¾ 2 = r T N (1+°) 2 (1¡°) 2 1+´ 1¡´ b 0 : where b 0 =q(0)¡´q(1)+´ 2 q(2)¡ (° 2 ´) 3 1¡(° 2 ´) 2 (1¡° 2 ´)c q , q(0)=¡ 2 1+° , q(1)=¡ 1 1+° (¡2+ 3°¡3° 2 ), q(2)= 1 1+° (°¡5° 2 +5° 3 ¡3° 4 ), and c q = 1 1+° (2° ¡3 ¡6° ¡2 +5° ¡1 ¡3). Asymptotic Variance As shown in Hsiao (2003) (pp. 88, (4.3.47)), the asymptotic variance of the scaled estimator equals to (NT)¾ 2 [( P N i=1 Z 0 i M y i;¡1 ) 0 ( P N i=1 Z 0 i Z i ) ¡1 ( P N i=1 Z 0 i M y i;¡1 )] ¡1 . So when both N and T go to in¯nity, we know ¾ 2 GMM = lim N;T!1 T¾ 2 [A GMM ] ¡1 = 1+° 1¡° (1+´) 2 ´ : (25) ¤ 46 Crude GMM Proof of Theorem 1.4.3 In the case of using one di®erence instrument, we have Zi = 0 B B B B B B B @ Myi2 Myi3 . . . Myi;T¡2 1 C C C C C C C A , then Z 0 i Zi = 0 B B B B B B B B B B B @ My 2 i2 0 0 0 0 My 2 i3 0 . . . 0 . . . . . . . . . 0 . . . 0 My 2 i;T¡3 0 0 0 0 My 2 i;T¡2 : 1 C C C C C C C C C C C A Then since E[My 2 it ]= 2¾ 2 1+° , we know ( 1 N N X i=1 Z 0 i Z i ) ¡1 ! p ( 2¾ 2 1+° I T¡2 ) ¡1 , 1+° 2¾ 2 I T¡2 : (26) Consistency The consistency follows directly from the proof of Alvarez & Arellano (2003). The equa- tion (A110) on page 1151 of their paper becomes 1 NT P T¡1 i=1 v 0 t M t v t ! p 1 NT P T¡1 t=1 ¾ 2 = T¡1 NT ¾ 2 !0, since E[v 0 t M t v t ]=¾ 2 if we only use one instrument. Asymptotic Bias Similar to the proof of the GMM case, we have lim N;T!1 E[ p NT(^ °¡°)] = lim N;T!1 r T N trf(lim N!1 1 N P N i=1 Z 0 i Z i ) ¡1 E[ 1 N ( P N i=1 Z 0 i Mu i My 0 i;¡1 Z i ]g (lim N!1 1 N P N i=1 Z 0 i My i;¡1 ) 0 (lim N!1 1 N P N i=1 Z 0 i Z i ) ¡1 (lim N!1 1 N P N i=1 Z 0 i My i;¡1 ) , lim N;T!1 r T N B A : 47 For the Numerator: We need to calculate B = trf(lim N!1 1 N P N i=1 Z 0 i Z i ) ¡1 E[ 1 N ( P N i=1 Z 0 i M u i M y 0 i;¡1 Z i ]g = trf 1+° 2¾ 2 I T¡2 Qg = 1+° 2¾ 2 tr(Q), where Q = E[Z 0 i M u i M y 0 i;¡1 Z i ]. Since tr(Q)= P T¡2 s=1 q ss =¡ 2 1+° ¾ 4 (T ¡2), we know B = 1+° 2¾ 2 [¡ 2 1+° ¾ 4 (T ¡2)]+o(T)=¡(T ¡2)¾ 2 +o(T): (27) For the Denominator: Since 1 N P N i=1 Z 0 i My i;¡1 = 0 B B B B @ 1 N P N i=1 My i1 My i2 . . . 1 N P N i=1 My i;T¡2 My i;T¡1 1 C C C C A ! p ¡ 1¡° 1+° ¾ 2 ¶; we know A! p (1¡°) 2 (1+°) 2 ¾ 4 ¶ 0 ( 1+° 2¾ 2 I T¡2 )¶= (1¡°) 2 2(1+°) ¾ 2 (T ¡2): (28) So the asymptotic bias is lim N;T!1 E[ p NT(^ °¡°)]=¡ q T N 2(1+°) (1¡°) 2 . Asymptotic Variance the asymptotic variance of the scaled estimator equals to A ¡1 [( P N i=1 Z 0 i M y i;¡1 ) 0 ( P N i=1 Z 0 i Z i ) ¡1 ( P N i=1 Z 0 i Z i )( P N i=1 Z 0 i Z i ) ¡1 ( P N i=1 Z 0 i My i;¡1 )]A ¡1 . Similarly to the case of GMM, we know 48 ¾ 2 CGMM = lim T!1 1 T [ (1¡°) 2 2(1+°) ] ¡1 [¡ 1¡° 1+° ¶ 0 ][ 2 1+° I] ¡1 [ 1¡° 1+° 0 B B B B B B B B B B @ w 1 1 0 1 w 1 1 . . . . . . . . . 1 w 1 1 1 w 1 1 C C C C C C C C C C A ][ 2 1+° I] ¡1 [¡ 1¡° 1+° ¶] = lim T!1 1 T 1+° 1¡° [(T ¡2)w 1 +2(T ¡3)] = 2 (1+°)(3¡°) (1¡°) 2 : ¤ MLE Treating Initial Value as a Fixed Constant Proof of Lemma 1.5.2 The Denominator ~ A T;i (i) De¯ne à tk = E[M y it M y i;t+k ]. Then à 10 = E[M y 2 i1 ] = w¾ 2 +b 2 , and à t0 = ° 2 à t¡1;0 +2(1¡°)¾ 2 . So à t0 = 2¾ 2 1+° +° 2(t¡1) (w¾ 2 +b 2 ¡ 2¾ 2 1+° ), for t¸1. Also à t1 = E[M y it M y i;t+1 ] = ¡ 1¡° 1+° ¾ 2 +° 2t¡1 (w¾ 2 +b 2 ¡ 2¾ 2 1+° ). Since for k ¸ 2, à tk =°Ã t;k¡1 +E[My it Mu i;t+k ]=°Ã t;k¡1 , we have à tk =° k¡1 à t1 . So à tk = 8 > > < > > : 2¾ 2 1+° +° 2(t¡1) ¾ 2 1 ; k =0 ° k¡1 [¡ 1¡° 1+° ¾ 2 +° 2t¡1 ¾ 2 1 ]; k¸1 where ¾ 2 1 =w¾ 2 +b 2 ¡ 2¾ 2 1+° . (ii) The expectation of the denominator is E[ ~ A T;i ] = P T¡1 t=1 p tt E[M y 2 it ] + 2 P T¡2 t=1 P T¡t¡1 k=1 p t;t+k E[My it My i;t+k ]: 49 Using the results in (i), we know TE[ ~ A T;i ]= T¡1 X t=1 t(T ¡t)[ 2¾ 2 1+° +° 2(t¡1) ¾ 2 1 ] +2 T¡2 X t=1 T¡1¡t X k=1 t(T ¡t¡k)° k¡1 [¡ 1¡° 1+° ¾ 2 +° 2t¡1 ¾ 2 1 ] » 2¾ 2 1+° ( 1 6 T 3 ¡ 1 6 T)¡2 1¡° 1+° ¾ 2 T¡2 X t=1 t[ T ¡t¡1 1¡° ¡ °¡° T¡t (1¡°) 2 ] +2¾ 2 1 T¡2 X t=1 t[ T ¡t¡1 1¡° ° 2t¡1 ¡ ° 2t ¡° T+t¡1 (1¡°) 2 ] » 2¾ 2 1+° ( 1 6 T 3 ¡ 1 6 T)¡2 1¡° 1+° ¾ 2 [ 1 1¡° ( 1 6 T 3 ¡ 1 2 T 2 + 1 3 T) ¡ ° (1¡°) 2 (T ¡2)(T ¡1) 2 ]; So O(T 3 ) terms are 2¾ 2 1+° T 3 6 ¡ 2 1 1¡° (¡ 1¡° 1+° )¾ 2T 3 6 = 0, and O(T 2 ) terms are ¡2 1¡° 1+° ¾ 2 (¡ 1 2 1 1¡° ¡ 1 2 ° (1¡°) 2 )= 1 1¡° 2 ¾ 2 . Hence E[ ~ A T;i ]=O(T), and 1 N ~ A T ! p 1 1¡° 2 ¾ 2 T =O(T). The Numerator ~ B T;i E[ ~ B T;i ] = E[ T¡1 X s=1 T¡1 X t=1 p st Mu i;s+1 My it ] = T¡3 X s=1 [p ss (¡¾ 2 )+p s;s+1 (2¡°)¾ 2 + T¡1 X t=s+2 p st (¡° t¡2¡s (1¡°) 2 ¾ 2 )] +p T¡2;T¡2 (¡¾ 2 )+p T¡2;T¡1 (2¡°)¾ 2 +p T¡1;T¡1 (¡¾ 2 ) = ¾ 2 T f T¡3 X s=1 [s(T ¡s)(¡1)+s(T ¡s¡1)(2¡°) + T¡1 X t=s+2 s(T ¡t)(¡° t¡s¡2 (1¡°) 2 )] +2(T ¡2)(¡1)+(T ¡2)(2¡°)+(T ¡1)(¡1)g: 50 Since P T¡1 t=s+2 (T ¡t)° t¡s¡2 = T¡2¡s 1¡° ¡ °¡° T¡1¡s (1¡°) 2 , it follows that E[ ~ B T;i ] = ¾ 2 T f T¡3 X s=1 s[¡(T ¡s)+(2¡°)(T ¡s¡1)¡(1¡°)(T ¡2¡s) +°(1¡° T¡2¡s )]+2(T ¡2)(¡1)+(T ¡2)(2¡°)+(T ¡1)(¡1)g = ¾ 2 T f¡ T¡3 X s=1 s° T¡1¡s ¡°(T ¡2)¡(T ¡1)g = ¾ 2 (1¡°) 2 (1¡° T )¡(1¡°)T T : The Variance of ~ B T;i =Mu 0 i ¡1 0 My i;¡1 By iteration,My it =° t¡1 My i1 + P t j=2 ° t¡j Mu ij , Mu i =(Mu i2 ;:::;Mu iT ) 0 ,My i;¡1 =(My i1 ;:::;My i;T¡1 ) 0 . We can write this in matrix form: M y i;¡1 = q T M y i1 + R T M u i , where q T = (1;°;:::;° T¡2 ) 0 , and R T = 0 B B B B B B B B B B B B B B @ 0 0 0 ::: 0 0 1 0 0 ::: 0 0 ° 1 0 ::: 0 0 ° 2 ° 1 ::: 0 0 . . . . . . . . . . . . . . . . . . ° T¡3 ° T¡4 ° T¡5 ::: 1 0 1 C C C C C C C C C C C C C C A . Also we knowMu i =Cu i , whereu i =(u i1 ;u i2 ;:::;u iT ) is aT dimensional vector and C = 0 B B B B B B B B B B B B B B B B B @ ¡1 1 0 ::: 0 0 0 0 ¡1 1 ::: 0 0 0 0 0 ¡1 ::: 0 0 0 . . . . . . . . . . . . . . . . . . . . . 0 0 0 ::: 1 0 0 0 0 0 ::: ¡1 1 0 0 0 0 ::: 0 ¡1 1 1 C C C C C C C C C C C C C C C C C A is a (T ¡1)£T matrix. 51 So E[ ~ B 2 T;i ] = E[(u 0 i C 0 ¡1 0 q T My i1 +u 0 i C 0 ¡1 0 R T Cu i ) 2 ] = E[(My i1 a 0 u i ) 2 ]+E[(u 0 i Hu i ) 2 ]+2E[My i1 a 0 u i u 0 i Hu i ] ´ U 1 +U 2 +2U 12 ; where H =C 0 ¡1 0 R T C, and a 0 =q 0 T ¡1 0 C. I. The Matrix H: We know r st = 8 > > < > > : 0; s·t ° s¡t¡1 ; s¸t+1 Let F = ¡1 0 R T . Then f st = P T¡1 k=1 p sk r kt = 8 > > < > > : P T¡1 k=t+1 p sk ° k¡t¡1 ; t<T ¡1 0; t=T ¡1 Denote F = 0 B B B B B B B @ F 0 1 F 0 2 . . . F 0 T¡1 1 C C C C C C C A and C =(C1;C2;:::;CT). Then C 0 F = 0 B B B B B B B B B B @ ¡F 0 1 F 0 1 ¡F 0 2 . . . F 0 T¡2 ¡F 0 T¡1 F 0 T¡1 1 C C C C C C C C C C A and H = C 0 FC = 0 B B B B B B B B B B @ ¡F 0 1 C 1 ¡F 0 1 C 2 ::: ¡F 0 1 C T¡1 ¡F 0 1 C T (F 0 1 ¡F 0 2 )C 1 (F 0 1 ¡F 0 2 )C 2 ::: (F 0 1 ¡F 0 2 )C T¡1 (F 0 1 ¡F 0 2 )C T . . . . . . . . . . . . . . . (F 0 T¡2 ¡F 0 T¡1 )C 1 (F 0 T¡2 ¡F 0 T¡1 )C 2 ::: (F 0 T¡2 ¡F 0 T¡1 )C T¡1 (F 0 T¡2 ¡F 0 T¡1 )C T F 0 T¡1 C 1 F 0 T¡1 C 2 ::: F 0 T¡1 C T¡1 F 0 T¡1 C T 1 C C C C C C C C C C A : (i) The general term for 2·s;t·T ¡1 is h st = T¡1 X k=1 (f s¡1;k ¡f sk )c kt =f st ¡f s;t¡1 ¡f s¡1;t +f s¡1;t¡1 = T¡1 X k=t+1 p sk ° k¡1¡t ¡ T¡1 X k=t p sk ° k¡t ¡ T¡1 X k=t+1 p s¡1;k ° k¡1¡t + T¡1 X k=t p s¡1;k ° k¡t : 52 We have the following cases: (1) s·t, Th st = T¡1 X k=t+1 s(T ¡k)° k¡1¡t ¡ T¡1 X k=t s(T ¡k)° k¡t ¡ T¡1 X k=t+1 (s¡1)(T ¡k)° k¡1¡t + T¡1 X k=t (s¡1)(T ¡k)° k¡t = T¡1 X k=t+1 (T ¡k)° k¡1¡t ¡ T¡1 X k=t (T ¡k)° k¡t = ¡ 1¡° T¡t 1¡° : (2) s>t, Th st = s¡1 X k=t+1 k(T ¡s)° k¡1¡t + T¡1 X k=s s(T ¡k)° k¡1¡t ¡ s¡1 X k=t k(T ¡s)° k¡t ¡ T¡1 X k=s s(T ¡k)° k¡t ¡ s¡1 X k=t+1 k(T ¡s+1)° k¡1¡t ¡ T¡1 X k=s (s¡1)(T ¡k)° k¡1¡t + s¡1 X k=t k(T ¡s+1)° k¡t + T¡1 X k=s (s¡1)(T ¡k)° k¡t = (s¡1)° s¡1¡t ¡ s¡2 X k=t ° k¡t +(T ¡s)° s¡1¡t ¡° T¡1¡t ¡ T¡2 X k=s ° k¡t = T° s¡1¡t ¡T° s¡t ¡1+° T¡t 1¡° : So h st = 8 > > < > > : ¡ 1 T 1¡° T¡t 1¡° ; s·t ° s¡t¡1 ¡ 1 T 1¡° T¡t 1¡° ; s>t where 2·s;t·T ¡1. (ii) The values at the border when s, t equals to 1 or T are: h 11 =f 11 = P T¡1 k=2 p 1k ° k¡2 = 1 1¡° ¡ 1 T 2¡°¡° T¡1 (1¡°) 2 , 53 h TT =f T¡1;T¡1 =0, h 1T =¡f 1;T¡1 =0, and h T;1 =¡f T¡1;1 =¡ P T¡1 k=2 p T¡1;k ° k¡2 = ° T¡2 1¡° ¡ 1 T 2¡°¡° T¡1 (1¡°) 2 . Since the last column of R T is a zero vector, we know h sT =0, for 1·s·T. For 2·s;t·T ¡1, (a) h 1t = f 1;t¡1 ¡f 1t = T¡1 X k=t p 1k ° k¡t ¡ T¡1 X k=t+1 p 1k ° k¡t¡1 = 1 T [ T¡1 X k=t (T ¡k)° k¡t ¡ T¡1 X k=t+1 (T ¡k)° k¡t¡1 ] = 1 T (° T¡1¡t + T¡2 X k=t ° k¡t ) = 1 T 1¡° T¡t 1¡° : (b) h s1 = f s¡1;1 ¡f s;1 = T X k=2 p s¡1;k ° k¡2 ¡ T¡1 X k=2 p sk ° k¡2 : We know h 21 = 1 T [ T¡1 X k=2 (T ¡k)° k¡2 ¡ T¡1 X k=2 2(T ¡k)° k¡2 ] = ¡ 1 T T¡1 X k=2 (T ¡k)° k¡2 =¡h 11 : For s¸3, we have 54 h s1 = s¡1 X k=2 p s¡1;k ° k¡2 + T¡1 X k=s p s¡1;k ° k¡2 ¡ s¡1 X k=2 p sk ° k¡2 ¡ T¡1 X k=s p sk ° k¡2 = 1 T [ s¡1 X k=2 k(T ¡s+1)° k¡2 + T¡1 X k=s (s¡1)(T ¡k)° k¡2 ¡ s¡1 X k=2 k(T ¡s)° k¡2 ¡ T¡1 X k=s s(T ¡k)° k¡2 ] = 1 T [ s¡1 X k=2 k° k¡2 ¡ T¡1 X k=s (T ¡k)° k¡2 ] = 1 T 2¡°¡° T¡1 (1¡°) 2 ¡ ° s¡2 1¡° : This formula is consistent with the case when s=2. (c) Finally, h Tt = f T¡1;t¡1 ¡f T¡1;t = T¡1 X k=t p T¡1;k ° k¡t ¡ T¡1 X k=t+1 p T¡1;k ° k¡t¡1 = 1 T ( T¡1 X k=t k° k¡t ¡ T¡1 X k=t+1 k° k¡t¡1 ) = ° T¡1¡t ¡ 1 T 1¡° T¡t 1¡° : II. The Vector a: We know a 0 =q 0 T ¡1 0 C. 55 Denote g =q 0 T ¡1 0 . Then g t = T¡1 X k=1 ° k¡1 p kt = 1 T [ t X k=1 ° k¡1 k(T ¡t)+ T¡1 X k=t+1 ° k¡1 t(T ¡t)] = 1 T [(T ¡t)( 1¡° t (1¡°) 2 ¡ t° t 1¡° )+t( (T ¡t¡1)° t 1¡° ¡ ° t+1 ¡° T (1¡°) 2 )] = 1 T (T ¡t)¡T° t +t° T (1¡°) 2 : So for 2·t·T ¡1, a t = g t¡1 ¡g t = 1 T (T +1¡t)¡T° t¡1 +(t¡1)° T (1¡°) 2 ¡ 1 T (T ¡t)¡T° t +t° T (1¡°) 2 = 1 T 1¡° T (1¡°) 2 ¡ ° t¡1 1¡° : Since a 1 =¡g 1 = 1 T 1¡° T (1¡°) 2 ¡ 1 1¡° , and a T =g T¡1 = 1 T 1¡° T (1¡°) ¡ ° T¡1 1¡° , we see generally we have a t = 1 T 1¡° T (1¡°) 2 ¡ ° t¡1 1¡° , for 1·t·T. Analysis of the Variance of ~ B T;i (i) U 1 = E[(My i1 a 0 u i ) 2 ] = E[My 2 i1 ( T X k=1 a k u ik ) 2 ] = E[My 2 i1 ( T X k=1 a 2 k u 2 ik )] = a 2 1 E[My 2 i1 u 2 i1 ]+ T X k=2 a 2 k E[My 2 i1 u 2 ik ] = a 2 1 f¹ 4 +[b 2 +(w¡1)¾ 2 ]¾ 2 g+ T X k=2 a 2 k f¾ 4 +[b 2 +(w¡1)¾ 2 ]¾ 2 g: So U 1 » P T t=1 a 2 t » P T t=1 ( ° t¡1 1¡° ) 2 =O(1). 56 (ii) U 2 = E[(u 0 i Hu i ) 2 ] = E[( T X s=1 T X t=1 h st u s u t ) 2 ] = E[ T X s=1 T X t=1 h 2 st u 2 s u 2 t + X s6=t h ss h tt u 2 s u 2 t ] = ( T X s=1 h 2 ss )¹ 4 + X s6=t h 2 st ¾ 4 + X s6=t h ss h tt ¾ 4 : We have (1) P T s=1 h 2 ss »h 2 11 + P T¡1 s=2 h 2 ss +h 2 TT =O( 1 T )+O( 1 T )=O( 1 T ). (2) X s6=t h 2 st = T¡1 X s=2 h 2 s1 +h 2 T1 + T¡1 X t=2 h 2 1t + T¡1 X t=2 h 2 T;t + X 2·s<t·T¡1 h 2 st + X 2·t<s·T¡1 h 2 st = O(1)+O( 1 T )+O( 1 T )+O(1)+L 1 +L 2 ; with L 1 » P T¡1 s=2 P T¡1 t=s+1 1 (1¡°) 2 T 2 =O(1); L 2 » P T¡1 t=1 P T¡1 s=t+1 ° 2(s¡t¡1) =( T¡1 1¡° 2 ¡ 1¡° 2(T¡1) (1¡° 2 ) 2 )» T 1¡° 2 =O(T): So P s6=t h 2 st » T 1¡° 2 . (3) P s6=t h ss h tt » P s6=t 1 (1¡°) 2 T 2 =O(1): From the above analysis, we know U 2 » T¾ 4 1¡° 2 : 57 (iii) U 12 = E[My i1 a 0 u i u 0 i Hu i ]=E[( T X k=1 a k u ik My i1 )( T X s=1 T X t=1 h st u s u t )] = E[( T X k=1 a k u ik u i1 )( T X s=1 T X t=1 h st u s u t )] = E[a 1 u 2 i1 T X s=1 h ss u 2 is + T X k=2 a k u ik u i1 ( T X l=2 h l1 u il u i1 + T X l=2 h 1l u i1 u il )] = a 1 h 11 ¹ 4 +( T X s=2 a 1 h ss + T X s=2 a s h s1 + T X s=2 a s h 1s )¾ 4 : Since a 1 h 11 =O(1)£O(1)=O(1); (29) T X s=2 a 1 h ss » T X s=2 (¡ 1 1¡° )(¡ 1 T(1¡°) )=O(1); (30) T X s=2 a s h s1 » T X s=2 (¡ ° s¡1 1¡° )(¡ ° s¡2 1¡° )=O(1); (31) T X s=2 a s h 1s » T X s=2 (¡ ° s¡1 1¡° ) 1 T(1¡°) =O( 1 T ); (32) we know U 12 =O(1). So E[ ~ B T;i ]=O(1) implies var( ~ B T;i )»E[ ~ B 2 T;i ]» T¾ 4 1¡° 2 . ¤ MLE Treating Initial Value as Random Proof of Lemma 1.5.1 Let ½ ¤ st = D(T;w)½ st . Denote B ¤ T;i = D(T;w)B T;i , C ¤ T;i = D(T;w)C T;i . i) De¯ne ' k =E[Mu is My i;s+k ], s¸2. ' k =0, for k·¡2, ' ¡1 =¡¾ 2 , ' 0 =(2¡°)¾ 2 , ' 1 =°' 0 +' ¡1 =¡(1¡°) 2 ¾ 2 , 58 ' k = E[M u is M y i;s+k ] = E[M u is (° M y i;s+k¡1 + M u i;s+k )] = °' k¡1 +E[M u is M u i;s+k ], Since E[Mu is Mu i;s+k ]=0, for k¸2, we know ' k =° k¡1 ' 1 =¡° k¡1 (1¡°) 2 ¾ 2 . So ' k = 8 > > > > > > > > > > < > > > > > > > > > > : 0; k·¡2 ¡¾ 2 ; k =¡1 (2¡°)¾ 2 ; k =0 ¡° k¡1 (1¡°) 2 ¾ 2 ; k¸1 ii) De¯ne Á t =E[(My i1 ¡b)My i;t¡1 ]. It's easy to check Á 2 =E[(My i1 ¡b)My i;1 ]= w¾ 2 . Since M y i1 = y i1 ¡y i0 = ® i +(°¡1)y i0 +u i1 , and M u i2 = u i2 ¡u i1 , we know E[(My i1 ¡b)Mu i2 ]=¡¾ 2 : Also Á 3 =E[(My i1 ¡b)(°My i1 +Mu i2 )]=°Á 2 +E[(My i1 ¡b)Mu i2 ]=(°w¡1)¾ 2 . Since E[M y i1 M u i;t¡1 ] = 0, for t ¸ 3, Á t = E[(M y i1 ¡b)(° M y i;t¡2 +M u i;t¡1 )] = °Á t¡1 +E[(My i1 ¡b)Mu i;t¡1 ]=°Á t¡1 , for t¸3. In Summary, Á t = 8 > > < > > : w¾ 2 ; t=2 (°w¡1)¾ 2 ; t¸3: iii) Using the above expressions, we have E[B ¤ T;i ] = T¡2 X s=2 [½ ¤ ss (¡¾ 2 )+½ ¤ s;s+1 (2¡°)¾ 2 + T X t=s+2 ½ ¤ st (¡° t¡2¡s (1¡°) 2 ¾ 2 )] +½ ¤ T¡1;T¡1 (¡¾ 2 )+½ ¤ T¡1;T (2¡°)¾ 2 +½ ¤ T;T (¡¾ 2 ); and E[C ¤ T;i ] = ½ ¤ 12 w¾ 2 + T X t=3 ½ ¤ 1t (°w¡1)° t¡3 ¾ 2 : 59 Substitute the expression of the elements of the matrix inverse, it follows that E[B ¤ T;i ]=¾ 2 = T¡2 X s=2 [(s¡1)w¡(s¡2)]f¡(T +1¡s)+(T ¡s)(2¡°) ¡(T ¡s¡1)(1¡°)+°(1¡° T¡s¡1 )g +2[(T ¡2)w¡(T ¡3)](¡1)+[(T ¡2)w¡(T ¡3)](2¡°) +[(T ¡1)w¡(T ¡2)](¡1) = T¡2 X s=2 [(s¡1)w¡(s¡2)](¡° T¡s ) ¡(T ¡2)w°+(T ¡3)°¡(T ¡1)w+(T ¡2); which implies (1¡°) 2 E[B ¤ T;i ]=¾ 2 = ¡w(T ¡3)° 2 +w(T ¡2)° 3 ¡w° T +(T ¡4)° 2 ¡(T ¡3)° 3 +° T¡1 ¡(T ¡2)w°(1¡°) 2 +(T ¡3)°(1¡°) 2 ¡(T ¡1)w(1¡°) 2 +(T ¡2)(1¡°) 2 = ¡w° T +° T¡1 ¡[(T ¡1)¡Tw]°+[(T ¡2)¡(T ¡1)w]: Also (1¡°) 2 E[C ¤ T;i ]=¾ 2 = (T ¡1)w(1¡°) 2 +(°w¡1)(T ¡2)(1¡°) ¡(°w¡1)°(1¡° T¡2 ) = w° T ¡° T¡1 ¡[Tw¡(T ¡1)]°+[(T ¡1)w¡(T ¡2)]: Summing them together yields E[B ¤ T;i +C ¤ T;i ]=0. ¤ 60 Appendix B Proofs of Chapter 2 Proof of Lemma 2.2.1. We take ~ x i.i.d. with x, and ~ y i.i.d. with y. Then following Whittle's proof: E[jx 0 Ay¡E[x 0 Ay]j q ] =E[jx 0 Ay¡~ x 0 A~ yj q ] = 1 2 q E[j(x+~ x) 0 A(y¡ ~ y)+(x¡~ x) 0 A(y+ ~ y)j q ] · 1 2 q fE[j(x+~ x) 0 A(y¡ ~ y)j q ] 1=q +E[j(y+ ~ y) 0 A(x¡~ x)j q ] 1=q g q : (33) One of the two terms: E[j(x+~ x) 0 A(y¡ ~ y)j q ] 1=q =E[j» 0 ´j q ] 1=q ·C 1 ( X t E[j» t j 2q ] 1=q E[j´ t j 2q ] 1=q ) q=2 : (34) Since E[j» t j 2q ] = E[( P s a ts (x s + ~ x s ) 2q ] · C 2 ( P s a 2 ts ° 2 s (2q)) q and E[j´ t j 2q ] = E[(y t ¡ ~ y t ) 2q ] · 2 2q E[y 2q t ] = 2 2q ½ t (2q) 2q , we have E[j(x + ~ x) 0 A(y ¡ ~ y)j q ] 1=q · C 3 [ P a 2 ts ° 2 s (2q)½ 2 t (2q)] q=2 . Similarly,fortheothertermwehaveE[j(y+~ y) 0 A(x¡~ x)j q ] 1=q · C 4 [ P a 2 ts ½ 2 s (2q)° 2 t (2q)] q=2 : 61 Since A is symmetric, let C 5 =max(C 3; C 4 ), then E[jx 0 Ay¡E[x 0 Ay]j q ] · 1 2 q f2(C 5 [ X a 2 ts ° 2 s (2q)½ 2 t (2q)] q=2 ) 1=q g q =C 6 [ X a 2 ts ° 2 s (2q)½ 2 t (2q)] q=2 : (35) ¤ Besides Whittle's inequalities (Whittle (1960)), we need the following lemmas which is easy to prove. Lemma .0.1. Let A and B be two symmetric matrices, and B is positive semide¯nite, then ¸ min (A)tr(B)·tr(AB)·¸ max tr(B): (36) Proof of Theorem 2.2.1. It's su±cient to show that the four lemmas hold true with correlated errors, i.e. e = ¿ +Â. To illustrate how the original proof in Li (1987) would change, we prove Lemma 2.2.2 in details here. We know that R(m)=E[L(m)]=tr(P m V)+jjQ m ¹ m jj 2 : (37) So R(m) ¸ tr(P m V) and R m ¸ jjQ m ¹ m jj 2 . By assumption (A.2.2), we know R(m) ¸ C 0 tr(P m ), for some constant C 0 . For Lemma 2.2.2, P(sup m je 0 P m e¡tr(P m V)j=R(m)>±) · X m E[(e 0 P m e¡tr(P m V)) 2h ]=(±R(m)) 2h : (38) 62 By Minkowski's inequality E[(e 0 P m e¡tr(P m V)) 2h ] =E[f(¿ 0 P m ¿¡E[¿ 0 P m ¿]) +2(¿ 0 P m ¡E[¿ 0 P m Â]) +( 0 P m ¡E[ 0 P m Â])g 2h ] ·fE[(¿ 0 P m ¿¡E[¿ 0 P m ¿]) 2h ] 1=2h +2E[(¿ 0 P m ¡E[¿ 0 P m Â]) 2h ] 1=2h +E[( 0 P m ¡E[ 0 P m Â]) 2h ] 1=2h g 2h : (39) Denote ° 1;t (h) = E[¿ h t ] and ° 2;t (h) = E[ h t ]. Since P m is symmetric, by Whittle's inequality for x 0 Ax (Whittle (1960)) and x 0 Ay (Lemma 2.2.1), we have E[(¿ 0 P m ¿¡E[¿ 0 P m ¿]) 2h ] ·C( T X t;s a 2 ts ° 2 1;t (4h)° 2 1;t (4h)) h ; (40) E[(¿ 0 P m ¡E[¿ 0 P m Â]) 2h ] ·C( T X t;s a 2 ts ° 2 1;t (4h)° 2 2;t (4h)) h ; (41) and E[( 0 P m ¡E[ 0 P m Â]) 2h ] ·C( T X t;s a 2 ts ° 2 2;t (4h)° 2 2;t (4h)) h : (42) 63 So E[(e 0 P m e ¡ tr(P m V)) 2h ] · C 1 ( P T t;s a 2 ts ) h = C 1 tr(P m P 0 m ) = C 1 tr(P m ), and P m E[(e 0 P m e¡tr(P m V)) 2h ]=(±R(m)) 2h ·C 1 ± ¡2h P m R(m) ¡h !0. Hence P(sup m je 0 P m e¡tr(P m V)j=R(m)>±)!0; which means sup m je 0 P m e¡tr(P m V)j=R(m)!0 in probability. Lemma 2.2.3 is obvious since je 0 e¡tr(V)j does not change with m and R(m) goes to in¯nity. For Lemma 2.2.4, Since E[< Q m ¹ m ;¿ >] = 0 and E[< Q m ¹ m ; >] = 0 by Whittle (1960)'s inequality for the linear form, we have E[(¹ 0 m Q 0 m e) 2h ] =E[(¹ 0 m Q 0 m ¿ +¹ 0 m Q 0 m Â) 2h ] ·f(E[(¹ 0 m Q 0 m ¿) 2h ]) 1=2h +(E[(¹ 0 m Q 0 m Â) 2h ]) 1=2h g 2h ·f(C 1 [ T X t=1 (Q m ¹ m ) 2 t ° 2 1;t (2h)] h ) 1=2h +(C 2 [ T X t=1 (Q m ¹ m ) 2 t ° 2 2;t (2h)] h ) 1=2h g 2h ·C 3 [ T X t=1 (Q m ¹ m ) 2 t ] h =C 3 jjQ m ¹ m jj 2h ; (43) where (Q m ¹ m ) t is the t th element of the vector Q m ¹ m . We know P(j¹ 0 m Q 0 m e=R m j>±) ·E[(¹ 0 m Q 0 m e) 2h ]=(±R m ) 2h ·C 3 jjQ m ¹ m jj 2h =(±R m ) 2h ·C 3 =(± 2h R h w )!0; (44) and P(j¹ 0 m Q 0 m e=R m j>±)=E[P(j¹ 0 m Q 0 m e=R m j>±)]!0. 64 Since L m ¡R m =(jjP m ejj 2 ¡tr(P m V))¡2<¹ m Q 0 m ;P m e>, to prove Lemma 2.2.5, we need to show that jjP m ejj 2 ¡tr(P m V)=R m ! p 0 and < ¹ m Q 0 m ;P m e > =R m ! p 0. TheformerisexactlyLemma2.2.2andthelattercanbeprovedthesamewayasLemma 2.2.4. ¤ ProofofTheorem2.3.2. TheprooffollowsthatinWanetal.(2010),whichconsidered only the case when error terms are independent. C L (w)=L(w)+2hQ(w)¹;ei+(e 0 e¡tr(V))¡2(e 0 P(w)e¡tr(P(w)V)) (45) Since the term e 0 e¡tr(V) is irrelevant with w, it's su±cient to show that as n!1, sup w2W je 0 P(w)e¡tr(P(w)V)j=R(w)! p 0; (46) sup w2W jhQ(w)¹;eij=R(w)! p 0; (47) sup w2W jL(w)=R(w)¡1j! p 0: (48) To prove (46), Pfsup w2W je 0 P(w)e¡tr(P(w)V)j=R(w)>±g ·Pfsup w2W M X m=1 w m je 0 P(m)e¡tr(P(m)V)j>±» T g ·± ¡2L » ¡2L T M X m=1 E[(e 0 P(m)e¡tr(P(m)V)) 2L ]: (49) 65 Since E[e 0 P(m)e] = tr(P(m)V), e = ¿ + Â, we can apply Whittle's bound to the quadratic form (Whittle (1960)) and weighted inner product (Lemma 2.2.1) as in the proof of Lemma (2.2.3). It follows that E[(e 0 P(m)e¡tr(P(m)V)) 2L ]·C 5 R(m) L : (50) So Pfsup w2W je 0 P(w)e¡tr(P(w)V)j=R(w) > ±g · C 6 » ¡2L T P M m=1 R(m) L , which con- verge to zero by assumption [A.5]. To show (47), Pfsup w2W jhQ(w)¹;eij=R(w)>±g ·Pfsup w2W jh M X m=1 w m Q(m)¹;eij>±» T g ·± ¡2L » ¡2L T M X m=1 E[hQ(m)¹;ei 2L ]: (51) Since E[hQ(m)¹;ei]=0 and e=¿+Â, by Whittle's result Whittle (1960) on the higher moments of linear forms, using similar arguments as (43), it follows that Pfsup w2W jhQ(w)¹;eij=R(w)>±g·C 2 » ¡2L T M X m=1 R(m) L : (52) By Assumption [A.5], we know (47) holds true. Now for the last part (48), since L(w) ¡ R(w) = kP(w)ek 2 ¡ tr(P(w)V) ¡ 2hQ(w)¹;P(w)ei, we need only to show that sup w2W jkP(w)ek 2 ¡tr(P(w)V)j=R(w)! p 0; (53) and sup w2W jhQ(w)¹;P(w)eij=R(w)! p 0; (54) The claim (53) is the same as (46) and (54) can be proved similarly to (47). ¤ 66
Abstract (if available)
Abstract
In this dissertation, the issues of Large N T Properties of IV, GMM, MLE and Least Square Model Selection/Averaging are studied. The first part is based on a joint work with Professor Hsiao. We examine the asymptotic properties of IV, GMM or MLE to estimate dynamic panel data models when either N or T or both are large. We show that the Anderson-Hsiao simple instrumental variable estimator (IV) or the quasi-maximum likelihood estimator (QMLE) treating initial value as stochastic is asymptotically unbiased either N or T or both tend to infinity. On the other hand, the QMLE treating initial value as fixed is asymptotically unbiased only if N is fixed and T is large. If both N and T are large and the ratio N/T goes to a nonzero constant, the QMLE treating initial values as fixed is asymptotically biased of order square root of N/T. On the other hand, the Arellano type GMM estimator is asymptotically biased of order square root of T/N if T/N goes to a nonzero constant, even we restrict the number of instruments used. Monte Carlo studies show that whether an estimator is asymptotically biased or not has important implications on the actual size of conventional t-test. The second part of this dissertation is on the issue of least square model selection and model averaging. By developing a new Whittle's inequality, we generalize the asymptotic loss efficiency results for Mallows criterion in different model selection and model averaging settings to the case when the error terms are autocorrelated. In particular, we show that the optimality results are still true in the model selection studied by Li (1987), the model averaging studied by Hansen (2007), and the model averaging studied by Wan, et al. (2010) in the time series framework with autocorrelated errors.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Essays on estimation and inference for heterogeneous panel data models with large n and short T
PDF
Large N, T asymptotic analysis of panel data models with incidental parameters
PDF
Essays on econometrics analysis of panel data models
PDF
Essays on the econometric analysis of cross-sectional dependence
PDF
Essays on family planning policies
PDF
Essays in panel data analysis
PDF
Essays on high-dimensional econometric models
PDF
Panel data forecasting and application to epidemic disease
PDF
Essays on development and health economics: social media and education policy
PDF
Model selection principles and false discovery rate control
PDF
A structural econometric analysis of network and social interaction models
PDF
Essays on health and aging with focus on the spillover of human capital
PDF
Essays in political economy and mechanism design
PDF
Essays on nonparametric and finite-sample econometrics
PDF
The impact of trade liberlaization on firm performance in developing countries -- new evidence from Pakistani manufacturing sector
PDF
Large-scale inference in multiple Gaussian graphical models
PDF
Linear filtering and estimation in conditionally Gaussian multi-channel models
PDF
Essays on treatment effect and policy learning
PDF
Essays on price determinants in the Los Angeles housing market
PDF
Essays on economics of education and private tutoring
Asset Metadata
Creator
Zhang, Junwei
(author)
Core Title
Two essays in econometrics: large N T properties of IV, GMM, MLE and least square model selection/averaging
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Economics
Publication Date
08/06/2013
Defense Date
06/20/2013
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
asymptotic bias,asymptotic loss efficiency,GMM,iv,Mallows criterion,MLE,OAI-PMH Harvest
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Hsiao, Cheng (
committee chair
), Ferson, Wayne (
committee member
), Nugent, Jeffrey B. (
committee member
), Ridder, Geert (
committee member
)
Creator Email
junweizh@usc.edu,junweizhg@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-318937
Unique identifier
UC11294699
Identifier
etd-ZhangJunwe-1982.pdf (filename),usctheses-c3-318937 (legacy record id)
Legacy Identifier
etd-ZhangJunwe-1982.pdf
Dmrecord
318937
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Zhang, Junwei
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
asymptotic bias
asymptotic loss efficiency
GMM
Mallows criterion
MLE