Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Three essays on the statistical inference of dynamic panel models
(USC Thesis Other)
Three essays on the statistical inference of dynamic panel models
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
T h r e e e s s a y s o n t h e s t a t i s t i c a l i n f e r e n c e o f d y n a m i c p a n e l m o d e l s by Q i an k u n Z h ou A D i s s e r t a t i on P r e s e nt e d t o t he F a c ul t y of t he G r a dua t e S c hool U ni ve r s i t y of S out he r n C a l i f or ni a i n pa r t i a l f ul f i l l m e nt of t he r e qui r e m e nt s f or t he de gr e e of D oc t or of P hi l os ophy ( E c on om i c s ) A ugus t 2015 C opy r i ght 2015 Q i a nkun Z hou t o m y w i f e and m y par e nt s Acknowledgements In the August 1st 2010, that was my day to come to USA. Now it’s approaching the end of March 2015, it is almost five years since I am here at the University of Southern California (USC). During my stay at USC, I struggled myself with the courseworks, the term papers and tedious mathematical derivation, fortunately, not only I survived, I also succeeded my PhD studies. I have to admit that this is my best five years ever in my life since I tried my best to pave the road which leads to my future career. Before celebrating the completion of my PhD degree, I would like to share my deep gratitude to my parents, Mr Guangbing Zhou and Mrs Fengqun Xu. As one of the first generation of college students in my whole family and the first one to come to USA for PhD degree, my parents have provided tremendous support to me in the past 20 years. Without their selfless dedication and deep love, I can’t imagine I could finish my PhD degree at USC. I still remember the time when my father earns only $100 a month in the year 2000, but he still encourage me to attend college and paid my tuition by personal loan. We Chinese have a saying that the sheep has the feeling of filial piety and the crow has the righteousness of nurturing to parents, I will always bear in mind of the selfless dedication of my parents. My second appreciation goes to my deeply loved wife, Mrs Jia Xie. During my studies at USC, she has shown great encouragement and endless help. Whenever I encounter difficulties in my research, she always encourage me not to give up. Whenever I am very hesitant to do things, she always encourage me to try. Whenever I fail, she always encourage me to stand up. It is my best luck to have such a considerate and supportive wife. My third thanks goes to my advisors, Dr. Cheng Hsiao, Dr. M.Hashem Pesaran and Dr. Roger Moon. I would like to appreciate for their continuous guidance and advice. They are such great iv advisors! I have benefited greatly from them not only on conducting academic researches, but also on understanding the value of devotion of teaching and monitoring students. I would also like to thank Dr. Yingying Fan for her kindness to be the outside member of my dissertation committee. All faculties and staffs from the economics department are deeply appreciated for their help and support for my studies at USC. My last thanks goes to my friends from China and from USC. I didn’t list the name here because I am afraid there are too many to list. I would like to thank them for their support and encouragement during my studies at USC. It is my pleasure and I am very honored to have such a wonderful group of friends. They all are valuable asset in my life. v Abstract Dynamic panel models has very wide economic application in labor economics, health economics as well as development economics. However, there are unique features for dynamic panel models: (i) the presence of time-invariant individual specific effects raises the issue of incidental parameters, be the specific effects are considered random or fixed; (ii) the formulation of initial observations; and (iii) the multi-dimensional nature of panel data. For the first one, certain transformation has to be used to remove the individual effects. After transformation, method of moments estimation can be applied to estimate the lag coefficient. However, it is shown in the literature that the generalized method of moments (GMM) estimator is asymptotically biased. Since the reliability of statistical inference depends critically on whether the estimator is asymptotically unbiased or not, if we want to make valid inference, then we need to obtain unbiased estimator in the first hand. In this article, we propose two approaches which will lead to unbiased estimator for dynamic panel models. The first approach is the maximum likelihood estimation (MLE) for dynamic panel models with serial uncorrelated errors, and the second approach is the jackknife instrumental variables estimation (JIVE) for dynamic panel models with serial correlated errors. It is shown in this article that both the MLE and JIVE estimators are asymptotically unbiased and asymptotically normally distributed. Monte Carlo simulations are conducted to examine the finite sample properties of MLE and JIVE, and simulation findings confirm the our findings in this article. vi Table of Contents Acknowledgements......................................................................................................................................................................................iv Abstract.................................................................................................................................................................................................................vi Chapter 1 Introduction.......................................................................................................................................................................1 Chapter 2 Elimination of individual effects in dynamic panel models and method of moments estimation....................................................................................................................................................5 2.1 Elimination of individual effects.....................................................................................................................................5 2.2 GMM estimation.........................................................................................................................................................................6 2.3 Asymptotics of GMM using all lags as instruments........................................................................................8 2.3.1 GMM based on forward demeaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3.2 Arellano-Bond GMM based on first difference using all available instruments . . . . 9 2.4 Asymptotics of GMM using one lag as instrument.......................................................................................10 2.4.1 GMM based on FOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4.2 Arellano-Bond GMM based on first difference using one lag as instrument . . . . . . 12 2.4.3 Simple IV estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.5 Monte Carlo Simulation.....................................................................................................................................................15 2.6 Conclusion....................................................................................................................................................................................17 Chapter 3 Maximum likelihood estimation of dynamic panel models with serial uncorrelated errors....................................................................................................................................................24 3.1 MLE for simple dynamic panel models .................................................................................................................24 3.1.1 MLE and its asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.1.2 Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.1.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2 Extension to panel dynamic simultaneous equation models...................................................................31 3.2.1 MLE for panel dynamic simultaneous equation models . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2.2 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Chapter 4 Jackknife Instrumental variables estimation (JIVE) for dynamic panel models with serial correlated errors ............................................................................................................41 4.1 Models for Dynamic Panel Regression with serial correlated errors...............................................42 4.1.1 Models and Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.1.2 2SLS and its asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.1.3 JIVE and its Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.2 Generalization ............................................................................................................................................................................55 4.2.1 FOD and FD Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.2.2 2SLS estimation and its asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.2.3 JIVE and its asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.3 Monte Carlo Simulation.....................................................................................................................................................59 4.4 Conclusion....................................................................................................................................................................................62 vii Chapter 5 Summary..........................................................................................................................................................................63 References.........................................................................................................................................................................................................64 Appendix: A Appendix: Mathematical proofs.............................................................................................................70 A.1 Mathematical proofs for Chapter 2 ............................................................................................................................70 A.1.1 Asymptotics of GMM based on FOD using all first differenced lags. . . . . . . . . . . . . 70 A.1.2 Asymptotics of GMM based on FOD using one level lag . . . . . . . . . . . . . . . . . . . . . . . 78 A.1.3 Asymptotics of GMM based on FOD using one first differenced lag . . . . . . . . . . . . 83 A.1.4 Asymptotic bias of Arellano-Bond GMM using one level lag as instrument. . . . . . 84 A.2 Mathematical proofs for Chapter 3 ............................................................................................................................86 A.2.1 Mathematical proofs for Chapter 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 A.2.2 Mathematical proofs for Chapter 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 A.3 Mathematical proofs for Chapter 4 ............................................................................................................................92 A.3.1 Proof of univariate model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 A.3.2 Proof of Multivariate model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 viii Chapter 1 Introduction Dynamic panel models has a wide range application in economics such as Euler equations for household consumption and empirical models of economic growth (Bond (2002)). As noted by Nerlove (2000) that "all interesting economic behaviors is inherently dynamic, dynamic models are the only relevant models". However, the presence of individual specific effects in dynamic panel models creates correlation between the unobserved individual effects and all current and past realized endogenous variables. Transformations such as first difference and forward difference can be used to eliminate the individual effects. After transformation, the unique feature of dynamic panel models is that all lagged variables are valid instruments for the differenced model. Thus, instrumental variables (IV) estimation (Anderson and Hsiao (1981, 1982)) and generalized method of moments (GMM) estimation (Arellano and Bond (1991) and Alvarez and Arellano (2003)) can be used to estimate the lag coefficient. However, as shown by Alvarez and Arellano (2003) that the GMM estimator is asymptotically biased. Since the reliability of statistical inference depends critically on whether an estimator is asymptotically unbiased or not, then the inference made by the biased estimator might be problematic. In order to have an unbiased estimator for dynamic panel models, in this article, we propose two approaches which will lead to unbiased estimator for dynamic panel models. The first approach is the maximum likelihood estimation (MLE) for dynamic panel models with serial uncorrelated errors, and the second approach is the jackknife instrumental variables estimation (JIVE) for dynamic panel models with serial correlated errors. It is shown in this article that both the MLE and JIVE estimators are asymptotically unbiased and asymptotically normally distributed. The second chapter, which is based on the work of Zhou and Hsiao (2014), discusses different transformations such as first difference (FD) (Anderson and Hsiao (1981, 1982)) and forward 1 demeaning (FOD) (Arellano and Bover (1995), Alvarez and Arellano (2003)) to remove individual effects, and studies the associated method of moments estimations such as generalized method of moments (GMM) estimation as well the simple instrumental variables (IV) estimation: In the case whenT is fixed andN tends to infinity, any of these transformations yields moment estimator of the lag coefficient in dynamic panel models that is asymptotically normally distributed that is centered at the true value of the lag coefficient: However, ifN andT are of similar magnitude, as shown by Alvarez and Arellano (2003) that the Arellano-Bond (1991) type generalized method of moments estimator (GMM) is asymptotically biased. In this chapter, we show the asymptotic properties of the method of moments estimator that based on first difference (FD) or forward demeaning (FOD) can be different. We show that when all available instruments are used, the two differencing methods of the Arellano-Bond type GMM is biased of order p c wherec = T N <1 as (N;T )!1: However, if only a fixed number of instruments are used, the Arellano-Bond type GMM based on FD remains asymptotically biased of order p c; while the Arellano-Bond GMM type GMM based on FOD is asymptotically unbiased evenc6= 0 as (N;T )!1: We also explore the trade-off (bias, efficiency of the estimators, distortion in the size of the test) between the GMM using all available instruments and only a fixed number of instruments such as the Anderson-Hsiao simple instrumental variables estimation. Monte Carlo simulations confirm our theoretical findings in this chapter. The third chapter, which is based on the work of Hsiao and Zhou (2015a, 2015b), discusses the maximum likelihood estimation (MLE) for simple dynamic panel data models as well panel dynamic simultaneous equations models with serial uncorrelated errors. Moreover, we propose to use the long difference (Grassetti (2011) and Hahn et al (2007)), which subtracts the initial observation from every observations, to remove the individual effects. The reason of using long difference transformation instead of other transformation such as first difference or forward 2 difference is because the long difference transforms the fixed effects models into a random effects model, and it is well known that the variance-covariance matrix of the random effects model is easy to calculate, which in turn simplifies the calculation and estimation of the MLE since MLE requires the knowledge of the variance-covariance matrix. For the MLE based on long difference, it is shown in this chapter that the MLE is unbiased and asymptotically normally distributed as long asN is large orT is large or bothN andT are large. It is also shown in the chapter that the MLE using long difference can be easily extended to panel simultaneous equations models. Monte Carlo simulations are conducted to compare the finite sample properties of the MLE, IV and GMM estimators for dynamic panel models. The fourth chapter, which is based on the work of Lee et al (2014), considers the estimation of dynamic panel models with serial correlated errors. For illustration of the existence of serial correlated errors in dynamic panel models, we consider linear dynamic panel models with measurement errors. This is because the observed data obeys a dynamic panel models with serial correlated errors. For this type of models, we consider the two-stage least squares (2SLS) estimation of the lag coefficients. It is shown in this chapter that in our framework, the two 2SLS estimators (one based on the FOD transformation and the other one based on the FD transformation) suffer from the bias due to many IVs whenT is large. For the asymptotics, we use the alternative asymptotics that allowsN;T!1 with T 3 N ! . In order to have unbiased estimator for dynamic panel models with serial correlated errors, we also investigate how to reduce (or correct) the asymptotic bias of the two 2SLS estimators due to many IVs. For this purpose, we consider the jackknife instrumental variable estimation (JIVE) of Phillips and Hale (1977) and Angrist, Imbens, and Kreuger (1999). In existing literature, the JIVE has been studied in general IV or GMM frameworks (e.g., Angrist, Imbens, and Kreuger (1999), Chao et al (2012), and Hansen and Kozbur (2014)), but not in estimating a dynamic panel regression with IVs as far 3 as we know. As the main theoretical contribution, we show that under the alternative asymptotics whereN;T!1 with T 3 N ! , the JIVE is asymptotically normal without an asymptotic bias due to many IVs. that the 2SLS estimator is asymptotically biased, and the bias could be quite significant if time dimensionT is large. Finite sample properties of 2SLS and the associated JIVE are examined by Monte Carlo simulations. A summary view is provided in the last chapter and all mathematical proofs for these chapters are provided in the appendix. 4 Chapter 2 Elimination of individual effects in dynamic panel models and method of moments estimation In this chapter 1 , we will consider the methods to eliminate the individual effects in dynamic panel models, and the method of moment estimation for the transformed model. As noted earlier, the presence of individual effects in dynamic panel models creates the problem for estimation. Certain transformations need to be used to remove the individual effects. 2.1 Elimination of individual effects To begin with, let’s consider the simple dynamic panel y it = i + y i;t1 +u it ; i = 1;:::;N;t = 1;:::;T; (2.1) For ease of notation, we assume thaty i0 are observable. We assume that Assumption 2.1: 0<j j< 1: Assumption 2.2:u it IID (0; 2 u ) has finite fourth moments: Assumption 2.3: i IID (0; 2 ) has finite fourth moments and is independently distributed ofu it : Remark 1 The homoscedasticity assumption ofu it is only for the ease of deriving the exact as- ymptotic bias, the main result can be easily extended to incorporate the heteroscedasticity case. By continuous substitution, y i0 = i 1 +" i0 ; (2.2) where" i0 IID (0; 2 " ) acrossi with finite fourth moment if the process starts from the past has reached stationarity at time zero. 1 This chapter is based on the work of Zhou and Hsiao (2014). 5 Stacking theith individual’sT time series equations in vector form yields y i = y i;1 + 1 T i + u i ; i = 1;:::;N; (2.3) where y i = (y i1 ;y i2 ;:::;y iT ) 0 ; y i;1 = (y i0 ;y i2 ;:::;y iT1 ) 0 ; u i = (u i1 ;u i2 ;:::;u iT ) 0 ; and 1 T is aT 1 vector of (1; 1;:::; 1) 0 : Let A be a (T 1)T matrix with rankT 1 that satisfies A1 T = 0; then Ay i = Ay i;1 + Au i ; i = 1;:::;N: (2.4) In general, there are two approaches to remove the individual effects i for model (2.1), first difference (e.g., Anderson and Hsiao (1981, 1982)) and forward difference (Alvarez and Arellano (2003) and Arellano and Bover (1985)). For the first difference, by defining y it =y it y i;t1 ; then y it = y i;t1 + u it ; i = 1;:::;N;t = 2;:::;T: The forward demeaning of model (2.1) is given by y (f) it = y (f) i;t1 +u (f) it ; i = 1;:::;T ;t = 2;:::;T 1; (2.5) wherey (f) it =c t y it 1 Tt P T s=t+1 y is andy (f) it1 =c t y it1 1 Tt P T1 s=t y is withc 2 t = Tt Tt+1 : 2.2 GMM estimation For the first differenced model or the forward differenced model, if we want to apply the method of moments estimation, we need to find the moment condition for these transformed model. Let W i beM (T 1) matrix that satisfies E (W i Au i ) = 0: (2.6) The Arellano-Bond (1991) type GMM is to find ^ that minimizes the quadratic form 1 N N X i=1 W i Au i ! 0 1 N 2 N X i=1 W i Au i u 0 i A 0 W 0 i ! 1 1 N N X i=1 W i Au i ! : (2.7) 6 The first differencing uses the transformation matrix A of the form A = 0 B B @ 1 1 0 ::: 0 0 1 1 ::: 0 . . . . . . . . . . . . . . . 0 ::: 0 1 1 1 C C A ; (2.8) and W i =diag (q i2 ;:::; q iT ) where q it = (y i0 ;:::;y i;t2 ) 0 fort 2: The Arellano-Bover (1995) forward demeaning uses A of the form A = 0 B B @ c 1 0 0 ::: 0 c 2 0 ::: . . . . . . . . . . . . 0 ::: 0 c T1 1 C C A 0 B B B @ 1 1 T1 1 T1 ::: 1 T1 0 1 1 T2 ::: 1 T2 . . . . . . . . . . . . . . . 0 ::: 0 1 1 1 C C C A ; (2.9) where c 2 t = Tt Tt+1 for t = 1;:::;T 1; and W i = diag (q i1 ;:::; q iT ) where q it = (y i0 ;:::;y i;t1 ) 0 for t 1: The advantage of Arellano-Bover (1995) forward demeaning transformation matrix yields the error term u i = Au i that isiid; E (u i u 0 i ) = 2 u I T1 : (2.10) More specifically, for the first difference transformation of model (2.1), we have y it = y i;t1 + u it ; (2.11) and the Arellano-Bond type GMM based on (2.11) using all available level instruments is given by (Arellano (2003), Arellano and Bond (1991), Hsiao (2003) and Hsiao and Zhang (2013)) ^ AB GMM;level = 2 4 N X i=1 W i y i;1 ! 0 N X i=1 W i HW 0 i ! 1 N X i=1 W i y i;1 ! 3 5 1 2 4 N X i=1 W i y i;1 ! 0 N X i=1 W i HW 0 i ! 1 N X i=1 W i y i ! 3 5 ; (2.12) where y i = (y i2 ;:::;y iT ) 0 ; y i;1 = (y i1 ;:::;y iT1 ) 0 ; and W i T(T1) 2 (T1) = 0 B B @ q i2 0 0 0 q i3 0 . . . . . . . . . . . . 0 0 q iT 1 C C A ; (2.13) with q it = (y i0 ;:::;y it2 ) 0 being the vector of all available instruments sinceE (q it u it ) = 0 for 7 t = 2;:::;T: H is a (T 1) (T 1) symmetric matrix H (T1)(T1) = 0 B B B B B @ 2 1 0 0 1 2 1 0 0 . . . . . . . . . . . . . . . . . . 1 2 1 0 0 1 2 1 C C C C C A : (2.14) There is no loss of generality in deriving the asymptotic properties of GMM usingL (L 1) lags are used as instruments, by lettingL = 1 for ease of exposition; then we can replace W i of (2.13) by W 1L i =diag (y i0 ;:::;y iT2 ): The GMM estimation of based on forward demeaning is given by (for example, Alvarez and Arellano (2003)) ^ FOD GMM;level = T1 X t=1 y (f)0 t1 P t1 y (f) t1 ! 1 T1 X t=1 y (f)0 t1 P t1 y (f) t ; (2.15) where y (f) t = y f 1t ;:::;y f Nt 0 ; P t1 = Z t1 Z 0 t1 Z t1 1 Z 0 t1 with Z t1 = (y 0 ;:::; y t1 ) is theNt matrix of instruments where y t = (y 1t ;:::;y Nt ) 0 . When one level lag is used as instruments, we have Z t1 = y t1 : 2.3 Asymptotics of GMM using all lags as instruments 2.3.1 GMM based on forward demeaning For the GMM estimator (2.15) using all available level lags, i.e., Z t1 = (y 0 ;:::; y t1 ), we notice that p NT ^ FOD GMM;level = 1 NT T1 X t=1 y (f)0 t1 P t1 y (f) t1 ! 1 1 p NT T1 X t=1 y (f)0 t1 P t1 u (f) t ; and it is shown by Alvarez and Arellano (2003) that Theorem 1 For model (2.1) and the GMM estimator (2.15), under assumptions 2.1-2.3 and as (N;T )!1; we have E h p NT ^ FOD GMM;level i = (1 + ) p c +o (1); (2.16) wherec = lim (N;T )!1 T N ; and p NT ^ FOD GMM;level 1 N (1 + ) ! d N 0; 1 2 : (2.17) The proof is provided in Alvarez and Arellano (2003). 8 Alternatively, for model (2.5), we have E y i;s u (f) it = 0 fors<t; and we can use first differenced lags as instruments. Then, we can rewrite the GMM estimator as ^ FOD GMM;FD = T1 X t=1 y (f)0 t1 P t1 y (f) t1 ! 1 T1 X t=1 y (f)0 t1 P t1 y (f) t ; (2.18) where y (f) t = y f 1t ;:::;y f Nt 0 ; P t1 = Z t1 Z 0 t1 Z t1 1 Z 0 t1 with Z t1 = (y 1 ;:::; y t1 ) is theN (t 1) matrix of instruments where y t = (y 1t ;:::; y Nt ) 0 . For the GMM estimator (2.18) based on FOD using all first differenced lags, we have Lemma 2 For model (2.1) and the GMM estimator (2.18) using all first differenced lags, under assumptions 2.1-2.3 and asN!1 followed byT!1; we have E h p NT ^ FOD GMM;FD i = (1 + ) p c +o (1); (2.19) wherec = lim (N;T )!1 T N ; and p NT ^ FOD GMM;FD 1 N (1 + ) ! d N 0; 1 2 : (2.20) 2.3.2 Arellano-Bond GMM based on first difference using all available instruments For the Arellano-Bond GMM using all available level instruments, i.e., W i is given by (2.13), under the assumption of homoscedasticity and cross-sectionally independence, we have p NT ^ AB GMM;level 1 N b level ! d N 0; AB GMM;level ; as (N;T )!1; whereb level is some constant and AB GMM;level can be approximated by (e.g., Hsiao (2003)) 2 u lim (N;T )!1 2 4 1 NT N X i=1 W i y i;1 ! 0 1 NT N X i=1 W i HW 0 i ! 1 1 NT N X i=1 W i y i;1 ! 3 5 1 ; (2.21) where H is given by (2.14). Similarly, if all first differenced lags are used as instruments, then, we have can replace W i by W i in (2.12), where W i =diag q i2 ;:::; q iT with q it = (y i1 ;:::; y i;t2 ) 0 fort 3; which 9 we shall denote it as ^ AB GMM;FD ; and we have p NT ^ AB GMM;FD 1 N b FD ! d N 0; AB GMM;FD ; as (N;T )!1; whereb FD is some constant and AB GMM;FD can be approximated by (e.g., Hsiao (2003)) 2 u lim (N;T )!1 2 4 1 NT N X i=1 W i y i;1 ! 0 1 NT N X i=1 W i HW 0 i ! 1 1 NT N X i=1 W i y i;1 ! 3 5 1 : (2.22) For the asymptotic bias of the Arellano-Bond type GMM using all level lags, our simulations suggests that the asymptotic bias is of order q T N : Remark 2 It should be noted that it would be difficult to compare the variance of the Arellano- Bond type GMM (2.21) or (2.22) with the variance of GMM based on forward demeaning (2.17) using all lags as instruments, however, as pointed out by Blundell et al (2001) that the Arellano- Bond type GMM is asymptotically efficient in the class of estimators based on the conditions (2.6). Also, as shown by Alvarez and Arellano (2003) that the GMM based on forward demeaning (2.17) reaches the asymptotical efficiency, then we can claim that the Arellano-Bond type GMM is equiv- alent to GMM based on forward demeaning as long as T N < 0:5: As shown in our simulation, we find that the Arellano-Bond type GMM (2.12) is very close to the GMM based on forward demean- ing (2.12) terms of estimates, RMSE and iqr (inter quantile range). It should also be noted that the the Arellano-Bond type GMM is computationally much more extensive than the GMM based on FOD especially whenT is large, this is because the Arellano-Bond type GMM requires the inverse of a matrix of dimensionO (T 2 ); while for the GMM based on FOD, it only requires the inverse of a matrix of dimensionO (T ) at most. 2.4 Asymptotics of GMM using one lag as instrument In the above section, we discuss the asymptotic bias of GMM estimation using all lags as instruments. For dynamic model, we can also use fixed number of lags as instruments for GMM estimation, e.g., Anderson and Hsiao (1981, 1982). In this section, we assume that one lagged variable is used as instruments, and we will study the asymptotics of GMM using one lag as instrument.. 2.4.1 GMM based on FOD For GMM estimation based on forward demeaning, if one lagged variables instead of all lags 10 are used in the GMM, i.e., ^ FOD GMM;1level = T1 X t=1 y (f)0 t1 P 1L t1 y (f) t1 ! 1 T1 X t=1 y (f)0 t1 P 1L t1 y (f) t ; (2.23) where y (f) t = y f 1t ;:::;y f Nt 0 ; P 1L t1 = y t1 y 0 t1 y t1 1 y 0 t1 with y t1 = (y 1t1 ;:::;y Nt1 ) 0 . By following the derivation in the appendix, we have E h p NT ^ FOD GMM;1level i =o p (1); as (N;T )!1; which means that ^ FOD GMM;1level based on forward demeaning using one level lag as instruments is asymptotically unbiased. For the limiting distribution of ^ FOD GMM;1level , we notice that p NT ^ FOD GMM;1level = 1 NT T1 X t=1 y (f)0 t1 P 1L t1 y (f) t1 ! 1 1 p NT T1 X t=1 y (f)0 t1 P 1L t1 u (f) t ; where u (f) t = u (f) 1t ;:::;u (f) Nt 0 : For the denominator, we have 1 NT T1 X t=1 y (f)0 t1 P 1L t1 y (f) t1 ! p 2 u 1 2 2 2 u 1 2 + 2 (1 ) 2 1 ; as (N;T )!1 by using the results of (A.A.9). For the numerator, we have 1 p NT T1 X t=1 y (f)0 t1 P 1L t1 y (f) t ! d 2 u 1 2 2 u 1 2 + 2 (1 ) 2 1 N 0; 2 u 2 u 1 2 + 2 (1 ) 2 ; as (N;T )!1 by using the results of (A.A.14). Lemma 3 For the GMM estimator based on FOD using only one level lag as instrument, under assumptions 2.1-2.3 and asN!1 followed byT!1; we have p NT ^ FOD GMM;1level ! d N 0; (1 2 ) 2 u + (1 + ) 2 2 2 u ! : (2.24) Similarly, if first differenced lags are used as instruments, the FOD GMM estimator is asymptotically unbiased. For the limiting distribution, we have ^ FOD GMM;1FD = T1 X t=1 y (f)0 t1 P 1FD t1 y (f) t1 ! 1 T1 X t=1 y (f)0 t1 P 1FD t1 y (f) t ; (2.25) 11 where y (f) t = y f 1t ;:::;y f Nt 0 ; P 1FD t1 = y t1 y 0 t1 y t1 1 y 0 t1 with y t1 = (y 1t1 ;:::; y Nt1 ) 0 . Following the derivation in the appendix, we have 1 NT T1 X t=1 y (f)0 t1 P 1FD t y (f) t1 ! p 2 u 2 (1 + ) ; as (N;T )!1 by using the results of (A.A.15). Also, we have 1 p NT T1 X t=1 y (f)0 t1 P 1FD t1 y (f) t ! d N 0; 2 u 2 2 u 1 + ; as (N;T )!1 by using the results of (A.A.16). Lemma 4 For the GMM estimator based on FOD using only one first differenced lag as instru- ment, under assumptions 2.1-2.3 and asN!1 followed byT!1; we have p NT ^ FOD GMM;1FD ! d N (0; 2 (1 + )): (2.26) Remark 3 Note that the variance of (2.24) can be rewritten as Var p NT ^ FOD GMM;1level = (1 2 ) 2 u + (1 + ) 2 2 2 u = (1 + ) (1 ) + (1 + ) 2 2 u = (1 + ) 1 + 2 2 u + 2 2 u 1 : Consequently, if 1 + 2 2 u + 2 2 u 1 > 2; or 2 2 u > 1; (2.27) then we have Var p NT ^ FOD GMM;1level >Var p NT ^ FOD GMM;1FD ; (2.28) As a result, when only one lag is used as instrument, then y it1 should be used if (2.27) holds. 2.4.2 Arellano-Bond GMM based on first difference using one lag as instrument For the Arellano-Bond GMM estimator (2.12) based on first difference and using fixed lags as instruments, we shall assume that only one lag is used as instruments as in Hsiao and Zhang 12 (2013), it can be shown that E h p NT ^ AB GMM;1L i = 1 p NT O (T ) =O p c ; when using one lag as instruments as in the GMM estimation. As a result, as long as T N ! 0, we will have E h p NT ^ AB GMM;1L i =o (1); which means that the Arellano-Bond type GMM is asymptotically unbiased when only one lag is used as instruments under the condition T N ! 0. For the Arellano-Bond type GMM when using one differenced lag as instruments, the asymptotic bias order remains at the order q T N (Hsiao and Zhang (2013)). For the limiting distribution of Arellano-Bond GMM when only using one lag as instruments, we have p NT ^ AB GMM;1L b 1L N ! d N 0; AB GMM;1L ; as (N;T )!1 whereb 1L is some certain constant and AB GMM;1L can be approximated by 2 u lim (N;T )!1 2 4 1 NT N X i=1 W 1L i y i;1 ! 0 1 NT N X i=1 W 1L i HW 1L0 i ! 1 1 NT N X i=1 W 1L i y i;1 ! 3 5 1 ; (2.29) where W 1L i = diag (y i0 ;:::;y iT2 ): Similarly, when only one fixed differenced lag is used as instrument, we have p NT ^ AB GMM;1FD b 1FD N ! d N 0; AB GMM;1FD ; as (N;T )!1 where b 1FD is given in Hsiao and Zhang (2013) and AB GMM;1FD can be approximated by 2 u lim (N;T )!1 2 4 1 NT N X i=1 W 1FD i y i;1 ! 0 1 NT N X i=1 W 1FD i HW 1FD0 i ! 1 1 NT N X i=1 W 1FD i y i;1 ! 3 5 1 ; (2.30) where W 1FD i =diag (y i1 ;:::; y iT2 ): 13 2.4.3 Simple IV estimation For model (2.11), we can consider the simple IV estimator as in Anderson and Hsiao (1981, 1982) as follows ^ IV;level = N X i=1 T X t=2 y i;t2 y i;t1 ! 1 N X i=1 T X t=2 y i;t2 y it ; (2.31) It is shown by Anderson and Hsiao (1981, 1982) that ^ IV is asymptotically unbiased eitherN or T or both tend to infinity. Moreover, we have Lemma 5 For the simple IV estimator (2.31) for model (3.1), under assumptions 2.1-2.3 and as (N;T )!1; we have p NT ^ IV;level ! d N (0; 2 (1 + )); (2.32) as (N;T )!1: The proof is provided in Phillips and Han (2014). Lemma (4) and Lemma (5) say that the GMM based on FOD when only one FD lag is used as instrument and the simple IV using one level lag as instrument yield asymptotically identical result. Alternatively, if first differenced lag is used as instrument, then we have ^ IV;FD = N X i=1 T X t=3 y i;t2 y i;t1 ! 1 N X i=1 T X t=3 y i;t2 y it ; (2.33) It is also shown by Hsiao and Zhang (2013) that ^ IV;FD is asymptotically unbiased eitherN orT or both tend to infinity. Moreover, we have Lemma 6 For the simple IV estimator (2.33) for model (3.1), under assumptions 2.1-2.3 and as (N;T )!1; we have p NT ^ IV;FD ! d N 0; 2 (1 + ) (3 ) (1 ) 2 : (2.34) The proof is provided in Phillips (2014). We note that from (2.32) and (2.34), Var p NT ^ IV;FD Var p NT ^ IV;level = 3 (1 ) 2 > 1; 14 sincej j< 1 under assumption 2.1. This means that the simple IV estimator based on level lag is always more efficient that the simple IV estimator based on first differenced lag. If reliability of statistical inference is a consideration, then one should use asymptotically unbiased estimator. The GMM based on FOD is asymptotically unbiased if a fixed number of instruments are used. However, in this case, GMM based on FOD using one FD variables as instrument yields identical asymptotic distribution as the Anderson-Hsiao simple IV using level variable as instrument. 2.5 Monte Carlo Simulation In this section, we investigate the finite sample properties for GMM and crude GMM estimation of for dynamic panel model. We consider the following data generating process y it = i + y it +u it ; we assume that i IIDN (0; 2 ) andu it IIDN (0; 1) for alli andt: For the values of ; we let 0 = 0:5. DGP 1. (Asymptotic bias) This DGP is set to verify the asymptotic bias of the GMM estimator based on FOD and FD. For simplicity, we let 2 = 1: To allow different convergence rate betweenT andN; we consider two cases: (1) T N = 0:4; (2) T 2 N = 0:4: We letT varies from 25, 50 and 75. It should be noted that for case (2), we have T N ! 0: DGP 2. (GMM based on FOD using all level or FD lags) This DGP is set to show the equivalence of GMM based on FOD using all level lags or FD lags. We set 2 = 1;N = 300; 400 andT = 50; 100. Note that theoretically GMM based on FOD using all lagged level lags is equivalent to GMM based on FOD using all lagged FD lags: DGP 3. (GMM based on FOD with individual effects variation) This DGP is set to compare the efficiency of GMM using one lag based on FOD with the 15 variation of 2 : We will set 2 = 1 and 2 = 5: We also letN = 100; 200 andT = 50; 100. Note that the variance of ^ FOD GMM;1FD is close to ^ FOD GMM;1level in the former case; but ^ FOD GMM;1FD should be more efficient than the ^ FOD GMM;1level in the latter case: DGP 4. (Simple IV efficiency comparison) This DGP is set to compare the efficiency of simple IV using one lag based on FD. For simplicity, we let 2 = 1; and letN = 100; 200 andT = 50; 100. Note that ^ IV;level should be more efficient than the ^ IV;FD : For DGP1, we consider five estimators: GMM based FOD using all lags and one lag as instruments, the Arellano-Bond type GMM based on FD using all lags and one lag as instruments, and the simple IV estimator based on FD using one lag. The simulation results are summarized in Table 2-Table 3, where Table 3 is the summary statistics for p NT (^ 0 ): The simulation results of DGP2 to DGP4 are summarized in Table 4-Table 6, respectively. For illustration, we also draw the empirical density of p NT (^ 0 ) of DGP1 to show the sensitivity of the pdf of p NT (^ 0 ) regards to the different convergence rate ofT andN: From Table 2-Table 3, we can find that the Arellano-Bond type GMM estimator using all lags is asymptotic bias ofO ( p c), because the sample mean in Table 2 of the Arellano-Bond type GMM estimator using all lags stay unchanged in case 1 with the increase ofT: Similarly, the Arellano-Bond type GMM estimator using one lag and GMM estimator based on FOD using all lags are also asymptotic biased of orderO ( p c): Clearly, when only one lag is used, the GMM estimator based on FOD and simple IV estimator are asymptotically unbiased and have correct size. Similar findings can also be found in Fig1-Fig4. From Fig1 and Fig4, it’s clear that that the Arellano-Bond type GMM and GMM based on FOD using all lags are asymptotic biased of order O ( p c); and GMM estimator based on FOD using one lag as instrument and simple IV estimator 16 are asymptotically unbiased. Similar findings can be found in the sample mean of p NT (^ 0 ) provided in Table 2. For the iqr (which is the 75%-th quantile25%-th quantile), it is obvious that the Arellano-Bond type GMM is more concentrated than the GMM based on forward demeaning and CIV estimator. From Table 4, we notice that for GMM based on FOD using all lags, either level variables or first differenced variables, they remain the same, which confirms our theoretical finding. From Table 5, we can observe that when 2 = 1; we have 2 = 2 u ; then GMM based on FOD using one lag variable, either level variable or first differenced variable, the RMSE of these two GMM estimators are very close to each other. However, when 2 = 5; we have 2 > 2 u ; then we find that the RMSE of GMM using one level lag is much larger than the GMM using one first differenced lag, which confirms our theoretical finding in section (2.4.1). Finally, for Table 6, it is obvious that the simple IV using one level variable is much more efficient that the simple IV using one first differenced lag variable, which confirms our theoretical finding in section (2.4.3). 2.6 Conclusion In this chapter, we consider different GMM estimators for dynamic panel model based either forward demeaning or first difference transformations to eliminate the individual-specific effects, using either all lags or one lag as instruments whenN andT are of similar magnitude. We show that the Arellano-Bond type GMM is asymptotic biased of order p c using all lags or one lag as instruments wherec = lim (N;T )!1 T N <1. For GMM based on FOD, we show that it is asymptotic biased of order p c when using all lags, but it is asymptotic unbiased when using only fixed number of lags as instruments. We also show that, for GMM based on FOD using one lag variable as instrument, using one FD lag is more efficient than using one level lag variable as instrument. On the other hand, for the simple IV estimator, we find that using one level variable is much more efficient that using one first differenced lag variable. Monte Carlo simulations confirm 17 the theoretical findings in this chapter. 18 Table 1: Sample mean, rmse, iqr and size of different estimators of of DGP1 ^ IV ^ FOD GMM ^ AB GMM T all lags one lag all lags one lag mean 0.5010 0.4669 0.4869 0.4681 0.4429 25 rmse 0.1028 0.0452 0.0578 0.0448 0.1070 iqr 0.1028 0.0419 0.0752 0.0430 0.1291 case1 size 4.9% 19% 5.3% 17% 9.6% mean 0.5026 0.4864 0.4986 0.4862 0.4758 50 rmse 0.0467 0.0189 0.0241 0.0191 0.0496 iqr 0.0636 0.0188 0.0328 0.0174 0.0561 size 4.9% 20% 3.8% 21% 9% mean 0.5001 0.4908 0.4984 0.4909 0.4793 75 rmse 0.0312 0.0124 0.0159 0.0124 0.0360 iqr 0.0412 0.0113 0.0199 0.0114 0.0388 size 5.3% 21% 5.2% 20% 11% mean 0.5011 0.4986 0.5002 0.4986 0.4982 25 rmse 0.0199 0.0066 0.0104 0.0067 0.0189 iqr 0.0199 0.0085 0.0146 0.0087 0.0255 case2 size 5.3% 5.3% 4.6% 5.4% 5.2% mean 0.4998 0.4997 0.4998 0.4997 0.4993 50 rmse 0.0068 0.0020 0.0035 0.0020 0.0068 iqr 0.0090 0.0025 0.0046 0.0027 0.0086 size 5.2% 6.2% 5.9% 6% 5.2% mean 0.5003 0.4998 0.4999 0.4999 0.4999 75 rmse 0.0035 0.0010 0.0019 0.0010 0.0033 iqr 0.0049 0.0014 0.0027 0.0014 0.0045 size 4.5% 5.7% 4.8% 4% 4.8% Note: 1. ^ IV refers to simple instrumental variable estimation, ^ FOD GMM refers to GMM based on forward demeaning, ^ AB GMM refers to Arellano-Bond type GMM based on first difference; 2. all lags refers to GMM estimators using all lags, one lag refers to GMM estimators using one lags; 3. iqr refers inter quantile range (75%-25%). 19 Table 2: Sample mean and iqr of different estimators of p NT (^ ) of DGP1 ^ IV ^ FOD GMM ^ AB GMM T all lags one lag all lags one lag 25 mean 0.0405 -1.3132 -0.5189 -1.2676 -2.2667 case1 iqr 5.4301 1.6643 2.9835 1.7071 5.1224 50 mean 0.2073 -1.0803 -0.1074 -1.9014 -1.9195 iqr 5.0517 1.4922 2.6004 1.3837 4.4546 75 mean 0.0131 -1.0925 -0.1945 -1.0805 -2.4553 iqr 4.8948 1.3420 2.3615 1.3582 4.6073 25 mean 0.2082 -0.2712 0.0372 -0.2727 -0.3646 case2 iqr 5.2767 1.6736 2.8854 1.7143 5.0492 50 mean -0.1245 -0.1813 -0.1123 -0.1741 -0.3703 iqr 5.0437 1.3878 2.5949 1.5080 4.8109 75 mean 0.2588 -0.1792 -0.1010 -0.1491 -0.1132 iqr 5.0255 1.4124 2.7557 1.4659 4.6043 Note: 1. ^ IV refers to simple instrumental variable estimation, ^ FOD GMM refers to GMM based on forward demeaning, ^ AB GMM refers to Arellano-Bond type GMM based on first difference; 2. all lags refers to GMM estimators using all lags, one lag refers to GMM estimators using one lags; 3. iqr refers inter quantile range (75%-25%). Table 3: Sample mean, iqr and rmse of GMM based on FOD using all lags of DGP2 N 300 500 T level FD level FD mean 0.4934 0.4934 0.4961 0.4961 50 rmse 0.0108 0.0109 0.0078 0.0078 iqr 0.0112 0.0112 0.0092 0.0092 mean 0.4942 0.4942 0.4965 0.4965 100 rmse 0.0081 0.0082 0.0057 0.0057 iqr 0.0076 0.0077 0.0058 0.0057 Note: 1. "level" refers to using level lag as instrument, "FD" refers to using first differenced lag as instrument,; 2. iqr refers inter quantile range (75%-25%). 20 Table 4: Sample mean, iqr and rmse of GMM estimators of DGP3 N 100 200 2 = 1 2 = 5 2 = 1 2 = 5 T level FD level FD level FD level FD mean 0.4952 0.5000 0.4848 0.5000 0.4976 0.4991 0.4921 0.4991 50 rmse 0.0300 0.0270 0.0591 0.0270 0.0200 0.0194 0.0396 0.0194 iqr 0.0402 0.0357 0.0735 0.0357 0.0265 0.0266 0.0526 0.0266 mean 0.4983 0.4996 0.4941 0.4966 0.4989 0.4997 0.4964 0.4997 100 rmse 0.0189 0.0186 0.0364 0.0186 0.0134 0.0130 0.0263 0.0130 iqr 0.0253 0.0254 0.0455 00254 0.0180 0.0173 0.0346 0.0173 Note: 1. The estimators are GMM based on FOD using only one lag as instrument. 2. "level" refers to using level lag as instrument, "FD" refers to using first differenced lag as instrument,; 3. iqr refers inter quantile range (75%-25%). Table 5: Sample mean, iqr and rmse of simple IV estimators of DGP4 N 100 200 T level FD level FD mean 0.5006 0.5064 0.5004 0.5018 50 rmse 0.0262 0.0797 0.0183 0.0555 iqr 0.0351 0.1075 0.0252 0.0776 mean 0.5007 0.5025 0.4999 0.4998 100 rmse 0.0182 0.0561 0.0127 0.0390 iqr 0.0240 0.0741 0.0162 0.0528 Note: 1. "level" refers to using level lag as instrument, "FD" refers to using first differenced lag as instrument,; 2. iqr refers inter quantile range (75%-25%). 21 Fig1. Empirical density of p NT (^ 0 ) GMM using all lags as instruments for case1 8 6 4 2 0 2 4 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 GMMAB T=25 GMMAB T=50 GMMAB T=75 GMMFOD T=25 GMMFOD T=50 GMMFOD T=75 x=0 Fig2. Empirical density of p NT (^ 0 ) GMM and IV using one lag as instrument for case1 20 15 10 5 0 5 10 15 20 0 0.05 0.1 0.15 0.2 0.25 GMMAB T=25 GMMAB T=50 GMMAB T=75 GMMFOD T=25 GMMFOD T=50 GMMFOD T=75 IV T=25 IV T=50 IV T=75 x=0 22 Fig3. Empirical density of p NT (^ 0 ) GMM using all lags as instruments for case2 6 4 2 0 2 4 6 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 GMMAB T=25 GMMAB T=50 GMMAB T=75 GMMFOD T=25 GMMFOD T=50 GMMFOD T=75 x=0 Fig4. Empirical density of p NT (^ 0 ) GMM and IV using one lag as instrument for case2 20 15 10 5 0 5 10 15 20 0 0.05 0.1 0.15 0.2 GMMAB T=25 GMMAB T=50 GMMAB T=75 GMMFOD T=25 GMMFOD T=50 GMMFOD T=75 IV T=25 IV T=50 IV T=75 x=0 23 Chapter 3 Maximum likelihood estimation of dynamic panel models with serial uncorrelated errors The unique feature of dynamic panel models is that certain transformation has to be used to remove the individual effects. As discussed in the above chapter, after transformation, GMM estimation can be applied to estimate the lag coefficient. As shown by Alvarez and Arellano (2003) that the GMM estimator is asymptotically biased. Since the reliability of statistical inference depends critically on whether an estimator is asymptotically unbiased or not, then the inference made by the biased estimator might be problematic. In order to have an unbiased estimator for dynamic panel models, in this chapter, we propose to use the maximum likelihood estimation (MLE) for simple dynamic panel models as well as panel simultaneous equations models. Moreover, we propose to use the long difference, which subtracts the initial observation from every observations, to remove the individual effects. For the MLE based on long difference, it is shown in this chapter that the MLE is unbiased and asymptotically normally distributed. Monte Carlo simulations are conducted to compare the finite sample properties of the MLE, IV and GMM estimators for dynamic panel models. We demonstrate that the reliability of statistical inference depends critically on whether an estimator is asymptotically unbiased or not. 3.1 MLE for simple dynamic panel models In this section 2 , we will consider the MLE for simple dynamic panel models. Extension to panel dynamic simultaneous equations models is discussed in the next section. Consider the simple dynamic panel y it = i + y i;t1 +u it ; (3.1) we assume that Assumption 3.1: 0< < 1: 2 This section is based on the work of Hsiao and Zhou (2015b). 24 Assumption 3.2: i IID (0; 2 ) has finite fourth moment and is independently distributed ofu it : Assumption 3.3:u it IID (0; 2 u ) has finite fourth moment: Remark 4 The homoskedasticity assumption ofu it is only for the purpose of deriving the exact asymptotic bias, and the main result can be easily extended to incorporate the heteroskedasticity case. For the model (3.1), it is well known that the presence of individual-specific effects i creates challenge for estimation of (for example, see Hsiao (2003)). Different approaches have been applied to eliminate the individual-specific effects, and several types of GMM estimators have been proposed in the literature. Alternatively, for model (3.1), we can consider the quasi-maximum likelihood estimation. In order to implement the maximum likelihood estimation procedure, we can consider the first difference to eliminate the individual-specific effects, however, it is obvious that the first differenced model is undefined fort = 2 sincey i0 is unspecified: Also, as can be seen from (Hsiao and Zhang (2013)), the inverse of the variance-covariance matrix for the first differenced model is quite complicated, thus it is unsuitable to use maximum likelihood based on first difference. Another way of eliminating individual effects is the so-called long difference (Grassetti (2011) and Hahn et al (2007)), which subtract the initial observation from each observation. For the initial observation, we notice that y i0 = i + y i;1 +u i0 ; which is equivalent to (1 L)y i0 = i +u i0 ; then we have (1 )y i0 = i + (1 ) (1 L) 1 u i0 ; (3.2) by defining i = (1 ) (1 L) 1 u i0 and we can also note that i is independent of i andu it 25 fort> 0 by construction. By subtracting (3.2) from (3.1), we obtain y it (1 )y i0 = i + y i;t1 i i +u it ; which is equivalent to y it y i0 = (y i;t1 y i0 ) i +u it ; by letting ~ y it =y it y i0 ; the above model can be rewritten as ~ y it = ~ y i;t1 i +u it ; (3.3) with the vector form ~ y i = ~ y i;1 i 1 T + u i ; (3.4) where 1 T is a vector of ones of lengthT and ~ y i = (~ y i1 ;:::; ~ y iT ) 0 ; ~ y i;1 = (~ y i0 ;:::; ~ y iT1 ) 0 ; u i = (u i1 ;:::;u iT ) 0 : 3.1.1 MLE and its asymptotics For long differenced model (3.4), it’s a random effects model by construction, and the variance-covariance matrix is given by =Var ( i 1 T + u i ) = 2 1 T 1 0 T + 2 u I T ; the log-likelihood function for model (3.4) is given by logL = NT 2 logj j 1 2 N X i=1 (~ y i ~ y i;1 ) 0 1 (~ y i ~ y i;1 ); (3.5) consequently, we can obtain the (quasi-)MLE of as ^ MLE = N X i=1 ~ y 0 i;1 1 ~ y i;1 ! 1 N X i=1 ~ y 0 i;1 1 ~ y i ! ; (3.6) and a feasible MLE can be obtained by replacing with its consistent estimator ^ : For the consistent estimation of 2 and 2 u (i.e., ), they can be consistently estimated by (Hsiao (2003)), ^ 2 u = 1 2N(T 1) N X i=1 T X t=2 (~ y it ^ ~ y i;t1 ) 2 ; ^ 2 = 1 N N X i=1 ~ y i ^ ~ y i;1 2 1 T ^ 2 u ^ = ^ 2 1 T 1 0 T + ^ 2 u I T ; 26 for any given consistent estimator ^ (simple IV estimator, for example) 3 , where ~ y i = 1 T P T t=1 ~ y it : In this section, we shall discuss the asymptotics of the MLE (3.6). To this end, we first notice that p NT (^ MLE ) = 1 NT N X i=1 ~ y 0 i;1 1 ~ y i;1 ! 1 1 p NT N X i=1 ~ y 0 i;1 1 ( i 1 T + u i ) ! ; (3.7) it is shown in the appendix that, as (N;T )!1; 1 NT N X i=1 ~ y 0 i;1 1 ~ y i;1 ! p 1 1 2 ; and 1 p NT N X i=1 ~ y 0 i;1 1 ( i 1 T + u i )! d N 0; 1 1 2 ; consequently, we have Theorem 7 For model (3.1) after using long difference, let the (quasi-)MLE is given by (3.6), then under assumptions 3.1-3.3 and as (N;T )!1; the MLE (3.6) is asymptotically unbiased, and p NT (^ MLE )! d N 0; 1 2 : Remark 5 From the above theorem, we can notice that under nomality condition, the MLE has reached the Crame-Rao low bound. If the homoskedasticity assumption doesn’t hold, then the variance term is given by the usual sandwich form. Remark 6 As shown above, the MLE is invariant of transformations, namely, either first differ- ence (Hsiao and Zhang (2013)) or long difference is used, the MLE is asymptotically unbiased and have identical limiting distribution. Remark 7 It should also be noted that if there is unit root in model (3.1), then by following Binder et al (2005), the MLE can still provide a consistent estimator of : 3 The simple IV estimator of dynamic panel can be obtained from the othogonal condition thatE (y i;t2 u it ) = 0; thus the simple IV estimator is given by ^ IV = N X i=1 T X t=2 y i;t2 y i;t1 ! 1 N X i=1 T X t=2 y i;t2 y it ; for more discussion on simple IV estimator, see, e.g., Anderson and Hsiao (1981, 1982). 27 3.1.2 Monte Carlo Simulation In this section, we shall investigate the finite sample properties for GMM and crude GMM estimation of for dynamic panel model. We will consider the following data generating process y it = i + y it1 +u it ; we shall assume that i IIDN (0; 1) for alli: For the values of ; we shall let 0 = 0:1; 0:5; 0:9. We shall also let T = 25; 50 and N = 300; 500: 4 We will generate T + 100 observations, and the first 100 observations will be discarded. For the error term u it ; we shall assume u it IIDN (0; 2 i ) for alli;t; where 2 i IID 0:5(1 + 0:5 2 (2)) fori = 1;:::;N; with 2 (2) denotes 2 distribution with degree of freedom of 2. For this DGP, we shall consider the MLE, IV , GMM based forward demeaning (Alvarez and Arellano (2003)), and the Arellano-Bond type GMM based on first difference (Arellano and Bond (1991)). The number of replication is set toR = 1000; and we report the sample mean, bias, root MSE (RMSE), iqr and size for these estimators. For comparison, we also draw the empirical density for the case when = 0:5;T = 25 andN = 150: 3.1.3 Conclusion In this section, we consider the maximum likelihood estimation of dynamic panel models. We propose to use the long difference to remove the individual effects. The MLE based on long difference is shown to be unbiased and asymptotically normally distributed. Monte Carlo studies are conducted to compare the performance of the QMLE and the simple IV estimator and GMM estimator. We demonstrate that the reliability of statistical inference depends critically on whether an estimator is asymptotically unbiased or not. 4 This is to ensure thatN > 2T which is required for the implementation of the GMM based on forward demeaning. 28 Table 6: Sample mean, bias, RMSE, iqr and size of ^ when = 0:1 N 300 500 T ^ MLE ^ IV ^ FOD GMM ^ AB GMM ^ MLE ^ IV ^ FOD GMM ^ AB GMM estimate 0.1009 0.1268 0.0947 0.0946 0.0999 0.1118 0.0958 0.0959 25 bias 0.0009 0.0268 -0.0053 -0.0054 -0.0001 0.0118 -0.0042 -0.0041 rmse 0.0128 0.2323 0.0160 0.0161 0.0103 0.1743 0.0125 0.0123 iqr 0.0164 0.2822 0.0205 0.0204 0.0144 0.2188 0.0173 0.0160 size 5.6% 5.4% 6.4% 6.8% 5% 5.2% 5.5% 6.8% estimate 0.1004 0.1112 0.0957 0.0957 0.0999 0.1053 0.0970 0.0966 50 bias 0.0004 0.0112 -0.0043 -0.0043 -0.0001 0.0053 -0.0030 -0.0034 rmse 0.0093 0.1484 0.0112 0.0110 0.0069 0.1148 0.0084 0.0083 iqr 0.0129 0.2126 0.0141 0.0134 0.0086 0.1485 0.0100 0.0101 size 5% 4.9% 6.4% 7% 6.1% 5.2% 8% 9% Note: 1. ^ MLE refers to maximum likelihood estimation, ^ IV refers to simple instrumental variable estimation, ^ FOD GMM refers to GMM based on forward demeaning, ^ AB GMM refers to Arellano-Bond type GMM based on first difference; 2. iqr refers inter quantile range (75%-25%). Table 7: Sample mean, bias, RMSE, iqr and size of ^ when = 0:5 N 300 500 T ^ MLE ^ IV ^ FOD GMM ^ AB GMM ^ MLE ^ IV ^ FOD GMM ^ AB GMM estimate 0.5005 0.5016 0.4906 0.4906 0.4999 0.4997 0.4931 0.4931 25 bias 0.0005 0.0016 -0.0094 -0.0094 -0.0001 -0.0003 -0.0069 -0.0069 rmse 0.0106 0.0525 0.0183 0.0181 0.0086 0.0414 0.0144 0.0137 iqr 0.0124 0.0729 0.0213 0.0209 0.0108 0.0569 0.0170 0.0161 size 6.5% 5.4% 9.2% 8.8% 6.5% 4.2% 9.5% 11% estimate 0.5004 0.5013 0.4931 0.4931 0.4997 0.4999 0.4952 0.4949 50 bias 0.0004 0.0013 -0.0069 -0.0069 -0.0003 -0.0001 -0.0048 -0.0051 rmse 0.0077 0.0345 0.0121 0.0120 0.0057 0.0268 0.0090 0.0090 iqr 0.0098 0.0490 0.0135 0.0132 0.0072 0.0357 0.0099 0.0099 size 6.3% 4.7% 10% 12% 6% 5.8% 9.7% 11% Note: 1. ^ MLE refers to maximum likelihood estimation, ^ IV refers to simple instrumental variable estimation, ^ FOD GMM refers to GMM based on forward demeaning, ^ AB GMM refers to Arellano-Bond type GMM based on first difference; 2. iqr refers inter quantile range (75%-25%). 29 Table 8: Sample mean, bias, RMSE, iqr and size of ^ when = 0:9 N 300 500 T ^ MLE ^ IV ^ FOD GMM ^ AB GMM ^ MLE ^ IV ^ FOD GMM ^ AB GMM estimate 0.9001 0.9015 0.8557 0.8558 0.8986 0.8976 0.8678 0.8682 25 bias 0.0001 0.0015 -0.0443 -0.0442 -0.0014 -0.0024 -0.0322 -0.0318 rmse 0.0053 0.0893 0.0502 0.0.502 0.0053 0.0686 0.0378 0.0372 iqr 0.0052 0.1142 0.0314 0.0309 0.0038 0.0904 0.0275 0.0263 size 3.7% 4.3% 46% 45% 5.6% 5.7% 36% 38% estimate 0.9002 0.9004 0.8789 0.8790 0.8999 0.8998 0.8857 0.8856 50 bias 0.0002 0.0004 -0.0211 -0.0210 -0.0001 -0.0002 -0.0143 -0.0044 rmse 0.0029 0.0450 0.0232 0.0230 0.0020 0.0335 0.0162 0.0162 iqr 0.0036 0.0588 0.0132 0.0127 0.0030 0.0465 0.0102 0.0099 size 4.3% 4.6% 58% 59% 5.1% 4.7% 47% 48% Note: 1. ^ MLE refers to maximum likelihood estimation, ^ IV refers to simple instrumental variable estimation, ^ FOD GMM refers to GMM based on forward demeaning, ^ AB GMM refers to Arellano-Bond type GMM based on first difference; 2. iqr refers inter quantile range (75%-25%). Fig5. Empirical density of ^ when = 0:5 forT = 25 andN = 150 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 5 10 15 20 25 30 MLE IV GMMAB GMMFOD Note: "GMM-AB" denotes Arellano-Bond type GMM using all lags, "GMM-FOD" denotes GMM based on forward demeaning. 30 3.2 Extension to panel dynamic simultaneous equation models 3.2.1 MLE for panel dynamic simultaneous equation models The MLE discussed in the previous section can be easily extended to panel dynamic simultaneous equation models 5 . Let’s consider the following dynamic simultaneous equation model with two variables y 1;it = y 2;it + 11 y 1i;t1 + 1i +u 1;it ; (3.8) y 2;it = 21 y 1i;t1 + 22 y 2i;t1 + 2i +u 2;it ; which can be rewritten as in the following matrix form y 1;it y 2;it = y 1i;t1 y 2i;t1 + 1i 2i + v 1;it v 2;it ; with = 11 12 21 22 = 21 + 11 22 21 22 = 1 0 1 11 0 21 22 B; and i = 1i 2i = B 1i 2i ; v it = v 1;it v 2;it = B u 1;it u 2;it ; then by defining y it = y 1;it y 2;it the above SEM can be rewritten as y it = y i;t1 + i + v it ; (3.9) In order to have asymptotic results for of model (3.9), we assume the process (3.9) is stationary, i.e., we assume that all the roots of the characteristic equationjIj = 0 lie outside of the unit circle: 5 This section is based on the work of Hsiao and Zhou (2015a). 31 Let ~ y it = y it y i0 ; following the discussion in the above section, we have ~ Y i =I 2 ~ Y 1i;1 ; ~ Y 2i;1 vec ( 0 ) + ~ V i ; i = 1; 2;:::;N; (3.10) where ~ Y i = ~ Y 0 1i ; ~ Y 0 2i 0 with ~ Y 1i = (~ y 1;i2 ;:::; ~ y 1;iT ) 0 and ~ Y 1i;1 = (~ y 1;i0 ;:::; ~ y 1;iT1 ) 0 ; and ~ V i = V i + i 1 T ; and ~ V = E ~ V i ~ V 0 i =E (V i V 0 i ) +E ( i 0 i 1 T 1 0 T ) = v I T + 1 T 1 0 T = v Q + J: where = u +T and v =E (v it v 0 it ) = v;11 v;12 v;21 v;22 ; =E ( i 0 i ) = ;11 ;12 ;21 ;22 : Then, conditional on v and ; the MLE ofvec ( 0 ) = = ( 11 ; 12 ; 21 ; 22 ) 0 is given by ^ = (" I 2 ~ Y 0 1;1 ~ Y 0 2;1 # 1 W h I 2 ~ Y 1;1 ; ~ Y 2;1 i ) 1 (" I 2 ~ Y 0 1;1 ~ Y 0 2;1 # 1 W ~ Y ) where ~ Y 1 = ~ Y 0 1;1 ;:::; ~ Y 0 1;N 0 with ~ Y 1;i = (~ y 1;i1 ;:::; ~ y 1;iT ) 0 ; ~ Y 1;1 = ~ Y 0 11;1 ;:::; ~ Y 0 1N;1 0 with ~ Y 1i;1 = (0; ~ y 1;i2 ;:::; ~ y 1;iT1 ) 0 , and 1 ~ V = 1 v Q + 1 J: Thus, following Magnus and Neudecker (2007, Ch16), we can establish that p NT (^ ) d !N (0; ); where =E 1 NT @ 2 logL @@ 0 ; and logL = NT 2 logj ~ V j 1 2 N X i=1 h ~ Y i I 2 ~ Y 1i;1 ; ~ Y 2i;1 i 1 ~ V h ~ Y i I 2 ~ Y 1i;1 ; ~ Y 2i;1 i 0 : A detailed proof for the unbiasedness of the MLE is given in the appendix. The structural form parameter can be derived from the relation = 12 22 : Thus, the MLE of 32 is simply ^ 12 ^ 22 : From ^ = ^ 12 ^ 22 12 22 = 22 (^ 12 12 ) 12 (^ 22 22 ) ^ 22 22 ; (3.11) using the delta method, one can show that p NT ^ = 22 p NT (^ 12 12 ) 12 p NT (^ 22 22 ) 2 22 +o p (1): (3.12) Thus, p NT ^ is asymptotically normally distributed with mean 0. 3.2.2 Simulation We conduct a small scale Monte Carlo simulations to examine the finite sample properties of various estimators in this section. Following Akashi and Kunitomo (2012), we consider a dynamic simultaneous equations model of the form y 1;it = y 2;it + 11 y 1;it1 + 1i +u 1;it ; y 2;it = 21 y 1;it1 + 22 y 2;it1 + 2i +u 2;it ; with = 0:5; 11 = 0:5; 21 = 0; 22 = 0:3: In data generation process 1 (DGP1), we assume that 1i 2i iid N 0; 1 0 0 1 ; u 1;it u 2;it iid N 0; 2 u 1 0:2 u 1 u 2 0:2 u 1 u 2 2 u 2 ; where 2 u 1 ;i and 2 u 2 ;i are set as independently random draws from 0:5(1 + 0:5 2 (2)) for i = 1; 2;:::;N; and ( 1i ; 2i ) 0 and (u 1;it ;u 2;it ) 0 are independent overi andt: In DGP2, we assume that 1i 2i iid N 0; 1 0 0 1 ; andu 1;it iid 2 (1) 1 andu 2;it iid 2 (1) 1; but we setCov (u 1;it ;u 2;it ) = 0:2: As before, ( 1i ; 2i ) 0 and (u 1;it ;u 2;it ) 0 are independent overi andt: We generate 100 +T observations of y it ; starting with zero. We let y i0 be the 100th 33 observations of y it : We report the bias, root mean square error (RMSE), iqr (interquantile range of 75%-25%) and size for MLE, IV and GMM estimators of and 11 whenN = 100; 200 and T = 25; 50 for DGP1 in Tables 1-2 and DGP2 in Tables 3-4. The number of replication is set at 2000. For illustration, we also draw the empirical densities for different estimators of and 11 when (N;T ) = (200; 25) for DGP1. The results show clearly that the actual size of IV and MLE is close to the nominal size. However, the size distortion of Arellano-Bond type GMM is significant. The actual size of the GMM estimator for the coefficient of the joint dependent variable,; could be near 100%, and for the lag dependent variable, ; could be near 75% for a 5% significance level test. Overall, the empirical distribution of Arellano-Bond type GMM estimators for is more concentrated than other estimators. Unfortunately, it is not centered at the true values. Overall, our finding suggests that MLE proposed in this section is preferred for the estimation and inference for panel dynamic simultaneous equations models, in terms of bias, RMSE, and actual size of a test. 3.2.3 Conclusion We consider the estimation of panel dynamic simultaneous equations models in this section. The presence of time-invariant individual-specific effects does affect the asymptotic properties of the estimator when the cross-sectional dimension,N; and the time-series dimension,T; are of the same magnitude. We consider both the likelihood approach and the methods of moments approach of inference. We show that the treatment of initial values plays a pivotal role in the likelihood approach. The asymptotic distribution of the quasi-maximum likelihood estimator (QMLE) is centered at the true value independent of the wayN orT or both go to infinity if the distribution of initial values is properly formulated. For the method of moments estimators, the treatment of initial values plays no role. However, the asymptotic distribution depends critically on the way that population moments are approximated by the sample moments. The suggested panel instrumental variable estimator (IV) is consistent and asymptotically unbiased independent 34 of the wayN orT or both tend to infinity. On the other hand, the Arellano-Bond (1991) type GMM estimators approximate the population moments by taking the cross-sectional mean is asymptotically unbiased only ifT is fixed andN tends to infinity. When T N ! c6= 0 <1; the Arellano-Bond type GMM estimator is asymptotically biased of order q T N : Our Monte Carlo studies confirm the importance of using asymptotically unbiased estimators to obtain valid statistical inference and the desirability of using a properly formulated likelihood approach for inference. 35 Table 9: Simulation results of for DGP1 N T MLE IV GMM 25 estimate 0.4979 0.4973 0.6903 bias -0.0021 -0.0027 0.1903 rmse 0.0855 0.1396 0.1947 iqr 0.1176 0.1738 0.0494 100 size 5.1% 5.2% 99% 50 estimate 0.5003 0.4983 0.6854 bias 0.0003 -0.0017 0.1854 rmse 0.0590 0.0904 0.1864 iqr 0.0799 0.1168 0.0249 size 5.1% 5.2% 100% 25 estimate 0.5005 0.4991 0.6893 bias 0.0005 -0.0009 0.1893 rmse 0.0608 0.0951 0.1936 iqr 0.0807 0.1268 0.0470 200 size 4.9% 4.95% 99% 50 estimate 0.5005 0.4987 0.6868 bias 0.0005 -0.0013 0.1868 rmse 0.0398 0.0643 0.1879 iqr 0.0503 0.0861 0.0240 size 4.9% 5.4% 100% Note: 1. The true value of in this case is = 0:5; 2. For estimators, MLE refers to MLE, IV refers to IV estimation, GMM refers to Arellano-Bond type GMM estimator; 3. iqr is the 75th-25th interquartile range; 4. The number of replication is set atR = 2000; and the 95% confidence interval for size 5% is [4%, 6%]; 36 Table 10: Simulation results of 11 for DGP1 N T MLE 11 IV 11 GMM 11 25 estimate 0.5000 0.5004 0.4322 bias 0.0000 0.0004 -0.0678 rmse 0.0214 0.0418 0.0735 iqr 0.0284 0.0536 0.0261 100 size 4.7% 5.95% 66% 50 estimate 0.4998 0.5002 0.4502 bias -0.0002 0.0002 -0.0498 rmse 0.0143 0.0282 0.0519 iqr 0.0196 0.0381 0.0165 size 5.05% 4.55% 93% 25 estimate 0.4998 0.5008 0.4478 bias -0.0002 0.0008 -0.0522 rmse 0.0152 0.0294 0.0570 iqr 0.0205 0.0392 0.0202 200 size 5.05% 5.35% 63% 50 estimate 0.5002 0.5008 0.4592 bias 0.0002 0.0008 -0.0408 rmse 0.0096 0.0195 0.0426 iqr 0.0131 0.0261 0.0128 size 4.75% 4.75% 91% Note: 1. The true value of 11 in this case is 11 = 0:5; 2. For estimators, MLE refers to MLE, IV refers to IV estimation, GMM refers to Arellano-Bond type GMM estimator; 3. iqr is the 75th-25th interquartile range; 4. The number of replication is set atR = 2000; and the 95% confidence interval for size 5% is [4%, 6%]; 37 Table 11: Simulation results of for DGP2 N T MLE IV GMM 25 estimate 0.4978 0.4946 0.7003 bias -0.0022 -0.0054 0.2003 rmse 0.0817 0.1341 0.2054 iqr 0.1073 0.1756 0.0543 100 size 5.35% 5.35% 99% 50 estimate 0.4975 0.4928 0.6949 bias -0.0025 -0.0074 0.1949 rmse 0.0571 0.0933 0.1965 iqr 0.0767 0.1221 0.0310 size 4.7% 4.6% 100% 25 estimate 0.4980 0.4965 0.6949 bias -0.0020 -0.0035 0.1949 rmse 0.0584 0.0926 0.1999 iqr 0.0772 0.1224 0.0472 200 size 5.3% 5.15% 99% 50 estimate 0.4977 0.4959 0.6930 bias -0.0023 -0.0041 0.1930 rmse 0.0408 0.0640 0.1944 iqr 0.0552 0.0836 0.0270 size 5.3% 5.3% 100% Note: 1. The true value of in this case is = 0:5; 2. For estimators, MLE refers to MLE, IV refers to IV estimation, GMM refers to Arellano-Bond type GMM estimator; 3. iqr is the 75th-25th interquartile range; 4. The number of replication is set atR = 2000; and the 95% confidence interval for size 5% is [4%, 6%]; 38 Table 12: Simulation results of 11 for DGP2 N T MLE 11 IV 11 GMM 11 25 estimate 0.4995 0.5007 0.4333 bias -0.0005 0.0007 -0.0667 rmse 0.0191 0.0368 0.0717 iqr 0.0256 0.0494 0.0234 100 size 5.35% 5.25% 72% 50 estimate 0.4998 0.5010 0.4489 bias -0.0002 0.0010 -0.0511 rmse 0.0128 0.0253 0.0529 iqr 0.0165 0.0326 0.0150 size 5.6% 5% 96% 25 estimate 0.5003 0.5009 0.4500 bias 0.0003 0.0009 -0.0500 rmse 0.0136 0.0258 0.0543 iqr 0.0188 0.0353 0.0182 200 size 5.1% 4.45% 66% 50 estimate 0.5001 0.5005 0.4584 bias 0.0001 0.0005 -0.0416 rmse 0.0089 0.0178 0.0431 iqr 0.0119 0.0236 0.0112 size 5.35% 5.9% 96% Note: 1. The true value of 11 in this case is 11 = 0:5; 2. For estimators, MLE refers to MLE, IV refers to IV estimation, GMM refers to Arellano-Bond type GMM estimator; 3. iqr is the 75th-25th interquartile range; 4. The number of replication is set atR = 2000; and the 95% confidence interval for size 5% is [4%, 6%]; 39 Figure 6: Empirical densities for MLE, IV and GMM estimators of for DGP1 when (N;T ) = (200; 25) 0.2 0 0.2 0.4 0.6 0.8 1 1.2 0 1 2 3 4 5 6 7 8 9 10 M LE IV G MM These empirical densities are drawn based on 2000 replications of DGP1, the true value of is 0.5. Figure 7: Empirical densities for MLE, IV and GMM estimators of 11 for DGP1 when (N;T ) = (200; 25) 0 .35 0.4 0 .45 0.5 0 .55 0.6 0 .65 0 5 10 15 20 25 30 M LE IV G MM These empirical densities are drawn based on 2000 replications of DGP1, the true value of 11 is 0.5. 40 Chapter 4 Jackknife Instrumental variables estimation (JIVE) for dynamic panel models with serial correlated errors In this chapter, we will consider the estimation of dynamic panel models with serial correlated errors. As can be seen above, when there is serial correlated errors in the dynamic panel models, the MLE can still be used to estimate the lag coefficient. However, as shown by Hsiao and Zhang (2013), the inverse of variance-covariance matrix of serial correlated errors is quite complicated, which in turn complicates the calculation and estimation of MLE. To overcome this difficulty, we propose to do bias reduction for the GMM estimation of dynamic panel models with serial correlated errors, and we consider the jackknife instrumental variable estimation (JIVE) of Phillips and Hale (1977) and Angrist, Imbens, and Kreuger (1999). As an illustration for dynamic panel models with serial correlated errors, we consider a linear measurement error model in which the observed variable is a linear function of the latent variable of interest. In the presence of the (linear) measurement error, the linear dynamic panel regression faces two kinds of endogeneities of the lagged dependent regressor. One is the correlation due to individual specific effects, which causes the incidental parameter problem (e.g., Nickelle (1981)). The other is the correlation due to measurement error. The estimator of interest in this chapter 6 is the 2SLS estimator (2SLSE) based on IVs of further lagged dependent variables whose number is proportional to an order ofT 2 : We consider 2SLSE mainly because it is the most frequently used IV estimator and simple to compute, although the 2SLSE is not an efficient estimator in dynamic panel regression with measurement error. Our first finding is that when measurement error presents in the observed dynamic panel andT is large, the 2SLSEs based on the two transformations suffer from the bias due to many IVs. We show that under the alternative asymptotics whereN;T!1 with T 3 N ! , the JIVE is asymptotically 6 This chapter is based on the work of Lee, Moon and Zhou (2014). 41 normal without an asymptotic bias due to many IVs. 4.1 Models for Dynamic Panel Regression with serial correlated errors 4.1.1 Models and Assumptions We start with a simple dynamic panel regression where the only regressor is the lagged dependent variable for expositional purpose. A generalization will be discussed later in Section 4.2. As an illustration, we consider a dynamic model with measurement errors. Suppose that the panel of (latent) variable of interesty it is generated by y it = y it1 + i +v it ; (4.1) wherev it iid with zero mean acrossi andt and i is an unobserved individual effect. Our particular interests are in the case thaty it is not directly observed and only a proxy variable y it is observed. Suppose that the relationship between the latent variabley it and the proxy variable y it is linear as y it = i0 + 1 y it + it : (4.2) We assume that it iid with zero mean acrossi andt and independent of" js for allj;s: The linear measurement error model in (4:2) was introduced by Bollinger and Chandra (2005) and Kim and Solon (2005) and was used in many empirical applications. For examples, see Kim and Solon, 2005 and Lee, 2009 for the study of income dynamics. In many survey data, researchers often observe 1 < 1. 7 A widely accepted behavioral explanation is that survey respondents tend to respond with average values; ones below the average tend to over-report, while the others above the average tend to underreport (e.g., see Bound, Brown and Mathiowetz, 2001). However, in this chapter we do not impose the restriction that 1 < 1: Notice that the classical measurement error is a special case of the measurement error model of (4:2) because when i0 = 0 and 1 = 1, we havey it =y it + it : In this case, the measurement error, it =y it y it ; has zero mean and is uncorrelated with the latent variabley it : However, when 7 This is why the model is sometimes called a mean-reverting measurement error model. 42 i0 6= 0 or 1 6= 1; the model (4:2) allows for non-classical measurement error. This is because in this case, the measurement error,y it y it = i0 + ( 1 1)y it + it ; is correlated with the latent variabley it and its mean is not necessarily zero. In (4:1) and (4:2); we can write the observed panely it as y it = y it1 + ~ i +u it ; (4.3) where ~ i = 1 i + (1) i0 ; (4.4) u it = 1 v it + it it1 : For the error term of the reduced form (4.4), we can notice that E (u it u it1 ) = E [( 1 v it + it it1 ) ( 1 v it1 + it1 it2 )] = E 2 it1 6= 0: As a result, a noticeable feature of the panel regression erroru it of (4.3) is a composite error of the latent panel regression errorv it and the measurement error it and it1 ; and that it is serially correlated. Thus, for dynamic panel models with measurement errors, the associated reduced form indeed is a dynamic panel models with serial correlated errors. In this chapter, we will mainly focus on the model (4.3). The goal is to estimate the dynamic parameter with the observed panely it when bothN;T are large. Before we introduce the estimators and derive their asymptotic properties, we introduce the assumptions that will be maintained until we generalize the simple model later. Assumption 4.1. Conditional on i andfy its :sp ( 1)g; v it iid (0; 2 v ); it iid (0; 2 ) with finite eighth moment. We also assume thatv it and js are independent for all i;j;s;t: Assumption 4.2. (i) i is independent offu it g has finite fourth moment; (ii) i0 has finite 43 fourth moment, and is independent of it andv it for alli andt: Assumption 4.3. (i) The parameter set of is j j 1 for some small > 0 (but < 1=2): (ii) The coefficientj 1 j> > 0: Assumption 4.4. Assume thatN;T!1 with T 3 N !; where 0<<1: Assumptions 4.1 and 4.2 are quite standard. Assumption 4.3 is made for identification of . Under this assumption, the IVs that will be introduced in the following section have significant correlation with the endogenous regressor. The alternative asymptotics that we use in the chapter is assumed in Assumption 4.4. This alternative asymptotics is different from the conventional alternative asymptotics of the existing panel literature whereN;T!1 with T N !c: For model (4.3), it is well known that the presence of individual-specifc effects ~ i causes the so called incidental parameter problem whenT is small (e.g., Nickelle (1981)). In the conventional dynamic panel regression model where the error term does not have serial correlation, this problem disappears whenT is large (e.g., see Hahn and Kuersteiner (2002)). However, when the composite error termu it is serially correlated due to the measurement error, the lagged dependent variable regressory it1 is correlated with the error termu it ; so the within estimator of becomes biased even whenT tends to infinity. In this chapter, we consider the approach that transform the panel data model to eliminate ~ i and then to resolve the endogeneity, we use proper lagged transformed regressors as IVs. A common practice to overcome the incidental parameter problem caused by ~ i is to transform the panel data and eliminate the individual-specific effects. We consider two transformations that are most frequently used in the applications, (i) the forward orthogonal demeaning in Arellano and Bover (1995) and Alvarez and Arellano (2003) and (ii) the first difference in Anderson and Hsiao (1981, 1982) and Arellano and Bond (1991), for example. Forward orthogonal demeaning(FOD) The FOD transformation is to subtract the time 44 average of the future data. For t = 1;:::;T 1; define y f it = t y it 1 Tt P T s=t+1 y is ; x f it1 = t y it1 1 Tt P T s=t+1 y is1 ; and u f it = t u it 1 Tt P T s=t+1 u is ; where t = Tt Tt+1 : Then, we have y f it =x f it1 +u f it ; i = 1;:::;N;t = 2;:::;T 1; (4.5) where u f it = 1 v f it + f it f it1 (4.6) wherev f it , f it ; f it1 are forward orthogonal demeaned variables ofv it ; it ; and it1 ; respectively. Notice that due to the transformation and the serial correlation inu it ; the transformed regressor x f it and the transformed error termu f it are correlated. However, under Assumption 4.1, we have for 0st 2; E y is u f it =E h y is 1 v f it + f it f it1 i = 0: This implies z it2 = (y it2 ;y it3 ;:::;y i0 ) 0 (4.7) satisfies the orthogonal condition with respect to the transformed erroru f it in (4:6): Also, under Assumption 4.3,z it2 has significant correlation with the transformed regressorx f it1 : In this chapter, we considerz it2 in (4:7) as IVs when we use the FOD transformation. First Difference (FD): An alternative transformation widely used in the applications is to take the first time difference. Denote to be the first difference of time series, so that y it =y it y it1 ; for example. Then, y it = y it1 + u it ; i = 1;:::;N;t = 3;:::;T; (4.8) Under Assumption 4.1, we have for 0st 3; E (y is u it ) = 0: From this, we choose z it3 = (y it3 ;y it4 ;:::;y i0 ) 0 (4.9) 45 as IVs. The difference between the IVs in (4:7) and (4:9) is that the IVs for the FOD includey it2 and further lagged variables, while the IVs for the FD includey it3 and further lagged variables. In what follows, sometimes we use the following general notation for presentation simplicity. We useY it as a general notation fory f it and y it ;X it forx f it1 and y it1 ;U it foru f it and u it ; andZ it forz it2 andz it3 ; depending on whether we use the FOD or FD for the transformation. We use the bold characters to denote the vector or the matrix that stacks the cross sections of panel. For example, define Y t = (Y 1t ;Y 2t ;:::;Y Nt ) 0 ; X t = (X 1t ;X 2t ;:::;X Nt ) 0 ; U t = (U 1t ;U 2t ;:::;U Nt ) 0 : We also denote z t = (z 1t ;z 2t ;:::;z Nt ) and p t = z t (z 0 t z t ) 1 z 0 t : 4.1.2 2SLS and its asymptotics In this section, we investigate the 2SLS estimator of: Although ^ 2SLS is not an efficient estimator when the transformed error termU it is not homoscedastic, we are still interested in it because it is the most widely used estimator in the IV applications, it is simple to compute, and its closed form is available. The 2SLS estimator is ^ 2SLS (4.10) = 0 @ X t 2 4 N X i;j=1 X it Z 0 it N X i=1 Z it Z 0 it ! 1 Z jt X jt 3 5 1 A 1 X t 2 4 N X i;j=1 X it Z 0 it N X i=1 Z it Z 0 it ! 1 Z jt Y jt 3 5 = + 0 @ X t 2 4 N X i;j=1 X it Z 0 it N X i=1 Z it Z 0 it ! 1 Z jt X jt 3 5 1 A 1 X t 2 4 N X i;j=1 X it Z 0 it N X i=1 Z it Z 0 it ! 1 Z jt U jt 3 5 ; or equivalently expressed as ^ 2SLS = X t [X 0 t P t X t ] ! 1 X t [X 0 t P t Y t ] ! = + X t [X 0 t P t X t ] ! 1 X t [X 0 t P t U t ] ! : (4.11) Here P t runs from 2 toT 1 in the FOD case and from 3 toT in the FD case. We denote ^ f 2SLS and ^ 2SLS depending on whether the FOD or the FD transformation is used. 46 We will derive the asymptotics of these two 2SLS estimators separately. First, for the ssymptotics of ^ f 2SLS : By definition, we have ^ f 2SLS = T1 X t=2 x f0 t1 p t2 x f t1 ! 1 T1 X t=2 x f0 t1 p t2 y f t ! = + T1 X t=2 x f0 t1 p t2 x f t1 ! 1 T1 X t=2 x f0 t1 p t2 u f t ! : Under strict stationarity ofy it ; we have y it = ~ i 1 + 1 it + it ; where it = it1 +v it = 1 X s=0 s v its : When the FOD transformation is used, we can express X it =x f it1 = t W it1 t W itT ; (4.12) where W it = y it ~ i 1 = 1 it + it W i;tT = 1 Tt T X s=t+1 W is1 : These decomposition can also be found at Alvarez and Arellano (2003), and can be rewritten in vector form as x f t1 = t W t1 t W tT ; (4.13) where W t1 = (W 1;t1 ;W 2;t1 ;:::;W N;t1 ) 0 and W tT = W 1;tT ; W 2;tT ;:::; W N;tT 0 . By definition of (4.4) and (4.6), we have u t = 1 v t + t t1 ; U t = u f t = 1 v f t + f t f t1 ; where v t = (v 1t ;:::;v Nt ) 0 , t = ( 1t ;:::; Nt ) 0 ; v f t = v f 1t ;:::;v f Nt 0 , and f t = f 1t ;:::; f Nt 0 : For notational simplicity, we shall also let t = 1 v t + t ; and f t denote the forward demeaning of t : Here the projection matrix of the IVs, P t = p t2 = z t2 z 0 t2 z t2 1 z 0 t2 : 47 First, Lemma (21) in the appendix shows that the denominator of ^ f 2SLS has the probability limit 1 NT T1 X t=2 x f0 t1 p t2 x f t1 ! p 2 2 1 2 v 1 2 + 2 : (4.14) Next, for the numerator of 2SLS estimator (4.10), using the alternative asymptotics where N;T!1 with T 3 N !; 0<<1 (Assumption 4.4), we show that 1 p NT T1 X t=2 x f0 t1 p t2 u f t = 1 p NT T1 X t=1 t 1 0 t1 p t2 u t 2 p +o p (1); (4.15) where t1 = 1;t1 ; 2;t1 ;:::; N;t1 0 . (See Lemma (16) in the appendix.). Then, we derive (see Lemma (22)) 1 p NT T1 X t=2 t 1 0 t1 p t2 u t ! d N 0; 2 ( 2 1 2 v + (1 2 ) 2 ) 2 1 2 v 1 2 ; (4.16) Combining (4:15) and (4:16); we have the following: 1 p NT T1 X t=2 x f0 t1 p t2 u f t ! d N 2 p ; 2 ( 2 1 2 v + (1 2 ) 2 ) 2 1 2 v 1 2 : (4.17) The above results are summarized in the following theorem. Theorem 8 For model (4:1) with (4:2), and let the 2SLS estimator defined in (4.10), under As- sumptions 4.1-4.4, for forward orthogonal demeaning, we have p NT ^ f 2SLS ! d N (1 2 ) 2 p [ 2 1 2 v + (1 2 ) 2 ] ; (1 2 ) 2 1 2 v 2 ( 2 1 2 v + (1 2 ) 2 ) : If there is no measurement error in the model, i.e., 2 = 0 and 1 = 1; then p NT ^ f 2SLS ! d N 0; (1 2 ) 2 : (4.18) It is interesting to note that the variance of (4.18) is larger than AA’s variance (which is 1 2 ). The reason for the variance difference between these two is that ^ f 2SLS uses the IVs up to lagt 2 because of the measurement error, while the GMM of AA uses IVs up to lagt 1: Therefore, when there is no measurement error, AA’s GMM is more efficient, but it becomes inconsistent when there is an measurement error. 48 Next we derive the limiting distribution of ^ 2SLS : First, by definition we have ^ 2SLS = T X t=3 y 0 t1 p t3 y t1 ! 1 T X t=3 y t1 p t3 y t ! = ^ + T X t=3 y 0 t1 p t3 y t1 ! 1 T X t=3 y t1 p t3 u t ! : Notice that y t1 = W t1 W t2 : Recall the definition that u t = 1 v t + t t1 : Then, according to part (b) of Lemma (21) in the appendix, the denominator of ^ 2SLS has the probability limit that 1 NT T X t=3 y 0 t1 p t3 y t1 = 1 NT T X t=3 [W t1 W t2 ] 0 p t3 [W t1 W t2 ] ! p 2 (1 ) 2 2 1 2 v 1 2 + 2 (4.19) Also, for the numerator, by definition we can write 1 p NT T X t=3 y t1 p t3 u t = 1 p NT T X t=3 (W t1 W t2 ) 0 p t3 (u t u t1 ) = 1 p NT T X t=3 h 1 ( 1) t2 + 1 v t1 + t1 t2 0 p t3 (u t u t1 ) i = 1 ( 1) 1 p NT T X t=3 0 t2 p t3 (u t u t1 ) + 1 p NT T X t=3 ( 1 v t1 + t1 t2 ) 0 p t3 (u t u t1 ) : In Lemma (22), we show that 1 p NT T X t=3 0 t2 p t3 (u t u t1 )! d N 0; 2 2 1 2 v + (1 + ) 2 2 v 1 + ; (4.20) Also, by using a similar argument as in the forward demeaning case using the alternative 49 asymptotics whereN;T!1 with T 3 N !; 0<<1 (Assumption 4.4), we show that 1 p NT T X t=3 ( 1 v t1 + t1 t2 ) 0 p t3 (u t u t1 ) = 1 p NT T X t=3 ( 1 v t1 + t1 t2 ) 0 p t3 ( 1 v t 1 v t1 + t (1 +) t1 + t2 ) ! p 2 1 2 v + (1 + 2 ) 2 p : Consequently, we have 1 p NT T X t=3 y t1 p t3 u t ! d N 2 1 2 v + (1 + 2) 2 p ; 2 2 1 2 v + (1 +) 2 2 v 1 + : (4.21) The above results are summarized in the following theorem. Theorem 9 For model (4:1) with (4:2), and let the 2SLS estimator defined in (4.10), under As- sumption 4.1-4.4, for first difference, we have p NT (^ 2SLS ) ! d N (1 + ) ( 2 1 2 v + (1 + 2 ) 2 ) (1 ) ( 2 1 2 v + (1 2 ) 2 ) p ; 2 (1 + ) [ 2 1 2 v + (1 + ) 2 ] 2 1 2 v 4 ( 2 1 2 v + (1 2 ) 2 ) 2 : Notice that if there is no measurement error, i.e., 2 = 0 and 1 = 1; then p NT ^ 2SLS ! d N (1 + ) (1 ) p ; 2 (1 + ) 4 : 4.1.3 JIVE and its Asymptotics In the above sections, we discuss that the 2SLS estimators based either forward demeaning or first difference are asymptotic biased for dynamic panel with measurement error. In order to reduce the asymptotic bias, we can consider the JIVE estimator, and we shall analyze the asymptotic properties of JIVE in this section. In IV literature (e.g., Angrist, Imbens, and Kreuger (1999) and Newey and Windmeijer (2009)), it is well known that the bias of the IV estimator ^ 2SLS is determined by b NT = 1 p NT X t 2 4 1 N 2 N X i=1 X it z 0 it N X i=1 z it z 0 it ! 1 z it U it 3 5 : (4.22) wherez it denotesz f it andz it for instruments based on forward demeaning and first difference, 50 respectively. As shown in the later section, when there is measurement error in the model, the 2SLS estimator is asymptotically biased. In order to correct the bias, we can consider the JIVE estimator proposed by Angrist, Imbens, and Kreuger (1999), which is to correct for the bias term b NT : The JIVE is given by ^ JIVE = 0 @ X t 2 4 N X i6=j X it z 0 it N X i=1 z it z 0 it ! 1 z jt X jt 3 5 1 A 1 X t 2 4 N X i6=j X it z 0 it N X i=1 z it z 0 it ! 1 z jt Y jt 3 5 : (4.23) Again we use notation ^ f JIVE and ^ JIVE ; depending on whether we use the FOD or the FD transformation of the panel. To begin with, let’s study the asymptotics of ^ f JIVE : By definition, we have p NT ^ f JIVE = 0 @ 1 NT T1 X t=2 N X i6=j x f it1 z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z jt2 x f jt1 1 A 1 1 p NT T1 X t=2 N X i6=j x f it1 z 0 it2 N X i=1 z it2 z 0 it2 ! 1 x f jt1 u f jt : Write the denominator 1 NT T1 X t=2 N X i6=j x f it1 z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z jt2 x f jt1 = 1 NT T1 X t=2 h x f0 t1 p t2 x f t1 i 1 NT T1 X t=2 N X i=1 x f it1 z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 x f it1 : (4.24) As shown in lemma (21), the first term has the probability limit 1 NT T1 X t=2 h x f0 t1 p t2 x f t1 i ! p 2 2 1 2 v 1 2 + 2 : (4.25) On the other hand, we show in Lemma (18) that 1 NT T1 X t=2 N X i=1 x f it1 z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 x f it1 =o p (1): (4.26) Consequently, by combining (4.25) and (4.26 we have 1 NT T1 X t=2 N X i6=j x f it1 z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z jt2 x f jt1 ! p 2 2 1 2 v 1 2 + 2 : (4.27) 51 For the limiting distribution of the numerator, we notice that 1 p NT T1 X t=2 N X i6=j x f it1 z 0 it2 N X i=1 z it2 z 0 it2 ! 1 x f jt1 u f jt = 1 p NT T1 X t=1 h x f0 t1 p t2 u f t i 1 p NT T1 X t=1 N X i=1 x f it1 z 0 it2 N X i=1 z it2 z 0 it2 ! 1 x f it1 u f it : (4.28) In Lemma Lemma (18) part (d), we show that 1 p NT T1 X t=1 N X i=1 x f it1 z 0 it2 N X i=1 z it2 z 0 it2 ! 1 x f it1 u f it ! p 2 p (4.29) under the alternative asymptotics in Assumption 4.4. Combining the results (4.17) and (4.29), we have the following limit of the numerator: 1 p NT T1 X t=2 N X i6=j x f it1 z 0 it2 N X i=1 z it2 z 0 it2 ! 1 x f jt1 u f jt ! d N 0; 2 ( 2 1 2 v + (1 2 ) 2 ) 2 1 2 v 1 2 : (4.30) Consequently, we have the following theorem for the JIVE when forward demeaning is used. Theorem 10 For model (4:1) with (4:2), and let the JIVE defined in (4.23), under Assumptions 4.1-4.4, for forward demeaning, we have p NT ^ f JIVE ! d N 0; (1 2 ) 2 1 2 v 2 ( 2 1 2 v + (1 2 ) 2 ) : When forward demeaning is used, the limiting distribution of the JIVE estimator of ; ^ JIVE ; shows that the ^ JIVE is consistent and asymptotically normally distributed. The JIVE estimator ^ JIVE eliminates the bias of 2SLS estimator as desired. Standard Error Computation Under homoskedasticity assumption (4.1) and given that the JIVE estimator is asymptotically consistent and unbiased, we can calculate the variance of JIVE (and 2SLS estimators as well) by 52 using the following moment conditions E u 2 it = 2 1 2 v + 1 + 2 2 ; E (u it u it1 ) = 2 ; which gives 2 = E (u it u it1 ) ; 2 1 2 v = E u 2 it 1 + 2 E (u it u it1 ) ; Also, since ^ u f it =y f it x f it ^ = t u it 1 Tt T X s=t+1 u is ! x f it (^ ); then for any fixedt and asT!1; it’s obvious that ^ u f it =u it +o p (1): Consequently, we have ^ 2 = 1 ^ f JIVE 1 NT T1 X t=2 N X i=1 ^ u f it ^ u f it1 ; d 2 1 2 v = 1 NT T1 X t=2 N X i=1 ^ u f2 it 1 + ^ f2 JIVE ^ 2 ; and a consistent estimator for the variance of ^ f JIVE can be obtained by replacing , 2 and 2 1 2 v by their estimates ^ f JIVE ; ^ 2 and d 2 1 2 v . Now, let’s turn to the asymptotics of ^ JIVE : By definition, we have p NT ^ JIVE = 0 @ 1 NT T X t=3 N X i6=j y it1 z 0 it3 N X i=1 z it3 z 0 it3 ! 1 z jt3 y jt1 1 A 1 1 p NT T X t=3 N X i6=j y it1 z 0 it3 N X i=1 z it3 z 0 it3 ! 1 z jt3 u jt : 53 The numerator p NT ^ JIVE is 1 p NT T X t=3 N X i6=j y it1 z 0 it3 N X i=1 z it3 z 0 it3 ! 1 z jt3 u jt = 1 p NT T X t=3 y 0 t1 p t3 u t 1 p NT T X t=3 N X i=1 y it1 z 0 it3 N X i=1 z it3 z 0 it3 ! 1 z it3 u it : For the last term, we show in part (b) of Lemma (19) that 1 p NT T X t=3 N X i=1 y it1 z 0 it3 N X i=1 z it3 z 0 it3 ! 1 z it3 u it ! p 2 1 2 v + (1 + 2 ) 2 p (4.31) under the alternative asymptotics asN;T!1 with T 3 N ! (Assumption 4.4). Consequently, we have 1 p NT T X t=3 N X i6=j y it1 z 0 it3 N X i=1 z it3 z 0 it3 ! 1 z jt3 u jt ! d N 0; 2 [ 2 1 2 v + (1 + ) 2 ] 2 1 2 v (1 ) 2 1 + ! ; (4.32) by using the results (4.31) and (4.21) in the 2SLS estimation. The denominator of the JIVE when first difference is used can be shown in a similar line as in the forward demeaning case and by noticing 1 NT T X t=3 N X i=1 y it1 z 0 it3 N X i=1 z it3 z 0 it3 ! 1 z it3 y it1 =o p (1) (4.33) by using the results of lemma (19), then we have 1 p NT T X t=3 N X i6=j y it1 z 0 it3 N X i=1 z it3 z 0 it3 ! 1 z jt3 y jt1 (4.34) ! p 2 (1 ) 2 2 1 2 v 1 2 + 2 : (4.35) Consequently, we have the following theorem. Theorem 11 For model (4:1) with (4:2), and let the JIVE defined in (4.23), under Assumption 54 4.1-4.4, for first difference, we have p NT ^ JIVE ! d N 0; 2 (1 + ) [ 2 1 2 v + (1 + ) 2 ] 2 1 2 v 4 ( 2 1 2 v + (1 2 ) 2 ) 2 : When first difference is used, the limiting distribution of the JIVE estimator of ; ^ JIVE ; shows that the ^ JIVE is consistent and asymptotically normally distributed. The JIVE estimator ^ JIVE eliminates the bias of 2SLS estimator as desired. Similarly, if there is no measurement error in the mode and if we use the IVsz it1 ; than we have p NT ^ JIVE ! d N 0; 2 (1 + ) 4 : Under homoskedasticity assumption (4.1) and given that the JIVE estimator is asymptotically consistent and unbiased, similar approach as in the FOD case of how to calculate the variance can be applied here. 4.2 Generalization In this section we extend the simple dynamic panel regression model we have analyzed so far by allowing additional predetermined regressors. Suppose that y it and y it are ak vector of panel variables, y it = y 1;it ;:::;y k;it 0 and y it = (y 1;it ;:::;y k;it ) 0 : For ease of exposition, we shall assume thatk = 2; and extension to other cases is straightforward. Suppose that y it is a latent variable of interest generated by the following panel V AR model, y it = y it1 + i + v it ; (4.36) where is a 2 2 matrix, i is a 2 1 vector, v it iid with zero mean acrossi andt: Assume that y it is measured by a proxy variable y it with an measurement error it : y it = i0 + 1 y it + it ; (4.37) where it iid with zero mean acrossi andt and independent of v js for allj;s; also, i0 is independent of it and v it for alli andt: 1 = diagf 1 ; 2 g is a diagonal matrix. Then, the observed panel y it follows y it = y it1 + 1 i + (I 2 ) i0 + 1 v it + it it1 : (4.38) 55 By letting ~ i = 1 i + (I 2 ) i0 ; u it = e it + it it1 ; e it = 1 v it ; we can rewrite (4.38) as y it = y it1 + ~ i + u it ; (4.39) where it is obvious that the error term u it follows a multi-dimension MA(1) process. Now suppose the parameter of interest is the first row of ; denoted by = ( 11 ; 12 ) 0 : Our interest is to estimate the first equation of (4.39) given by y 1;it = y 0 it1 + ~ 1;i +u 1;it ; (4.40) where u 1;it = 1 v 1;it + 1;it 11 1;it1 12 2;it1 : Here y it1 = (y 1;it1 ;y 2;it1 ) 0 is a vector of multiple regressors. For model (4.40), we shall make the following assumptions Assumption 4.5. Conditional on i andfy its :sp ( 1)g; we assume that v it iid (0; v ); it iid (0; ) with finite eighth moment where v > 0 and 0: We also assume that v it and js are independent for alli;j;s;t: Assumption 4.6. We assume that i are independent offv is ; is g for alli;t;s; and have bounded fourth moment; i0 are independent off i ; u is ; is g for alli;t;s; and have bounded fourth moment: Assumption 4.7. We assume that 1 6= 0: We assume that all the roots of the characteristic equationjIj = 0 lie outside of the unit circle: We assume that has full rank. In Assumption (4.5), the restriction 0 allows that measurement error may exist only in subcomponents of y it : Assumption 4.7 assumes the model (4.36) is stationary. Also, the 56 restrictions of Assumption 4.7 are required for identification of the parameter: 4.2.1 FOD and FD Transformations To estimate the regression coefficient in (4.40), the forward orthogonal demeaning can also be defined similarly, y f it = x f it1 + u f it ; i = 1;:::;N;t = 2;:::;T 1; (4.41) and the first equation of the model given by y f 1;it = x f0 it1 +u f 1;it ; i = 1;:::;N;t = 2;:::;T 1; with u f 1;it = 1 v f 1;it + f 1;it 11 f 1;it1 12 f 2;it1 : Then, we have E y is u f 1;it = 0 forst 2; and we can use y is as instruments for the above model whenst 2. Denote z it2 = (y it2 ;:::; y i0 ) 0 Similarly, for multi-dimensional model (4.40), the model after first-difference can be rewritten as y 1;it = y 0 it1 + u 1;it ; i = 1;:::;N;t = 3;:::;T; (4.42) The instrumental variables are y is as instruments for the above model whenst 3: Denote z it3 = (y it3 ;:::; y i0 ) 0 : 4.2.2 2SLS estimation and its asymptotics For model (4.40) when forward demeaning is used, we have y f 1;it = x f0 it1 +u f 1;it (4.43) then we can use z t2 = y (1) 0 ; y (2) 0 ;:::; y (1) t2 ; y (2) t2 ; as instruments for the above model with y (j) s = (y j;1s ;:::;y j;Ns ) 0 forj = 1; 2 ands = 0;:::;t 2: 57 It can also be noted that z t is aN 2 (t 1) matrix of instruments. Then the 2SLS estimator is given by ^ f 2SLS = T1 X t=2 X f0 t1 z t2 z 0 t2 z t2 1 z 0 t2 X f t1 ! 1 T1 X t=2 X f0 t1 z t2 z 0 t2 z t2 y (1)f t (4.44) where X f t1 = x f 1;t1 ;:::; x f N;t1 0 being a N 2 matrix of regressors with x f i;t1 = x f 1;it1 ;x f 2;it1 0 and y (1)f t = y f 1;1t ;:::;y f 1;Nt 0 . For the multivariate model (4.40), the model after first-difference can be rewritten as y 1;it = y 0 it1 + u 1;it ; (4.45) The instrumental variables are y is as instruments for the above model whenst 3: Then the 2SLS estimator for is given by ^ 2SLS = T X t=3 Y 0 t1 z t3 z 0 t3 z t3 1 z 0 t3 Y t1 ! 1 T X t=3 Y 0 t1 z t3 z 0 t3 z t3 y (1) t ; (4.46) where Y t1 = (y 1;t1 ;:::; y N;t1 ) 0 being a N 2 matrix of regressors, y (1) t = (y 1;1t ;:::; y 1;Nt ) 0 with y 1;it =y 1;it y 1;it1 : The main results of the 2SLS estimator for multi-dimension is summarized as follows. Theorem 12 Suppose that Assumptions 4.4-4.7 hold. AsN;T!1 with T 3 N !; we have p NT ^ f 2SLS ! d N B f ;V f ; (4.47) where B f = 2 ( ( 0 + ) 0 ) 1 11 ;11 + 12 ;12 21 ;21 + 22 ;22 p ; V f = ( ( 0 + ) 0 ) 1 1 ( ( 0 + ) 0 ) 1 ; and p NT ^ 2SLS ! d N B ;V ; (4.48) where B = 2 1 (1 + 2 11 ) ;11 + 2 12 ;12 + u;11 (1 + 2 11 ) ;21 + 2 12 ;22 + u;12 p ; V = 1 (I 2 ) 2 (I 2 ) 0 1 ; and 0 = P 1 i=0 i e i0 ; e =E (e it e 0 it ) = 1 v 1 ; =E ( it 0 it ); = (I 2 ) ( 0 + ) ( 0 I 2 ) 0 ; 1 and 2 are defined in (A.A.66) and (A.A.67), respectively. Under Assumption (4.5) and Assumption (4.7), we have 0 > 0 and 0; which gives 58 0 + > 0 and ( 0 + ) 0 > 0 since has full rank under Assumption (4.7): Similarly, we have = (I 2 ) ( 0 + ) ( 0 I 2 ) 0 > 0 since I 2 6= 0 and has full rank under Assumption (4.7). Then both ( 0 + ) 0 and are positive definite, hence they are invertible. 4.2.3 JIVE and its asymptotics As shown above, for model (4.41), the 2SLS of the first row of is asymptotically biased in general unless = 0: In order to eliminate the bias, let’s consider the JIVE estimator of; which is given by ^ f JIVE (4.49) = T1 X t=2 X i6=j x f i;t1 z it2 z 0 t2 z t2 1 z jt2 x f0 j;t1 ! 1 T1 X t=1 X i6=j x f i;t1 z it2 z 0 t2 z t2 1 z 0 jt y (1)f j;t : Similarly, for model (4.42), it is shown that the 2SLS of the first row of is asymptotically biased even if = 0: To eliminate the asymptotics bias, we can consider the JIVE for (4.42), which is given by ^ JIVE (4.50) = T X t=3 X i6=j y i;t1 z it3 z 0 t3 z t3 1 z 0 jt3 y 0 j;t1 ! 1 T X t=2 X i6=j y it1 z it3 z 0 t3 z t3 1 z 0 jt3 y (1) j;t : The results of JIVE for multi-dimension model are given by the following theorem. Theorem 13 Suppose that Assumptions 4.4-4.7 hold. AsN;T!1 with T 3 N !; we have p NT ^ f JIVE ! d N 0;V f ; (4.51) where V f = ( ( 0 + ) 0 ) 1 1 ( ( 0 + ) 0 ) 1 ; and p NT ^ JIVE ! d N 0;V ; (4.52) where V = 1 (I 2 ) 2 (I 2 ) 0 1 : 4.3 Monte Carlo Simulation In this section, we investigate the finite sample properties for 2SLS and JIVE estimation for the simple dynamic model with measurement error in a simple Monte Carlo simulation design. The 59 data generating process is y it = y it1 + i +v it ; y it = y it + it ; where i iidN (0; 1);v it iidN (0; 1); and it iidN (0; 1) for alli andt: We use different combinations ofN;T by lettingN = 500 and 1000; andT = 10; 25 and 50: For the values of; we consider2f0.2; 0.5, 0.8g: To eliminate i ; we use the FOD and the FD transformations. The finite sample properties of the estimators are examined by the bias, median, inter quantile range (iqr). The number of replication is set at 1000. Simulation results of the 2SLS and the JIVE based on the FOD transformed panel and the FD transformed panel are summarized in Tables 1 (N = 500) and 2 (N =1,000) and Tables 3 (N = 500) and 4 (N = 1; 000), respectively. The simulations show that there is significant bias in the two 2SLSEs. However, the bias of the JIVE is almost negligible in both the FOD and the FD cases. 60 Table 13 Simulation results of forward orthogonal demeaning whenN = 500 2SLS JIVE T m.b. iqr m.b. iqr 10 -0.1715 0.1838 0.0126 0.4283 0:2 25 -0.1367 0.0751 -0.0011 0.2804 50 -0.1199 0.0374 -0.0005 0.2279 10 -0.0761 0.0988 0.0042 0.1162 0:5 25 -0.0896 0.0436 0.0004 0.0562 50 -0.1108 0.0240 0.0010 0.0403 10 -0.0543 0.0705 0.0019 0.0795 0:8 25 -0.0441 0.0230 -0.0007 0.0239 50 -0.0507 0.0118 -0.0001 0.0124 Note: "m.b." refers to median bias, "iqr" refers to inter quantile range (75% quantile-25% quantile) Table 14 Simulation results of forward orthogonal demeaning whenN = 1000 2SLS JIVE T m.b. iqr m.b. iqr 10 -0.1114 0.1628 0.0042 0.2605 0:2 25 -0.1018 0.0624 0.0116 0.1470 50 -0.1012 0.0336 0.0024 0.1205 10 -0.0426 0.0736 -0.0010 0.0796 0:5 25 -0.0510 0.0306 0.0005 0.0357 50 -0.0694 0.0198 -0.0002 0.0244 10 -0.0270 0.0503 0.0015 0.0524 0:8 25 -0.0228 0.0148 -0.0001 0.0150 50 -0.0271 0.0080 0.0001 0.0081 Note: "m.b." refers to median bias, "iqr" refers to inter quantile range (75% quantile-25% quantile) 61 Table 15 Simulation results of first difference whenN = 500 2SLS JIVE T m.b. iqr m.b. iqr 10 -0.4537 0.1934 -0.1168 0.9999 0:2 25 -0.5662 0.0644 -0.0837 0.8364 50 -0.6107 0.0305 -0.0991 0.7986 10 -0.3276 0.1439 0.0360 0.3312 0:5 25 -0.5566 0.0648 0.0147 0.2633 50 -0.6965 0.0328 -0.0028 0.2584 10 -0.3771 0.1623 0.0315 0.3437 0:8 25 -0.6669 0.0731 -0.0032 0.2822 50 -0.8564 0.0337 -0.0092 0.2493 Note: "m.b." refers to median bias, "iqr" refers to inter quantile range (75% quantile-25% quantile) Table 16 Simulation results of first difference whenN = 1000 2SLS JIVE T m.b. iqr m.b. iqr 10 -0.3464 0.1868 0.0323 0.6476 0:2 25 -0.4999 0.0657 0.0062 0.4619 50 -0.5703 0.0313 -0.0187 0.4325 10 -0.2023 0.1244 0.0087 0.1998 0:5 25 -0.4003 0.0632 0.0030 0.1531 50 -0.5641 0.0327 0.0024 0.1353 10 -0.2237 0.1155 0.0155 0.1707 0:8 25 -0.4614 0.0584 0.0061 0.1334 50 -0.6722 0.0341 -0.0004 0.1263 Note: "m.b." refers to median bias, "iqr" refers to inter quantile range (75% quantile-25% quantile) 4.4 Conclusion In this chapter we investigate estimation of linear dynamic panel regression with serial correlated errors. As an illustration, we consider linear dynamic panel models with measurement error. Using the alternative asymptoticsN;T!1 with T 3 N !; we characterize the many IVs bias of the 2SLSE of the dynamic coefficient. To reduce the bias of the 2SLSEs, we consider the JIVE and establish its asymptotics. 62 Chapter 5 Summary In this article, we consider the estimation and inference of dynamic panel models. For dynamic panel models, the presence of time-invariant individual specific effects creates the problems for estimation of dynamic panel models, and certain transformation has to be used to remove the individual effects. In the first chapter, we consider the transformations such as first difference and forward difference to remove the individual effects and the associated method of moments estimations. However, it is shown in the literature that the generalized method of moments (GMM) estimator is asymptotically biased. Since the reliability of statistical inference depends critically on whether the estimator is asymptotically unbiased or not. In order to obtain unbiased estimator for dynamic panel models. Two estimation approaches have been proposed in this article. The second chapter considers the maximum likelihood estimation (MLE) for dynamic panel models with serial uncorrelated errors, and the third chapter discusses the jackknife instrumental variables estimation (JIVE) for dynamic panel models with serial correlated errors. It is shown in this article that both the MLE and JIVE estimators are asymptotically unbiased and asymptotically normally distributed. Monte Carlo simulations are conducted to examine the finite sample properties of MLE and JIVE, and simulation findings confirm the our findings in this article. However, there are several issues remain unsolved and will be left for future research. One issue is the estimation and inference for dynamic panel models with cross-sectional dependence, and the other one is the estimation and inference for dynamic panel V AR models with cross-sectional dependence. 63 References Akashi, K., and N. Kunitomo, 2012, Some properties of the LIML estimator in a dynamic panel structural equation, Journal of Econometrics 166, 167-183; Alvarez, J., and M. Arellano, 2003, The time series and cross-section asymptotics of dynamic panel data estimators, Econometrica 71, 1121-1159; Anderson, T.W., and C. Hsiao, 1981, Estimation of dynamic models with error components, Jour- nal of the American Statistical Association 76, 598–606; Anderson, T.W., and C. Hsiao, 1982, Formulation and estimation of dynamic models using panel data, Journal of Econometrics 18, 47–82; Andrews, D.W.K. and X. Cheng, 2012, Estimation and inference with weak, semi-strong, and strong identification", Econometrica 80, 2153-2211; Angrist, J.D., G.W. Imbens, and A.B. Krueger, 1999. Jackknife instrumental variables estimation, Journal of Applied Econometrics 14, 57-67; Arellano, M. 2003, Panel Data Econometrics, Oxford University Press; Arellano, M., and S. Bond, 1991. Some tests of specification for panel data: monte carlo evidence and an application to employment equations, Review of Economic Studies 58, 277-297; Arellano, M., and O. Bover, 1995, Another look at the instrumental variable estimation of error- 64 components models, Journal of Econometrics 68, 29-51; Arellano, M., and J. Hahn, 2007, Understanding bias in nonlinear panel models: some recent developments, in Advances in Economics and Econometrics Theory and Applications, 9th World Congress V olume 3; Avery, R., 1977, Error components and seemingly unrelated regressions, Econometrica 45, 199- 209; Baltagi, B., 2008, Econometric Analysis of Panel Data, 4th ed, Wiley; Bhargava, A., and D. Sargan, 1983, Estimating dynamic random effects models from panel data covering short time periods, Econometrica 51, 1635-1659; Binder, M., C. Hsiao, and M.H. Pesaran, 2005, Estimation and inference in short panel vector autoregressions with unit roots and cointegration, Econometric Theory 21, 795-837; Biorn, E., 2013, Panel data dynamics with mis-measured variables: modeling and GMM estima- tion, working paper; Blundell, R., Bond, S., and F. Windmeijer, 2000, Estimation in dynamic panel data models: im- proving on the performance of the standard GMM estimator, in Nonstationary Panels, Panel Coin- tegration and Dynamic Panels, published online, 53-91; Bond, S., 2002, Dynamic panel data models: a guide to microdata methods and practice, Por- tuguese Economic Journal 1, 141-162; 65 Bollinger, C and A. Chandra, 2005. Iatrogenic specification error: a cautionary tale of cleaning data. Journal of Labor Economics, 23(2), 235-257; Bound, J., C. Brown and N. Mathiowitz, 2001, Measurement error in survey data. ed. by J.J. Heckman and E. Leamer, Handbook of Econometrics, V olume 5, Amsterdam: North Holland Press; Cattaneo, M., M. Jansson, and W. Newey, 2012, Alternative asymptotics and the partially linear model with many regressors, Working Paper; Chao, J. N. Swanson, J. Hausman, W. Newey, and T. Woutersen (2012), Aymptotic distribution of JIVE in a heteroskedastic IV regression with many instruments, Econometric Theory 28, 42-86; Fang, Y ., K. Loparo, and X. Feng, 1994, Inequalities for the trace of matrix produc, IEEE Trans- actions on Automatic Control 39, 2489-2490; Grassetti, L., 2011, A note on transformed likelihood approach in linear dynamic panel models, Statistical Methods & Applications 20, 221-240; Hahn, J., J. Hausman, and G. Kuersteiner, 2002, Bias corrected instrumental variables estimation for dynamic panel models with fixed effects, working paper; Hahn, J., J. Hausman, and G. Kuersteinerc, 2007, Long difference instrumental variables estima- tion for dynamic panel models with fixed effects, Journal of Econometrics 140, 574–617; Hahn, J., and G. Kuersteiner, 2002, Asymptotically unbiased inference for a dynamic panel model with fixed effects when both n and T are large, Econometrica 70, 1639-1657; 66 Hansen, C., and D. Kozbur, 2014, Instrumental variables estimation with many weak instruments using regularized JIVE, Journal of Econometrics 182, 290-308; Holtz-Eakin, D., W. Newey., and H.S. Rosen, 1988, Estimating vector autoregressions with panel data, Econometrica 56, 1371-1395; Hsiao, C., 1983, Identification, in Handbook of Econometrics vol 1, edited by Z. Griliches and M. Intriligator, pp. 223-283. Amsterdam; Hsiao, C., 2014, Analysis of Panel Data, Cambridge University Press; Hsiao, C., Pesaran, M.H., and K. Tahmiscioglu, 2002, Maximum likelihood estimation of fixed effects dynamic panel data models covering short time periods, Jounal of Econometrics 109, 107- 150; Hsiao, C., and J. Zhang, 2013, IV , GMM or Likelihood approach to estimate dynamic panel models when eitherN orT or both are large, working paper; Hsiao, C., and Q. Zhou, 2015a, Statistical inference for panel dynamic simultaneous equations models, working paper; Hsiao, C., and Q. Zhou, 2015b, Quasi-maximum likelihood estimation of dynamic panels by using long difference transformation, working paper; Hujer R., P., Rodrigues, and C. Zeiss, 2005, Serial correlation in dynamic panel models with weakly exogenous regressors and fixed effects, working paper; 67 Kim, B and G. Solon, 2005. Implications of mean-reverting measurement error for longitudinal studies of wages and employment. Review of Economics and Statistics, 87 (1), 193-196; Kulkarni, D., D. Schmidt, S-K. Tsui, 1999, Eigenvalues of tridiagonal pseudo-Toeplitz matrices, Linear Algebra and its Applications 297, 63-80; Phillips, P., 2014, Dynamic panel GMM with roots near unity, working paper; Phillips, P., and C. Han, 2014, On the limit distributions of first difference least squares and the Anderson-Hsiao IV estimators in panel autoregression, working paper; Lee, N., R. Moon, and Q. Zhou, Many IVs Estimation of Dynamic Panel Regression Models with Measurement Errors, working paper; Maddala, G.S., 1971, The use of variance components models in pooling cross section and time series data, Econometrica 39, 341-358; Magnus, J., and H. Neucecker, 2007, Matrix Differential Calculus with Applications in Statistics and Econometrics, 3rd Edition, John Wiley & Sons; Meijer, E., L. Spierdijk, and T Wansbeek, 2013, Consistent estimation of linear panel data models with measurement error, working paper; Meijer, E., L. Spierdijk, and T Wansbeek, 2013, Measurement error in the linear dynamic panel data model, in ISS-2012 Proceedings Volume On Longitudinal Data Analysis Subject to Measure- ment Errors, Missing Values, and/or Outliers, Lecture Notes in Statistics 2013, 77-92; 68 Miller, K., 1981, On the inverse of the sum of matrices, Mathematics Magazine 54, 67-72; Moon, H.R. and P.C.B. Phillips, 2000, Estimation of autoregressive roots near unity using panel data, Econometric Theory 16, 927–997; Moon, H.R., B. Perron, and P.C.B. Phillips, 2013, Incidental parameters and dynamic panel mod- eling, forthcoming in Handbook of Panel Data; Newey, W. and F. Windmeijer, 2009, Generalized method of moments with many weak moment conditions, Econometrica 77, 687-719; Nerlove, M., 2000, An essay on the history of panel data econometrics, working paper; Nickell, S., 1981, Biases in dynamic models with fixed effects, Econometrica 49, 1417-1426; Phillips, G.D.A. and C. Hale, 1977, The bias of instrumental variable esitmators of simultaneous equation systems, International Economic Review, 18, 219-228; Phillips, P.C.B. and D. Sul, 2007, Bias in dynamic panel estimation with fixed effects, incidental trends and cross section dependence, Journal of Econometrics, 137, 162–188; White, H., 2001, Asymptotic Theory for Econometricians, Revised Edition, Academic Press; Zhou, Q., and C. Hsiao, 2014, First Difference or Forward Demeaning: Implications for the Method of Moments Estimators, working paper; 69 Appendix A Appendix: Mathematical proofs This section includes all the derivations for the proof in the chapters. A.1 Mathematical proofs for Chapter 2 section In order to derive the asymptotics of GMM based on forward demeaning, we first note that under strictly stationarity condition ofy it ; we have w it =y it i 1 ; wherew it follows an AR(1) process asw it = w i;t1 +u it : Consequently, we have y (f) it =w (f) it =c t (w it w t+1;T ); where w t+1;T = 1 Tt P T s=t+1 w is , the above holds since the individual-specifc effects i ( i 1 here) is eliminated by forward demeaning: A.1.1 Asymptotics of GMM based on FOD using all first differenced lags For the GMM based on FOD using all first differenced lags, we have that P t1 = Z t1 Z 0 t1 Z t1 1 Z 0 t1 with Z t1 = (y t1 ;:::; y 1 ) is theN (t 1) matrix of instruments, then p NT ^ FOD GMM;FD = 1 NT T1 X t=1 y (f)0 t1 P t1 y (f) t1 ! 1 1 p NT T1 X t=1 y (f)0 t1 P t1 u (f) t : To obtain the limit of the denominator, we notice that y t = w t w t1 = w t ; then we have 1 NT T1 X t=1 w 0 t1 P t1 w t1 = 1 T T1 X t=1 " 1 N w 0 t1 Z t1 1 N Z 0 t1 Z t1 1 1 N Z 0 t1 w t1 # ; 70 where 1 N Z 0 t1 Z t1 = 1 N 0 @ w 0 t1 . . . w 0 1 1 A (w t1 ;:::; w 1 ) = 1 N 0 B B @ w 0 t1 w t1 w 0 t1 w t2 w 0 t1 w 1 w 0 t2 w t1 w 0 t2 w t2 w 0 t2 w 1 . . . . . . . . . . . . w 0 1 w t1 w 0 1 w t2 w 0 1 w 1 1 C C A ! p 0 B B B B @ 2 2 u 1+ 1 1+ 2 u 2 u 1 1+ t3 1 1+ 2 u 2 2 u 1+ 2 u 1 1+ t4 . . . . . . . . . . . . 2 u 1 1+ t3 2 u 1 1+ t4 2 2 u 1+ 1 C C C C A = 2 u 1 + 0 B B @ 1 t2 1 t3 . . . . . . . . . . . . t2 t3 1 1 C C A 2 u 1 + 0 B B @ 1 1 t3 1 1 t2 . . . . . . . . . . . . t3 t2 1 1 C C A = 2 u 1 + (A 1t +A 2t ); since under stationary assumption 2.1, we have, for 1st 1 1 N w 0 s w s = 1 N N X i=1 (w is w i;s1 ) 2 ! p E (w is w i;s1 ) 2 = 2 2 u 1 + ; and for 1s 1 <s 2 t 1; 1 N w 0 s 1 w s 2 = 1 N N X i=1 (w is 1 w i;s 1 1 ) (w is 2 w i;s 2 1 ) ! p E [(w is 1 w i;s 1 1 ) (w is 2 w i;s 2 1 )] = 2 u 1 1 + s 2 s 1 1 ; and 1 N w 0 t1 Z t1 = 1 N w 0 t1 (w t1 ;:::; w 1 ) ! p 2 u 1 + 1;:::; t2 ; where A 1t = 0 B B @ 1 t2 1 t3 . . . . . . . . . . . . t2 t3 1 1 C C A ; andA 1 1t = 1 1 2 0 B B B B @ 1 0 0 1 + 2 0 . . . . . . . . . . . . . . . 0 1 + 2 0 0 1 1 C C C C A ; 71 then by using the sequential limit, namely,N!1 thenT!1; we obtain 1 NT T1 X t=1 w 0 t1 P t1 w t1 = 1 T T1 X t=1 2 4 2 u 1 + 1;:::; t2 (A 1t A 2t ) 1 0 @ 1 . . . t2 1 A 3 5 +o p (1) = 1 T T1 X t=1 2 4 2 u 1 + 1;:::; t2 A 1 1t 0 @ 1 . . . t2 1 A 3 5 1 T T1 X t=3 2 4 2 u 1 + 1;:::; t2 A 1 1t A 1 2t +A 1 2t 1 A 1 1t 0 @ 1 . . . t2 1 A 3 5 +o p (1); with A 1 1t 0 @ 1 . . . t2 1 A = 1 1 2 0 B B B B @ 1 0 0 1 + 2 0 . . . . . . . . . . . . . . . 0 1 + 2 0 0 1 1 C C C C A 0 @ 1 . . . t2 1 A = 1 1 2 0 B B @ 1 2 0 . . . 0 1 C C A = 0 B B @ 1 0 . . . 0 1 C C A ; and 1;:::; t2 A 1 1t 0 @ 1 . . . t2 1 A = 1; and 1;:::; t2 A 1 1t A 1 2t +A 1 2t 1 A 1 1t 0 @ 1 . . . t2 1 A = (1; 0;:::; 0) A 1 2t +A 1 2t 1 0 B B @ 1 0 . . . 0 1 C C A = A 1 2t +A 1 2t 1 (1;1) ; whereA (1;1) denotes the (1; 1)-th element of matrixA: For the above% t 11 = A 1 2t +A 1 2t 1 (1;1) for 3tT 1; by substitution, we have, % t 11 = (t 1) 1 (t 1) t 1 = (t 1) t 1 +t (t 1) t 1 = 1 + t t ( 1) 1 = 1 + 1 1 1 + + 1 t ( 1) 1 = 1 + 1 1 + 1 t ( 1) 1 ; 72 consequently, we have 1 T T1 X t=1 2 4 2 u 1 + 1;:::; t2 A 1 1t A 1 2t +A 1 2t 1 A 1 1t 0 @ 1 . . . t2 1 A 3 5 = 2 u 1 + 1 T T1 X t=1 1 + 1 1 + 1 t ( 1) 1 = 2 u 1 2 +o (1); combining these results, we can obain 1 NT T1 X t=1 w 0 t1 P t1 w t1 ! p 2 u 1 + + 2 u 1 2 = 2 u 1 2 : then by following Alvarez and Arellano (2003) that 1 NT T1 X t=1 y (f)0 t1 P t1 y (f) t1 = 1 NT T1 X t=2 w 0 t1 P t1 w t1 +o p (1) ! p 2 u 1 2 : For the numerator, we have 1 p NT T1 X t=1 h y (f)0 t1 P t1 u (f) t i = 1 p NT T1 X t=1 c 2 t w 0 t1 w 0 t;T1 P t1 (u t u t+1;T ) = 1 p NT T1 X t=1 c 2 t w 0 t1 P t1 u t 1 p NT T1 X t=1 c 2 t w 0 t1 P t1 u t+1;T 1 p NT T1 X t=1 c 2 t w 0 t;T1 P t1 u t + 1 p NT T1 X t=1 c 2 t w 0 t;T1 P t1 u t+1;T I 1 +I 2 +I 3 +I 4 ; where For the first term, we have E (I 1 ) = 1 p NT T1 X t=1 c 2 t E w 0 t1 P t1 u t = 0; by the independence of y t1 and u t ; andI 1 contributes to the limiting distribution (Alvarez and Arellano (2003), p1144). For the second term, we have E (I 2 ) = 1 p NT T1 X t=1 c 2 t E w 0 t1 P t1 u t+1;T = 0; by using the similar argument above. 73 Similarly, forI 3 ; we have E (I 3 ) = 1 p NT T1 X t=1 c 2 t E w 0 t;T1 P t1 u t = 1 p NT T1 X t=1 c 2 t Tt T1 X s=t E w 0 s P t1 u t = 1 p NT T1 X t=1 1 Tt + 1 T1 X s=t E " st+1 w t1 + s X l=t sl u l ! 0 P t1 u t # = 1 p NT T1 X t=1 1 Tt + 1 T1 X s=t st E u 0 t P t1 u t = 1 p NT T1 X t=1 2 u (t 1) Tt + 1 T1 X s=t st = 2 u p NT T1 X t=1 t 1 Tt + 1 1 Tt 1 = 2 u p NT T1 X t=1 t 1 Tt + 1 1 1 + 2 u p NT T1 X t=1 t Tt + 1 Tt+1 1 = 2 u p NT T1 X t=1 t 1 Tt + 1 1 1 + 2 u (1 ) p NT T2 X t=2 (Tt) t t = 2 u p NT T1 X t=1 t 1 Tt + 1 1 1 + T 2 u (1 ) p NT T2 X t=2 t t +O p 1 p NT ; where P T1 t=1 t t is convergent forj j< 1: 8 8 By the ratio test, the series converges whenj j< 1: 74 For the last term, we have E (I 4 ) = 1 p NT T1 X t=1 c 2 t E w 0 t;T1 P t1 u t+1;T = 1 p NT T1 X t=1 1 (Tt) (Tt + 1) X s 1 s 2 E w 0 s 1 P t1 u s 2 = 1 p NT T1 X t=1 1 (Tt) (Tt + 1) X s 1 s 2 s 1 s 2 E u 0 s 2 P t1 u s 2 = 1 p NT T1 X t=1 2 u (t 1) (Tt) (Tt + 1) T1 X s 2 =t+1 T1 X s 1 =s 2 s 1 s 2 = 1 p NT T1 X t=1 2 u (t 1) (Tt) (Tt + 1) T1 X s 2 =t+1 1 Ts 2 1 = 1 p NT T1 X t=1 2 u 1 (t 1) (Tt 1) (Tt) (Tt + 1) 1 p NT T1 X t=1 2 u (t 1) (Tt) (Tt + 1) Tt1 X s=1 s 1 = 1 p NT T1 X t=1 2 u 1 (t 1) (Tt 1) (Tt) (Tt + 1) 2 u (1 ) 2 p NT T1 X t=2 (t 1) 1 Tt1 (Tt) (Tt + 1) ; where P T1 t=1 t t 2 and P T1 t=1 t t are convergent forj j< 1: Since 1 p NT T1 X t=1 2 u 1 (t 1) (Tt 1) (Tt) (Tt + 1) = 1 p NT T1 X t=1 2 u 1 t 1 (Tt + 1) 2 p NT T1 X t=1 2 u 1 t 1 (Tt) (Tt + 1) = 2 p NT T1 X t=1 2 u 1 t 1 (Tt + 1) 1 p NT T1 X t=1 2 u 1 t 1 Tt = 2 p NT T1 X t=1 2 u 1 Tt t 1 p NT T1 X t=1 2 u 1 Tt t + 1 p NT T1 X t=1 2 u 1 1 t = 1 p NT T1 X t=1 2 u 1 Tt t 2 p NT 2 u (T 1) 1 +O p logT p NT = 1 p NT T1 X t=1 2 u 1 t 1 (Tt + 1) T 1 p NT 2 2 u 1 +O p logT p NT ; because 1 p NT T1 X t=1 2 u 1 1 t = 2 u 1 logT p NT =o p (1); 75 and 2 u (1 ) 2 p NT T1 X t=1 (t 1) 1 Tt1 (Tt) (Tt + 1) = 2 u (1 ) 2 p NT T1 X t=1 Tt 1 t (t + 1) 2 u (1 ) 2 p NT T1 X t=1 (Tt 1) t1 t (t + 1) = 2 u (T 1) (1 ) 2 p NT T1 X t=1 1 t (t + 1) 2 u (1 ) 2 p NT T1 X t=1 1 t + 1 2 u (T 1) (1 ) 2 p NT T1 X t=1 t1 t (t + 1) + 2 u (1 ) 2 p NT T1 X t=1 t1 t + 1 = 2 u (T 1) (1 ) 2 p NT 2 u (T 1) (1 ) 2 p NT T1 X t=1 t1 t (t + 1) + 2 u (1 ) 2 p NT T1 X t=1 t1 t + 1 +O p logT p NT ; then we have E (I 4 ) = 1 p NT T1 X t=1 2 u 1 t 1 (Tt + 1) T 1 p NT 2 u 1 2 u (T 1) (1 ) 2 p NT + 2 u (T 1) (1 ) 2 p NT T1 X t=1 t1 t (t + 1) +o p (1) By defining 2 0 = lim (N;T )!1 T N ; and 1 = lim T!1 T1 X t=1 t t ; and by noticing that T1 X t=1 t1 t (t + 1) = T1 X t=1 t1 t T1 X t=1 t1 t + 1 = T1 X t=1 t1 t 1 T1 X t=0 t t + 1 1 ! = 1 T1 X t=1 t1 t + 1 ! p 1 2 1 + 1 ; 76 then we have E (I 3 +I 4 ) = (T 3) 2 u (1 ) p NT T1 X t=1 t t T 1 p NT 2 u 1 2 u (T 1) (1 ) 2 p NT + 2 u (T 1) (1 ) 2 p NT T1 X t=2 t1 t (t + 1) +o p (1) = 2 u 0 ( 1 ) (1 ) 2 u 0 1 2 u 0 (1 ) 2 + 2 u 0 (1 ) 2 1 2 1 + 1 = ( 1 ) (1 ) (1 ) 2 + ( 1) 1 + (1 ) 2 2 u 0 = 1 1 2 u 0 ; which is equivalent to E (I 3 +I 4 ) = 1 1 2 u r T N +o p (1): (A.A.1) As a result, the asymptotic bias of ^ GMM is given by E h p NT ^ FOD GMM;FD i =O r T N ! : For the limiting distribution of the numerator, by following the proof of Alvarez and Arellano (2003), we can show that I 2 =o p (1); I 3 =o p (1); I 4 =o p (1); then 1 p NT T1 X t=1 h y (f)0 t1 P t1 u (f) t i = 1 p NT T1 X t=1 w 0 t1 P t1 u t +o p (1); and 1 p NT T1 X t=2 E w 0 t1 P t1 u t = 0; by construction of forward demeaning, and Var 1 p NT T1 X t=1 w 0 t1 P t1 u t ! = 1 NT T1 X t=1 E w 0 t1 P t1 u t u 0 t P t1 w t1 = 2 u NT T1 X t=1 E w 0 t1 P t1 w t1 ! p 4 u 1 2 ; by following the derivation of the denominator. 77 Consequently, we have for the GMM based on forward demeaning using all first differenced lags, p NT ^ FOD GMM;FD 1 N (1 + ) ! d N 0; 1 2 ; which is the same as the GMM based on forward demeaning using all level lags. Note: In order to calculate the (1; 1)-th element of A 1 2t +A 1 2t 1 ;% t 11 ; for 3 t T 1; we note that by recursive calculation, we have % 3 11 = 2 1 2 4 ; fort = 3; % 4 11 = 3 1 3 5 ; fort = 4; % 5 11 = 4 1 4 6 ; fort = 5; % 6 11 = 5 1 5 7 ; fort = 6; % 7 11 = 6 1 6 8 ; fort = 7; . . . then we have the (1; 1)-th element of A 1 2t +A 1 2t 1 for 3tT 1 has the general form of % t 11 = (t 1) 1 (t 1) t 1 : A.1.2 Asymptotics of GMM based on FOD using one level lag For the GMM based on FOD using one level lag, we have P 1L t1 = y t1 y 0 t1 y t1 1 y 0 t1 ; then for the numerator of p NT ^ FOD GMM;1L we have 1 p NT T1 X t=1 h y (f)0 t1 P 1L t1 u (f) t i = 1 p NT T1 X t=1 c 2 t w 0 t1 w 0 t;T1 P 1L t1 (u t u t+1;T ) = 1 p NT T1 X t=1 c 2 t w 0 t1 P 1L t1 u t 1 p NT T1 X t=1 c 2 t w 0 t1 P 1L t1 u t+1;T 1 p NT T1 X t=1 c 2 t w 0 t;T1 P 1L t1 u t + 1 p NT T1 X t=1 c 2 t w 0 t;T1 P 1L t1 u t+1;T II 1 +II 2 +II 3 +II 4 ; where w 0 t;T1 = 1 Tt P T1 s=t w s and u t+1;T = 1 Tt P T s=t+1 u s together with w t = (w 1t ;:::;w Nt ) 0 78 andw it =y it i 1 : For the first two terms, we have E (II 1 ) = 1 p NT T1 X t=1 c 2 t E w 0 t1 P 1L t1 u t = 0; E (II 2 ) = 1 p NT T1 X t=1 c 2 t E w 0 t1 P 1L t1 u t+1;T = 0; by the independence of y t1 and u t . Similarly, forI 3 ; we have E (II 3 ) = 1 p NT T1 X t=1 c 2 t E w 0 t;T1 P 1L t1 u t = 1 p NT T1 X t=1 c 2 t Tt T1 X s=t E w 0 s P 1L t1 u t = 1 p NT T1 X t=1 c 2 t Tt + 1 T1 X s=t E " st+1 w t1 + s X l=t sl u l ! 0 P 1L t1 u t # = 1 p NT T1 X t=1 c 2 t Tt + 1 T1 X s=t st E u 0 t P 1L t1 u t = 1 p NT T1 X t=1 2 u c 2 t Tt + 1 T X s=t st = 2 u (1 ) p NT T1 X t=1 c 2 t Tt + 1 +O 1 p NT ; where P T1 t=1 t t is convergent forj j< 1 and sincetr P 1L t1 = 1: 79 For the last term, we have E (II 4 ) = 1 p NT T1 X t=1 c 2 t E w 0 t;T1 P 1L t1 u t+1;T = 1 p NT T1 X t=1 c 2 t (Tt) (Tt + 1) X s 1 s 2 E w 0 s 1 P 1L t1 t u s 2 = 1 p NT T1 X t=1 c 2 t (Tt) (Tt + 1) X s 1 s 2 s 1 s 2 E u 0 s 2 P 1L t1 u s 2 = 1 p NT T1 X t=1 2 u c 2 t (Tt) (Tt + 1) T2 X s 2 =t+1 T1 X s 1 =s 2 s 1 s 2 = 1 p NT T1 X t=1 2 u c 2 t (Tt) (Tt + 1) T2 X s 2 =t+1 1 Ts 2 1 = 2 u 1 1 p NT T1 X t=1 c 2 t (Tt + 1) +O 1 p NT ; then we have E (II 3 +II 4 ) =O 1 p NT =o (1); (A.A.2) which is equivalent to E 1 p NT T1 X t=2 y (f)0 t1 P t u (f) t ! =o (1): which gives the asymptotic unbiasness of ^ FOD GMM;1L For the limiting distribution of ^ FOD GMM;1L ; p NT ^ FOD GMM;1L = 1 NT T1 X t=1 y (f)0 t1 P 1L t1 y (f) t1 ! 1 1 p NT T1 X t=1 y (f)0 t1 P 1L t1 u (f) t ; For the denominator, we have 1 NT T1 X t=1 y (f)0 t1 P 1L t1 y (f) t1 = 1 NT T1 X t=1 c 2 t w 0 t1 w 0 t;T1 P 1L t1 (w t1 w t;T1 ) = 1 NT T1 X t=1 c 2 t w 0 t1 P 1L t1 w t1 1 NT T1 X t=1 c 2 t w 0 t1 P 1L t1 w t;T1 1 NT T1 X t=1 c 2 t w 0 t;T1 P 1L t1 w t1 + 1 NT T1 X t=1 c 2 t w 0 t;T1 P 1L t1 w t;T1 A 1 +A 2 +A 3 +A 4 ; (A.A.3) 80 where w t = (w 1t ;:::;w Nt ) 0 withw it =w i;t1 +u it and w t+1;T = 1 Tt P T s=t+1 w is . For the first termA 1 ; we first note that, for 1tT 1; 1 N y 0 t1 y t1 = 1 N N X i=1 y 2 i;t1 ! p E y 2 1;t1 = 2 u 1 2 + 2 (1 ) 2 ; (A.A.4) and 1 N w 0 t1 y t1 = 1 N N X i=1 w i;t1 y i;t1 ! p E (w i;t1 y i;t1 ) = 2 u 1 2 ; (A.A.5) since E (w i;t1 y i;t1 ) =E w i;t1 w i;t1 + i 1 =E w 2 i;t2 = 2 u 1 2 ; and 1 N E ( w 0 t y t1 ) = 1 N N X i=1 1 Tt T1 X s=t E (w is y i;t1 ) = 1 N N X i=1 1 Tt T1 X s=t E (w is w i;t1 ) = 1 N N X i=1 1 Tt T1 X s=t st+1 E w 2 i;t1 = 2 u 1 2 Tt 1 Tt 1 ; Then, we have A 1 = 1 NT T1 X t=1 c 2 t w 0 t1 P 1L t1 w t1 = 1 T T1 X t=1 c 2 t 1 N w 0 t1 y t1 1 N y 0 t1 y t1 1 1 N y 0 t1 w t1 = 2 u 1 2 + 2 (1 ) 2 1 1 T T1 X t=1 c 2 t 1 N w 0 t1 y t2 1 N y 0 t2 w t1 +o p (1) ! p 2 u 1 2 2 2 u 1 2 + 2 (1 ) 2 1 ; (A.A.6) where the mean-square convergence can be obtained by following Alvarez and Arellano (2003). 81 ForA 2 ; we have A 2 = 1 NT T1 X t=1 c 2 t w 0 t1 P 1L t1 w t;T1 = 1 T T1 X t=1 c 2 t 1 N w 0 t1 y t1 1 N y 0 t1 y t1 1 1 N y 0 t1 w t;T1 = 2 u 1 2 2 u 1 2 + 2 (1 ) 2 1 1 T T1 X t=2 2 u 1 2 2 Tt 1 Tt 1 +o p (1) = o p (1); (A.A.7) similarly, we have A 3 =o p (1) andA 4 =o p (1): (A.A.8) Consequently, by substituting (A.A.6)-(A.A.8) to (A.A.3), we obtain. 1 NT T1 X t=1 y (f)0 t1 P 1L t1 y (f) t1 ! p 2 u 1 2 2 2 u 1 2 + 2 (1 ) 2 1 ; (A.A.9) as (N;T )!1: For the numerator, we have 1 p NT T1 X t=1 y (f)0 t1 P 1L t1 u (f) t = 1 p NT T1 X t=1 2 t w 0 t1 P 1L t1 u t 1 p NT T1 X t=1 2 t w 0 t1 P 1L t1 u tT 1 p NT T1 X t=1 2 t w 0 t1;T P 1L t1 u t + 1 p NT T1 X t=1 2 t w 0 t1;T P 1L t1 u tT = B 1 +B 2 +B 3 +B 4 ; (A.A.10) where w t1;T = 1 Tt (w t +::: +w T1 ) and u tT = 1 Tt (u t+1 +::: +u T ): By following Alvarez and Arellano (2003) and the derivation above, it can be shown that B j =o p (1); forj = 2; 3; 4; (A.A.11) andB 1 will contribute to the limiting distribution. ForB 1 ; we have B 1 = 1 p NT T1 X t=1 1 N w 0 t1 y t1 1 N y 0 t1 y t1 1 y 0 t1 u t +o p (1) = 2 u 1 2 2 u 1 2 + 2 (1 ) 2 1 1 p NT T1 X t=1 y 0 t2 u t1 +o p (1) ! d 2 u 1 2 2 u 1 2 + 2 (1 ) 2 1 N 0; 2 u 2 u 1 2 + 2 (1 ) 2 ;(A.A.12) 82 since Var 1 p NT T1 X t=1 y 0 t1 u t ! = 1 NT X s;t E y 0 t1 u t y 0 s1 u s = 1 NT N X i=1 X s;t E (y i;t1 u i;t y i;s1 u i;s ) = 1 NT N X i=1 T1 X t=1 E y 2 i;t1 u 2 i;t + 2 NT N X i=1 X s<t E (y i;t1 u i;t y i;s1 u i;s ) = 2 u 2 u 1 2 + 2 (1 ) 2 ; (A.A.13) then by combining (A.A.10)-(A.A.13), we have 1 p NT T1 X t=1 y (f)0 t1 P 1L t1 u (f) t ! d N 0; (1 2 ) 2 2 u 2 u 1 2 + 2 (1 ) 2 1 ! : (A.A.14) A.1.3 Asymptotics of GMM based on FOD using one first differenced lag For the GMM based on FOD using one first differenced lag, we have P 1FD t1 = y t1 y 0 t1 y t1 1 y 0 t1 ; then by following the derivation above, we have we have following the derivation above, we have p NT ^ FO GMM;1FD = 1 NT T1 X t=2 y (f)0 t1 P 1FD t1 y (f) t1 ! 1 1 p NT T1 X t=2 y (f)0 t1 P 1FD t1 u (f) t ; with 1 NT T1 X t=2 y (f)0 t1 P 1FD t1 y (f) t1 ! p 2 u 2 (1 + ) ; (A.A.15) since for 2tT; 1 N y 0 t1 y t1 = 1 N N X i=1 y 2 i;t1 ! p E y 2 1;t2 = 2 2 u 1 + ; and 1 N w 0 t1 y t1 = 1 N N X i=1 w i;t1 y i;t1 ! p E (w i;t1 y i;t1 ) = 2 u 1 + ; since E (w i;t1 y i;t1 ) = E [w i;t1 (w i;t1 w i;t2 )] = (1 )E w 2 i;t2 = 2 u 1 + : 83 For the convergence of the numerator, we notice that Var 1 p NT T1 X t=2 y 0 t1 u t ! = 1 NT X s;t E y 0 t1 u t y 0 s1 u s = 1 NT N X i=1 X s;t E (y i;t1 u i;t y i;s1 u i;s ) = 1 NT N X i=1 T1 X t=2 E y 2 i;t1 u 2 i;t + 2 NT N X i=1 X s<t E (y i;t1 u i;t y i;s1 u i;s ) = 2 u 2 2 u 1 + ; consequently, we have 1 p NT T1 X t=2 y (f)0 t1 P 1FD t1 u (f) t ! d N 0; 2 u 2 2 u 1 + : (A.A.16) A.1.4 Asymptotic bias of Arellano-Bond GMM using one level lag as instrument For the Arellano-Bond GMM estimator (2.12) based on first difference and using fixed lags as instruments, we shall assume that only one lag is used as instruments as in Hsiao and Zhang (2013), p NT ^ AB GMM;1L = 2 4 1 T 1 N N X i=1 W 1L i y i;1 ! 0 1 NT N X i=1 W 1L i HW 1L0 i ! 1 1 N N X i=1 W 1L i y i;1 ! 3 5 1 2 4 1 N N X i=1 W 1L i y i;1 ! 0 1 NT N X i=1 W 1L i HW 1L0 i ! 1 1 p NT N X i=1 W 1L i u i ! 3 5 : where W 1L i =diag (y i0 ;:::;y iT2 ). It can be shown that (Arellano and Bond (1991), and Hsiao and Zhang (2013)) 1 T 1 N N X i=1 W 1L i y i;1 ! 0 1 NT N X i=1 W 1L i HW 1L0 i ! 1 1 N N X i=1 W 1L i y i;1 ! ! p K; as (N;T )!1; whereK is a positive constant. 84 Also, we have 1 NT N X i=1 W 1L i HW 1L0 i = 1 N N X i=1 0 B B B B B @ 2y 2 i0 y i0 y i1 0 0 y i0 y i1 2y 2 i1 y i1 y i2 0 0 . . . . . . . . . . . . . . . . . . y i;T4 y i;T3 2y 2 i;T3 y i;T3 y i;T2 0 0 y i;T3 y i;T2 2y 2 i;T2 1 C C C C C A ; under the assumption of homoscedasticity and cross-sectionally independence, we have 1 N N X i=1 W 1L i HW 1L0 i ! p E W 1L i HW 1L0 i = 2 (1 ) 2 0 B B B B B @ 2 1 0 0 1 2 1 0 0 . . . . . . . . . . . . . . . . . . 1 2 1 0 0 1 2 1 C C C C C A + 2 u 1 2 0 B B B B B @ 2 0 0 2 0 0 . . . . . . . . . . . . . . . . . . 2 0 0 2 1 C C C C C A ; because, under strict stationarity condition, E y 2 it = 2 (1 ) 2 + 2 u 1 2 ; E (y i;t y i;t1 ) = 2 (1 ) 2 + 2 u 1 2 : As a result, the asymptotic bias of Arellano-Bond GMM using one lag as instruments is given by E h p NT ^ AB GMM;1L i = K 1 E 2 4 1 N N X i=1 W 1L i y i;1 ! 0 1 N N X i=1 W 1L i HW 1L0 i ! 1 1 p NT N X i=1 W 1L i u i ! 3 5 +o p (1); where E 2 4 1 N N X i=1 W 1L i y i;1 ! 0 1 N N X i=1 W 1L i HW 1L0 i ! 1 1 p NT N X i=1 W 1L i u i ! 3 5 = tr ( E W 1L i HW 1L0 i 1 E " 1 p NT N X i=1 W 1L i u i ! 1 N N X i=1 W 1L i y i;1 ! 0 #) +o p (1) = tr ( E W 1L i HW 1L0 i 1 E 1 N 1 p NT N X i=1 W 1L i u i y 0 i;1 W 1L0 i !) +o p (1); 85 and E 1 N 3=2 T 1=2 N X i=1 W 1L i u i y 0 i;1 W 1L0 i ! = 1 N 1=2 T 1=2 E W 1L i u i y 0 i;1 W 1L0 i ; where W 1L i u i y 0 i;1 W 1L0 i = 0 B B @ y i1 u i3 y i2 y i1 y i1 u i3 y i3 y i2 y i1 u i3 y i;T1 y i;T2 y i2 u i4 y i2 y i1 y i2 u i4 y i3 y i2 y i2 u i4 y i;T1 y i;T2 . . . . . . . . . . . . y i;T2 u iT y i2 y i1 y i;T2 u iT y i3 y i2 y i;T2 u iT y i;T1 y i;T2 1 C C A ; then E W 1L i u i y 0 i;1 W 1L0 i = 0 B B B B B B @ 2 2 (1 ) 2 + 2 u 1 2 E (y i1 u i3 y i3 y i2 ) E (y i1 u i3 y i;T1 y i;T2 ) 0 2 2 (1 ) 2 + 2 u 1 2 E (y i2 u i4 y i;T1 y i;T2 ) . . . . . . . . . . . . 0 0 2 2 (1 ) 2 + 2 u 1 2 1 C C C C C C A since, for 1tT 2; E (y it u i;t+2 y i;t+1 y it ) = E (u i;t+2 u i;t+1 ) (y i;t+1 y it ) (y it ) 2 = E u i;t+1 y i;t+1 y 2 it =E u 2 i;t+1 E (y it ) 2 = 2 u 2 (1 ) 2 + 2 u 1 2 : Consequently, by using the trace inequality for the product of matrices 9 and noticing that E W 1L i HW 1L0 i is a positive definite symmetric matrix with bounded eigenvalues 10 under our assumption andE W 1L i HW 1L0 i 0; we have that tr ( E W 1L i HW 1L0 i 1 E 1 N N X i=1 W 1L i u i y 0 i;1 W 1L0 i !) =O (T ): which gives E h p NT ^ AB GMM;1L i = 1 p NT O (T ) =O r T N ! : A.2 Mathematical proofs for Chapter 3 section 9 For any positive semi-definite matrices matricesA andB; we have min (A)tr (B)tr (AB) max (A)tr (B); (for example, Fang et al (1994)): 10 For 1 h T 1; the h-th eigenvalue of E W 1L i HW 1L0 i is given by by 2 2 (1 ) 2 + 2 2 u 1 2 + 2 2 (1 ) 2 + 2 u 1 2 cos h T (Kulkarni et al (1999)). 86 A.2.1 Mathematical proofs for Chapter 3.1 In order to derive the asymptotics of the MLE, for the numerator of (3.7), we have 1 NT N X i=1 ~ y 0 i;1 1 ~ y i;1 = 1 2 u 1 NT N X i=1 ~ y 0 i;1 " I T 2 2 u +T 2 1 T 1 0 T #! ~ y i;1 = 1 2 u 1 NT N X i=1 T1 X t=1 ~ y 2 it1 2 2 u 2 u +T 2 1 NT N X i=1 T1 X t=1 ~ y it1 ! 2 ; (A.A.21) since (e.g., Hsiao (2003)) 1 = 2 1 T 1 0 T + 2 u I T 1 = 1 2 u " I T 2 2 u +T 2 1 T 1 0 T # : For the first term of (A.A.21), we have 1 2 u 1 NT N X i=1 T1 X t=1 ~ y 2 it1 = 1 2 u 1 NT N X i=1 T1 X t=1 i 1 t 1 + t X s=1 ts u is ! 2 = 1 2 u 1 NT N X i=1 T1 X t=1 (1 t ) 2 (1 ) 2 2 i + 1 2 u 1 NT N X i=1 T1 X t=1 t X s=1 ts u is ! 2 2 1 2 u 1 NT N X i=1 T1 X t=1 t X s=1 1 t 1 i ts u is = 1 (1 ) 2 2 u 1 N N X i=1 2 i + 1 2 u 1 NT N X i=1 T1 X t=1 t X s=1 ts u is ! 2 +o p (1) ! p 2 (1 ) 2 2 u + 1 1 2 ; (A.A.22) where the first equation comes from the fact that ~ y it = i 1 t 1 + t X s=1 ts u is ; and for the second term of (A.A.21), we note that for alli 1 T T1 X t=1 ~ y it1 = i 1 T T1 X t=1 1 t 1 + 1 T T1 X t=1 t X s=1 ts u is ; 87 with 1 N N X i=1 1 T T1 X t=1 ~ y it1 ! 2 = 1 N N X i=1 i 1 T T1 X t=1 1 t 1 + 1 T T1 X t=1 t X s=1 ts u is ! 2 = 1 N N X i=1 2 i 1 T T1 X t=1 1 t 1 ! 2 + 1 N N X i=1 1 T T1 X t=1 t X s=1 ts u is ! 2 1 1 1 N N X i=1 2 i 1 T T1 X t=1 t X s=1 ts u is +o p (1) ! p 2 (1 ) 2 ; as (N;T )!1: Then, for the second term of (A.A.21), we have T 2 2 u 2 u +T 2 1 N N X i=1 1 T T1 X t=1 ~ y it1 ! 2 ! p 2 2 u (1 ) 2 ; (A.A.23) consequently, by combining (A.A.22) and (A.A.23), we have 1 NT N X i=1 ~ y 0 i;1 1 ~ y i;1 ! p 1 1 2 ; as (N;T )!1: For the limit of the numerator of (3.7), we first note that E ~ y 0 i;1 1 i 1 T = E " ~ y 0 i;1 2 u I T 2 2 u I T 1 T 1 0 T I T 1 +T 2 2 u ! i 1 T # = 2 u E ~ y 0 i;1 i 1 T T 2 4 u 1 +T 2 2 u E ~ y 0 i;1 1 T i = 2 u 1 +T 2 2 u E ~ y 0 i;1 1 T i = 2 2 u 1 +T 2 2 u T1 X t=1 (1 t ) 1 since ~ y i;0 = 0; and from (3.3), ~ y it = i 1 t 1 + t X s=1 ts u is ; E (~ y it i ) = 2 1 t 1 ; 88 and E ~ y 0 i;1 1 u i = E " ~ y 0 i;1 2 u I T 2 4 u 1 T 1 0 T 1 +T 2 2 u ! u i # = 2 u E ~ y 0 i;1 u i 2 4 u 1 +T 2 2 u E ~ y 0 i;1 1 T 1 0 T u i = 2 4 u 1 +T 2 2 u E T X t;s=1 ~ y i;t1 u is ! = 2 4 u 1 +T 2 2 u E X ts+1 ~ y i;t1 u is ! = 2 2 u 1 +T 2 2 u T1 X s=1 1 Ts 1 = 2 2 u 1 +T 2 2 u T1 X t=1 1 t 1 ; consequently, we have E ~ y 0 i;1 1 ( i 1 T + u i ) = 2 2 u 1 +T 2 2 u T1 X t=1 (1 t ) 1 2 2 u 1 +T 2 2 u T1 X t=1 (1 t ) 1 = 0; this suggests that the MLE of is asymptotically unbiased eitherN orT or both tend to infinity. Furthermore, under assumptions 3.1-3.3, the variance of the MLE ^ MLE is given by V 1 MLE = 1 NT E @ 2 logL @ 2 = 1 NT N X i=1 E ~ y 0 i;1 1 ~ y i;1 = 1 1 2 : A.2.2 Mathematical proofs for Chapter 3.2 Conditional on v and ; the MLE ofvec ( 0 ) = = ( 11 ; 12 ; 21 ; 22 ) 0 is given by ^ = (" I 2 ~ Y 0 1;1 ~ Y 0 2;1 # 1 ~ V h I 2 ~ Y 1;1 ; ~ Y 2;1 i ) 1 (" I 2 ~ Y 0 1;1 ~ Y 0 2;1 # 1 ~ V ~ Y ) : (A.A.24) where ~ Y 1 = ~ Y 0 1;1 ;:::; ~ Y 0 1;N 0 with ~ Y 1;i = (~ y 1;i1 ;:::; ~ y 1;iT ) 0 ; ~ Y 1;1 = ~ Y 0 11;1 ;:::; ~ Y 0 1N;1 0 with ~ Y 1i;1 = (0; ~ y 1;i2 ;:::; ~ y 1;iT1 ) 0 , and 1 ~ V = 1 v Q + 1 J: Thus, p NT (^ ) = ( 1 NT " I 2 ~ Y 0 1;1 ~ Y 0 2;1 # 1 ~ V h I 2 ~ Y 1;1 ; ~ Y 2;1 i ) 1 ( 1 p NT " I 2 ~ Y 0 1;1 ~ Y 0 2;1 # 1 ~ V ~ V ) ; (A.A.25) It’s easy to see the first term on the right hand size of (A.A.25) converges to a nonsingular 89 constant matrix as (N;T )!1: Let 1 v = 11 v 12 v 12 v 22 v ; 1 = w 11 w 12 w 12 w 22 : Then, the numerator of (A.A.25) can be rewritten as 1 p NT P N i=1 # i where # i = (# i;1 ;# i;2 ;# i;3 ;# i;4 ) 0 with # i;1 = 11 v ~ Y 0 1i;1 QV 1i + 12 v ~ Y 0 1i;1 QV 2i +w 11 ~ Y 0 1i;1 J (1 T 1i + V 1i ) +w 12 ~ Y 0 1i;1 J (1 T 2i + V 2i ) # i;2 = 11 v ~ Y 0 2i;1 QV 1i + 12 v ~ Y 0 2i;1 QV 2i +w 11 ~ Y 0 2i;1 J (1 T 1i + V 1i ) +w 12 ~ Y 0 2i;1 J (1 T 2i + V 2i ) # i;3 = 21 v ~ Y 0 1i;1 QV 1i + 22 v ~ Y 0 1i;1 QV 2i +w 21 ~ Y 0 1i;1 J (1 T 1i + V 1i ) +w 22 ~ Y 0 1i;1 J (1 T 2i + V 2i ) # i;4 = 21 v ~ Y 0 2i;1 QV 1i + 22 v ~ Y 0 2i;1 QV 2i +w 21 ~ Y 0 2i;1 J (1 T 1i + V 1i ) +w 22 ~ Y 0 2i;1 J (1 T 2i + V 2i ); (A.A.26) We note that ~ y 1;it ~ y 2;it = ~ y 1i;t1 ~ y 2i;t1 + i + v it (A.A.27) = (I 2 ) 1 I 2 t i + t X j=0 j v i;tj ; then E ~ y 1;it1 ~ y 2;it1 0 i = (I 2 ) 1 I 2 t1 ;11 ;12 ;12 ;22 ; E ~ y 1;it1 ~ y 2;it1 v 0 i;tj = j1 v;11 v;12 v;12 v;22 for 1jt 1; and E ( ~ Y 0 1i;1 ~ Y 0 2i;1 J1 T ( 1i ; 2i ) ) =E ( T1 X j=0 ~ y 1;iT1j ~ y 2;iT1j ( 1i ; 2i ) ) = (I 2 ) 2 (T 1)I 2 T + T ;11 ;12 ;12 ;22 : Thus E ( ~ Y 0 1i;1 ~ Y 0 2i;1 Q (V 1i ;V 2i ) ) = 1 T E ( T1 X j=0 ~ y 1;iT1j ~ y 2;iT1j T1 X j=0 (v 1;iTj ;v 2;iTj ) ) = 1 T (I 2 ) 2 (T 1)I 2 T + T v;11 v;12 v;12 v;22 ; 90 and E ( ~ Y 0 1i;1 ~ Y 0 2i;1 J (V 1i ;V 2i ) ) = 1 T E ( T1 X j=0 ~ y 1;iT1j ~ y 2;iT1j T1 X j=0 (v 1;iTj ;v 2;iTj ) ) = 1 T (I 2 ) 2 (T 1)I 2 T + T v;11 v;12 v;12 v;22 : Let (I 2 ) 2 (T 1)I 2 T + T = a 11 a 12 a 21 a 22 : Then E 11 v ~ Y 0 1i;1 QV 1i + 12 v ~ Y 0 1i;1 QV 2i = 1 T a 11 11 v v;11 + 12 v v;21 +a 12 11 v v;12 + 12 v v;22 = a 11 T ; since 11 v v;11 + 12 v v;21 = 1 and 11 v v;12 + 12 v v;22 = 0 from the fact that v 1 v =I 2 : Also, E w 11 ~ Y 0 1i;1 J1 T 1i +w 12 ~ Y 0 1i;1 J1 T 2i = a 11 w 11 ;11 +w 12 ;21 +Ta 12 w 11 ;12 +w 12 ;22 ; and E w 11 ~ Y 0 1i;1 JV 1i +w 12 ~ Y 0 1i;1 JV 2i = 1 T fa 11 [w 11 v;11 +w 12 v;12 ] +a 12 [w 11 v;21 +w 12 v;22 ]g: Combining these two equations we have E w 11 ~ Y 0 1i;1 J (1 T 1i + V 1i ) +w 12 ~ Y 0 1i;1 J (1 T 2i + V 2i ) = a 11 T w 11 ( v;11 +T ;11 ) +w 12 ( v;12 +T ;21 ) + a 12 T w 11 ( v;12 +T ;12 ) +w 12 ( v;22 +T ;22 ) = a 11 T ; Thus E h 11 v ~ Y 0 1i;1 QV 1i + 12 v ~ Y 0 1i;1 QV 2i +w 11 ~ Y 0 1i;1 J (1 T 1i + V 1i ) +w 12 ~ Y 0 1i;1 J (1 T 2i + V 2i ) i = 0; 91 or E (# i;1 ) = 0: Similarly, we can show that# i;2 ;# i;3 ;# i;4 have zero mean. Following Magnus and Neudecker (2007, Ch16), we can establish that p NT (^ ) d !N (0; ); where =E 1 NT @ 2 logL @@ 0 ; and logL = NT 2 logj ~ V j 1 2 N X i=1 h ~ Y i I 2 ~ Y 1i;1 ; ~ Y 2i;1 i 1 ~ V h ~ Y i I 2 ~ Y 1i;1 ; ~ Y 2i;1 i 0 : A.3 Mathematical proofs for Chapter 4 section A.3.1 Proof of univariate model LetkAk = p tr (AA 0 ) denote the Frobenius norm, min (A) to denote the minimum eigenvalue ofA; and max (A) to denote the maximum eigenvalue ofA: We also letC denote a generic finite constant, whose value may vary case by case. Lemma 14 LetA andB are symmetric, positive semidefinite matrices. Then, the following holds. (a)j min (A) min (B)j;j max (A) max (B)jkABk: Also, if ^ AA ! p 0 and 1=C min (A) max (A)C; then, w.p.a.1, 1=2C min ^ A max ^ A 2C: (b) min (A +B) min (A) + min (B); max (A +B) max (A) + max (B): Lemma 14(a) is Lemma A0 of Newey and Windmeijer (2009, Supplementary appendix). Lemma 15 Let 3;v and 4;v be the third and fourth order cumulants ofv it : Also, let d t and d s be N 1 vectors containing the diagonal elements of p t and p s ; respectively, so that tr(p t ) = d 0 t 1 N = t and tr(p s ) = d 0 s 1 N = s; and d 0 t d s min (t;s); then under assumptions 4.1-4.3, for 92 lrt;pqs andts Cov v 0 l p t v r ; v 0 p p s v q = 8 > > < > > : 4;v d 0 t d s + 2 4 u s ( 4;v + 2 4 u )s ifl =r =p =q 4 v s ifl =p6=r =q 3;v E (d 0 t p s v q ) ifl =r =p6=q<t 0 otherwise andjE (d 0 t p s v q )j (st) 1=2 v : Similar argument can also be applied to it : Proof can be found at Alvarez and Arellano (2003). Lemma 16 Assumption 4.1-4.4, then the following holds for forward demeaning case 1 p NT T1 X t=2 x f0 t1 p t2 u f t = 1 p NT T1 X t=2 t 1 0 t1 p t2 u t 2 p +o p (1); Proof. For the numerator of the 2SLS estimator, we have 1 p NT T1 X t=2 x f0 t1 p t2 u f t = 1 p NT T1 X t=2 h t W 0 t1 p t2 u f t i 1 p NT T1 X t=2 h t W 0 tT p t2 u f t i = 1 p NT T1 X t=2 2 t 1 0 t1 p t2 u t + 1 p NT T1 X t=2 2 t 0 t1 p t2 u t 1 p NT T1 X t=2 2 t W 0 t1 p t2 u tT 1 p NT T1 X t=2 h 2 t W 0 tT p t2 u f t i = 1;NT + 2;NT + 3;NT + 4;NT ; (A.A.1) here u f t = t (u t u tT ) and u tT = 1 Tt+1 (u t+1 +::: +u T ) with u t = 1 v t + t t1 = t t1 : For the first term 1;NT , it will contribute to the limiting distribution. For 2;NT ; we have 2;NT = 1 p NT T1 X t=2 2 t 0 t1 p t2 u t = 1 p NT T1 X t=2 2 t 0 t1 p t2 ( 1 v t + t t1 ) = 1 p NT T1 X t=2 1 2 t 0 t1 p t2 v t + 1 p NT T1 X t=2 2 t 0 t1 p t2 t p NT T1 X t=2 2 t 0 t1 p t2 t1 = 21;NT + 22;NT + 23;NT ; then E ( 21;NT ) =E ( 22;NT ) = 0; 93 by the independence of t1 and t ; and E ( 23;NT ) = p NT T1 X t=2 2 t E 0 t1 p t2 t1 = 2 p NT T1 X t=2 2 t (t 1) = 2 p +o p (1): Also, for the variance, we have Var ( 21;NT ) = 1 NT T1 X s;t=2 2 t 2 s E 0 t1 p t2 v t 0 s1 p s2 v s = 1 NT T1 X t=2 4 t E v 0 t p t2 0 t1 t1 p t2 v t = 2 v 2 NT T1 X t=2 (t 1) +o p (1) = o p (1); and Var ( 22;NT ) = 1 NT T1 X s;t=2 2 t 2 s E 0 t1 p t2 t 0 s1 p s2 s = 1 NT T1 X t=2 4 t E 0 t1 p t2 t 0 t p t2 t1 = 4 NT T1 X t=2 (t 1) = o p (1); 94 and Var ( 23;NT ) = 1 NT T1 X s;t=2 E 2 t 0 t1 p t2 t1 2 s 0 s1 p s2 s1 1 NT T1 X t=2 E 2 t 0 t1 p t2 t1 ! 2 = 1 NT T1 X t=2 2 4 t E 0 t1 p t2 t1 0 t1 p t2 t1 + 1 NT X s6=t 2 2 t 2 s 4 (t 1) (s 1) 1 NT T1 X t=2 2 t 2 (t 1) ! 2 = 1 NT T X t=2 2 4 t 4; (t 1) + 2 4 (t 1) + 1 NT T1 X t=2 2 t 2 (t 1) ! 2 1 NT T1 X t=2 2 t 2 (t 1) ! 2 = o p (1); since E 0 t1 p t2 t1 = 2 (t 1); and E 0 t1 p t2 t1 0 t1 p t2 t1 = X i 1 ;i 2 ;i 3 ;i 4 p (i 1 ;i 2 ) t2 p f(i 3 ;i 4 ) t2 E ( i 1 ;t1 i 2 ;t1 i 3 ;t1 i 4 ;t1 ) = N X i=1 p (i;i) t2 p (i;i) t2 E 4 i;t1 + X i 1 6=i 2 p (i 1 ;i 1 ) t2 p (i 2 ;i 2 ) t2 E 2 i 1 ;t1 E 2 i 2 ;t1 +2 X i 1 6=i 2 p (i 1 ;i 2 ) t2 p (i 1 ;i 2 ) t1 E 2 i 1 ;t1 E 2 i 2 ;t1 = 4; N X i=1 p (i;i) t2 p (i;i) t2 + 4 X i 1 6=i 2 p (i 1 ;i 1 ) t2 p (i 2 ;i 2 ) t2 + 2 4 X i 1 6=i 2 p (i 1 ;i 2 ) t2 p (i 1 ;i 2 ) t1 = 4; (t 1) + 4 (t 1) 2 + 2 (t 1) ; where P (i;j) t denotes (i;j)-th element of P t : Combining these gives us that 21;NT ! p 0; 22;NT ! p 0; 23;NT ! p 2 p ; 95 consequently, we have 2;NT ! p 2 p : (A.A.2) For 3;NT ; we have 3;NT = 1 p NT T1 X t=2 2 t W 0 t1 p t2 u tT = 1 p NT T1 X t=2 2 t W 0 t1 p t2 tT p NT T1 X t=2 2 t W 0 t1 p t2 t1 31;NT + 32;NT ; Obviously, we have E ( 31;NT ) =E ( 32;NT ) = 0; by the construction of forward demeaning. For the variance of 31;NT ; we have Var ( 31;NT ) = 1 NT T1 X s;t=2 2 t 2 s E W 0 t1 p t2 tT W 0 s1 p t2 sT = 1 NT T1 X t=2 4 t E W 0 t1 p t2 tT W 0 t1 p t2 tT + 2 NT X s<t 2 t 2 s E W 0 t1 p t2 tT W 0 s1 p s2 sT ; where, fort =s; E W 0 t1 p t2 tT W 0 t1 p t2 tT = 2 1 2 v + 2 Tt + 1 E W 0 t1 p t2 W t1 = 2 1 2 v + 2 Tt + 1 E 0 t1 p t2 t1 + 2 (t 1) Tt + 1 : and fort>s; E W 0 t1 p t2 tT W 0 s1 p s2 sT = E W 0 t1 p t2 E t ( tT 0 sT ) p s2 W s1 = 2 1 2 v + 2 Ts + 1 E W 0 t1 p s2 W s1 = 2 1 2 v + 2 Ts + 1 E " ts s1 + ts X l=0 l v tl + t1 ! 0 p s2 s1 + s1 # = ( 2 1 2 v + 2 ) ts Ts + 1 E 0 s1 p s2 s1 ; 96 since we haveW it = it + it fort>s: Then Var ( 31;NT ) = 1 NT T1 X t=2 4 t 2 (t 1) Tt + 1 + 1 NT X s<t 2 t 2 s ( 2 1 2 v + 2 ) ts Ts + 1 E 0 s1 p s2 s1 C NT X s<t ts Ts + 1 E 0 s1 p s2 s1 +O p logT N ; C T T1 X t=2 t (1 t1 ) Tt + 1 +o p (1) = O p 1 T =o p (1); asT!1; where the last inequality holds since E 0 s1 p s2 s1 CE 0 s1 s1 =C N X i=1 E 2 i;s1 = O p (N): Similarly, Var ( 32;NT ) = 1 NT T1 X s;t=2 2 t 2 s E W 0 t1 p t2 tT W 0 s1 p s2 sT = o p (1); Combining these results gives us that 3;NT =o p (1): (A.A.3) Finally, for 4;NT , we have 4;NT = 1 p NT T1 X t=2 h W 0 tT p t2 u f t i = 1 p NT T1 X t=2 2 t W 0 tT p t2 u t 1 p NT T1 X t=2 2 t W 0 tT p t2 u tT 41;NT + 42;NT ; 97 where jE ( 41;NT )j 1 p NT T1 X t=2 E 2 t W 0 tT p t2 u t = 1 p NT T1 X t=2 t Tt jE (W t p t2 u t +::: + W T1 p t2 u t )j C p NT T1 X t=2 1 Tt 1 + + + Tt 2 v + 2 (t 1) C r T N T1 X t=2 1 Tt C r T 3 N logT T = o p (1); under Assumption 4.4, where u t = (u 1t ;u 2t ;:::;u Nt ) 0 ; u tT = 1 Tt+1 (u t +::: + u T ) and U t = t (u t u tT ); and because E W 0 tT p t2 u t = 1 Tt E W 0 t p t2 u t + + W 0 T1 p t2 u t C Tt 1 + + + Tt 2 v + 2 (t 1); with forst; E (W 0 s p t2 u t ) = E ( 1 s + s ) 0 p t2 (v t + t t1 ) = 1 st E (v 0 t p t2 v t ) +E ( 0 t p t2 t ) C st 2 v + 2 (t 1): Also, jE ( 42;NT )j = 1 p NT T1 X t=2 t E W 0 tT p t2 u tT = 1 p NT T1 X t=2 t (Tt + 1) (Tt) T1 X s=t E W 0 tT p t2 u s C p NT T1 X t=2 1 (Tt) (Tt + 1) T1 X s=t 1 + + + Ts 2 v + 2 (t 1) = o p (1); 98 by using the similar argument above. For the variance of 41;NT and 42;NT ; we have Var ( 41;NT ) = 1 NT Var T1 X t=2 2 t W 0 tT p t2 u t ! = 1 NT T1 X s;t=2 2 t 2 s E W 0 tT p t2 u t W 0 sT p s2 u s E 1 p NT T1 X t=2 E 2 t W 0 t p t2 u t !! 2 = 1 NT T1 X s;t=2 2 t 2 s E W 0 tT p t2 u t W 0 sT p s2 u s +o p (1); where, forts; E W 0 tT p t2 u t W 0 sT p s2 u s = 1 Tt 1 Ts T1 X l=s T1 X m=t E (W 0 m p t2 u t W 0 l p s2 u s ); then Var ( 41;NT ) = 1 NT T1 X s;t=2 2 t 2 s (Tt) (Ts) T1 X l=s T1 X m=t E (W 0 m p t2 u t W 0 l p s2 u s ) +o p (1) = 1 NT T1 X t=2 4 t (Tt) 2 T1 X l;m=t E (W 0 m p t2 u t W 0 l p s2 u t ) + 2 NT X s<t 2 t 2 s (Tt) (Ts) T1 X l=s T1 X m=t E (W 0 m p t2 u t W 0 l p s2 u s ) +o p (1); where, forl;mt andt =s; E (W 0 m p t2 u t W 0 l p s2 u t ) = E ( 1 m + m ) 0 p t2 (v t + t t1 ) ( 1 l + l ) 0 p t2 (v t + t t1 ) = E ( 1 0 m p t2 v t + 1 0 m p t2 t 1 0 m p t2 t1 + 0 m p t2 v t + 0 m p t2 t 0 m p t2 t1 ) ( 1 0 l p t2 v t + 1 0 l p t2 t 1 0 l p t2 t1 + 0 l p t2 v t + 0 l p t2 t 0 l p t2 t1 ) = 2 1 E ( 0 m p t2 v t 0 l p t2 v t ) + 2 1 E ( 0 m p t2 t 0 l p t2 t ) + 1 E ( 0 m p t2 t 0 t p t2 v t ) + 2 2 1 E ( 0 m p t2 t1 0 l p t2 t1 ) + 1 E ( 0 t p t2 v t 0 l p t2 t ) +E ( 0 l p t2 v t 0 l p t2 v t ) +E ( 0 l p t2 t 0 l p t2 t ) + 2 E ( 0 l p t2 t1 0 l p t2 t1 ); 99 where the last 3 terms hold only whenl =m; we have E ( 0 m p t2 v t 0 l p t2 v t ) = E " mt t + m X s=t ms v s ! 0 p t2 v t v 0 t p t2 lt t + l X s=t ls v s !# = l+m2t E ( 0 t p t2 v t v 0 t p t2 t ) + 2 l+m2t E (v 0 t p t2 v t v 0 t p t2 v t ) + min(l;m) X s=t l+m2s E (v s p t2 v t v 0 t p t2 v s ); then we have T1 X l;m=t E ( 0 m p t2 v t 0 l p t2 v t ) = T1 X l;m=t l+m2t E ( 0 t p t2 v t v 0 t p t2 t ) + T1 X l;m=t 2 l+m2t E (v 0 t p t2 v t v 0 t p t2 v t ) + T1 X l;m=t min(l;m) X s=t l+m2s E (v s p t2 v t v 0 t p t2 v s ) O p (t (Tt)): in view of the lemma (15) and the proof of the variance of 31;NT . Similar argument can also be applied to other terms, then we have 1 NT T1 X t=2 4 t (Tt) 2 T1 X l;m=t E (W 0 m p t2 u t W 0 l p s2 u s ) 1 NT T1 X t=2 4 t (Tt) 2 O p (t (Tt)) = o p (1); and forls;mt andt>s; E (W 0 m p t2 u t W 0 l p s2 u s ) = E ( 1 m + m ) 0 p t2 (v t + t t1 ) ( 1 l + l ) 0 p s2 (v s + s s1 ) = E ( 1 0 m p t2 v t + 1 0 m p t2 t 1 0 m p t2 t1 + 0 m p t2 v t + 0 m p t2 t 0 m p t2 t1 ) ( 1 0 l p s2 v s + 1 0 l p s2 s 1 0 l p s2 s1 + 0 l p s2 v s + 0 l p s2 s 0 l p s2 s1 ) = 2 1 E ( 0 m p t2 v t 0 l p s2 v s ) + 1 E ( 0 m p t2 t 0 l p s2 v s ) 2 1 E ( 0 m p t2 t1 0 l p s2 s ) E ( 0 m p t2 t1 0 l p s2 s ); we have that forts; P T1 l=s P T1 m=t E (W 0 m p t2 u t W 0 l p s2 u s ) is at most of orderO (ts) in view 100 of the above derivation; then Var ( 41;NT ) = C NT T1 X s;t=2 2 t 2 s (Tt) (Ts) O (ts) +o p (1); = o p (1); under Assumption 4.4. As a result, we have 41;NT =o p (1); by using the similar argument above, we have 42;NT =o p (1): Consequently, we have 4;NT =o p (1): (A.A.4) Substituting (A.A.2)-(A.A.4) to (A.A.1) gives the lemma as required. Lemma 17 Assumption 4.1-4.4, then the following hold for allt = 2;:::;T 1. (a) ~ B Nt B Nt 2 =O t 2 N ; (b) min ~ B Nt C > 0; (c) min (B Nt )C > 0; whereB Nt = 1 N P N i=1 z it2 z 0 it2 ; ~ B Nt = 1 N P N i=1 E z it2 z 0 it2 Proof. Part (a). we first notice that z it2 = i 1 t1 + 1 i (t 2) + i (t 2); where i = ~ i 1 ; i (t 2) = i;t2 ;:::; i;0 0 and i (t 2) = ( i;t2 ;:::; i;0 ) 0 : ~ B Nt = 1 N N X i=1 E z it2 z 0 it2 = 2 (1 ) 2 1 t1 1 0 t1 + 2 1 2 v t + 2 I t1 ; where t is the (t 1) (t 1) autoregressive matrix of it whose (j;k)-th element is given by jjkj 1 2 forj;k = 0; 2;:::;t 2. 101 By using the triangular inequality, we have ~ B Nt B Nt 2 = 1 N N X i=1 ( i 1 t1 + 1 i (t 2) + i (t 2)) i 1 0 t1 + 1 0 i (t 2) + 0 i (t 2) E ( i 1 t1 + 1 i (t 2) + i (t 2)) i 1 0 t1 + 1 0 i (t 2) + 0 i (t 2) 2 = 1 N N X i=1 8 < : i 1 t1 ( 1 0 i (t 2) + 0 i (t 2)) + i ( 1 i (t 2) + i (t 2)) 1 0 t1 + 2 1 [ i (t 2) 0 i (t 2)E ( i (t 2) 0 i (t 2))] + [ i (t 2) 0 i (t 2)E ( i (t 2) 0 i (t 2))] 9 = ; 2 C 1 N N X i=1 i 1 t1 ( 0 i (t 2) + 0 i (t 2)) 2 + 1 N N X i=1 2 1 [ i (t 2) 0 i (t 2)E ( i (t 2) 0 i (t 2))] 2 + 1 N N X i=1 [ i (t 2) 0 i (t 2)E ( i (t 2) 0 i (t 2))] 2 ; and note that 1 N N X i=1 i 1 t1 ( 1 0 i (t 2) + 0 i (t 2)) 2 = 2 1 N 2 N X i;j=1 tr 1 t1 ( i (t 2) + i (t 2)) ( 0 i (t 2) + 0 i (t 2)) 1 0 t1 = 2 1 N 2 N X i;j=1 tr ( i (t 2) + i (t 2)) ( 0 i (t 2) + 0 i (t 2)) 1 0 t1 1 t1 = (t 1) 2 1 N 2 N X i;j=1 tr (( i (t 2) + i (t 2)) ( 0 i (t 2) + 0 i (t 2))) = (t 1)t 2 1 N 1 p Nt N X i=1 ( i (t 2) + i (t 2)) ( 0 i (t 2) + 0 i (t 2)) 2 = O t 2 N ; and 1 N N X i=1 2 1 [ i (t 2) 0 i (t 2)E ( i (t 2) 0 i (t 2))] 2 = t 2 1 N 1 p Nt N X i=1 i (t 2) i (t 2) 0 E i (t 2) i (t 2) 0 2 = O t 2 N ; 102 and 1 N N X i=1 [ i (t 2) 0 i (t 2)E ( i (t 2) 0 i (t 2))] 2 = t 2 N 1 p Nt N X i=1 [ i (t 2) 0 i (t 2)E ( i (t 2) 0 i (t 2))] 2 = O t 2 N ; combining these gives us the results as required. Part (b). By Lemma 14(b), we have min ~ B Nt = min 2 (1 ) 2 1 t1 1 0 t1 + 2 v t + 2 I t1 min 2 v t C > 0; where the second inequality holds becauseJ t1 is an (t 1) (t 1) positive semi-definite matrix with rank at most 1 and the third inequality follows the fact thatE i (t) i (t) 0 is positive definite with trace of orderO (N): Part (c). From (a), if t 2 N ! p 0 as N goes to infinity for all t; then ~ B Nt B Nt ! p 0: Consequently, by using the result of Lemma 14(a), we have min (B Nt ) 1 2 C > 0; as required. Lemma 18 Assume Assumption 4.1-4.4 hold, then the followings hold as (N;T )!1 for for- ward orthogonal demeaning. (a) 1 NT P T1 t=2 P N i=1 x f it1 z 0 it2 P N i=1 z it2 z 0 it2 1 z it2 x f it1 =o p (1): (b) 1 p NT P T1 t=2 P N i=1 x f it1 z 0 it2 P N i=1 z it2 z 0 it2 1 z it2 u f it ! p 2 p : Proof. Part (a). Recall that X it = x f it = t W i;t1 t W i;tT ; W it = 1 it + it ; and W i;tT = 1 Tt T1 X s=t W i;s ; z it = i 1 t1 + 1 i (t 2) + i (t 2); 103 where i (t 2) = i;t2 ;:::; i0 0 ; i (t 2) = ( i;t2 ;:::; i0 ) 0 ; and it is an AR(1) process. We have 1 N E 0 @ x f it1 z 0 it2 1 N N X i=1 z it2 z 0 it2 ! 1 z it2 x f it1 1 A C N E x f2 it1 z 0 it2 z it2 ; by using part (c) of lemma (17). Also, E x f2 it1 z 0 it2 z it2 = E t W i;t1 t W i;tT 2 ( i 1 t1 + 1 i (t 2) + i (t 2)) 0 ( i 1 t1 + 1 i (t 2) + i (t 2)) = E 2 t W 2 i;t1 2 2 t W i;t1 W i;tT + 2 t W 2 i;tT 2 i (t 1) + P t2 s=0 W 2 is + 2 i 1 1 0 t1 i (t 2) + 2 i 1 0 t1 i (t 2) = (t 1)E 2 i 2 t E W 2 i;t1 +E 2 t W 2 i;t1 t2 X s=0 W 2 is ! 2 (t 1) 2 t E 2 i E W i;t1 W i;tT E 2 2 t W i;t1 W i;tT t2 X s=0 W 2 is ! + (t 1)E 2 i 2 t E W 2 i;tT +E 2 t W 2 i;tT t2 X s=0 W 2 is ! III 1 +III 2 +III 3 +III 4 +III 5 +III 6 ; where III 1 = (t 1)E 2 i 2 t E W 2 i;t1 = (t 1)E 2 i 2 t 2 1 2 v 1 2 + 2 ; and III 2 = 2 t t2 X s=0 E W 2 i;t1 W 2 is = 2 t t2 X s=0 E h 1 i;t1 + i;t1 2 1 i;s + i;s 2 i = 2 t t2 X s=0 E 2 1 2 i;t1 + 2 i;t1 + 2 1 i;t1 i;t1 2 1 2 i;s + 2 i;s + 2 1 i;s i;s = 2 t t2 X s=0 E 4 1 2 i;t1 2 i;s + 2 1 2 i;t1 2 i;s + 2 1 2 i;t1 2 i;s + 2 i;t1 2 i;s = 2 t t2 X s=0 4 1 E 2 i;t1 2 i;s + 2 t 2 2 1 2 v 2 1 2 + 4 (t 1); 104 and III 3 = 2 2 t (t 1)E 2 i E W i;t1 W i;tT = 2 2 t E ( 2 i ) (t 1) Tt T1 X s=t E (W i;t1 W i;s ) = 2 2 t E ( 2 i ) (t 1) Tt T1 X s=t E 1 i;t1 + i;t1 1 i;s + i;s = 2 2 t E ( 2 i ) 2 1 (t 1) Tt T1 X s=t E i;t1 i;s = 2 2 t E ( 2 i ) 2 1 (t 1) Tt 2 v 1 2 T1 X s=t st ; and III 4 = E 2 2 t W i;t1 W i;tT t2 X s=0 W 2 is ! = 2 2 t (Tt) 2 t2 X s 1 =0 T X s 2 =t E W i;t1 W 2 i;s 1 W i;s 2 = 2 2 t (Tt) 2 t2 X s 1 =0 T X s 2 =t E 1 i;t1 + i;t1 1 i;s 1 + i;s 1 2 1 i;s 2 + i;s 2 = 2 2 t (Tt) 2 t2 X s 1 =0 T X s 2 =t E 2 1 i;t1 i;s 2 2 1 2 i;s 1 + 2 i;s 1 = 2 2 t (Tt) 2 t2 X s 1 =0 T X s 2 =t E 4 1 i;t1 i;s 2 2 i;s 1 + 2 2 t 2 1 (Tt) 2 t2 X s 1 =0 T X s 2 =t E i;t1 i;s 2 E 2 i;s 1 = 2 2 t 4 1 (Tt) 2 t2 X s 1 =0 T X s 2 =t E i;t1 i;s 2 2 i;s 1 + 2 2 t 2 1 2 (Tt) T X s 2 =t E i;t1 i;s 2 1 ; and III 5 = E ( 2 i ) (t 1) 2 t (Tt) 2 E W 2 i;tT = E ( 2 i ) (t 1) 2 t (Tt) 2 T1 X s 1 ;s 2 =t E (W i;s 1 W i;s 2 ) = E ( 2 i ) (t 1) 2 t (Tt) 2 T X s=t E W 2 i;s + E ( 2 i ) (t 1) 2 t (Tt) 2 T1 X s 1 6=s 2 =t E 1 i;s 1 + i;s 1 1 i;s 2 + i;s 2 = E ( 2 i ) (t 1) 2 t (Tt) 2 T X s=t E W 2 i;s + E ( 2 i ) (t 1) 2 t (Tt) 2 T1 X s 1 6=s 2 =t E 2 1 i;s 1 i;s 2 = E ( 2 i ) (t 1) 2 t (Tt) 2 1 2 v 1 2 + 2 + E ( 2 i ) (t 1) 2 t (Tt) 2 2 1 2 v 1 2 T1 X s 1 6=s 2 =t js 1 s 2 j ; 105 and III 6 = E 2 t W 2 i;tT t2 X s=0 W 2 is ! = 2 t (Tt) 2 T1 X s 1 ;s 2 =t t2 X s 3 =0 E W i;s 1 W i;s 2 W 2 i;s 3 = 2 t (Tt) 2 T1 X s 1 ;s 2 =t t2 X s 3 =0 E 1 i;s 1 + i;s 1 1 i;s 2 + i;s 2 1 i;s 3 + i;s 3 2 = 2 t (Tt) 2 T1 X s 1 =t t2 X s 3 =0 E 1 i;s 1 + i;s 1 2 1 i;s 3 + i;s 3 2 + 2 t (Tt) 2 T1 X s 1 6=s 2 =t t2 X s 3 =0 E 1 i;s 1 + i;s 1 1 i;s 2 + i;s 2 1 i;s 3 + i;s 3 2 = 2 t (Tt) 2 T1 X s 1 =t t2 X s 3 =0 E 4 1 2 i;s 1 2 i;s 3 + 2 1 2 i;s 1 2 i;s 3 + 2 1 2 i;s 1 2 i;s 3 + 2 i;s 1 2 i;s 3 + 2 t (Tt) 2 T1 X s 1 6=s 2 =t t2 X s 3 =0 E 4 1 i;s 1 i;s 2 2 i;s 3 + 2 1 i;s 1 i;s 2 2 i;s 3 = 2 t 4 1 (Tt) 2 T1 X s 1 =t t2 X s 3 =0 E 2 i;s 1 2 i;s 3 + 2 t (t 1) (Tt) 2 2 1 2 v 2 1 2 + 4 : In sum, for the termsIII 1 toIII 6 above, we can observe that they are of orderO (t 1) at most, consequently, 1 N E 0 @ x f it1 z 0 it2 1 N N X i=1 z it2 z 0 it2 ! 1 z it2 x f it1 1 A C N (t 1); and 1 NT T1 X t=2 N X i=1 E 0 @ x f it1 z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 x f it1 1 A C N 2 T T1 X t=2 N X i=1 (t 1) = O p T N =o p (1); (A.A.5) under Assumption 4.4. 106 Moreover, for the variance, we have Var 0 @ 1 NT T1 X t=2 N X i=1 x f it1 z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 x f it1 1 A = 1 N 2 T 2 T1 X s;t=2 N X i;j=1 E 0 @ x f2 it1 z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 x f2 js1 z 0 js2 N X i=1 z is2 z 0 is2 ! 1 z js2 1 A +o p (1) (A.A.6) = 1 N 2 T 2 T1 X s;t=2 N X i=1 E 0 @ x f2 it1 z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 x f2 is1 z 0 is2 N X i=1 z is2 z 0 is2 ! 1 z is2 1 A + 1 N 2 T 2 T1 X s;t=2 X i6=j E 0 @ x f2 it1 z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 x f2 js1 z 0 js2 N X i=1 z is2 z 0 is2 ! 1 z js2 1 A +o p (1) (A.A.7) C N 4 T 2 T1 X s;t=2 N X i=1 E x f2 it1 z 0 it2 z it2 x f2 is1 z 0 is2 z is2 + C N 4 T 2 T X s;t=2 X i6=j E x f2 it1 z 0 it2 z it2 E x f2 js1 z 0 js2 z js2 +o p (1) = o p (1); (A.A.8) under assumptions 4.1-4.4 and as long as the eighth moments of it and" it are finite, where the last inequality holds by using the derivations above, and because C N 4 T 2 T1 X s;t=2 N X i=1 E x f2 it1 z 0 it2 z it2 x f2 is1 z 0 is2 z is2 C N 4 T 2 T1 X s;t=2 N X i=1 h E x f8 it1 E z 0 it2 z it2 4 E x f8 is1 E z 0 is2 z is2 4 i 1=4 = C N 4 T 2 N X i=1 T1 X t=2 h E x f8 it1 E z 0 it2 z it2 4 i 1=4 ! 2 = o p (1): and E x f8 it1 = E h t W i;t1 t W i;tT 8 i C 8 t E W 8 i;t1 + W 8 i;tT C 8 t E 8 1 8 i;t1 + 8 i;t1 = O p (1); 107 Consequently, combining (A.A.5) and (A.A.8) gives 1 NT T1 X t=2 N X i=1 x f it1 z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 x f it1 =o p (1); as required. Part (b), we first note that E 0 @ x f it1 z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 u f it 1 A = E 2 4 t W i;t1 t W i;tT z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 1 v f it + f it f i;t1 3 5 = E 2 4 t i;t1 z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 i;t1 3 5 t E 0 @ W i;tT z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 1 v f it + f it f i;t1 1 A I 1 +I 2 ; (A.A.9) and 1 p NT T1 X t=2 N X i=1 I 1 = 1 p NT T1 X t=2 N X i=1 t E 2 4 i;t1 z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 i;t1 3 5 = 2 p NT T1 X t=2 t E 2 4 N X i=1 z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 3 5 = 2 p NT T1 X t=2 t (t 1) ! p 2 p ; (A.A.10) 108 and Var 1 p NT T1 X t=2 N X i=1 I 1 ! = 2 NT Var 0 @ T1 X t=2 t N X i=1 2 i;t1 z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 1 A = 2 NT T1 X s;t=2 t s N X i 1 ;i 2 =1 E 0 @ 2 i 1 ;t1 z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 2 i 2 ;s1 z f0 i 2 s N X i=1 z f i 2 s z f0 i 2 s ! 1 z f i 2 s 1 A 2 4 +o p (1) = 2 NT T1 X s;t=2 t s N X i=1 E 0 @ 2 i;t1 z f0 it N X i=1 z f it z f0 it ! 1 z f it 2 i;s1 z f0 is N X i=1 z f is z f0 is ! 1 z f is 1 A + 2 NT T1 X s;t=2 t s X i 1 6=i 2 E 0 @ 2 i;t1 z f0 i 1 t N X i=1 z f it z f0 it ! 1 z f i 1 t 1 A E 0 @ 2 i 2 ;s1 z 0 i 2 t2 N X i=1 z it2 z 0 it2 ! 1 z i 2 t2 1 A 2 4 +o p (1) = 2 4; NT T1 X t=2 2 t N X i=1 E 0 @ z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 1 A + 2 4 NT T1 X s6=t t s N X i=1 E 0 @ z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 z 0 is2 N X i=1 z is2 z 0 is2 ! 1 z is2 1 A + 2 4 2 4 +o p (1) = o p (1); (A.A.11) since 2 4; NT T1 X t=2 2 t N X i=1 E 0 @ z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 1 A C N 3 T T X t=2 N X i=1 E z 0 it2 z it2 z 0 it2 z it2 C N 3 T T X t=2 N X i=1 (t 2) 2 = O p T 2 N 2 =o p (1): As a result, by combining (A.A.10) and (A.A.11), we have 1 p NT T1 X t=2 N X i=1 t i;t1 z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 i;t1 ! p 2 p : (A.A.12) 109 For the second termI 2 ; we have I 2 = t W i;tT z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 1 v f it + f it f i;t1 (A.A.13) = t 1 Tt T X s=t W is z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 v f it + t Tt T X s=t W is z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 f it t Tt T X s=t W is z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 f i;t1 = I 21 +I 22 +I 23 ; (A.A.14) then, E (I 21 ) = t 1 Tt T1 X s=t E 0 @ W is z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 v f it 1 A = 2 t 1 Tt T1 X s=t E 0 @ ( 1 is + is )z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 (v it v i;tT ) 1 A = 2 t 1 Tt T1 X s=t E 0 @ st+1 1 i;t1 + st v it z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 v it 1 A 2 t 2 1 (Tt) 2 T1 X s 1 =t T1 X s 2 =t+1 E 0 @ i;s 1 z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 v i;s 2 1 A = 2 t 2 v 1 Tt T1 X s=t st E 0 @ z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 1 A 2 t 2 1 (Tt) 2 T1 X s 2 s 1 =t+1 E 0 @ s 1 s 2 +1 i;s 2 1 + s 1 s 2 v i;s 2 z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 v i;s 2 1 A = 2 t 2 v 1 Tt T1 X s=t st E 0 @ z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 1 A 2 t 2 v 2 1 (Tt) 2 T1 X s 2 s 1 =t+1 s 1 s 2 E 0 @ z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 1 A ; (A.A.15) 110 consequently, we have 1 p NT T1 X t=2 N X i=1 E (I 21 ) = 1 p NT T1 X t=2 N X i=1 2 t 2 v Tt T1 X s=t st E 0 @ z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 1 A 1 p NT T1 X t=2 N X i=1 2 t 2 v (Tt) 2 T1 X s 2 s 1 =t+1 s 1 s 2 E 0 @ z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 1 A = 1 p NT T1 X t=2 2 t 2 v (t 1) Tt T1 X s=t st 1 p NT T1 X t=2 2 t 2 v (t 1) (Tt) 2 T1 X s 2 s 1 =t+1 s 1 s 2 = o p (1); (A.A.16) under Assumption 4.4. For variance, we have Var 1 p NT T1 X t=2 N X i=1 I 21 ! = 1 NT T1 X s;t=2 N X i 1 ;i 2 =1 t s (Tt) (Ts) T1 X s 1 =t 1 T1 X ss=t 2 E 0 @ W i 1 s 1 z 0 i 1 t 1 2 P N i=1 z it 1 2 z 0 it 1 2 1 z i 1 t 1 2 v f i 1 t 1 W i 2 s 2 z 0 i 2 t 2 2 P N i=1 z it 2 2 z 0 it 2 2 1 z i 2 t 2 2 v f i 2 t 2 1 A +o p (1) (A.A.17) = 1 NT T1 X s;t=2 N X i=1 t s (Tt) (Ts) T1 X s 1 =t 1 T1 X ss=t 2 E 0 @ W is 1 z 0 it 1 2 P N i=1 z it 1 2 z 0 it 1 2 1 z it 2 2 v f it 1 W is 2 z 0 it 2 2 P N i=1 z it 2 2 z 0 it 2 2 1 z it 2 2 v f it 2 1 A +o p (1) (A.A.18) = 1 NT N X i=1 E 2 4 T1 X t=2 T1 X s=t t (Tt) W is z 0 it 1 2 N X i=1 z it 1 2 z 0 it 1 2 ! 1 z it 2 2 v f it 3 5 2 +o p (1) C N N X i=1 T1 X t=2 T1 X s=t 2 t Tt E 0 @ W 2 is v f2 it 0 @ z 0 it 1 2 N X i=1 z it 1 2 z 0 it 1 2 ! 1 z it 2 2 1 A 2 1 A +o p (1) C N 3 N X i=1 T1 X t=2 E z 0 it 1 2 z it 2 2 2 2 t Tt T1 X s=t E W 2 is v f2 it +o p (1) C N 3 N X i=1 T1 X t=2 E z 0 it 1 2 z it 2 2 2 2 t Tt T1 X s=t h E W 4 is E v f4 it i 1=2 +o p (1) C N 3 N X i=1 T1 X t=2 (t 1) 2 +o p (1) = o p (1); (A.A.19) 111 as a result, we have (from (A.A.16) and (A.A.19)) 1 p NT T X t=2 N X i=1 I 21 =o p (1): (A.A.20) ForI 22 ; we have E (I 22 ) = t Tt T1 X s=t E 0 @ W is z 0 it 1 2 N X i=1 z it 1 2 z 0 it 1 2 ! 1 z it 2 2 ( it i;tT ) 1 A = t 2 Tt E 0 @ z 0 it 1 2 N X i=1 z it 1 2 z 0 it 1 2 ! 1 z it 2 2 1 A t 2 (Tt + 1) E 0 @ z 0 it 1 2 N X i=1 z it 1 2 z 0 it 1 2 ! 1 z it 2 2 1 A = t 2 (Tt) (Tt + 1) E 0 @ z 0 it 1 2 N X i=1 z it 1 2 z 0 it 1 2 ! 1 z it 2 2 1 A ; consequently, we have 1 p NT T1 X t=2 N X i=1 E (I 22 ) = 1 p NT T1 X t=2 N X i=1 t 2 (Tt) (Tt + 1) E 0 @ z 0 it 1 2 N X i=1 z it 1 2 z 0 it 1 2 ! 1 z it 2 2 1 A = 1 p NT T1 X t=2 t 2 (t 1) (Tt) (Tt + 1) = o p (1); 112 and Var 1 p NT T1 X t=2 N X i=1 I 22 ! = 1 NT T1 X t 1 ;t 2 =2 N X i 1 ;i 2 =1 t 1 t 2 (Tt 1 ) (Tt 2 ) T1 X s 1 =t 1 T1 X s 2 =t 2 E 0 @ W i 1 s 1 z 0 i 1 t 1 2 P N i=1 z it 1 2 z 0 it 1 2 1 z i 1 t 1 2 ( i 1 t 1 i 1 ;t 1 T ) W i 2 s 2 z 0 i 2 t 2 2 P N i=1 z it 2 2 z 0 it 2 2 1 z i 2 t 2 2 ( i 2 t 2 i 2 ;t 2 T ) 1 A +o p (1) = 1 NT T1 X t 1 ;t 2 =2 N X i=1 t 1 t 2 (Tt 1 ) (Tt 2 ) T1 X s 1 =t 1 T1 X s 2 =t 2 E 0 @ W is 1 z 0 it 1 2 P N i=1 z it 1 2 z 0 it 1 2 1 z it 2 2 ( it 1 i;t 1 T ) W is 2 z 0 it 2 2 P N i=1 z it 2 2 z 0 it 2 2 1 z it 2 2 ( it 2 i;t 2 T ) 1 A +o p (1) = 1 NT N X i=1 E 2 4 T1 X t=2 T1 X s=t t Tt 0 @ W is ( it i;tT )z 0 it 1 2 N X i=1 z it 1 2 z 0 it 1 2 ! 1 z it 2 2 1 A 3 5 2 +o p (1) C N N X i=1 T1 X t=2 T1 X s=t 2 t Tt E 0 @ W 2 is ( it i;tT ) 2 0 @ z 0 it 1 2 N X i=1 z it 1 2 z 0 it 1 2 ! 1 z it 2 2 1 A 2 1 A +o p (1) C N 3 N X i=1 T1 X t=2 1 Tt E z 0 it 1 2 z it 2 2 2 T1 X s=t E W 2 is ( it i;tT ) 2 +o p (1) C N 3 N X i=1 T1 X t=2 1 Tt E z 0 it 1 2 z it 2 2 2 T1 X s=t E W 4 is E ( it i;tT ) 4 1=2 +o p (1) C N 3 N X i=1 T1 X t=2 (t 1) 2 +o p (1) = o p (1): Consequently, we have 1 p NT T1 X t=2 N X i=1 I 22 =o p (1): (A.A.21) Similarly, we have 1 p NT T1 X t=2 N X i=1 I 23 =o p (1): (A.A.22) Combining (A.A.20)-(A.A.22), we have 1 p NT T1 X t=2 N X i=1 t W i;tT z 0 it 1 2 N X i=1 z it 1 2 z 0 it 1 2 ! 1 z it 2 2 1 v f it + f it f i;t1 =o p (1); (A.A.23) 113 and combining (A.A.12) and (A.A.23), we have 1 p NT T1 X t=2 N X i=1 x f it1 z 0 it2 N X i=1 z it2 z 0 it2 ! 1 z it2 u f it ! p 2 p ; as required. Lemma 19 Assume Assumption 4.1-4.4 hold, then the followings hold as (N;T )!1 for first difference. (a) 1 NT P T t=3 P N i=1 y it1 z 0 it3 P N i=1 z it3 z 0 it3 1 z it3 y it1 =o p (1): (b) 1 p NT P T t=3 P N i=1 y it1 z 0 it3 P N i=1 z it3 z 0 it3 1 z it3 u it ! p ( 2 1 2 v + (1 + 2 ) 2 ) p : Proof. Part (a), same as above. Part (b), we notice that E 0 @ 1 p NT T X t=3 N X i=1 y it1 z 0 it3 N X i=1 z it3 z 0 it3 ! 1 z it3 u it 1 A = 1 p NT T X t=3 N X i=1 E 0 @ (W i;t1 W i;t2 ) ( 1 v it + it it1 )z 0 it3 N X i=1 z it3 z 0 it3 ! 1 z it3 1 A = 1 p NT T X t=3 N X i=1 E 0 @ W i;t1 ( 1 v it it it1 )z 0 it3 N X i=1 z it3 z 0 it3 ! 1 z it3 1 A p NT T X t=3 N X i=1 E 0 @ 2 i;t2 z 0 it3 N X i=1 z it3 z 0 it3 ! 1 z it3 1 A = 1 p NT T X t=3 N X i=1 E 0 @ 2 1 i;t1 v i;t1 (1 + ) 2 i;t1 z 0 it3 N X i=1 z it3 z 0 it3 ! 1 z it3 1 A p NT T X t=3 N X i=1 E 0 @ 2 i;t2 z 0 it3 N X i=1 z it3 z 0 it3 ! 1 z it3 1 A ! p 2 1 2 v + (1 + 2 ) 2 p ; and the mean-square convergence follows from the previous proof of (18). Lemma 20 Assume Assumption 4.1-4.4 hold, then the following holds as (N;T )!1; 1 NT T X t=2 W 0 t2 p t2 W t2 ! p 2 1 2 v 1 2 + 2 : Proof. To prove this result, we notice that W t = Y t where = ( 1 ;:::; N ) 0 with i = ~ i 1 : 114 Let t be theN 1 vector of errors of the population linear projection of on Z t ; then we have t = Z t t with t = [E (Z it Z 0 it )] 1 E (Z it i ): By using the decomposition, we have W it = 1 it + it ;y it = i + 1 it + it ; z it = i 1 t1 + 1 i (t 2) + i (t 2); where i (t 2) = i;t2 ;:::; i0 0 ; i (t 2) = ( i;t2 ;:::; i0 ) 0 ; and it is an AR(1) process. Let t be the (t 1) (t 1) autoregressive matrix of it whose (j;k)-th element is given by jjkj 1 2 forj;k = 0; 2;:::;t 2; then we have E (z it z 0 it ) = 2 1 t1 1 0 t1 + 2 1 2 v t + 2 I t1 ; E (z it i ) = 2 1 t1 : To get an exact expression of the inverse ofE (z it z 0 it ); we notice that [E (z it z 0 it )] 1 = 2 1 t1 1 0 t1 + 2 1 2 v t + 2 I t1 1 = 2 1 2 v t + 2 I t1 1 2 ( 2 1 2 v t + 2 I t1 ) 1 1 t1 1 0 t1 ( 2 1 2 v t + 2 I t1 ) 1 1 + 2 1 0 t1 ( 2 1 2 v t + 2 I t1 ) 1 1 t1 ; and 2 1 2 v t + 2 I t1 1 = 2 1 2 v 1 t { 1 1 +{ 1 [(t 1) + 2 (t 3)] 2 t ; 115 by using the formula of Miller (1981), where{ 1 = 2 2 1 2 v . Substituting back, we have [E (z it z 0 it )] 1 = 2 1 2 v 1 t { 1 1 +{ 1 [(t 1) + 2 (t 3)] 2 t 2 2 1 2 v 1 t { 1 1+{ 1[(t1)+ 2 (t3)] 2 t 1 t1 1 0 t1 2 1 2 v 1 t { 1 1+{ 1 [(t1)+ 2 (t3)] 2 t 1 + 2 1 0 t1 2 1 2 v 1 t { 1 1+{ 1 [(t1)+ 2 (t3)] 2 t 1 t1 = 2 1 2 v 1 t 2 1 2 v { 1 1 + { 1 (1 2 ) [(t 1) + 2 (t 3)] 2 t + 2 4 1 4 v { 1 1+{ 1 [(t1)+ 2 (t3)] 1 t 1 t1 1 0 t1 2 t 2 4 1 4 v 1 t 1 t1 1 0 t1 1 t 1 + 2 2 1 2 v 1 2 + (t 1) (1 ) 2 { 1 1+{ 1 [(t1)+ 2 (t3)] 2 + (1 ) 2 (t 3) + 2 4 1 4 v { 1 1+{ 1 [(t1)+ 2 (t3)] 2 t 1 t1 1 0 t1 1 t 2 4 1 4 v { 1 1+{ 1 [(t1)+ 2 (t3)] 2 2 t 1 t1 1 0 t1 2 t 1 + 2 2 1 2 v 1 2 + (t 1) (1 ) 2 { 1 1+{ 1 [(t1)+ 2 (t3)] 2 + (1 ) 2 (t 3) then t = 2 1 t1 1 0 t1 + 2 1 2 v t + 2 I t1 1 2 i 1 t1 = 2 1 2 v 2 i 1 t 1 t1 2 v 2 i { 1 1 +{ 1 [(t 1) + 2 (t 3)] 2 t 1 t1 + 4 i 4 v { 1 1+{ 1 [(t1)+ 2 (t3)] 1 t 1 t1 1 0 t1 2 t 1 t1 4 i 4 v 1 t 1 t1 1 0 t1 1 t 1 t1 1 + 2 i 2 v 1 2 + (t 1) (1) 2 { 1 1+{ 1 [(t1)+ 2 (t3)] 2 + (1 ) 2 (t 3) + 4 i 4 v { 1 1+{ 1 [(t1)+ 2 (t3)] 2 t 1 t1 1 0 t1 1 t 1 t1 4 i 4 v { 1 1+{ 1 [(t1)+ 2 (t3)] 2 2 t 1 t1 1 0 t1 2 t 1 t1 1 + 2 i 2 v 1 2 + (t 1) (1 ) 2 { 1 1+{ 1 [(t1)+ 2 (t3)] 2 + (1 ) 2 (t 3) = { 2 1 t 1 t1 { 2 ~ { t 2 t 1 t1 +{ 1 t { 2 2 ~ { t 2 + (1 ) 2 (t 3) 1 2 + (t 1) (1 ) 2 1 t 1 t1 +{ 1 t { 2 2 ~ { t 1 2 + (t 1) (1 ) 2 ~ { 2 t 2 + (1 ) 2 (t 3) 2 t 1 t1 = { 2 1 t 1 t1 { 2 ~ { t 2 t 1 t1 +{ 2 { 1 t 1 1 t 1 t1 { 2 ~ { t { 1 t 1 2 t 1 t1 = { 2 { 1 t 1 t 1 t1 { 2 ~ { t { 1 t 2 t 1 t1 since { 1 t { 2 ~ { t 2 + (1 ) 2 (t 3) 1 2 + (t 1) (1 ) 2 ={ 1 t 1 116 where{ 2 = 2 2 1 2 v ; { t = 1 +{ 2 1 2 + (t 1) (1 ) 2 ~ { t 2 + (1 ) 2 (t 3) ; ~ { t = { 1 1 +{ 1 [(t 1) + 2 (t 3)] : and{ t =O p (t) and ~ { t =O p 1 t : Consequently, we have, for theith element of t ; it = i z 0 it t = i 1 1 0 t1 t 0 i (t 2) t 0 i (t 2) t : It’s obvious that E ( it ) = 0 since it is the linear combination of zero mean random variables and by the independence of i ; it and it . Also, forE ( 2 it ); we note; E 2 it = 2 1 1 0 t1 t 2 + 2 1 0 t E [ i (t 2) 0 i (t 2)] t + 0 t E [ i (t 2) 0 i (t 2)] t ; where E [ i (t 2) 0 i (t 2)] = 2 v t and E [ i (t 2) 0 i (t 2)] = 2 I t1 : since i;t = i;t1 +v it by construction. For the first term 2 1 1 0 t1 t 2 ; we notice that 1 0 t1 t = { 2 { 1 t 1 0 t1 1 t 1 t1 { 2 ~ { t { 1 t 1 0 t1 2 t 1 t1 = { 1 t { 2 1 2 + (t 1) (1 ) 2 { 2 ~ { t { 1 t 2 + (1 ) 2 (t 3) = { 1 t { 2 1 2 + (t 1) (1 ) 2 ~ { t 2 + (1 ) 2 (t 3) = 1{ 1 t : Consequently, we have 2 1 1 0 t1 t 2 = 2 { 2 t =O p 1 t 2 : For the second term, 2 1 0 t E [ i (t 2) 0 i (t 2)] t = 2 1 2 v 0 t t t =O p 1 t ; 117 since 0 t t t = { 2 { 1 t 1 t 1 t1 { 2 ~ { t { 1 t 2 t 1 t1 0 V t { 2 { 1 t 1 t 1 t1 { 2 ~ { t { 1 t 2 t 1 t1 = { 2 { 1 t 1 0 t1 1 t { 2 ~ { t { 1 t 1 0 t1 2 t { 2 { 1 t 1 t1 { 2 ~ { t { 1 t 1 t 1 t1 = { 2 2 { 2 t 1 0 t1 1 t 1 t1 2{ 2 2 { 2 t ~ { t 1 0 t1 2 t 1 t1 +{ 2 2 ~ { 2 t { 2 t 1 0 t1 3 t 1 t1 = O p 1 t : Finally, for the third term, we have 0 t E [ i (t 2) 0 i (t 2)] t = 2 0 t t = { 2 { 1 t 1 t 1 t1 { 2 ~ { t { 1 t 2 t 1 t1 0 { 2 { 1 t 1 t 1 t1 { 2 ~ { t { 1 t 2 t 1 t1 = O p 1 t : by using the similar argument above. As a result, we have E 2 it =O p 1 t : Moreover, given the existence of fourth moment of i ;" it and it ; we also have E 4 it =O p 1 t 2 : Now, for the decomposition, W 0 t2 p t2 W t2 = W 0 t2 W t2 W 0 t2 (I N p t2 ) W t2 = W 0 t2 W t2 0 t (I N p t2 ) t ; following Alvarez and Arellano (2003), then 1 NT T X t=2 E W 0 t2 p t2 W t2 = 1 NT T X t=2 E W 0 t2 W t2 E ( 0 t (I N p t2 ) t ) = 1 NT N X i=1 T X t=2 E w 2 i;t2 1 NT T X t=2 E ( 0 t (I N p t2 ) t ) ! p 2 1 2 v 1 2 + 2 ; 118 which holds since E ( 0 t (I N p t2 ) t )E ( max ((I N p t2 )) 0 t t )E ( 0 t t ); and 1 NT T X t=2 E ( 0 t (I N p t2 ) t ) 1 T T X t=2 E 2 it 1 T T X t=2 O 1 t = logT T ! 0: we have 1 NT T X t=2 E W 0 t2 p t2 W t2 ! p 2 1 2 v 1 2 + 2 : Moreover, the mean square convergence follows Alvarez and Arellano (2003), then we have 1 NT T X t=2 W 0 t2 p t2 W t2 ! p 2 1 2 v 1 2 + 2 ; as required. Lemma 21 Assume Assumption 4.1-4.4 hold, then the following holds as (N;T )!1; (a) for forward demeaning, we have 1 NT T1 X t=2 x f0 t1 p t2 x f t1 = 1 NT T1 X t=2 2 t W 0 t1 p t2 W t1 +o p (1) ! p 2 2 1 2 v 1 2 + 2 : (b) for first difference, we have 1 NT T X t=3 y 0 t1 p t3 y t1 ! p 2 (1 ) 2 2 1 2 v 1 2 + 2 : Proof. (a) We first note that 1 NT T1 X t=2 x f0 t1 p t2 x f t1 = 1 NT T1 X t=2 2 t W 0 t1 p t2 W t1 2 NT T1 X t=2 2 t W 0 t1 p t2 W tT + 1 NT T1 X t=2 2 t W 0 tT p t2 W tT = II 1 +II 2 +II 3 ; (A.A.24) 119 where the first term converges to II 1 = 1 NT T1 X t=2 2 t W 0 t1 p t2 W t1 = 1 NT T1 X t=2 2 t [ W t2 + 1 v t1 + t1 t2 ] 0 p t2 [ W t2 + 1 v t1 + t1 t2 ] = 2 NT T1 X t=2 2 t W 0 t2 p t2 W t2 + 2 NT T1 X t=2 2 t W 0 t2 p t2 ( 1 v t1 + t1 t2 ) + 2 NT T1 X t=2 2 t ( 1 v t1 + t1 t2 ) 0 p t2 ( 1 v t1 + t1 t2 ) ! p 2 2 1 2 v 1 2 + 2 ; (A.A.25) by using the results of lemma (20), and lemma (15) and the fact that 2 t ! p 1: For the second term, we notice that E W 0 t1 p t2 W tT = 1 Tt T1 X s=t E W 0 t1 p t2 W s = 1 Tt T1 X s=t E 1 t1 + t1 0 p t2 ( 1 s + s ) = 2 1 Tt T1 X s=t E 0 t1 p t2 s = 2 1 Tt T1 X s=t st+1 E 0 t1 p t2 t1 ; then E (II 2 ) C NT T1 X t=2 1 Tt T1 X s=t st+1 E 0 t1 p t2 t1 C T T1 X t=2 Tt+1 Tt = o p (1); (A.A.26) by using the similar argument above, and Var (II 2 ) = E 1 NT T1 X t=2 2 t E W 0 t1 p t2 W tT ! 2 +o p (1) = 1 N 2 T 2 T1 X s;t=2 2 t 2 s E W 0 t1 p t2 W tT W 0 s1 p s2 W sT +o p (1) = o p (1); (A.A.27) 120 by following the proof of 31;NT . Then, from (A.A.26) and (A.A.27), we have II 2 ! p 0: (A.A.28) Similarly, for the third term, we have E (II 3 ) = E 1 NT T1 X t=2 2 t W 0 tT p t2 W tT ! = 1 NT T1 X t=2 2 t E W 0 tT p t2 W tT = 1 NT T1 X t=2 2 t 1 (Tt) 2 T1 X t 1 ;t 2 =t E W 0 t 1 p t2 W t 2 ; with T1 X t 1 ;t 2 =t E W 0 t 1 p t2 W t 2 = T1 X t 1 ;t 2 =t E 1 t 1 + t 1 0 p t2 1 t 2 + t 2 = T1 X s=t E ( 1 s + s ) 0 p t2 ( 1 s + s ) + 2 2 1 X t=t 1 <t 2 t 2 t 1 E 0 t 1 p t2 t 1 = 2 1 T1 X s=t E ( 0 s p t2 s ) + T1 X s=t E ( 0 s p t2 s ) + 2 2 1 X t=t 1 <t 2 t 2 t 1 E 0 t 1 p t2 t 1 then jE (II 3 )j C NT T1 X t=2 1 (Tt) 2 T1 X s=t E ( 0 s p t2 s ) + T1 X s=t E ( 0 s p t2 s ) + 2 X t=t 1 <t 2 t 2 t 1 E 0 t 1 p t2 t 1 ! = o p (1); (A.A.29) by using the similar arguments above and proof of (17). For the variance, we have Var 1 NT T1 X t=2 2 t W 0 tT p t2 W tT ! = 1 N 2 T 2 T1 X s;t=2 E 2 t 2 s W 0 tT p t2 W tT W 0 sT p s2 W sT +o p (1) = 1 N 2 T 2 T1 X s;t=2 T1 X t 1 ;t 2 =t T1 X s 1 ;s 2 =s 1 (Tt) 2 (Ts) 2 E W 0 t 1 p t2 W t 2 W 0 s 1 p s2 W s 2 +o p (1) = o p (1); (A.A.30) by using the similar argument of 31;NT . Then, from (A.A.29) and (A.A.30), we have II 3 ! p 0: (A.A.31) 121 Consequently, by substituting (A.A.25), (A.A.28) and (A.A.31) back to (A.A.24), we have 1 NT T1 X t=2 x f0 t1 p t2 x f t1 = 1 NT T1 X t=2 2 t W 0 t1 p t2 W t1 +o p (1) ! p 2 2 1 2 v 1 2 + 2 : as required. (b) when first difference is used, we note that instrumentsz it3 = (y it3 ;y it4 ;:::;y i0 ) 0 ; which contains the information up tot 3; then by following the proof of lemma (21), we have 1 NT T X t=3 W 0 t3 p t3 W t3 ! p 2 1 2 v 1 2 + 2 : (A.A.32) Also, we have 1 NT T X t=3 y 0 t1 p t3 y t1 = 1 NT T X t=3 W 0 t1 p t3 W t1 + 1 NT T X t=3 W 0 t2 p t3 W t2 2 NT T X t=3 W 0 t1 p t3 W t2 = A 1 +A 2 +A 3 ; (A.A.33) where A 1 = 1 NT T X t=3 W 0 t1 p t3 W t1 = 1 NT T X t=3 h 1 t1 + t1 0 p t3 1 t1 + t1 i = 1 NT T X t=3 1 2 t3 + 2 t3 + 1 v t2 + 1 v t1 + t1 2 t3 0 p t3 1 2 t3 + 2 t3 + 1 v t2 + 1 v t1 + t1 2 t3 = 1 NT T X t=3 ( 2 W t3 + 1 v t2 + 1 v t1 + t1 2 t3 ) 0 p t3 ( 2 W t3 + 1 v t2 + 1 v t1 + t1 2 t3 ) = 4 1 NT T X t=3 W 0 t3 p t3 W t3 +o p (1) ! p 4 2 1 2 v 1 2 + 2 ; (A.A.34) where the penultimate equation follows Assumption 4.4 and the proof of lemma (21). Similarly, 122 forA 2 ; we have A 2 = 1 NT T X t=3 W 0 t2 p t3 W t2 = 1 NT T X t=3 (W t3 + 1 v t2 + t2 t3 ) 0 p t3 (W t3 + 1 v t2 + t2 t3 ) = 2 1 NT T X t=3 W 0 t3 p t3 W t3 +o p (1) ! p 4 2 1 2 v 1 2 + 2 : (A.A.35) And forA 3 ; we have A 3 = 2 NT T X t=3 W 0 t1 p t3 W t2 = 2 NT T X t=3 ( 2 W t3 + 1 v t2 + 1 v t1 + t1 2 t3 ) 0 p t3 ( W t3 + 1 v t2 + t2 t3 ) = 2 3 1 NT T X t=3 W 0 t3 p t3 W t3 +o p (1) = 2 3 2 1 2 v 1 2 + 2 : (A.A.36) By substituting (A.A.34)-(A.A.36) back to (A.A.32), we can obtain 1 NT T X t=3 y 0 t1 p t3 y t1 ! p 2 (1 ) 2 2 1 2 v 1 2 + 2 ; for first difference as required. Lemma 22 As (N;T )!1; under Assumption 4.1-4.4, (a) For forward demeaning, we have 1 p NT T X t=2 0 t1 p t2 u t ! d N 0; ( 2 1 2 v + (1 2 ) 2 ) 2 v 1 2 ; where u t = 1 v t + t t1 (b) For first difference, we have 1 p NT T X t=3 0 t2 p t3 u t ! d N 0; 2 [ 2 1 2 v + (1 + ) 2 ] 2 v 1 + : Proof. (a) We have 1 p NT T X t=2 0 t1 p t2 u t = 1 p NT T X t=2 0 t1 u t + 1 p NT T X t=2 0 t1 (I N p t2 ) u t ; (A.A.37) 123 with E 1 p NT T X t=2 0 t1 (I N p t2 ) u t ! = 0; (A.A.38) and Var 1 p NT T X t=2 0 t1 (I N p t2 ) u t ! = 1 NT T X t=2 Var 0 t1 (I N p t2 ) u t + 2 NT X s<t Cov 0 t1 (I N p t2 ) u t ; 0 s1 (I N p s2 ) u s = C NT T X t=2 E 0 t1 (I N p t2 ) t1 + 2 NT T X t=2 E 0 t1 (I N p t2 ) u t 0 t2 (I N p t3 ) u t1 = 2 C NT T X t=2 E 0 t2 (I N p t2 ) t2 +o p (1) = 2 C NT T X t=2 O 1 t +o p (1) = o p (1); (A.A.39) by using the result from the proof of (20): Consequently, from (A.A.38) and (A.A.39), we have 1 p NT T X t=2 0 t1 p t3 u t = 1 p NT T X t=2 0 t1 u t +o p (1): (A.A.40) Furthermore, we have E 0 t1 u t = 0; (A.A.41) by the independence of t1 and u t : Also, forts; we have 1 N E 0 t1 u t 0 s1 u s = 8 > < > : ( 2 1 2 v +(1+ 2 ) 2 ) 2 v 1 2 ift =s 2 2 2 v 1 2 ift =s + 1 0 otherwise : then it can be easily verified that 0 t1 u t ;F t is an adapted mixingale with size1; whereF t the 124 -algebra generated by the entire current and past history of 0 t1 u t : 11 Furthermore, we have Var 1 p NT T X t=2 0 t1 u t ! = 1 NT T X t=2 E 0 t1 u t 0 t1 u t + 2 NT T X t=2 E 0 t1 u t 0 t2 u t1 ! p ( 2 1 2 v + (1 + 2 ) 2 ) 2 v 1 2 2 2 2 2 v 1 2 = ( 2 1 2 v + (1 2 ) 2 ) 2 v 1 2 : (A.A.42) As a result, by applying the CLT for dependent process (for example, White (2001)), we have 1 p NT T X t=2 0 t1 u t ! d N 0; ( 2 1 2 v + (1 2 ) 2 ) 2 v 1 2 : (A.A.43) As a result, substituting (A.A.43) to (A.A.40) gives result as required. (b) The proof follows from the above derivation, and by noticing that, forts 1 N E 0 t2 u t 0 s2 u s = 8 > > > > < > > > > : (2 2 1 2 v +2(1+ + 2 ) 2 ) 2 v 1 2 ift =s ( 2 1 2 v +(1+ ) 2 2 ) 2 v 1 2 ift =s + 1 3 2 2 v 1 2 ift =s + 2 0 otherwise ; then Var 1 p NT T X t=3 0 t2 u t ! = 1 NT T X t=3 E 0 t2 u t 0 t2 u t + 2 NT T X t=3 E 0 t2 u t 0 t3 u t1 + 2 NT T X t=3 E 0 t2 u t 0 t4 u t2 ! p (2 2 1 2 v + 2 (1 + + 2 ) 2 ) 2 v 1 2 2 2 1 2 v + (1 + ) 2 2 2 v 1 2 + 2 3 2 2 v 1 2 = 2 [(1 ) 2 1 2 v + (1 2 ) 2 ] 2 v 1 2 = 2 [ 2 1 2 v + (1 + ) 2 ] 2 v 1 + ; then we have 1 p NT T X t=3 0 t2 p t3 u t ! d N 0; 2 [ 2 1 2 v + (1 + ) 2 ] 2 v 1 + ; as required. 11 For the details on the definition of adapted mixingale, please refer to White (2001). Here, we can think of" 0 t1 U t as a MA(1) process. 125 A.3.2 Proof of Multivariate model For the multi-dimension model after FOD, we have y f 1;it = y f0 it1 +u f 1;it and we can use z t2 = y (1) 0 ; y (2) 0 ;:::; y (1) t2 ; y (2) t2 ; as instruments for the above model with y (j) s = (y j;1s ;:::;y j;Ns ) 0 forj = 1; 2 ands = 0;:::;t 2: It can also be noted that z t2 is aN 2 (t 1) matrix of instruments. Then the 2SLS estimator is given by ^ f 2SLS = T1 X t=2 X f0 t1 z t2 z 0 t2 z t2 1 z 0 t2 X f t1 ! 1 T1 X t=2 X f0 t1 z t2 z 0 t2 z t2 y (1)f t = T X t=2 X f0 t1 p M;t2 X f t1 ! 1 T1 X t=2 X f0 t1 p M;t2 y (1)f t ; where X f t1 = x f 1;t1 ;:::; x f N;t1 0 being a N 2 matrix of regressors with x f i;t1 = x f 1;it1 ;x f 2;it1 0 and p M;t2 = z t2 z 0 t2 z t2 1 z 0 t2 being the projection matrix. Under strict stationarity assumption (4.7) of y it ; we have y it = y it1 + ~ i + e it + it it1 ; then y it = (I 2 ) 1 ~ i + (I 2 L) 1 e it + it ; where e it = 1 v it withe 1;it = 1 v 1;it ande 2;it = 2 v 2;it : Consequently, we can decompose y it as y it = (I 2 ) 1 ~ i +& it + it ; where& it = & it1 + e it as in the univariate case. Following the steps as in the univariate case, we define h it = y it (I 2 ) 1 ~ i =& it + it ; 126 then the forward demeaning transformation is given by h f it = y f it =& f it + f it ; and y f it = t h it 1 Tt T X s=t+1 h is ! = t h it t h i;t+1T ; where h i;t+1T = 1 Tt T X s=t+1 h is : In matrix form by stacking overi; we have Y f t1 = t H t1 t H tT ; (A.A.44) where H t1 = (h 1;t1 ; h 2;t1 ;:::; h N;t1 ) 0 and H tT = h 1;tT ; h 2;tT ;:::; h N;tT 0 . For the above estimator, we have p NT ^ f 2SLS = 1 NT T1 X t=2 X f0 t1 p M;t2 X f t1 ! 1 1 p NT T1 X t=2 X f0 t1 p M;t2 u (1)f t ; where u (1)f t = u f 1;1t ;:::;u f 1;Nt 0 : It is shown in the appendix that 1 NT T1 X t=2 X f0 t1 p M;t2 X f t1 ! p ( 0 + ) 0 ; by using lemma (25) where 0 =E (& it & 0 it ) = P 1 i=0 i e i0 : Also, from lemma (26) we have 1 p NT T1 X t=2 X f0 t1 p M;t2 u (1)f t = 1 p NT T1 X t=2 0 t1 p M;t2 u (1) t 2 11 ;11 + 12 ;12 21 ;21 + 22 ;22 p +o p (1); where t1 = (& 1t1 ;:::;& Nt1 ) 0 and 1 p NT T1 X t=2 0 t1 p M;t2 u (1) t ! d N (0; 1 ); where 1 = lim (N;T )!1 1 NT T1 X t=2 h E 0 t1 u (1) t u (1)0 t t1 +E 0 t1 u (1) t u (1)0 t1 t2 i ; from lemma (31). Combining the above results gives (4.47) as required. 127 For the above 2SLS estimator of based on first difference, we have p NT ^ CIV = 1 NT T X t=3 Y 0 t1 z t3 z 0 t3 z t3 1 z 0 t3 Y t1 ! 1 1 p NT T X t=3 Y 0 t1 z t3 z 0 t3 z t3 z 0 t3 u (1) t = 1 NT T X t=3 Y 0 t1 p M;t3 Y t1 ! 1 1 p NT T X t=3 Y 0 t1 p M;t3 u (1) t ; where Y t1 = y 0 1t1 ;:::; y 0 N;t1 0 ; u (1) t = (u 1;1t ;:::; u 1;Nt ) 0 and p M;t3 = z t3 z 0 t3 z t3 1 z 0 t3 : For the denominator, we have 1 NT T X t=3 Y 0 t1 p M;t3 Y t1 ! p = (I 2 ) ( 0 + ) ( 0 I 2 ) 0 ; by using part (b) of the lemma (25). Also, we have 1 p NT T X t=3 Y 0 t1 p M;t3 u (1) t = 1 p NT T X t=3 h (H t1 H t2 ) 0 p M;t3 u (1) t i = 1 p NT T X t=3 h t2 0 + E t1 + t1 t2 t2 0 p M;t3 u (1) t i = 1 p NT (I 2 ) T X t=2 0 t2 p M;t3 u (1) t + 1 p NT T X t=2 h U t1 + t1 t2 0 p M;t3 u (1) t i ; where E t1 = (e 1t1 ;:::; e Nt ) 0 ; and 1 p NT T X t=3 0 t2 p M;t3 u (1) t ! d N (0; 2 ): with 2 = lim (N;T )!1 1 NT T X t=3 2 4 E 0 t1 u (1) t u (1)0 t t1 +E 0 t1 u (1) t u (1)0 t1 t2 +E 0 t1 u (1) t u (1)0 t2 t3 3 5 : by using the lemma (31). It can be shown that, by using the similar argument as in the forward 128 demeaning case and univariate case, 1 p NT T X t=3 h E t1 + t1 t2 0 p M;t3 u (1) t i ! p 2 (1 + 2 11 ) ;11 + 2 12 ;12 + u;11 (1 + 2 11 ) ;21 + 2 12 ;22 + u;12 p ; then we have 1 p NT T X t=3 Y 0 t1 p M;t3 u (1) t ! d N 2 (1 + 2 11 ) ;11 + 2 12 ;12 + u;11 (1 + 2 11 ) ;21 + 2 12 ;22 + u;12 p ; (I 2 ) 2 (I 2 0 ) ; Combining the above results gives (4.48) as required. For the JIVE (4.49) based on forward demeaning, we have p NT ^ f JIVE = 1 NT T1 X t=2 X i6=j x f i;t1 z it2 z 0 t2 z t2 1 z 0 jt2 x f0 j;t1 ! 1 1 p NT T1 X t=2 X i6=j x f i;t1 z it2 z 0 t2 z t2 1 z 0 jt2 u (1)f j;t : It is obvious that 1 NT T1 X t=2 X i6=j x f i;t1 z it2 z 0 t2 z t2 1 z 0 jt2 x f0 j;t1 = 1 NT T1 X t=2 X f0 t1 p M;t2 X f t1 1 NT T1 X t=2 N X i=1 x f i;t1 z it2 z 0 t2 z t2 1 z 0 it2 x f0 i;t1 ; and from the proof in the lemma (30), we have 1 NT T1 X t=2 N X i=1 x f i;t1 z it2 z 0 t2 z t2 1 z 0 it2 x f0 i;t1 =o p (1): Aslo, we have 1 p NT T1 X t=2 X i6=j x f i;t1 z it2 z 0 t2 z t2 1 z 0 jt2 u (1)f j;t = 1 p NT T1 X t=2 X f0 t1 p M;t2 u (1)f t 1 p NT T1 X t=2 N X i=1 x f i;t1 z it2 z 0 t2 z t2 1 z 0 it2 u (1)f it ; and as shown in lemma (27), we have 1 p NT T1 X t=2 N X i=1 x f i;t1 z it2 z 0 t2 z t2 1 z 0 it2 w (1)f it =2 11 ;11 + 12 ;12 21 ;21 + 22 ;22 p +o p (1): 129 Finally, by noticing that p NT ^ f JIVE = 1 NT T1 X t=2 X f0 t1 p M;t2 X f t1 ! 1 1 p NT T1 X t=2 X f0 t1 p M;t2 u (1)f t 2 1 NT T1 X t=2 X f0 t1 p M;t2 X f t1 ! 1 11 ;11 + 12 ;12 21 ;21 + 22 ;22 p +o p (1); and combining the above we can have (4.51) as required. For the JIVE (4.50) based on first difference, we have p NT ^ JIVE = 1 NT T X t=3 X i6=j y i;t1 z it3 z 0 t3 z t3 1 z 0 jt3 y 0 j;t1 ! 1 1 p NT T X t=3 X i6=j y it1 z it3 z 0 t3 z t3 1 z 0 jt3 y (1) j;t : For the denominator, we have 1 NT T X t=3 X i6=j y i;t1 z it3 z 0 t3 z t3 1 z 0 jt3 y 0 j;t1 = 1 NT T X t=3 Y 0 t1 p M;t3 Y t1 1 NT T X t=3 N X i=1 y i;t1 z it3 z 0 t3 z t3 1 z 0 it3 y 0 it1 ; and 1 NT T X t=3 N X i=1 y i;t1 z it3 z 0 t3 z t3 1 z 0 it3 y 0 it1 =o p (1); by using the proof in lemma (30). Also, for the numerator, we have 1 p NT T X t=3 X i6=j y it1 z it3 z 0 t3 z t3 1 z 0 jt3 u (1) j;t = 1 p NT T X t=3 Y 0 t1 p M;t3 u (1) t 1 p NT T X t=3 N X i=1 y it1 z it3 z 0 t3 z t3 1 z 0 it3 u (1) it ; 130 and as shown later, we have 1 p NT T X t=3 N X i=1 y it1 z it3 z 0 t3 z t3 1 z 0 it3 u (1) it = 2 (1 + 2 11 ) ;11 + 2 12 ;12 + u;11 (1 + 2 11 ) ;21 + 2 12 ;22 + u;12 p +o p (1); consequently, by noticing that p NT ^ JIVE = 1 NT T X t=3 Y 0 t1 p M;t3 Y t1 ! 1 1 p NT T X t=3 Y 0 t1 p M;t3 u (1) t +o p (1) 2 1 NT T X t=3 Y 0 t1 p M;t3 Y t1 ! 1 (1 + 2 11 ) ;11 + 2 12 ;12 + u;11 (1 + 2 11 ) ;21 + 2 12 ;22 + u;12 p ; and combining the above we have (4.52) as required. We first present a lemma which is similar to (15) of the univariate case. Lemma 23 Let d t and d s beN 1 vectors containing the diagonal elements of p M;t and p M;s for multivariate model; respectively, so that tr(p M;t ) = d 0 t 1 N = 2 (t 1) and tr(p M;s ) = d 0 s 1 N = 2 (s 1); and d 0 t d s 2 min (t 1;s 1); then under assumptions 4.5-4.7, for l r t; pqs andts Cov v (a)0 l p M;t v (b) r ; v (a)0 p p M;s v (b) q = 8 > > > < > > > : m (3) +m (2) 2s +m (0) E (d 0 t d s ) ifl =r =p =q E v (a)2 it v (b) it E d 0 t p M;s v (b) q ifl =r =p6=q<t m (3) 2s ifl =p6=r =q 0 otherwise and E d 0 t p M;s v (b) q 2 h stE v (b)2 it i 1=2 ; and m (1) = E h v (a)2 it v (b)2 it i ;m (2) = E h v (a) it v (b) it i 2 ; m (3) = E h v (a)2 it i E h v (b)2 it i ;m (0) =m (1) 2m (2) m (3) ; and v (a) t ; v (b) t takes any pair ofN 1 vectors from random variablesv (g) it (g = 1; 2): Proof can be found in Akashi and Kunitomo (2012). Lemma 24 Under Assumptions 4.4-4.7, as (N;T )!1; we have 1 NT T X t=2 H 0 t2 p M;t2 H t2 = 1 NT T X t=2 H 0 t2 H t2 +o p (1) ! p 0 + ; where 0 =E (& it & 0 it ) = P 1 i=0 i e i0 : 131 Proof. The proof follows (20) and Akashi and Kunitomo (2012). Here, we provide an sketch of the proof. Also, to simply the proof, we shall assume (I 2 ) 1 ~ i = i ; whereVar ( i ) = 2 and is a vector, similar approach has also been adapted by Akashi and Kunitomo (2012). Let M 0 = ( 1 ;:::; N ) 0 being theN 2 matrix of individual effects, then H t = Y t M : Let t be theN 2 vector of errors of the population linear projection of M on Z t ; then we have t = M z t t with t = [E (z it z 0 it )] 1 E (z it ~ 0 i ) = 1 t ; 2 t ; where z t = z t 0t is of dimension N 2 (t 1) with 0t a as the 2 (t 1) 2 (t 1) block-diagonal matrix whose 2 2 diagonal blocks are lower triangular matrix L 10 such that 0 = LL 0 with 0 = E (& it & 0 it ) = P 1 i=0 i e i0 (these approaches can be found in Akashi and Kunitomo (2012)); and z it is of dimension 2 (t 1)1 and has the form of z it = (I t1 L 1 ) z it : Let also A t be the 2 (t 1) 2 (t 1) symmetric matrix A t = 0 B B @ I 2 0 0t2 I 2 0t1 . . . . . . . . . . . . t2 t1 I 2 1 C C A ; where = L 1 L. We shall also let l t = ( 0 L 10 ;:::; 0 L 10 ) 0 : Then, by following the proof of Akashi and Kunitomo (2012), we have, forg = 1; 2; (g) t = h 2 l t l 0 t + A t +I t1 ~ i 1 2 (g) l t ; where ~ = L 1 L 10 with =E ( it 0 it ); and A t is defined above. Then the following proof follows that of Akashi and Kunitomo (2012)) except that there is one extra term ofI t1 ~ in the (A.7) of Akashi and Kunitomo (2012). To examine the effect of the presence of this extra term, we notice that [E (z it z 0 it )] 1 = 2 l t l 0 t + A t +I t1 ~ 1 = A t +I t1 ~ 1 A t +I t1 ~ 1 2 l t l 0 t A t +I t1 ~ 1 1 + 2 l 0 t A t +I t1 ~ 1 l t ; 132 then, forg = 1; 2; we have (g) it = i (g) Z it (g) t = i (g) i l 0 t (g) t h L 1 h i0 0 ;:::; L 1 h it2 0 i (g) t = i (g) 2 (g) i l 0 t h 2 l t l 0 t + A t +I t1 ~ i 1 l t 2 (g) h L 1 h i0 0 ;:::; L 1 h it2 0 ih 2 l t l 0 t + A t +I t1 ~ i 1 l t ; it’s obvious thatE (g) it = 0 forg = 1; 2; and Var (g) it = 2 (g)2 1 2 l 0 t h 2 l t l 0 t + A t +I t1 ~ i 1 l t 2 + 4 (g)2 l 0 t h 2 l t l 0 t + A t +I t1 ~ i 1 A t +I t1 ~ h 2 l t l 0 t + A t +I t1 ~ i 1 l t ; by noting that 1 2 l 0 t h 2 l t l 0 t + A t +I t1 ~ i 1 l t = 1 2 0 B @ l 0 t A t +I t1 ~ 1 l t 2 l 0 t A t +I t1 ~ 1 l t l 0 t A t +I t1 ~ 1 l t 1 + 2 l 0 t A t +I t1 ~ 1 l t 1 C A = 1 2 l 0 t A t +I t1 ~ 1 l t 1 + 2 l 0 t A t +I t1 ~ 1 l t = 1 1 + 2 l 0 t A t +I t1 ~ 1 l t ; and l 0 t h 2 l t l 0 t + A t +I t1 ~ i 1 A t +I t1 ~ h 2 l t l 0 t + A t +I t1 ~ i 1 l t = l 0 t h 2 l t l 0 t + A t +I t1 ~ i 1 2 l t l 0 t + A t +I t1 ~ 2 l t l 0 t h 2 l t l 0 t + A t +I t1 ~ i 1 l t = l 0 t h 2 l t l 0 t + A t +I t1 ~ i 1 l t 2 l 0 t h 2 l t l 0 t + A t +I t1 ~ i 1 l t l 0 t h 2 l t l 0 t + A t +I t1 ~ i 1 l t = l 0 t h 2 l t l 0 t + A t +I t1 ~ i 1 l t 1 2 l 0 t h 2 l t l 0 t + A t +I t1 ~ i 1 l t = l 0 t h 2 l t l 0 t + A t +I t1 ~ i 1 l t 1 + 2 l 0 t A t +I t1 ~ 1 l t ; 133 and Var (g) it = 2 (g)2 1 2 l 0 t h 2 l t l 0 t + A t +I t1 ~ i 1 l t 2 + 4 (g)2 l 0 t h 2 l t l 0 t + A t +I t1 ~ i 1 l t 1 2 l 0 t h 2 l t l 0 t + A t +I t1 ~ i 1 l t = 2 (g)2 1 2 l 0 t h 2 l t l 0 t + A t +I t1 ~ i 1 l t 1 2 l 0 t h 2 l t l 0 t + A t +I t1 ~ i 1 l t + 2 l 0 t h 2 l t l 0 t + A t +I t1 ~ i 1 l t = 2 (g)2 1 2 l 0 t h 2 l t l 0 t + A t +I t1 ~ i 1 l t = 2 (g)2 1 + 2 l 0 t A t +I t1 ~ 1 l t ; then we have Var (g) it =O p 1 t ; consequently, the above suggests that the extra termI t1 ~ doesn’t affect the property of Akashi and Kunitomo (2012). Consequently, the remaining proof follows Akashi and Kunitomo (2012), and we have 1 NT T X t=2 H 0 t2 p M;t2 H t2 = 1 NT T X t=2 H 0 t2 H t2 +o p (1); with 1 NT T X t=2 H 0 t2 H t2 = 1 NT N X i=1 T X t=2 h it h 0 it = 1 NT N X i=1 T X t=2 (& it + it ) (& it + it ) 0 ! p 0 + : Lemma 25 Under Assumptions 4.4-4.7, as (N;T )!1; (a) for forward demeaning, we have 1 NT T X t=2 X f0 t1 p M;t2 X f t1 = 1 NT T X t=2 H 0 t1 p M;t2 H t1 +o p (1) ! p ( 0 + ) 0 : 134 (b) for first difference, we have 1 NT T X t=3 Y 0 t1 p M;t3 Y t1 ! p (I 2 ) ( 0 + ) ( 0 I 2 ) 0 : Proof. (a) To prove this lemme, we first note that X f t1 = t H t1 t H tT ; then we have 1 NT T X t=2 X f0 t1 p M;t2 X f t1 = 1 NT T X t=2 2 t H t1 H tT 0 p M;t2 H t1 H tT = 1 NT T X t=2 2 t H 0 t1 p M;t2 H t1 1 NT T X t=2 2 t H 0 t1 p M;t2 H tT 1 NT T X t=2 2 t H 0 tT p M;t2 H t1 + 1 NT T X t=2 2 t H 0 tT p M;t2 H tT = 1 NT T X t=2 2 t H 0 t1 p M;t2 H t1 +o p (1); (A.A.45) the proof follows the univariate case since we note that H t1 is aN 2 matrix, which can be decomposed as H t1 = H (1) t1 ; H (2) t1 with H (1) t1 and H (2) t1 beingN 1 vectors, then we have 1 NT T X t=2 2 t H 0 t1 p M;t2 H tT = 1 NT T X t=2 2 t H (1)0 t1 p M;t2 H (1) tT H (1)0 t1 p M;t2 H (2) tT H (2)0 t1 p M;t2 H (1) tT H (2)0 t1 p M;t2 H (2) tT ! ; which can be shown by following the univariate case that the scalar 1 NT T X t=2 2 t H (g)0 t1 p M;t2 H (h) tT =o p (1); forg;h = 1; 2; similar arguments can also be applied to other terms. Consequently, we have 1 NT T X t=2 X f0 t1 p M;t2 X f t1 = 1 NT T X t=2 H 0 t1 p M;t2 H t1 +o p (1); Also, since H t1 = t1 + t1 = H t2 0 + U t1 + t1 t1 0 ; 135 where t1 = (& 1t1 ;:::;& Nt ) 0 and t1 = 1t1 ;:::; Nt 0 ; then we have 1 NT T X t=2 H 0 t1 p M;t2 H t1 = 1 NT T X t=2 H t2 0 + E t1 + t1 t1 0 0 p M;t2 H t2 0 + E t1 + t1 t1 0 = 1 NT T X t=2 H 0 t2 p M;t2 H t2 ! 0 + 1 NT T X t=2 H 0 t2 p M;t2 E t1 + t1 t1 0 ! + 1 NT T X t=2 E t1 + t1 t1 0 0 p M;t2 H t2 ! 0 + 1 NT T X t=2 E t1 + t1 t1 0 0 p M;t2 E t1 + t1 t1 0 = 1 NT T X t=2 H 0 t2 p M;t2 H t2 ! 0 +o p (1) ! p ( 0 + ) 0 ; (A.A.46) where the penultimate equation follows the similar line of the univariate case and lemma (23). Combining (A.A.45) and (A.A.46) gives the lemma as required. (b) When first difference is used, we have 1 NT T X t=3 H 0 t3 p M;t3 H t3 ! p 0 + ; by following the proof of (24) since the instruments are up tot 3 as in the univariate case. 1 NT T X t=3 Y 0 t1 p M;t3 Y t1 = 1 NT T X t=3 (H t1 H t2 ) 0 p M;t3 (H t1 H t2 ) = 1 NT T X t=3 H 0 t2 p M;t3 H t2 + 1 NT T1 X t=3 H 0 t1 p M;t3 H t1 1 NT T X t=3 H 0 t1 p M;t3 H t2 1 NT T X t=3 H 0 t2 p M;t3 H t1 = 1 + 2 + 3 + 4 ; (A.A.47) 136 where 1 = 1 NT T1 X t=3 H 0 t2 p M;t3 H t2 = 1 NT T1 X t=3 t3 0 + t3 0 + E t2 + t2 t3 0 0 p M;t3 t3 0 + t3 0 + E t2 + t2 t3 0 = 1 NT T1 X t=3 h H t3 0 + E t2 + t2 t3 0 0 p M;t3 H t3 0 + E t2 + t2 t3 0 i = 1 NT T1 X t=3 H 0 t3 p M;t3 H t3 0 +o p (1) ! p ( 0 + ) 0 ; (A.A.48) by using the similar argument as in the univariate case. For the second term, we have 2 = 1 NT T X t=3 H 0 t1 p M;t3 H t2 = 1 NT T X t=3 t2 0 + t2 0 + U t1 + t1 t2 0 0 p M;t3 t2 0 + t2 0 + U t1 + t1 t2 0 ! p 2 ( 0 + ) 20 : (A.A.49) Similarly, we have 3 = 1 NT T X t=3 H 0 t1 p M;t3 H t2 = 1 NT T X t=3 h t2 0 + U t1 + t1 0 p M;t3 H t2 i = 1 NT T X t=3 h t2 0 + t2 0 + U t1 + t1 t2 0 0 p M;t3 H t2 i = 1 NT T X t=3 H 0 t2 p M;t3 H t2 + 2 NT T X t=3 h U t1 + t1 t2 0 0 p M;t3 H t2 i ! p 2 ( 0 + ) 0 ; (A.A.50) and 4 = 1 NT T X t=3 H 0 t2 p M;t3 H t1 ! p ( 0 + ) 20 ; (A.A.51) by substituting (A.A.48)-(A.A.51) back to (A.A.47) we can obtain 1 NT T X t=3 y 0 t1 p M;t3 y t1 ! p (I 2 ) ( 0 + ) ( 0 I 2 ) 0 ; 137 as required. Lemma 26 Under Assumptions 4.4-4.7, for forward demeaning case, we have 1 p NT T1 X t=2 X f0 t1 p M;t2 u (1)f t = 1 p NT T1 X t=2 0 t1 p M;t2 u (1) t 2 11 ;11 + 12 ;12 21 ;21 + 22 ;22 p +o p (1): Proof. By following the proof in the univariate case, we have 1 p NT T1 X t=2 X f0 t1 p M;t2 u (1)f t = 1 p NT T X t=2 2 t 0 t1 p M;t2 u (1) t 1 p NT T X t=2 2 t h 0 t1 p M;t2 u (1) t i 1 p NT T X t=2 2 t h H 0 t1 p M;t2 u (1) t+1T i + 1 p NT T X t=2 2 t h H 0 tT p M;t2 u (1)f t i I 1 +I 2 +I 3 +I 4 ; (A.A.52) For the first termI 1 , it will contribute to the limiting distribution, andI 2 will contribute to the bias. To derive the bias term, we notice that the original model can be rewritten as y 1;it = y 0 it1 + ~ 1;i +u 1;it ; y 2;it = y 0 it1 2 + ~ 2;i +u 2;it ; where u 1;it = 1 v 1;it + 1;it 11 1;it1 12 2;it1 ; u 2;it = 2 v 1;it + 2;it 21 1;it1 22 2;it1 : and forst y is = (I 2 ) 1 I 2 st+1 (1 ) 1 ~ i + s X s 0 =t ss 0 (& is 0 1 +! is 0 + is 0): We shall also denote 11 (s) = ( s ) (1;1) and 12 (s) = ( s ) (1;2) being the (1; 1) and (1; 2) element of s ; respectively. Then the random component of y (1) is containing e it and it is given by h 11 (st) e (1) it + (1) it + 12 (st) e (2) it + (2) it i ; moreover, since& it = & it + e it ; then (1) t = 11 (1) t1 + 12 (2) t1 + E (1) t ; (2) t = 21 (1) t1 + 22 (2) t1 + E (2) t : 138 Then, we have 1 p NT T X t=2 2 t h 0 t1 p M;t2 u (1) t i = 1 p NT T X t=2 (1)0 t1 (2)0 t1 p M;t2 u (1) t = 1 p NT T X t=2 (1)0 t1 p M;t2 u (1) t (2)0 t1 p M;t2 u (1) t ; (A.A.53) then by following the proof in the univariate case, we have E 1 p NT T X t=2 (1)0 t1 p M;t2 u (1) t ! = 1 p NT T X t=2 c 2 t E h (1)0 t1 p M;t2 u (1) t i = 1 p NT T X t=2 c 2 t E h (1)0 t1 p M;t2 11 (1) t1 12 (2) t1 i = 11 1 p NT T X t=2 c 2 t E h (1)0 t1 p M;t2 (1) t1 i 12 1 p NT T X t=2 c 2 t E h (1)0 t1 p M;t2 (2) t1 i = 11 ;11 + 12 ;12 p NT T X t=2 E (p M;t2 ) = 11 ;11 + 12 ;12 p NT T X t=2 2 (t 1) ! p 2 ( 11 ;11 + 12 ;12 ) p ; and by using the similar argument from univariate case, we Var 1 p NT T X t=2 (1)0 t1 p M;t2 u (1)f t ! =o p (1); then we have 1 p NT T X t=2 (1)0 t1 p M;t2 u (1) t ! p 2 ( 11 ;11 + 12 ;12 ) p ; (A.A.54) similarly, 1 p NT T X t=2 (2)0 t1 p M;t2 u (1) t ! p 2 ( 21 ;21 + 22 ;22 ) p : (A.A.55) For the remaining two termsI 3 andI 4 ; by using the same strategy above and following the proof for the univariate case, we can show I 3 =o p (1) andI 4 =o p (1): (A.A.56) 139 Consequently, by combining (A.A.52)-(A.A.56), we have 1 p NT T1 X t=2 X f0 t1 p M;t2 y (1)f t = 1 p NT T X t=2 0 t1 p M;t2 u (1) t 2 11 ;11 + 12 ;12 21 ;21 + 22 ;22 p +o p (1); as required. Lemma 27 Under Assumptions 4.4-4.7, for forward demeaning case, we have 1 p NT T1 X t=2 N X i=1 x f i;t1 z it2 z 0 t2 z t2 1 z 0 it2 u (1)f it =2 11 ;11 + 12 ;12 21 ;21 + 22 ;22 p +o p (1); Proof. By using the notations in the above lemma, we have 1 p NT T1 X t=2 N X i=1 X f i;t1 z it2 z 0 t2 z t2 1 z 0 it2 u (1)f it = 1 p NT T1 X t=2 N X i=1 x (1)f i;t1 z it2 z 0 t2 z t2 1 z 0 it2 u (1)f it x (2)f i;t1 z it2 z 0 t2 z t2 1 z 0 it2 u (1)f it ; and 1 p NT T1 X t=2 N X i=1 E x (1)f i;t1 z it2 z 0 t2 z t2 1 z 0 it2 u (1)f it = 1 p NT T1 X t=2 N X i=1 2 t E z it2 z 0 t2 z t2 1 z 0 it2 (y 1;it1 y 1;tT ) (u 1;it u 1;t+1T ) = 1 p NT T1 X t=2 N X i=1 2 t E z it2 z 0 t2 z t2 1 z 0 it2 y 1;it1 u 1;it 1 p NT T1 X t=2 N X i=1 2 t E z it2 z 0 t2 z t2 1 z 0 it2 y 1;it1 u 1;t+1T 1 p NT T1 X t=2 N X i=1 2 t E z it2 z 0 t2 z t2 1 z 0 it2 y 1;tT u 1;it + 1 p NT T1 X t=2 N X i=1 2 t E z it2 z 0 t2 z t2 1 z 0 it2 y 1;tT u 1;t+1T ; then it can be shown by following the above lemma and the proof in the univariate case that 1 p NT T X t=2 N X i=1 E x (1)f i;t1 z it2 z 0 t2 z t2 1 z 0 it2 u (1)f it =2 ( 11 ;11 + 12 ;12 ) p +o p (1); and similarly, 1 p NT T X t=2 N X i=1 E x (2)f i;t1 z it2 z 0 t2 z t2 1 z 0 it2 u (1)f it =2 ( 21 ;21 + 22 ;22 ) p +o p (1): Combining these results gives 1 p NT T X t=2 N X i=1 E x f i;t1 z it2 z 0 t2 z t2 1 z 0 it2 u (1)f it =2 11 ;11 + 12 ;12 21 ;21 + 22 ;22 p +o p (1); as required. 140 Lemma 28 Under Assumptions 4.4-4.7, for first difference case, we have 1 p NT T X t=3 h U t1 + t1 t2 0 p M;t3 u (1) t i = 2 (1 + 2 11 ) ;11 + 2 12 ;12 + u;11 (1 + 2 11 ) ;21 + 2 12 ;12 + u;12 p +o p (1); Proof. We first note that 1 p NT T X t=3 h E t1 + t1 t2 0 p M;t3 u (1) t i = 1 p NT T X t=3 h 0 t1 p M;t3 u (1) t i 1 p NT T X t=3 h 0 t2 p M;t3 u (1) t i + 1 p NT T X t=3 h E 0 t1 p M;t3 u (1) t i ; (A.A.57) and u (1) t = U (1) t + (1) t 11 (1) t1 12 (2) t1 ; also, we note that 1 p NT T X t=3 h 0 t1 p M;t3 u (1) t i (A.A.58) = 1 p NT T X t=3 (1)0 t1 p M;t3 E (1) t + (1) t 11 (1) t1 12 (2) t1 (2)0 t1 p M;t3 E (1) t + (1) t 11 (1) t1 12 (2) t1 ; By following the proof in the univariate case, we can obtain 1 p NT T X t=3 h 0 t1 p M;t3 u (1) t i =2 ;11 + 11 ;11 + 12 ;12 ;21 + 11 ;21 + 12 ;22 p +o p (1); (A.A.59) similarly, 1 p NT T X t=3 h 0 t2 p M;t3 u (1) t i = 1 p NT T X t=3 (1)0 t2 p M;t3 E (1) t + (1) t 11 (1) t1 12 (2) t1 (2)0 t2 p M;t3 E (1) t + (1) t 11 (1) t1 12 (2) t1 = 2 11 ;11 + 12 ;12 11 ;21 + 12 ;22 p +o p (1); (A.A.60) 141 and 1 p NT T X t=3 h U 0 t1 p M;t3 u (1) t i = 1 p NT T X t=3 U (1)0 t1 p M;t3 E (1) t + (1) t 11 (1) t1 12 (2) t1 U (2)0 t1 p M;t3 E (1) t + (1) t 11 (1) t1 12 (2) t1 = 2 u;11 u;12 p +o p (1); (A.A.61) By combining these results (A.A.59)-(A.A.61), we have 1 p NT T X t=3 h E t1 + t1 t2 0 p M;t3 u (1) t i = 2 (1 + 2 11 ) ;11 + 2 12 ;12 + u;11 (1 + 2 11 ) ;21 + 2 12 ;22 + u;12 p +o p (1); as required. Lemma 29 Under Assumptions 4.4-4.7, for first difference case, we have 1 p NT T X t=3 N X i=1 y it1 z it3 z 0 t3 z t3 1 z 0 it3 u (1) it = 2 (1 + 2 11 ) ;11 + 2 12 ;12 + u;11 (1 + 2 11 ) ;21 + 2 12 ;22 + u;12 p +o p (1); Proof. This proof is similar to the univariate case, and by noticing that 1 p NT T X t=3 N X i=1 y it1 z it3 z 0 t3 z t3 1 z 0 it3 u (1) it = 1 p NT T X t=3 N X i=1 y (1) it1 z it3 z 0 t3 z t3 1 z 0 it3 u (1) i;t y (2) it1 z it3 z 0 t3 z t3 1 z 0 it3 u (1) i;t ; and 1 p NT T X t=3 N X i=1 E y (1) it1 z it3 z 0 t3 z t3 1 z 0 it3 u (1) i;t = 1 p NT T X t=3 N X i=1 E & 1;it1 + 1;it1 & 1;it2 1;it2 z it3 z 0 t3 z t3 1 z 0 it3 u 1;it + 1;it 11 1;it1 12 2;it1 u 1;it1 1;it1 + 11 1;it2 + 12 2;it2 = 2 [(1 + 2 11 ) ;11 + 2 12 ;12 + u;11 ] p +o p (1); 142 and 1 p NT T X t=3 N X i=1 E y (2) it1 z it3 z 0 t3 z t3 1 z 0 it3 u (1) i;t = 1 p NT T X t=3 N X i=1 E & 2;it1 + 2;it1 & 2;it2 2;it2 z it3 z 0 t3 z t3 1 z 0 it3 u 1;it + 1;it 11 1;it1 12 2;it1 u 1;it1 1;it1 + 11 1;it2 + 12 2;it2 = 2 [(1 + 2 11 ) ;21 + 2 12 ;22 + u;12 ] p +o p (1); which gives 1 p NT T X t=3 N X i=1 E y i;t1 z it3 z 0 t3 z t3 1 z 0 it3 u (1) i;t = 2 (1 + 2 11 ) ;11 + 2 12 ;12 + u;11 (1 + 2 11 ) ;21 + 2 12 ;22 + u;12 p +o p (1); as required. Lemma 30 Under Assumptions 4.4-4.7, we have (a) for the forward othergonal demeaning, we have 1 NT T1 X t=2 N X i=1 x f i;t1 z it2 z 0 t2 z t2 1 z 0 it2 x f0 i;t1 =o p (1); (b) for the first difference, we have 1 NT T X t=3 N X i=1 y i;t1 z it3 z 0 t3 z t3 1 z 0 it3 y 0 i;t1 =o p (1): Proof. (a) We notice that 1 NT T1 X t=2 N X i=1 x f i;t1 z it2 z 0 t2 z t2 1 z 0 it2 x f0 i;t1 = 1 NT T1 X t=2 N X i=1 x (1)f i;t1 z it2 z 0 t2 z t2 1 z 0 it2 x (1)f i;t1 x (1)f i;t1 z it2 z 0 t2 z t2 1 z 0 it2 x (2)f i;t1 x (2)f i;t1 z it2 z 0 t2 z t2 1 z 0 it2 x (1)f i;t1 x (2)f i;t1 z it2 z 0 t2 z t2 1 z 0 it2 x (2)f i;t1 ! ; then by using the similar argument in the univariate case, we can show that 1 NT T1 X t=2 N X i=1 x (1)f i;t1 z it2 z 0 t2 z t2 1 z 0 it2 x (1)f i;t1 = o p (1); 1 NT T X t=2 N X i=1 x (1)f i;t1 z it2 z 0 t2 z t2 1 z 0 it2 x (1)f i;t1 = o p (1); and by using Cauchy-Swartz inequality, we can show that 1 NT T X t=2 N X i=1 x (1)f i;t1 z it2 z 0 t2 z t2 1 z 0 it2 x (2)f i;t1 =o p (1); 143 consequently, we have 1 NT T X t=2 N X i=1 x (2)f i;t1 z it2 z 0 t2 z t2 1 z 0 it2 x (1)f i;t1 =o p (1); as required. (b) Similar to (a). Lemma 31 Under Assumptions 4.4-4.7, as (N;T )!1; (a) for the forward othergonal demeaning, we have 1 p NT T1 X t=2 0 t1 p M;t2 u (1) t ! d N (0; 1 ); where 1 = lim (N;T )!1 1 NT P T1 t=2 h E 0 t1 u (1) t u (1)0 t t1 +E 0 t1 u (1) t u (1)0 t1 t2 i : (b) for the first difference, we have 1 p NT T X t=3 0 t2 p M;t3 u (1) t ! d N (0; 2 ): where 2 = lim (N;T )!1 1 NT T X t=3 2 4 E 0 t1 u (1) t u (1)0 t t1 +E 0 t1 u (1) t u (1)0 t1 t2 +E 0 t1 u (1) t u (1)0 t2 t3 3 5 : Proof. (a) We have 1 p NT T1 X t=2 0 t1 p M;t2 u (1) t = 1 p NT T1 X t=2 h 0 t1 u (1) t i + 1 p NT T1 X t=2 h 0 t1 (I N p M;t2 ) u (1) t i ; (A.A.62) with E 1 p NT T1 X t=2 h 0 t1 (I N p M;t2 ) u (1) t i ! = 0 and Var 1 p NT T1 X t=2 h 0 t1 (I N p M;t2 ) u (1) t i ! =Var 1 p NT P T1 t=2 (1)0 t1 (I N p M;t2 ) u (1) t 1 p NT P T1 t=2 (2)0 t1 (I N p M;t2 ) u (1) t ! ; and Var 1 p NT P T1 t=2 (1)0 t1 (I N p M;t2 ) u (1) t 1 p NT P T1 t=2 (2)0 t1 (I N p M;t2 ) u (1) t ! = 0 B B B B @ Var 1 p NT P T1 t=2 (1)0 t1 (I N p M;t2 ) u (1) t Cov 1 p NT P T1 t=2 (1)0 t1 (I N p M;t2 ) u (1) t ; 1 p NT P T1 t=2 (2)0 t1 (I N p M;t2 ) u (1) t ! Cov 1 p NT P T1 t=2 (1)0 t1 (I N p M;t2 ) u (1) t ; 1 p NT P T1 t=2 (2)0 t1 I N p f M;t2 u (1) t ! Var 1 p NT P T1 t=2 (2)0 t1 (I N p M;t2 ) u (1) t 1 C C C C A ; 144 it can be shown by following the proof in the univariate case that Var 1 p NT T1 X t=2 (1)0 t1 (I N p M;t2 ) u (1) t ! = o p (1); Var 1 p NT T1 X t=2 (2)0 t1 (I N p M;t2 ) u (1) t ! = o p (1); and by using Cauchy-Swarthz inequality, we can also show that the covariance is alsoo p (1); consequently, we have Var 1 p NT T1 X t=2 h 0 t1 (I N p M;t2 ) w (1) t i ! =o p (1); then we can obtain 1 p NT T1 X t=2 h 0 t1 (I N p M;t2 ) u (1) t i =o p (1): (A.A.63) the result from the proof of (20): Consequently, from (A.A.62) and (A.A.63), we have 1 p NT T1 X t=2 0 t1 p M;t2 u (1) t = 1 p NT T1 X t=2 h 0 t1 u (1) t i +o p (1): (A.A.64) Furthermore, we have E h 0 t1 u (1) t i = 0; by the independence of t1 and w (1) t : Also, we have E 0 t1 w (1) t u (1)0 t t1 =E 0 @ (1)0 t1 u (1) t 2 (1)0 t1 u (1) t (2)0 t1 u (1) t (1)0 t1 u (1) t (2)0 t1 u (1) t (2)0 t1 u (1) t 2 1 A ; and E 0 t1 u (1) t u (1)0 t1 t2 =E (1)0 t1 u (1) t u (1)0 t1 (1) t2 (1)0 t1 u (1) t u (1)0 t1 (2) t2 (1)0 t1 u (1) t (2)0 t2 u (1) t1 (2)0 t1 u (1) t (2)0 t2 u (1) t1 ! ; and E 0 t1 u (1) t u (1)0 ts ts1 = 0; fors> 1 by construction, where u (1) t = 1 v (1) t + (1) t 11 (1) t1 12 (2) t1 ; (1) t = 11 (1) t1 + 12 (2) t1 + 1 v (1) t ; (2) t = 21 (1) t1 + 22 (2) t1 + 2 v (2) t : 145 As a result, by applying the CLT for dependent process (for example, White (2001)) and following the line in the univariate case, we have 1 p NT T1 X t=2 h 0 t1 u (1) t i ! d N (0; 1 ); (A.A.65) where 1 = lim (N;T )!1 1 NT T1 X t=2 h E 0 t1 u (1) t u (1)0 t t1 +E 0 t1 u (1) t u (1)0 t1 t2 i ; (A.A.66) as required. Combining (A.A.64) and (A.A.65) we can obtain the lemma as required. (b) The proof follows from the above derivation, and by E 0 t2 u (1) t u (1)0 t t2 =E 0 @ (1)0 t2 u (1) t 2 (1)0 t2 u (1) t (2)0 t2 u (1) t (1)0 t2 u (1) t (2)0 t2 u (1) t (2)0 t2 u (1) t 2 1 A ; and E 0 t2 u (1) t u (1)0 t1 t3 =E (1)0 t2 u (1) t u (1)0 t1 (1) t3 (1)0 t2 u (1) t u (1)0 t1 (2) t3 (1)0 t2 u (1) t (2)0 t3 u (1) t1 (2)0 t2 u (1) t (2)0 t3 u (1) t1 ! ; and E 0 t2 u (1) t u (1)0 t2 t4 =E (1)0 t2 u (1) t u (1)0 t2 (1) t4 (1)0 t2 u (1) t u (1)0 t2 (2) t4 (1)0 t2 u (1) t (2)0 t4 u (1) t2 (2)0 t2 u (1) t (2)0 t4 u (1) t2 ! ; and E 0 t2 u (1) t u (1)0 ts ts2 = 0; fors> 2; then 1 p NT T X t=3 h 0 t2 u (1) t i ! d N (0; 2 ); where 2 = lim (N;T )!1 1 NT T X t=3 2 4 E 0 t2 u (1) t u (1)0 t t2 +E 0 t2 u (1) t u (1)0 t1 t3 +E 0 t2 u (1) t u (1)0 t2 t4 3 5 ; (A.A.67) as required. 146
Abstract (if available)
Abstract
Dynamic panel models has very wide economic application in labor economics, health economics as well as development economics. However, there are unique features for dynamic panel models: (i) the presence of time-invariant individual specific effects raises the issue of incidental parameters, be the specific effects are considered random or fixed
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Essays on econometrics analysis of panel data models
PDF
Essays on estimation and inference for heterogeneous panel data models with large n and short T
PDF
Large N, T asymptotic analysis of panel data models with incidental parameters
PDF
Three essays on the identification and estimation of structural economic models
PDF
Essays on beliefs, networks and spatial modeling
PDF
Three essays on supply chain networks and R&D investments
PDF
Statistical inference for stochastic hyperbolic equations
PDF
Essays on causal inference
PDF
Essays on high-dimensional econometric models
PDF
Essays on the econometric analysis of cross-sectional dependence
PDF
Essays on health economics
PDF
Essays in panel data analysis
PDF
A structural econometric analysis of network and social interaction models
PDF
Three essays on linear and non-linear econometric dependencies
PDF
Statistical inference of stochastic differential equations driven by Gaussian noise
PDF
Modeling and vibration analysis of wheelchair-users
PDF
Assessment of the impact of second-generation antipscyhotics in Medi-Cal patients with bipolar disorder using panel data fixed effect models
PDF
Panel data forecasting and application to epidemic disease
PDF
Bayesian analysis of stochastic volatility models with Levy jumps
PDF
Essays on family planning policies
Asset Metadata
Creator
Zhou, Qiankun
(author)
Core Title
Three essays on the statistical inference of dynamic panel models
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Economics
Publication Date
06/18/2015
Defense Date
03/26/2015
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
dynamic panel models,first difference,forward difference,GMM estimation,JIVE,maximum likelihood estimation,OAI-PMH Harvest
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Hsiao, Cheng (
committee chair
), Moon, Hyungsik Roger (
committee member
), Pesaran, M. Hashem (
committee member
)
Creator Email
qiankunz@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-574125
Unique identifier
UC11300206
Identifier
etd-ZhouQianku-3483.pdf (filename),usctheses-c3-574125 (legacy record id)
Legacy Identifier
etd-ZhouQianku-3483.pdf
Dmrecord
574125
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Zhou, Qiankun
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
dynamic panel models
first difference
forward difference
GMM estimation
JIVE
maximum likelihood estimation