Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Essays on estimation and inference for heterogeneous panel data models with large n and short T
(USC Thesis Other)
Essays on estimation and inference for heterogeneous panel data models with large n and short T
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ESSAYS ON ESTIMATION AND INFERENCE FOR HETEROGENEOUS PANEL DATA MODELS WITH LARGEN AND SHORTT by Liying Yang A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ECONOMICS) May 2023 Copyright 2023 Liying Yang To myself and my parents. ii Acknowledgements I am deeply indebted to my advisor M. Hashem Pesaran for his tremendous time and eort in training and mentoring me. His rigorous approach to and enthusiasm for research will always inspire me to become a better researcher. I would like to express my sincere gratitude to Cheng Hsiao and Geert Ridder for their generous guidance, continuous feedback on my research, and constant support on my job market. I wish to thank Adel Javanmard and Michael Leung who served as my dissertation/qualifying committee members, and Matthew Kahn, Guofu Tan and Pablo Kurlat for their valuable discussions and suggestions on my research and career. My thanks also go to Alexander Karnazes, Annie Le, Akiko Matsukiyo, and Young Miller for their wonderful assistance, to relieve me of administrative concerns during my study. I am forever thankful for endless love from my parents, Xianghua Wei and Bingfeng Yang, my brother, and my family. My heartfelt gratitude goes to my friend Yufan Zhong, who always calms me down and cheers me up, and my dearest Ph.D. classmates, particularly Yukun Ding, Yue Fang, Rajat Kochhar, Xiongfei Li, Ruozi Song, Jingyi Tian, and Shaoshuang Yang, for their unwavering support and companion through- out the journey. Above all, tomorrow is another day. All the best with our future endeavors. iii TableofContents Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 2: Jackknife Bias-Corrected Mean Group Estimator for Heterogeneous Dynamic Panels with ShortT : An Application to Eects of Minimum Wages on Employment . . . . . . 3 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Related Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Dynamic Panel Data Models with Heterogeneous Coecients . . . . . . . . . . . . . . . . 11 2.4 Mean Group Estimators with Split-panel Jackknife Bias Correction . . . . . . . . . . . . . 16 2.5 Existence of Moments of OLS Estimator in an ARX(1) Model . . . . . . . . . . . . . . . . . 19 2.6 Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.6.1 Data-generating Process of an ARX(1) Panel Data Model . . . . . . . . . . . . . . 22 2.6.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.7 Empirical Applications of the Minimum Wage Policy . . . . . . . . . . . . . . . . . . . . . 30 2.7.1 Literature Review of Minimum Wage Studies . . . . . . . . . . . . . . . . . . . . . 30 2.7.2 Average Eects of Real Minimum Wages on Employment . . . . . . . . . . . . . . 32 2.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Chapter 3: Trimmed Mean Group Estimation of Average Treatment Eects in Ultra ShortT Panels with Correlated Heterogeneous Coecients . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2 Heterogeneous linear panel data models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.2.1 Properties of mean group estimator in short T panels . . . . . . . . . . . . . . . . . 48 3.2.2 A comparison of MG and FE estimators . . . . . . . . . . . . . . . . . . . . . . . . 50 3.2.2.1 Conditions for p nconsistency of FE estimator . . . . . . . . . . . . . . 51 3.2.2.2 Relative eciency of FE and MG estimators . . . . . . . . . . . . . . . . 53 3.3 Irregular mean group estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.3.1 Existing literature on trimming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.4 A new trimmed mean group estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 iv 3.5 Asymptotic properties of the TMG estimator . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.5.1 The choice of the trimming threshold . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.5.2 Trimming condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.5.3 Robust estimation of the covariance matrix of the trimmed MG estimator . . . . . 66 3.6 Heterogeneous panel data models with time eects . . . . . . . . . . . . . . . . . . . . . . 67 3.6.1 TMG-TE estimator withTk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.6.2 TMG-C estimator whenT >k . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.7 A Hausman-type test of the validity of the FE estimator . . . . . . . . . . . . . . . . . . . . 73 3.8 Monte Carlo evidence on small sample properties . . . . . . . . . . . . . . . . . . . . . . . 75 3.8.1 Designs of Monte Carlo experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.8.1.1 Data generating process of the outcome variable and regressor . . . . . . 75 3.8.1.2 Data generating process of heterogeneous coecients . . . . . . . . . . 77 3.8.1.3 Baseline and other experiments . . . . . . . . . . . . . . . . . . . . . . . 79 3.8.2 Monte Carlo ndings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 3.8.2.1 Comparison of TMG, FE, and MG estimators . . . . . . . . . . . . . . . . 80 3.8.2.2 Comparison of TMG, GP, and SU estimators . . . . . . . . . . . . . . . . 84 3.8.2.3 Monte Carlo evidence for Hausman-type test of correlated heterogeneity 92 3.9 Empirical application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 3.10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Chapter 4: Heterogeneous Autoregressions in ShortT Panel Data Models . . . . . . . . . . . . . . 100 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.2 Model and assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.3 Neglected heterogeneity bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.4 Identication of moments of the AR coecients . . . . . . . . . . . . . . . . . . . . . . . . 109 4.4.1 Identication conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.5 Panel autoregressions with group heterogeneity . . . . . . . . . . . . . . . . . . . . . . . . 113 4.6 Estimation of moments of autoregressive coecients . . . . . . . . . . . . . . . . . . . . . 117 4.6.1 Method of moments estimator based on autocorrelations . . . . . . . . . . . . . . 117 4.6.2 Generalized method of moments estimator based on autocovariances . . . . . . . 118 4.6.2.1 Generalized method of moments estimator ofE( i ) . . . . . . . . . . . . 118 4.6.2.2 Generalized method of moments estimator ofE( 2 i ) . . . . . . . . . . . 122 4.6.3 Plug-in estimator ofVar( i ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 4.7 Monte Carlo experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 4.7.1 Data generating process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 4.7.2 Comparison of FDAC and HetroGMM estimators . . . . . . . . . . . . . . . . . . . 125 4.7.3 Comparison of FDAC and HomoGMM estimators . . . . . . . . . . . . . . . . . . 126 4.7.4 Comparison of FDAC and MSW estimators . . . . . . . . . . . . . . . . . . . . . . 133 4.7.5 Initilizations, FDAC and HomoGMM estimators . . . . . . . . . . . . . . . . . . . 134 4.7.6 Simulation results for the categorical distribution parameters . . . . . . . . . . . . 140 4.8 Empirical application: heterogeneity in earnings dynamics . . . . . . . . . . . . . . . . . . 142 4.8.1 Literature review of estimation of earnings dynamics . . . . . . . . . . . . . . . . . 142 4.8.2 A heterogeneous panel AR(1) model of earnings dynamics with linear trends . . . 143 4.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 v Appendix to Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 A.1 Auxiliary Lemmas for Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 A.2 Mathematical Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 A.2.1 Proof of Lemma 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 A.2.2 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 A.2.3 Proof of Lemma 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 A.2.4 Proof of Lemma 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 A.3 Dependent Relationship between the Numerator and Denominator . . . . . . . . . . . . . 164 A.4 Time Eects and Cluster-robust Standard Errors in Heterogeneous Dynamic Panels . . . . 168 A.4.1 Mean Group Estimator with Time Eects whenT >k . . . . . . . . . . . . . . . . 168 A.4.2 Individual Estimator with Time Fixed Eects . . . . . . . . . . . . . . . . . . . . . 169 A.4.3 Cluster-Robust Variance of the Fixed Eects Estimators . . . . . . . . . . . . . . . 171 A.5 Extensive Empirical Results of Minimum Wage Policy . . . . . . . . . . . . . . . . . . . . . 172 A.5.1 Robustness Check with Dierent Lags in the County-level Analysis . . . . . . . . . 172 A.5.2 Eects of Real Minimum Wages on Total Employment across States . . . . . . . . 179 Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Appendix to Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 B.1 Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 B.2 Proof of Propositions and Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 B.2.1 Proof of Proposition 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 B.2.2 Proof of Proposition 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 B.2.3 Proof of Theorem 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 B.3 Proof of Theorem 5 (Asymptotic distribution of TMG-TE estimator) . . . . . . . . . . . . . 197 Appendix C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Appendix to Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 C.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 C.2 Neglected heterogeneity bias in AB and BB estimators . . . . . . . . . . . . . . . . . . . . . 201 C.3 Initialization of panel AR(1) processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 C.4 Monte Carlo evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 C.4.1 Comparison of FDAC and HetroGMM estimators for moments . . . . . . . . . . . 206 C.4.2 Simulation results of FDAC, FDLS, AH, AAH, AB, and BB estimators with non-Gaussian errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 C.4.3 Simulation results of FDAC and MSW estimators . . . . . . . . . . . . . . . . . . . 217 C.4.4 Robustness of FDAC estimator to dierent error processes . . . . . . . . . . . . . 218 C.5 Empirical application results for other sub-periods of the PSID . . . . . . . . . . . . . . . . 226 vi ListofTables 2.1 Bias, RMSE, and size (100) of FE, MG, MG-HJK and MG-TJK estimators of ( 0 = 0:5) in an ARX(1) model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.2 Bias, RMSE, and size (100) of FE, MG, MG-HJK and MG-TJK estimators of ( 0 = 0:5) in an ARX(1) model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.3 Short-term and long-term eects of real minimum wages on total employment across U.S. counties in an ARDL(2,2) model with time eects using the FE and MG estimators . . . . . 36 2.4 Long-term eects of real minimum wages on total employment across U.S. counties with time eects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.5 Short-term and long-term eects of real minimum wages on teenage employment across U.S. counties in an ARDL(2,2) model with time eects using the FE and MG estimators . . 37 2.6 Long-term eects of real minimum wages on teenage employment across U.S. counties with time eects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.1 Bias, RMSE and size of FE, MG and TMG estimators of ( 0 = 1) (The errors processes fory it andx it equations are chi-squared and Gaussian, and there is a heterogeneous autoregression in thex it equation.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.2 Bias, RMSE, and size of TMG and alternative estimators of ( 0 = 1) in the case of correlated heterogeneity (The errors processes fory it andx it equations are chi-squared and Gaussian, and there is not an interactive eect in thex it .) . . . . . . . . . . . . . . . . 86 3.3 Bias, RMSE, and size of TMG-TE and alternative estimators of ( 0 = 1) with a time eect in the model in the case of correlated heterogeneity (The errors processes fory it andx it equations are chi-squared and Gaussian, and there is not an interactive eect in thex it equation.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 3.4 Bias, RMSE, and size of TMG-TE and GP estimators of time eect 1 = 1 in the case of correlated heterogeneity (The errors processes fory it andx it equations are chi-squared and Gaussian, and there is not an interactive eect in thex it equation.) . . . . . . . . . . . 90 vii 3.5 Empirical size and power of the Hausman-type test of correlated heterogeneous slope coecients (The errors processes fory it andx it equations are chi-squared and Gaussian, and there is a heterogeneous autoregression in thex it equation.) . . . . . . . . . . . . . . . 93 3.6 Hausman-type test of correlated heterogeneity applied to the average eect of household’s total expenditure on calorie demand without time eects . . . . . . . . . . . . . . . . . . . 96 3.7 Estimates of the average eect of household’s total expenditure on calorie demand without time eects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 3.8 Estimates of the average eect of household’s total expenditure on calorie demand and time eects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.1 Bias, RMSE, and size of FDAC, FDLS, AH, AAH, AB, and BB estimators of ( 0 = 0:4) in a homogeneous panel AR(1) model with Gaussian errors and GARCH eects . . . . . . . . 129 4.2 Bias, RMSE, and size of FDAC, FDLS, AH, AAH, AB, and BB estimators in a heterogeneous panel AR(1) model with uniformly distributed autoregressive coecients, i = +v i , = 0:4, andv i IIDU(0:3; 0:3), Gaussian errors, and GARCH eects . . . . . . . . . 130 4.3 Bias, RMSE, and size of FDAC, FDLS, AH, AAH, AB, and BB estimators in a heterogeneous panel AR(1) model with uniformly distributed autoregressive coecients, i = +v i , = 0:4, andv i IIDU(0:5; 0:5), Gaussian errors, and GARCH eects . . . . . . . . . 131 4.4 Bias, RMSE, and size of FDAC and MSW estimators in a heterogeneous panel AR(1) model where i = +v i andv i IIDU(a;a) with = 0:4,a2f0:3; 0:5g, and Gaussian errors with GARCH eects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 4.5 Bias, RMSE, and size of FDAC, FDLS, AH, AAH, AB, and BB estimators of ( 0 = 0:4) in a homogeneous panel AR(1) model under non-stationary (M = 1; 2) and stationary (M =1) initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 4.6 Bias, RMSE, and size of FDAC, FDLS, AH, AAH, AB, and BB estimators in a heterogeneous panel AR(1) model under non-stationary (M = 1; 2) and stationary (M = 1) initialization with i = +v i , = 0:4, andv i IIDU(0:3; 0:3) . . . . . . . . . . . 138 4.7 Bias, RMSE, and size of FDAC, FDLS, AH, AAH, AB, and BB estimators in a heterogeneous panel AR(1) model under non-stationary (M = 1; 2) and stationary (M = 1) initialization with i = +v i , = 0:4, andv i IIDU(0:5; 0:5) . . . . . . . . . . . 139 4.8 Bias and RMSE of FDAC estimator of categorical distribution parameters ( L ; H ;) 0 with Gaussian errors and GARCH eects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 4.9 Estimates of mean persistence ( =E( i )) of log real earnings in a panel AR(1) model with a common linear trend using PSID data over the sub-periods 1991–1995 and 1986–1995 147 A.1 Estimated eects of real minimum wages on total employment across counties in an ARDL(1,1) model with time eects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 viii A.2 Estimated eects of real minimum wages on total employment across counties in an ARDL(1,2) model with time eects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 A.3 Estimated eects of real minimum wages on total employment across counties in an ARDL(2,1) model with time eects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 A.4 Estimated eects of real minimum wages on teenage employment across counties in an ARDL(1,1) model with time eects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 A.5 Estimated eects of real minimum wages on teenage employment across counties in an ARDL(1,2) model with time eects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 A.6 Estimated eects of real minimum wages on teenage employment across counties in an ARDL(2,1) model with time eects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 A.7 Descriptive statistics of state characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . 179 A.8 Estimates of the short-run eects of the minimum wage on employment across states with an ARDL(1,1) model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 A.9 Estimates of the long-run eect of the minimum wage on employment across states with an ARDL(1,1) model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 C.1 Bias, RMSE, and size of FDAC and HetroGMM estimators ofE( i ) in a heterogeneous panel AR(1) model with Gaussian errors and GARCH eects . . . . . . . . . . . . . . . . . 207 C.2 Bias, RMSE, and size of FDAC and HetroGMM estimators ofVar( i ) in a heterogeneous panel AR(1) model with Gaussian errors and GARCH eects . . . . . . . . . . . . . . . . . 208 C.3 Bias, RMSE, and size of FDAC, FDLS, AH, AAH, AB, and BB estimators of ( 0 = 0:4) in a homogeneous panel AR(1) model with non-Gaussian errors and GARCH eects . . . . . 214 C.4 Bias, RMSE, and size of FDAC, FDLS, AH, AAH, AB, and BB estimators in a heterogeneous panel AR(1) model where i = +v i andv i IIDU(a;a) with = 0:4,a = 0:3, non-Gaussian errors, and GARCH eects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 C.5 Bias, RMSE, and size of FDAC, FDLS, AH, AAH, AB, and BB estimators in a heterogeneous panel AR(1) model where i = +v i andv i IIDU(a;a) with = 0:4,a = 0:5, non-Gaussian errors, and GARCH eects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 C.6 Bias, RMSE, and size of FDAC and MSW estimators of in homogeneous panels with Gaussian errors and GARCH eects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 C.7 Bias, RMSE, and size of FDAC and MSW estimators in a heterogeneous panel AR(1) model under non-stationary (M = 1; 2) and stationary (M =1) initialization with i = +v i , = 0:4, andv i IIDU(0:3; 0:3) . . . . . . . . . . . . . . . . . . . . . . 218 ix C.8 Bias, RMSE, and size of FDAC estimator of = E( i ) in a heterogeneous panel AR(1) model with Gaussian errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 C.9 Bias, RMSE, and size of FDAC estimator ofVar( i ) in a heterogeneous panel AR(1) model with Gaussian errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 C.10 Bias, RMSE, and size of FDAC estimator ofE( i ) in a heterogeneous panel AR(1) model with Gaussian errors and GARCH eects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 C.11 Bias, RMSE, and size of FDAC estimator ofVar( i ) in a heterogeneous panel AR(1) model with Gaussian errors and GARCH eects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 C.12 Bias, RMSE, and size of FDAC estimator ofE( i ) in a heterogeneous panel AR(1) model with non-Gaussian errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 C.13 Bias, RMSE, and size of FDAC estimator ofVar( i ) in a heterogeneous panel AR(1) model with non-Gaussian errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 C.14 Bias, RMSE, and size of FDAC estimator ofE( i ) in a heterogeneous panel AR(1) model with non-Gaussian errors and GARCH eects . . . . . . . . . . . . . . . . . . . . . . . . . 222 C.15 Bias, RMSE, and size of FDAC estimator ofVar( i ) in a heterogeneous panel AR(1) model with non-Gaussian errors and GARCH eects . . . . . . . . . . . . . . . . . . . . . . . . . 222 C.16 Bias and RMSE of FDAC estimator of categorical distribution parameters ( L ; H ;) 0 with Gaussian errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 C.17 Bias and RMSE of FDAC estimator of categorical distribution parameters ( L ; H ;) 0 with non-Gaussian errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 C.18 Bias and RMSE of FDAC estimator of categorical distribution parameters ( L ; H ;) 0 with non-Gaussian errors with GARCH eects . . . . . . . . . . . . . . . . . . . . . . . . . 225 C.19 Distribution of individual observation numbers by year . . . . . . . . . . . . . . . . . . . . 226 C.20 Estimates of mean persistence ( =E( i )) of log real earnings in a panel AR(1) model with a common linear trend using PSID data over the sub-periods 1976–1980, 1981–1985 and 1986–1990 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 C.21 Estimates of mean persistence ( =E( i )) of log real earnings in a panel AR(1) model with a common linear trend using PSID data over the sub-periods 1976–1985 and 1981–1990 228 C.22 Estimates of variance of heterogeneous persistence (Var( i )) of log real earnings in a panel AR(1) model with a common linear trend using PSID data over the sub-periods 1991–1995 and 1986–1995 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 x C.23 Estimates of variance of heterogeneous persistence (Var( i )) of log real earnings in a panel AR(1) model with a common linear trend using PSID data over the sub-periods 1976–1985 and 1981–1990 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 C.24 Estimates of variance of heterogeneous persistence (Var( i )) of log real earnings in a panel AR(1) model with a common linear trend using PSID data over the sub-periods 1976–1980, 1981–1985, and 1986–1990 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 xi ListofFigures 2.1 Empirical rejection frequencies (at 5% nominal level) of FE, MG, MG-HJK and MG-TJK estimators of ( 0 = 0:5) in an ARX(1) model withT = 12 andn = 50; 1000; 2000 . . . . 28 2.2 Empirical rejection frequencies (at 5% nominal level) of FE, MG, MG-HJK and MG-TJK estimators of ( 0 = 0:5) in an ARX(1) model withT = 50 andn = 50; 1000; 2000 . . . . 29 3.1 Empirical power functions of TMG and FE estimators of ( 0 = 1) in both cases of uncorrelated and correlated heterogeneity (The errors processes fory it andx it equations are chi-squared and Gaussian, and there is a heterogeneous autoregression in thex it equation withn = 10; 000.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.2 Empirical power functions of TMG, GP and SU estimators of ( 0 = 1) in the case of correlated heterogeneity (The errors processes fory it andx it equations are chi-squared and Gaussian, and there is not an interactive eect in thex it equation withT = 2.) . . . . 87 3.3 Empirical power functions of MG, TMG and GP estimators of ( 0 = 1) in the case of correlated heterogeneity (The errors processes fory it andx it equations are chi-squared and Gaussian, and there is not an interactive eect in thex it equation withT = 3.) . . . . 87 3.4 Empirical power functions of TMG estimators of ( 0 = 1) using thresholds with dierent orders ofn in the case of correlated heterogeneity (The errors processes fory it andx it equations are chi-squared and Gaussian, and there is not an interactive eect in thex it equation withT = 2.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.5 Empirical power functions of GP estimators of ( 0 = 1) using thresholds with dierent orders ofn in the case of correlated heterogeneity (The errors processes fory it andx it equations are chi-squared and Gaussian, and there is not an interactive eect in thex it equation withT = 2.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.6 Empirical power functions of TMG-TE, GP, and SU estimators of ( 0 = 1) with time eects in the case of correlated heterogeneity (The errors processes for y it and x it equations are chi-squared and Gaussian, and there is an interactive eect in thex it equation withT = 2.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 xii 3.7 Empirical power functions of TMG-TE, TMG-C, and GP estimators of ( 0 = 1) with time eects in the case of correlated heterogeneity (The errors processes fory it and x it equations are chi-squared and Gaussian, and there is an interactive eect in thex it equation withT = 3.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.1 Empirical power functions for FDAC estimator in homogeneous and heterogeneous (a=0.5) panel AR(1) models where errors are Gaussian and non-Gaussian distributed with GARCH eects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 C.1 Empirical power functions for FDAC and HetroGMM estimators of E( i ) = 0:4 in a heterogeneous AR(1) panel where i is uniformly distributed, i = +v i , =E( i ) = 0:4, andv i IIDU(0:5; 0:5), with Gaussian errors and GARCH eects . 209 C.2 Empirical power functions for FDAC and HetroGMM estimators ofVar( i ) = 0:083 in a heterogeneous AR(1) panel where i is uniformly distributed, i = +v i , =E( i ) = 0:4, andv i IIDU(0:5; 0:5), with Gaussian errors and GARCH eects . 210 C.3 Empirical power functions for FDAC and HetroGMM estimators ofE( i ) = 0:62 in a heterogeneous panel AR(1) model where i is categorical distributed with Gaussian errors and GARCH eects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 C.4 Empirical power functions for FDAC and HetroGMM estimators ofVar( i ) = 0:076 in a heterogeneous panel AR(1) model where i is categorical distributed with Gaussian errors and GARCH eects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 C.5 Empirical power functions for FDAC and FDLS estimators of 0 = 0:4 in a homogeneous AR(1) panel with Gaussian errors and GARCH eects . . . . . . . . . . . . . . . . . . . . . 213 xiii Abstract This dissertation aims to develop new estimation methods for heterogeneous panel data models with a large number of cross-sectional units (n) but a limited number of periods (T ). In the second chapter, for average treatment eects, the mean group (MG) estimator proposed by Pesaran and Smith (1995) is extended to shortT heterogeneous dynamic panels by the Jackknife (JK). The MG-JK estimator is root- n consistent as n and T tend to innity and n=T 4 converges to zero. For the validity of the Jackknife in nite samples, a sucient condition for the r-th moment existence of the MG estimator is derived. Using the MG-JK estimator, we nd a close-to-zero eect of minimum wages on total employment but a negative impact on teenage employment in the U.S. The third chapter (with M. Hashem Pesaran) proposes a new trimmed mean group (TMG) estimator for average treatment eects in largen static panels with correlated heterogeneous coecients where T can be as small as the number of regressors. The TMG estimator is consistent and asymptotically normal distributed. A suitable trimming process is chosen by bias/eciency trade-o. A Hausman-type test is proposed to diagnose the validity of the xed eects estimation. The fourth chapter (with M. Hashem Pesaran) considers a rst-order autoregressive panel data model with individual-specic eects and a heterogeneous autoregressive coecient. New estimators are proposed for the moments of the random autoregressive coecients. The identication conditions under which the probability distribution of the autoregressive coecients assuming a categorical distribution are investigated. xiv Chapter1 Introduction This dissertation contributes to the liteature of estimation and inference for heterogeneous panel data models with a large number of cross-sectional units (n) but a limited number of time periods (T ). The rst chapter considers estimation of average treatment eects in heterogeneous dynamic panel data models with weakly exogenous regressors. The mean group (MG) estimators proposed by Pesaran and Smith (1995) is extended to shortT dynamic panels by the split-panel Jackknife (JK). It is shown that the MG-JK estimator is root-n consistent as bothn andT tend to innity andn=T 4 converges to zero once the rst-order bias is eliminated. For the validity of the Jackknife in nite samples, a sucient condition for ther-th moment existence of the MG estimator is derived for an ARX(1) model. Moreover, I revisit the long dispute over the disemployment eects of minimum wages. Using the MG-JK estimators and county-level data in the United States during 2002–2011, it is found that there is a close to zero eect of real minimum wages on total employment but a substantial negative impact on teenage employment in the long run. The second chapter (joint with M. Hashem Pesaran) proposes a new trimmed mean group (TMG) es- timator for consistent estimation of average treatment eects in panels with strictly exogenous regressors and correlated heterogeneous coecients where n is large but T can be as small as the number of re- gressors. The asymptotic distribution of the TMG estimator is derived and shown to be consistent and 1 asymptotically normal. A suitable trimming process is chosen by a trade-o between bias and eciency. Compared with the existing trimmed estimators, our TMG estimator has stronger powers. Moreover, a Hausman-type test is proposed to diagnose the validity of the xed eects estimation. Given a short panel of households in poor rural communities in Nicaragua, the TMG approach is applied to the average eect of household expenditure on calorie demand, which provides evidence of correlated heterogeneity in the sample. In the third chapter (joint with M. Hashem Pesaran), we consider a rst-order autoregressive panel data model with individual-specic eects and a heterogeneous autoregressive coecient. It is shown that the standard generalized method of moments estimators obtained under homogeneous slopes are bi- ased. For the moments of the cross-sectional distribution of the autoregressive coecients, new estimators are proposed in a random coecient model. The conditions under which the probability distribution of the autoregressive coecients assuming a categorical distribution with a nite number of categories is iden- tied are also investigated. Small sample properties of the proposed estimators are examined by Monte Carlo experiments both under homogenous and heterogeneous slopes and for both stationary and non- stationary processes. In an application of earnings dynamics, the heterogeneous approach shows a clear upward pattern in the mean persistence of earnings by the level of education. 2 Chapter2 JackknifeBias-CorrectedMeanGroupEstimatorforHeterogeneous DynamicPanelswithShortT: AnApplicationtoEectsofMinimum WagesonEmployment 2.1 Introduction This paper considers estimation and inference of heterogeneous dynamic panel data models with a large number of cross-sectional units (n) and a relatively small number of time periods (T ). When multiple observations for an individual are made available by panel data, researchers can exploit both within-unit and cross-sectional variations to estimate the average eects. In particular, individual time-invariant un- observables can be taken into account by including an individual-specic intercept in regressions. With homogeneous slope coecients, random eects and xed eects (FE) models are widely used in empirical studies with large n and short T panels. To tackle the incidental parameter problem in dynamic panel data models, various estimation methods have been developed based on the generalized method of mo- ments or likelihoods, for example, Anderson and Hsiao (1982), Arellano and Bond (1991), Blundell and Bond (1998), Hsiao et al. (2002), and more recently Chudik and Pesaran (2022), where the estimators are unbiased for a xedT . Another approach is to correct for the small-T bias of an existing estimator, for example, the bias of FE estimators was analyzed in Nickell (1981). This approach has been taken by Alvarez 3 and Arellano (2003), Hahn and Newey (2004), Hahn and Kuersteiner (2010), Hahn and Kuersteiner (2011), and Chudik et al. (2018) in both linear and nonlinear panels, which requiresn=T!c instead ofn=T! 0 for p nT -consistency. In general, these approaches hinge on homogeneous specications such as the ho- mogeneity in autoregressive coecients, such that common parameters can be separately identied from incidental parameters in dynamic panels. Relaxing the homogeneity restriction with multi-dimensional heterogeneity may better resemble reality. More importantly, a reliable estimation approach to evidence-based policy and program evaluations needs to be robust to possibly heterogeneous eects when estimating the average treatment eects. Ad- vances have been made in econometric methods of identifying and estimating the average eects for xed T static panel data models with correlated random coecients, like Chamberlain (1992), Wooldridge (2005), Arellano and Bonhomme (2012), Graham and Powell (2012), and Bonhomme (2012). Nonetheless, the meth- ods haven’t been extended to the estimation of average eects with dynamic heterogeneity in the outcome processes and treatment eects of continuous variables, whose presence is plausible, particularly in quasi- experiments, and invalidates the FE and dierence-in-dierences (DiD) estimators. Some evidence can be found in Allegretto et al. (2011) and Meer and West (2016), where the two- way xed eects (FE-TE) estimator for the minimum wage’s eect on employment is sensitive to dierent group-specic time trends used as controls. The possibility of cross-sectional dynamic heterogeneity can reconcile the discordant conclusions drawn on the disemployment eect of minimum wage policy in the existing literature. When bothn andT are large, researchers have proposed estimation methods for panel data models with multiple individual-specic parameters, based on which substantial variations in time trends and dy- namics of outcome variables across groups can be found. Pesaran and Smith (1995) studied linear panel data models with heterogeneous regression coecients across individuals. They showed that when regres- sors are weakly exogenous and heterogeneity in coecients is non-negligible, the pooled LS and standard 4 FE estimators for the means of heterogeneous slopes coecients are biased for all niteT andT!1. Thus, they proposed a mean group (MG) estimator that is calculated by averaging over individual OLS estimates, and in dynamic heterogeneous panels, it is root-n consistent when bothn andT tend to innity and p n=T ! 0. But for shortT dynamic panels, since the individual OLS estimators have systematic small-T bias, 1 as an average over these estimators, the MG estimator has small-T bias as well. For largen and relatively shortT panels, this paper recommends Jackknife bias-corrected mean group (MG-JK) estimators by splitting the entire panel along the time dimension studied in Dhaene and Jochmans (2015). The Jackknife bias-correction approach does not require calculation or approximation of bias for- mulas, while analytical bias correction relies on modeling assumptions regarding the underlying data- generating processes of both the outcome variable and regressors. For example, Bao and Ullah (2007) and Kiviet and Phillips (2012) have derived analytical results on the second-order bias of estimators in dierent time-series models, like autoregressive models with other regressors (ARX) and vector autoregressive mod- els. In general, approximations of bias are computed conditional on all the entireT 1 vectors of strictly exogenous regressors and/or initial observations of weakly exogenous regressors assuming Gaussian dis- tributed errors. Hence, analytical bias correction might be more likely to have misspecication problems when it is not known whether some of the regressors are strictly or weakly exogenous and whether the underlying errors follow Gaussian or asymmetric distributions. Under the primitive assumptions provided in the paper, we rst show that a stochastic expansion of the individual OLS estimator can be written as a polynomial function ofT based on the Nagar (1959)-type expansion, which enables the construction of the MG-JK estimators. Since a valid inference of the MG estimators requiresT to increase withn, we can consider both the rst- and second-order bias corrections which are simple to implement through the split-panel Jackknife approach, and derive their asymptotic distributions whenn;T!1 and eithern=T 4 ! 0 orn=T 6 ! 0. 1 It is known in time series analysis the OLS estimators of autoregressive coecients have small-T downward bias, whose formulas were rst derived by Kendall (1954) for AR(1) models without and with an intercept. 5 Note that to eliminate biases of dierent orders inT , the validity of the bias approximation of certain orders is related to the existence of some moments of the errors distributions, which was shown by Sargan (1974) for the OLS estimator in a simultaneous equation model with Gaussian distributed errors. Moreover, the Jackknife approach for correcting higher-order bias relies on well-dened individual estimators based on shorter sub-panels, whose variances need to be nite. Thus, a sucient condition is provided in the paper to ensure the existence of moments of the individual OLS estimates in a dynamic model, where the minimum number of time-series observations,T 0 , used in the calculation is pinned down with respect to the number of regressors. Such conditions,T > T 0 , are explicitly imposed for the validity of Jackknife bias-corrected estimators in each sub-panel, for example, by Dhaene and Jochmans (2015) and Chudik et al. (2018), whereT 0 remains unsolved in these papers. Through Monte Carlo experiments the nite sample properties of the MG-JK estimators are examined in an ARX(1) model with a strictly exogenous regressor. For dierent combinations ofn andT and dierent initial processes, the MG-JK estimators outperform the FE and MG estimators with the smallest biases and root mean squared errors. The FE estimators have the greatest biases and size distortions for both the slope coecients, which become even severer whenT increases while those of the MG and MG-JK estimators shrink. To examine to what extent the minimum wage policy aects employment in the United States, the pa- per measures the average impacts of real minimum wages on two outcomes: total employment and teenage employment in the private sector, using the data from the Quarterly Workforce Indicators (QWI) in Dube et al. (2016) with more than two thousand counties between 2002 and 2011. We consider heterogeneous autoregressive distributed lag (ARDL) models with individual xed eects and time eects and estimate both the short-term and long-term eects of real minimum wages on the two employment outcomes. Note that with heterogeneous autoregressive coecients, the MG estimation allows for county-specic employ- ment dynamics over time. In an ARDL(2,2) model with time xed eects, for the long-term eects on the 6 total employment, the MG-HJK estimate is 0.055 with its estimated standard errors (s.e.) of 0.041, while the FE-HJK estimate is signicantly negative of -0.585 (s.e. 0.247). 2 Furthermore, when controlling for the county total employment, the FE-HJK estimate of the average eect on teenage employment is -0.033 (s.e. 0.042), but the MG-HJK estimate is -0.238 (s.e. 0.061), which shows a considerable negative impact on teenage employment in the long-run. The discrepancies between the FE-HJK and MG-HJK estimation results indicate that there may exist short-term heterogeneity in the employment dynamics captured by its own lags and the dynamic eects of real minimum wages. 3 The rest of the paper is organized as follows. Section 2.2 provides a detailed review of bias-corrected and other estimators in largen and shortT panels with dynamics and cross-sectional heterogeneity. Sec- tion 2.3 sets out a heterogenous dynamic panel data model with covariates. Section 2.4 introduces the MG-JK estimators for the means of heterogeneous slope coecients and shows the asymptotic distribu- tions. In Section 2.5, a sucient condition for the moment existence of MG estimators is derived for an ARX(1) model. In Section 2.6, the performance of the FE, MG, and MG-JK estimators are contrasted in Monte Carlo experiments. In Section 2.7, the MG-HJK estimator with time eects is applied to evaluate the average eects of the minimum wage policy on employment. Section 2.8 concludes and remarks on future works. Proofs of theoretical results, details of the estimation methods, and extensive analyses of minimum wages on employment are included in the appendix. Notations: Generic positive nite constants are denoted byC when large, andc when small. They can take dierent values at dierent instances. Ifff n g 1 n=1 is any real sequence andfg n g 1 n=1 is a sequences of positive real numbers, thenf n =O(g n ), if there existsC such thatjf n j=g n C for alln. f n =o(g n ) if f n =g n ! 0 asn!1. Similarly,f n = O p (g n ) iff n =g n is stochastically bounded, andf n = o p (g n ); if f n =g n ! p 0. For a matrixA,A(a :b;c :d) denotes a block in the matrixA containing all the elements in thea-th to theb-th rows and thec-th tod-th columns ofA. GivenA = (A 1 . . .A 2 ),A 1 andA 2 denote 2 The estimated standard errors of the FE and FE-HJK estimators are clustered at the state level. 3 The aforementioned estimation results correspond to the model with time xed eects. 7 the non-overlapping left and right blocks of the matrixA respectively, where elements inA 1 andA 2 will be further specied. 2.2 RelatedLiterature For static panel data models, several new DiD estimators are proposed for dierent cases of treatment variables (binary or continuous), treatment assignments (staggered or non-staggered), and the number of time periods (two or multiple). Callaway et al. (2021) and de Chaisemartin et al. (2022) also consider con- tinuous treatment variables, where for identication, they impose generalized/conditional parallel trend assumptions and staggered adoption design with xedT panels. This paper shows that it is possible to identify average treatment eects without these assumptions, but it requires a minimum number of time periods with heterogeneous autoregressive coecients in panel data models. The incidental parameter problem in such panels has long been noted in the literature, in particular the small-T bias in estimating largen dynamic panels with individual xed eects and homogeneous slope coecients. Alvarez and Arellano (2003) showed that whenn=T! c, there are asymptotically negative biases in the FE, GMM, and LIMI estimators for an autoregressive panel model with random eects. A lot of eort has been put into developing bias correction methods for valid statistical inference with xed eects. For the mean of the xed eects and common parameters in static nonlinear panels, Hahn and Newey (2004) consider the maximum likelihood estimation (MLE) with analytical bias correction, and show that its asymptotic distribution would center at zero whenn=T 3 ! 0. A leave-one-out Jackknife approach is discussed in their paper as well. Hahn and Kuersteiner (2011) further construct analytical bias correction for MLE of dynamic nonlinear models with individual xed eects, whose asymptotic properties are examined whenn=T ! c. Dhaene and Jochmans (2015) have systematically studied the split-panel Jackknife bias correction for MLE in nonlinear models with incidental parameters. 8 The aforementioned papers mainly cover the case where a scalar individual-specic eect is included in the model. With multiple eects in dynamic nonlinear models, Fernández-Val and Lee (2013) consider various analytical bias correction approaches on GMM estimators where they allow for heterogeneous coecients of strictly exogenous regressors but homogeneous coecients of weakly exogenous regressors in linear and nonlinear panels, and Arellano and Hahn (2016) exploit a modied likelihood-based objective function to approximate bias whenn=T!c. Nonetheless, none of the above papers study linear panel data models with individual xed eects and heterogeneous slope coecients of weakly exogenous regressors. This paper provides a concrete example illustrating the eects of dierent asymptotic arrays on the estimation and inference of dynamic panels with multiple individual-specic parameters, where the small-T biases can be substantial, and an increase inn leads to greater size distortion. There are some other studies of identication and estimation strategies for xed T heterogeneous dynamic panels, and the proposed estimators rely on assumptions imposed on the functional forms of ran- dom coecients in the model or their distributions conditional on available strictly exogenous independent variables, including Sasaki (2015) and Mavroeidis et al. (2015). These papers have adopted a likelihood- based deconvolution approach, which in general will encounter the problem of the curse of dimensionality when there are many covariates in the model. Other than the average eects, people are interested in distribution characteristics of unit-specic parameters, where Jackknife bias-corrected estimators are also considered to reduce small-T bias when estimation relies on individual estimates. For example, Okui and Yanagi (2019) estimate the cross-sectional distribution function, quantile function, and moments of the heterogeneous means, autocovariances, and autocorrelations of a scalar variable by the sample analogue based on individual time-series estimates. They use Jackknife bias correction with Bootstrap inference for large n and short T panels. Okui and Yanagi (2020) also consider estimation of the density function by the nonparametric kernel-smoothing 9 method with Jackknife, where for consistency and asymptotic normality of the kernel estimator, a greater T is required for a givenn to shrink the asymptotic bias compared with the parametric estimators. Nonlinear models are popular in analyses of dynamic processes with multiple heterogeneous eects. Arellano and Bonhomme (2016) develop an iterative simulation-based approach for the estimation of quan- tile regressions with short T panels, including autoregressive models. In particular, for analyzing het- erogeneous dynamic income processes in the PSID data set, Arellano et al. (2017) assume the persistent component follows a general rst-order Markov process and estimate the income process via quantile re- gressions, which provides evidence of heterogeneity through the quantile autoregressions. Fernández-Val et al. (2022) apply dynamic heterogeneous distribution panel regressions to the labor income processes in the PSID and estimate by the FE methods, where the variance is estimated by Bootstrap under the strong heterogeneity assumption. While heterogeneous dynamic eects are often studied in nonlinear panels, the goal of this paper is to develop an implementable estimation method for the average treatment eects in shortT panels under relatively general assumptions that permit heterogeneity in treatment eects and dynamic processes of the outcome variable. The MG estimation is designed for the mean of heterogeneous regression coecients in linear panel data models. By attaching a uniform weight to each unit-specic estimator, the MG estimator in eect converges to the average treatment eects of the population when more cross-sectional units are observed in the data based on a random sampling scheme. It is crucial to examine both the asymptotic and nite sample properties of the MG and FE estimators and provide guidance for (i) when the MG estimator is preferred to the FE estimator, and (ii) when bias correction is required for a wide range of commonly used sample sizes wheren is larger thanT , such that researchers can draw reliable conclusions. 10 2.3 DynamicPanelDataModelswithHeterogeneousCoecients Consider a linear panel data model given by y it = i + i y i;t1 + 0 i x it +u it ; (2.1) fori = 1; 2;:::;n andt = 1; 2;:::;T , where i is an individual xed eect,x it is ak x 1 vector of regressors that could be weakly exogenous of the idiosyncratic errors,u it , and thek 1 vector of slope coecients i = ( i ; 0 i ) 0 are heterogeneous across individuals. The parameters of interest are the average short-run or long-term eects as the mean of heterogeneous coecients 0 =plim n!1 n 1 n X i=1 i0 ! : (2.2) In a xed T panel with individual xed eects and nontrivial heterogeneity in slope coecients of weakly exogenous regressors, it is indispensable to impose assumptions on the joint or conditional distri- butions of ( i ; 0 i ) 0 and initial conditions (y i0 ;x 0 i0 ) 0 for identication and estimation of even the mean of ( i ; 0 i ) 0 , regardless of how general or restrictive those assumptions can be. 4 Instead, we consider estimation and inference of 0 in largen panels whereT grows at a much slower rate thann such thatT is relatively short. Pesaran and Smith (1995) showed that the FE estimators are biased even whenT!1 in heterogeneous dynamic panels. Thus, we resort to the MG estimator they proposed for largen and largeT panels given by ^ MG =n 1 n X i=1 ^ i ; (2.3) 4 See Chamberlain (2022) for an example of identication problems with a discrete weakly exogenous regressor on pp. 10–12. 11 and fori = 1; 2;:::;n, ^ i = (W 0 i M T W i ) 1 W 0 i M T y i ; (2.4) wherew it = (y i;t1 ;x 0 it ) 0 ,W i = (w i1 ;w i2 ;:::;w iT ) 0 , andM T = I T T 0 T with aTT identity matrixI T and aT 1 vector of ones T . With the presence of heterogeneity in coecients, it is later shown that under the asymptotic rectangular arrays n;T !1 and p n=T ! 0, the MG estimator is p n-consistent in dynamic panels, p n(^ MG 0 ) = 1 p n n X i=1 ( i0 0 ) | {z } =Op(1) + 1 p n n X i=1 (^ i i0 ) | {z } =Op(1= p T ) ; and normally distributed with mean zeros, provided its second-order moment exists. For the smallT bias in the individual OLS estimator, Kendall (1954) rst derived the rst-order bias formula of the AR(1) coecient given by ^ i i0 = 1 + 3 i0 T +o p 1 T : Without the subscript i, it is also known as the Nickell (1981) bias in panels with xed eects. Since the individual OLS estimators that the MG estimator averages on have biases in the same direction, we consider bias correction by the split-panel Jackknife method studied in Dhaene and Jochmans (2015) for largen and shortT panels. To establish the validity of the Jackknife approach, the primitive assumptions for the dynamic linear panel data models are the following. Assumption1(Errors) Fori = 1; 2;:::;n,theerrors,u it ,in(2.1)arecross-sectionallyindependentwithzero means,E(u it ) = 0, and possibly cross-sectional heteroskedastic withE(u 2 it ) = 2 i and 0<c< 2 i <C. 12 Assumption2(Stableprocesses) Foralliandt,thek x 1vectorofregressors,x it ,fort = 1; 2;:::;T,can beeither(a)strictlyexogenous:E(u it x i;th ) = 0 kx forh2Z,or(b)weaklyexogenous:E(u it x i;th ) = 0 kx forh2Z + . (a) For alli andt,x it is strictly exogenous withRank(X i ) = k x whereX i = (x i1 ;x i2 ;:::;x iT ) 0 , and sup i j i j< 1. (b) For alli andt,x it is weakly exogenous and assumed to have the following linear process x it = ix + 0 i y i;t1 +u it;x : (2.5) Forw it = (y i;t1 ;x 0 it ) 0 , combining (2.1) and (2.5) we have 0i w it = iw + 1i w i;t1 +u it;w ; (2.6) where iw = ( i ; 0 ix ) 0 , 0i = 0 B B @ 1 0 1kx 0 i I kx 1 C C A , and 1i = 0 B B @ i 0 i 0 kx1 0 kxkx 1 C C A with serially uncorrelated errorsu it;w = (u i;t1 ;u 0 it;x ) 0 IID(0;V i ), and 0 < c < 0 jV i j < max jV i j<C. Given(2.6)and 0i isinvertible,areduced-formVAR(1)representationcanbeobtained as follows w it = iw + i w i;t1 +e it;w ; (2.7) where iw = 1 0i iw , i = 1 0i 1i , ande it;w = 1 0i u it;w . The support of the spectral radius %( i ) is strictly inside the unit circle fori = 1; 2;:::;n. 13 Assumption3(Initialization) If thek 1 regressors,w it , are weakly exogenous, the processes offw it g start from a distant past. Remark1 Asubsetoftheregressorscanbestrictlyexogenous,andinthiscase,theycouldfollowanonlinear processwithseriallycorrelatederrors. ToestablishtheasymptoticpropertiesoftheMG-JKestimatorswithx it being weakly exogenous, case (b) in Assumption 2 is imposed, and combined with Assumption 3, they imply thatw it arecovariancestationary,whichensuresthatthefollowingstochasticexpansionofbiasof ^ i isvalid. The regressors need not be stationary when they are strictly exogenous. A necessary condition for consistency would be the remaining bias of the MG-JK estimators being of a lower order, likeO p (T 3=2 ). For example, Chudiketal.(2018)provideaconditioninequation(19)underwhichtheremainingbiasofthetwo-wayxed eects estimator in a homogeneous dynamic panel iso p (T 1 ) after rst-order Jackknife bias correction. Assumption4(Regularity) Fori = 1; 2;:::;n, there exists a niteT 0 such that forT >T 0 , (i) the second-order moment of ^ i exists, (ii) thekk matrix iT =T 1 W 0 i M T W i (2.8) is positive denite and bounded fori = 1; 2;:::;n withsup i Ek iT k<C, (iii) and fors 5, up to thes-th moments of iT exist. (iv) AsT!1, plim T!1 iT = i ; where i is positive denite, andsup i k i k<C. Remark2 For Assumption 4 (i), the value ofT 0 (relative tok) will be derived in the following section for an ARX(1) model with strictly exogenousx it such that the OLS estimator, ^ i , has at least nite second-order moments withT >T 0 time-series observations. 14 Remark3 Assumption4(iii)willbesatisedifuptothe 2s-thmomentsofu it in(2.1)arenitewithstrictly exogenousx it fori = 1; 2;:::;n. Assumption5(Smoothness) Fori = 1; 2;:::;n, letz iT be a nite vector containing the distinctive com- ponents ofW 0 i M T W i andW 0 i M T y i (i.e., the scalar elements in these two matrices), andz i = E(z iT ). Then ^ i =f i (z iT ), and i0 =f i (z i ); wheref i is a vector-valued continuous function, andf i () is dierentiable up tol-th order with derivatives that are uniformly bounded in a neighborhood ofz i asT!1 forl 5. Assumption6(Heterogeneousregressioncoecients) Fori = 1; 2;:::;n, thek 1 regression coe- cients follow the random coecient model i = 0 + i , i IID(0; ); where 0 is given by (2.2) withk 0 k < C, and is akk symmetric nonnegative denite matrix with k k<C. Lemma1(ExpansionofasymptoticbiasesofindividualOLSestimators) Consider the panel data model (2.1), fori = 1; 2;:::;n andt = 1; 2;:::;T. Suppose Assumptions 1–4 hold and there exists a nite constantT 0 such that forT >T 0 ,kE(^ i )k<C. Then asT!1, E(^ i ) i0 = B i1 T + B i2 T 2 +O 1 T 3 ; (2.9) whereB i1 andB i2 are vectors containing nite constants. See Appendix A.2.1 for proof. 15 Remark4 The analytical formula of theO(T 1 ) bias of the autoregressive coecients in the ARX(1) and VAR(1) models can be found in Bao and Ullah (2007) with non-Gaussian errors, and the formula of up to O(T 2 ) bias for the entire vector of regression coecients with strictly exogenousx it can be found in Kiviet and Phillips (2012) with Gaussian distributed errors. Given (2.9), the Jackknife bias correction can be done by a linear combination of the MG estimates of the full panel and dierent sub-panels which are split along the time dimension. 2.4 MeanGroupEstimatorswithSplit-panelJackknifeBiasCorrection To minimize the remaining bias of a higher order, we split the entire panel along the time dimension into non-overlapping sub-panels. 5 Leth denote the minimum number of sub-panels needed for reducing up to the O T h+1 bias with h 2, and H h;j denote the j-th sub-panel that contains T=h time-series observations forj = 1; 2;:::;h. 6 Provided the individual estimators in each sub-panel are well-dened, the MG estimators with half-panel and one-third-panel Jackknife bias correction (MG-HJK and MG-TJK respectively) can be calculated as follows. The MG-HJK estimator is given by ^ MGHJK = 1 n n X i=1 ^ i;HJK ; (2.10) where ^ i;HJK = 2^ i 1 2 (^ iH 2;1 + ^ iH 2;2 ); (2.11) 5 See Dhaene and Jochmans (2015) and Chambers (2013), where dierent sub-sampling schemes to apply the Jackknife method are compared. 6 For illustration purpose, we assumeT=h is an integer. In cases whereT=h is not an integer, we recommend throwing away the rst few observations to construct the respective full panel and non-overlapping sub-panels. 16 and ^ i , ^ iH 2;1 , and ^ iH 2;2 are the OLS estimators based on individuali’s all (t = 1; 2;:::;T ), the rst half (t = 1; 2;:::;h), and the second half (t =h + 1;h + 2;:::;T ) of time-series observations. Analogously, the MG-TJK estimator is given by ^ MGTJK = 1 n n X i=1 ^ i;TJK (2.12) where ^ i;TJK = 3^ i 3 2 (^ iH 2;1 + ^ iH 2;2 ) + 1 3 (^ iH 3;1 + ^ iH 3;2 + ^ iH 3;3 ): (2.13) Compared with analytical bias correction, the Jackknife bias correction approach does not require knowl- edge about the bias formulas and is easy to implement when eliminating higher-order biases. With the presence of non-negligible heterogeneous in slope coecients, the asymptotic variance of the MG-JK estimators can be nonparametrically and consistently estimated based on the sample variance of ^ i;JK fori = 1; 2;:::;n irrespective of the order of bias correction given by \ Var(^ MGJK ) = 1 n n X i=1 (^ i;JK ^ MGJK )(^ i;JK ^ MGJK ) 0 : (2.14) Note that to eliminate higher-order bias likeO(T 2 ), the split-panel Jackknife approach relies on individ- ual OLS estimators of shorter non-overlapping sub-panels, and at least (T 0 + 1) time-series observations are required for each sub-panel such that that these estimates can be well dened. Hence, it is more de- manding on the length of panels when we eliminate higher-order biases. Remark5 Dhaene and Jochmans (2015) consider split-panel Jackknife bias correction for maximum likeli- hoodestimatorsoflinearandnonlinearpaneldatamodelswithindividualxedeects. Comparedwithother partition schemes of panels, they show that with equal partitions of the entire panels into non-overlapping sub-panels, the magnitude of the remaining higher-order bias terms can be minimized. This result is also 17 found by Chambers (2013) in the Monte Carlo simulation results of Jackknife bias-corrected estimators for an AR(p) model. Theorem1(AsymptoticdistributionoftheMG-JKestimatorsindynamicpanels) Consider the panel data model (2.1), fori = 1; 2;:::;n andt = 1; 2;:::;T. Suppose Assumptions 1–5 hold and there exists a nite constantT 0 such that forT=3>T 0 ,kVar(^ i;HJK )k<C. Then asT!1, sup 1in plim T!1 ^ i;HJK i0 =O 1 T 2 (2.15) and sup 1in plim T!1 ^ i;TJK i0 =O 1 T 3 : (2.16) Under Assumption 6, asn;T!1, (i) if n T 4 ! 0, then p n(^ MGHJK 0 ) d !N(0 k ; ); (2.17) (ii) and if n T 6 ! 0, then p n(^ MGTJK 0 ) d !N(0 k ; ): (2.18) See Appendix A.2.2 for proof. Theorem2(ConsistentestimatoroftheasymptoticvarianceoftheMG-JKestimators) Consider the panel data model (2.1), fori = 1; 2;:::;n andt = 1; 2;:::;T. Suppose Assumptions 1–6 hold and there exists a nite constantT 0 such that forT=3>T 0 ,kVar(^ i;HJK )k<C. Then asn;T!1, (i) if n T 4 ! 0, then n \ Var(^ MGHJK ) = 1 n 1 n X i=1 (^ i;HJK ^ MGHJK )(^ i;HJK ^ MGHJK ) 0 ! ; (2.19) 18 (ii) and if n T 6 ! 0, then n \ Var(^ MGTJK ) = 1 n 1 n X i=1 (^ i;TJK ^ MGTJK )(^ i;TJK ^ MGTJK ) 0 ! : (2.20) See Chudik and Pesaran (2019) for proof. 2.5 ExistenceofMomentsofOLSEstimatorinanARX(1)Model The MG-JK estimators require at least second-order moments of all individual estimators over which it averages to exist. In this section, we rst establish sucient conditions for the existence of moments of ^ i with strictly exogenousx it and solve forT 0 . For the estimator of other coecients of strictly exogenous regressors, ^ i , a less stringent condition is required for the existence of its moments, which will be satised if the former condition holds, that is,T >T 0 . Model (2.1) can be written compactly as y i = i T + i y i;1 +X i i +u i ; wherey i = (y i1 ;y i2 ; ;y iT ) 0 ,y i;1 = (y i0 ;y i1 ; ;y i;T1 ) 0 , andu i = (u i1 ;u i2 ; ;u iT ) 0 areT 1 vectors, andX i = (x i1 ;x i2 ; ;x iT ) 0 is aTk x matrix. LetG l be aT (T + 1) selection matrix dened byG l = 0 T(1s) . . .I TT . . .0 Tl , wherel = 0; 1 refers to the order of lag. Thus, y i;1 =G 1 ~ y i , andy i =G 0 ~ y i ; where ~ y i = (y i0 ;y i1 ;:::;y iT ) 0 . Then the OLS estimator of i0 can be written as ^ i = y 0 i;1 M i y i y 0 i;1 M i y i;1 = (G 1 ~ y i ) 0 M i G 0 ~ y i (G 1 ~ y i ) 0 M i G 1 ~ y i = ~ y 0 i A i ~ y i ~ y 0 i B i ~ y i ; (2.21) 19 where A i =G 0 1 M i G 0 = 0 B B @ 0 T M i 0 0 0 T 1 C C A ,B i =G 0 1 M i G 1 = 0 B B @ M i 0 T 0 0 T 0 1 C C A ; andM i =I T Z i (Z 0 i Z i ) 1 Z 0 i areTT matrices withZ i = ( T ;X i ). To pin downT 0 , we rst show that conditional on the strictly exogenous regressors ~ y i has a non-degenerate distribution given by (2.22). Lemma2(Non-degeneratedistributionoftheoutcomevariable) Suppose Assumptions 1 and 2 (a) hold, and the process of y it for t = 1; 2;:::;T is generated by (2.1) with a random initial condition with E (y i0 jX i ) = i0 ,Var (y i0 jX i ) =! 2 i0 , and 0r: (2.24) For the existence of variance (i.e.,r = 2) of ^ i , it requires T >T 0 =k x + 3; (2.25) that is, at least (k x + 4) number of time-series observations used in estimation where the initial observation, y i0 , is used as a regressor in an ARX(1) model. See Appendix A.2.4 for proof. Example1 Forx it being ascalar strictlyexogenous regressor (k x = 1) in anARX(1) model, the existence of the rst and second moments of ^ i requires at leastT 5h withh non-overlapping sub-panels for inference based on the MG-JK estimator correcting for theO(T h+1 ) bias. Remark6 Lemma A.1 generalizes the conditions for the existence of the ratio of quadratic forms of normal randomvectorsprovidedbyMagnus(1986). 9 ItisconsistentwiththeresultsinSmith(1988),whichareutilized 9 See Section 6 in Magnus (1986). 21 by Pesaran and Timmermann (2005) to establish conditions of moment existence for estimators of coecients in the AR(1) model. Remark7 In Appendix, we have shown that the numerator and denominator of ^ i are not independent of eachother,thustheconditionofmomentexistenceof ^ i cannotsimplylieontheexistenceofcertainmoments of its denominator. 2.6 MonteCarloSimulation In the Monte Carlo experiments, we consider a heterogeneous ARX(1) panel model with a strictly exoge- nous regressor. With dierent sample sizes, we rst compare the performance of the FE, MG, MG-HJK, and MG-TJK estimators of both the autoregressive coecient and the coecient of the strictly exogenous regressor. For the MG estimators without and with the Jackknife, the Monte Carlo experiments also exam- ine the eectiveness of bias reductions and the validity of inference with dierent means of autoregressive coecients and dierent initial conditions. Simulation results with time eects in regressions are also provided. 2.6.1 Data-generatingProcessofanARX(1)PanelDataModel The outcome variable and regressor,y it andx it , are generated as y it = i + t + i y i;t1 + i x it +u it (2.26) fort =m;m + 1;::; 0; 1; 2;:::;T with u it =c u iu e it ; (2.27) 22 and x it = ix +v it ; fort =50;49;:::;1; 0; 1;:::;T; (2.28) with v it = ix v it1 + (1 2 ix ) 1=2 c v iv e it;x ; (2.29) wheree it IID(0; 1) is drawn from Gaussian or chi-squared distributions ande it;x IIDN(0; 1) with heteroskedastic variances: 2 iu IID 1 2 (z 2 i + 1) withz i IIDN(0; 1) and 2 iv IID 1 3 (z iv + 1) with z iv 2 2 such thatE( 2 iu ) = E( 2 iv ) = 1, andc u = 0:5 andc v = 2. We generate the strictly exogenous regressorx it 50 periods ago before the initial state ofy it , such that fort =m;m + 1;:::;T ,x it used to generatey it is stationary. As well documented in the literature like Anderson and Hsiao (1981), dierent underlying processes of initial conditions will render the FE and other MLE estimators inconsistent whenn andT do not tend to innity simultaneously in homogeneous autoregressive panels with individual xed eects. Moreover, Kiviet and Phillips (2012) nd that the accuracy of bias-corrected estimators based on approximation for- mulas is sensitive to dierent processes of the initial observations in their Monte Carlo simulations. Hence, in the baseline case, we consider starting the process ofy it from the period 0, i.e.,m = 0 where the initial state is as follows with = 1, y i0 = 1 i ( i + i ix +u i0 ): (2.30) The individual xed eects fory it andx it are generated jointly as 0 B B @ i ix 1 C C A IIDN 0 B B @ 0 B B @ 0:5 0:5 1 C C A ; 0 B B @ 1=4 1=16 1=16 1=4 1 C C A 1 C C A : The heterogeneous slope coecients are generated as i IIDU(0; 1) and ix IIDU(0; 0:95). In terms of the DGP of i , we experiment with three degrees of average persistency: 23 (i) low average persistency, i IIDU(0; 0:6) with 0 =E( i ) = 0:3, (ii) median average persistency, i IIDU(0:05; 0:95) with 0 = 0:5, and (iii) high average persistency, i IIDU(0:45; 0:95) with 0 = 0:7, to illustrate how severe the biases could be in the FE and MG estimators even with the presence of a strictly exogenous regressor. We rst consider estimation without time eects ( t = 0 for allt) and evaluate estimation with non- zero time eects later where t randomly draws fromf0:4;0:3;0:2;0:1; 0:1; 0:2; 0:3; 0:4g fort = 1; 2;:::;T 1 under the normalization 0 T = 0 with 0 = 0. The experiments are conducted for all combinations ofn2f50; 200; 1000; 2000g andT2f12; 16; 20; 30; 50g with 2,000 replications. 2.6.2 SimulationResults Tables 2.1 and 2.2 report bias, root mean squared errors (RMSE), and size of the FE, MG, MG-HJK, and MG- TJK estimators for the means of heterogeneous coecients, and, in the ARX(1) model given by (2.26). With weakly exogenous regressors and heterogeneity in the respective slope coecients, the FE estimator is inconsistent for all combinations ofn andT . The magnitudes of its bias increase largely withT for both coecients. Hence, the FE estimator may provide even more misleading results with more time-period observations. 10 On the contrary, the bias of the MG estimator shrinks whenT grows, its RMSE decreases in bothn andT , and its size tends to the 5% level whenT increases for a xedn. In both Tables 2.1 and 2.2, it is shown that the magnitudes of biases in all the reported estimators only depend onT and do not vary withn. Thus, whenn increases, distortions in size are greater for a xedT . In line with the theory, the MG estimator of in Table 2.1 is always downward biased, and the FE estimator is upward biased. 11 In Table 2.2, the magnitudes of bias in the FE and MG estimators of are smaller than those of, since in 10 The FE estimator is still biased even wheni = for alli, as long asi is heterogeneous. 11 We do not consider the FE estimator with half-panel Jackknife bias correction in the experiments of estimation with het- erogeneous slope coecients in dynamic panels, whose estimates of will be even more biased upward than the FE estimator without bias correction. 24 the DGPx it is strictly exogenous and i is independently distributed of regressors. In the case where (a) there is feedback from the past outcomes to the current treatment and heterogeneity in treatment eects or (b) heterogeneity in treatment eects is correlated with covariates and/or initial states of the outcome variable, estimation of by the FE method could be more problematic. Comparing the MG estimators without and with bias corrections, the Jackknife method eectively reduces the bias across all dierent sample sizes being considered. Note that even when T = 30 and n = 50, there is a non-negligible size distortion in the MG estimator of with our design of the Monte Carlo experiments. Comparing the MG-HJK and MG-TJK estimators, we can nd that the MG-TJK estimator does not necessarily have smaller biases than the MG-HJK estimators with non-stationary initial conditions gener- ated based on (2.30). For the RMSE, in general, the MG estimators require a relatively largern to shrink the estimation errors, especially for the MG-JK estimators where individual estimators based on shorter sub-panels are used. The dierences between the RMSE of the MG-HJK and MG-TJK estimators are no- ticeable especially whenT is relatively short (T = 12; 16), andn is small (n = 50; 200). Since there are fewer time-series observations in each one-third panel, the respective individual estimators and thus the MG-TJK estimator based on such sub-panels are more likely to be irregular. Moreover, when n is also small, the MG-TJK estimator is easily aected by some outlying individual estimates and is more likely to yield unreasonable results. Thus, we recommend the MG-HJK estimator over the MG-TJK estimator for the robustness of dierent initial conditions and smaller estimation errors. Figures 2.1 and 2.2 illustrate the respective empirical power functions of for bothT = 12 andT = 50. The empirical power functions of the FE estimator do not center around the true value. Whenn 1000, the FE estimator fully rejects the true value. With largerT , the empirical power functions of the MG, MG- HJK, and MG-TJK estimators center closer to the true value. Compared with the MG-HJK estimator, the empirical power curve of the MG-TJK estimator is much wider and has weaker power against alternative 25 values. WhenT = 50, there is still a gap between the MG estimator and the true value, and the dierences between the empirical power curves of the MG-HJK and MG-TJK estimators become negligible. Table 2.1: Bias, RMSE, and size (100) of FE, MG, MG-HJK and MG-TJK estimators of ( 0 = 0:5) in an ARX(1) model Bias RMSE Size (100) MG MG MG MG MG MG T n FE MG -HJK -TJK FE MG -HJK -TJK FE MG -HJK -TJK 12 50 0.121 -0.072 -0.005 0.014 0.140 0.085 0.058 0.137 55.2 36.8 6.1 5.3 12 200 0.131 -0.072 -0.003 0.017 0.136 0.075 0.028 0.111 96.9 90.2 4.8 5.2 12 1000 0.134 -0.072 -0.003 0.017 0.135 0.073 0.013 0.047 100.0 100.0 4.8 8.1 12 2000 0.135 -0.072 -0.003 0.017 0.135 0.072 0.009 0.030 100.0 100.0 6.2 13.2 16 50 0.150 -0.055 0.001 0.013 0.165 0.069 0.047 0.139 72.5 24.2 4.5 4.7 16 200 0.162 -0.054 0.002 0.013 0.166 0.058 0.024 0.108 99.7 71.9 5.1 5.7 16 1000 0.166 -0.055 0.001 0.013 0.167 0.056 0.011 0.036 100.0 100.0 5.7 6.2 16 2000 0.166 -0.055 0.002 0.013 0.166 0.055 0.008 0.028 100.0 100.0 4.7 9.2 20 50 0.172 -0.043 0.002 0.005 0.185 0.060 0.046 0.065 81.3 19.1 6.6 5.9 20 200 0.182 -0.044 0.003 0.008 0.185 0.048 0.022 0.033 100.0 58.8 5.3 5.9 20 1000 0.186 -0.044 0.003 0.007 0.187 0.045 0.010 0.016 100.0 100.0 6.0 7.8 20 2000 0.187 -0.044 0.003 0.008 0.188 0.044 0.008 0.013 100.0 100.0 7.9 12.0 30 50 0.202 -0.029 0.003 0.003 0.213 0.049 0.041 0.047 92.8 12.1 5.7 5.5 30 200 0.215 -0.030 0.002 0.003 0.218 0.036 0.021 0.024 100.0 32.4 5.5 5.5 30 1000 0.218 -0.030 0.003 0.004 0.218 0.031 0.010 0.011 100.0 91.8 7.3 6.8 30 2000 0.219 -0.029 0.003 0.004 0.219 0.030 0.007 0.009 100.0 99.6 8.1 9.4 50 50 0.228 -0.019 0.001 0.000 0.237 0.042 0.038 0.041 96.8 8.3 5.1 5.5 50 200 0.242 -0.018 0.001 0.001 0.244 0.026 0.020 0.021 100.0 16.2 5.0 5.0 50 1000 0.247 -0.018 0.001 0.001 0.248 0.020 0.009 0.009 100.0 56.0 5.6 5.1 50 2000 0.247 -0.018 0.002 0.001 0.247 0.019 0.006 0.007 100.0 84.9 7.0 6.5 Notes: The model is given byyit = i +iyi;t1 +ixit +uit withi IIDU(0:05; 0:95), i IIDU(0; 1), = 1 in initial conditions and non-Gaussian errors. “FE" denotes the xed eects estimator of panel data models. “MG," “MG-HJK," and “MG-TJK" denote the mean group estimators without bias correction, with half-panel Jackknife bias correction eliminating rst-order bias, and with one-third-panel Jackknife bias correction eliminating up to second-order bias. 26 Table 2.2: Bias, RMSE, and size (100) of FE, MG, MG-HJK and MG-TJK estimators of ( 0 = 0:5) in an ARX(1) model Bias RMSE Size (100) MG MG MG MG MG MG T n FE MG -HJK -TJK FE MG -HJK -TJK FE MG -HJK -TJK 12 50 -0.023 0.007 0.007 0.010 0.058 0.046 0.051 0.103 8.6 5.7 5.5 4.2 12 200 -0.024 0.007 0.006 0.009 0.037 0.025 0.027 0.077 15.8 7.0 6.2 4.9 12 1000 -0.025 0.007 0.006 0.010 0.028 0.013 0.013 0.043 49.6 11.9 7.8 6.9 12 2000 -0.025 0.007 0.006 0.008 0.027 0.010 0.010 0.021 79.2 16.1 11.4 8.2 16 50 -0.033 0.006 0.003 0.006 0.064 0.045 0.047 0.119 13.7 6.4 5.5 5.2 16 200 -0.032 0.006 0.004 0.009 0.042 0.022 0.023 0.073 23.3 5.5 4.9 4.8 16 1000 -0.032 0.006 0.005 0.009 0.034 0.012 0.011 0.028 74.1 10.6 7.6 6.6 16 2000 -0.032 0.007 0.005 0.010 0.033 0.010 0.009 0.023 95.5 16.4 10.4 9.1 20 50 -0.035 0.007 0.005 0.006 0.064 0.043 0.045 0.057 14.1 6.1 6.2 6.5 20 200 -0.038 0.006 0.004 0.004 0.047 0.023 0.023 0.029 30.9 6.3 6.5 6.0 20 1000 -0.038 0.006 0.004 0.004 0.039 0.012 0.011 0.013 88.3 10.6 7.4 6.2 20 2000 -0.038 0.006 0.004 0.004 0.039 0.009 0.008 0.010 99.6 15.5 9.3 8.1 30 50 -0.047 0.003 0.001 0.000 0.069 0.042 0.042 0.046 18.4 5.5 5.5 5.9 30 200 -0.048 0.005 0.002 0.001 0.055 0.022 0.021 0.023 46.1 5.8 5.8 5.3 30 1000 -0.048 0.005 0.002 0.001 0.050 0.011 0.010 0.010 98.4 8.3 5.8 5.3 30 2000 -0.048 0.005 0.002 0.001 0.049 0.008 0.007 0.007 100.0 11.5 6.9 5.7 50 50 -0.057 0.002 0.000 -0.001 0.077 0.042 0.042 0.043 24.9 5.8 5.7 5.9 50 200 -0.058 0.004 0.002 0.001 0.064 0.021 0.021 0.021 63.9 5.5 5.1 4.9 50 1000 -0.059 0.004 0.001 0.000 0.061 0.010 0.010 0.010 99.9 7.0 5.7 5.1 50 2000 -0.060 0.003 0.001 0.000 0.061 0.007 0.007 0.007 100.0 7.2 4.7 4.7 Notes: The model is given byyit = i +iyi;t1 +ixit +uit withi IIDU(0:05; 0:95), i IIDU(0; 1), = 1 in initial conditions and non-Gaussian errors. “FE" denotes the xed eects estimator of panel data models. “MG," “MG-HJK," and “MG-TJK" denote the mean group estimators without bias correction, with half-panel Jackknife bias correction eliminating rst-order bias, and with one-third-panel Jackknife bias correction eliminating up to second-order bias. 27 Figure 2.1: Empirical rejection frequencies (at 5% nominal level) of FE, MG, MG-HJK and MG-TJK estima- tors of ( 0 = 0:5) in an ARX(1) model withT = 12 andn = 50; 1000; 2000 28 Figure 2.2: Empirical rejection frequencies (at 5% nominal level) of FE, MG, MG-HJK and MG-TJK estima- tors of ( 0 = 0:5) in an ARX(1) model withT = 50 andn = 50; 1000; 2000 29 2.7 EmpiricalApplicationsoftheMinimumWagePolicy 2.7.1 LiteratureReviewofMinimumWageStudies This section provides a review of the existing literature studying the impacts of minimum wages on em- ployment in the United States, which is most related to the analyses in this paper. In estimating the eects of minimum wages on employment, the FE-TE and DID estimators in panel data models are often used. Using state-level data, Neumark and Wascher (1992) nd a signicant and negative minimum wage elas- ticity of teenage employment of around -0.14. 12 Using the Quarterly Workforce Indicators (QWI) data at the county level during 1996–2000, Thompson (2009) obtained a substantial negative impact on teenage employment in counties where the minimum wage was likely binding, with the estimated minimum wage elasticities of teenage employment around -0.3. 13 To control for spatial heterogeneity in employment trends due to local economic conditions which can be correlated with state-level minimum wage variations, Dube et al. (2010) compared pairs of contiguous counties in the United States that straddle a state border. They found that the eect of minimum wages on total employment in the private sector is indistinguishable from zero using the Quarterly Census of Employment and Wages (QCEW) data. Similar conclusions were derived in Dube et al. (2016) for teenage employment and employment of restaurant workers using the QWI data. However, by comparing counties within a cross-border commuting zone, Jha et al. (2022) found adverse eects of minimum wages on the employment of restaurant workers of -0.141 with the QCEW data. 14 Moreover, there is a disagreement about whether it is appropriate to include additional region-specic time eects and trends in xed eects regressions, where Allegretto et al. (2011) and Meer and West (2016) 12 See column (2) in Table 2 on p.63 of Neumark and Wascher (1992). 13 The DID estimators in Thompson (2009) are calculated using two-year observations. See Table 5 on p. 354. 14 See column (4) in Table 1 on p.5 of Jha et al. (2022). 30 nd that the FE-TE estimators are not robust once additional region-specic time eects are used as con- trols. Based on the state-level data of the Current Population Survey (CPS) 15 , Allegretto et al. (2011) ob- tained an insignicant elasticity on teenage employment of 0.047 with additional regional-specic time eects and trends. 16 For total employment, Meer and West (2016) obtain a signicant long-term elasticity of -0.074 using the FE-TE estimator and the Business Dynamics Statistics (BDS) state-level data. 17 To control for such time-varying heterogeneity in panels, Powell (2022) proposes a generalized syn- thetic control estimator for average eects in xedn and largeT panels, where the estimated minimum wage elasticity of teenage employment across states is -0.178 using the CPS data. 18 Totty (2017) apply static panel data models with unknown factors to control the correlation between regressors and unob- served policy shocks. Based on the common correlated eects mean group method proposed by Chudik and Pesaran (2015), the estimated minimum wage elasticity of teenage employment is 0.001 using the CPS data and -0.089 using the QWI data over the period 1990–2013. 19 The debate over the employment eects of minimum wages is still indeterminate, especially for teenage employment and the employment of workers in restaurants and retail stores. 20 In particular, the following issues are still outstanding in the literature: (i) the sensitivity of estimates with respect to dierent controls used to deal with possibly heterogeneous pre-existing trends and (ii) the possibility of correlation between minimum wages and other policy shocks to low-wage workers. This paper investigates whether interactions between observed heterogeneous eects and dynamics in the short run would result in dierent estimation results for both the short-term and long-term average treatment eects. How can the MG estimation in a heterogeneous autoregressive distributed lag panel overcome the above concerns? With heterogeneous distributed lag (DL) coecients of minimum wages, 15 Note that the CPS is a survey data set, while the BDS, QCEW, and QWI are census data. 16 See column (4) in Table 3 on p.218 of Allegretto et al. (2011). 17 See column (1) in Table 4 on p.516 of Meer and West (2016) using the sample from Business Dynamics Statistics between 1977 and 2011. 18 See Table 3 of Powell (2022) on p.1312. 19 See Table 5 of Totty (2017) on p.1724. 20 See Allegretto et al. (2017), Neumark and Wascher (2017), and Neumark (2019). Summaries of some estimation results can be found in Neumark and Wascher (2017) Table 2 on p.603 and Totty (2017) Table 1 on p.1715. 31 the model can capture dierential dynamic treatment eects across individuals as in Cengiz et al. (2019) where the employment eects are correlated with wage levels. Both levels and growth rates of the outcome variable are allowed to vary among cross-sectional units. Furthermore, with heterogeneous autoregres- sive (AR) coecients, the MG estimation does not impose the parallel trend assumption or a staggered treatment design to identify the average treatment eects assumed in a number of studies in the literature. Some policy shocks are also soaked by the lagged outcome variables. In the following, we use the county- level QWI data with a relatively short period, 2002–2011, to reduce the possibility of time variations in the regression parameters, where the MG-HJK and FE-HJK estimation results are contrasted. 2.7.2 AverageEectsofRealMinimumWagesonEmployment The short-term, as well as long-term eects of minimum wages on employment, are estimated based on a heterogeneous autoregressive distributed lag (ARDL(p,q)) model with time eects given by ln (Emply it ) = i + t + p X j=1 i;j ln (Emply i;tj ) + q X j=0 i;j ln (MW i;tj ) + 0 i z it +u it ; (2.31) where ln (Emply it ) is the logarithm of employment, and the independent variable of interest is the log- arithm of real minimum wages (MW) which are possibly weakly exogenous. With a choice ofp andq, the ARDL models with the weakly exogenous condition is corresponding to a behavior assumption that people perceive the impact of past disturbances on the realized real minimum wages when making their employment decisions. It is assumed that there exists heterogeneity in short-run coecients while the long-run coecient is homogeneous, and thus the long-run eect is estimated based on the estimates of the average short-run eects. 21 Note that a possible cyclic pattern is allowed in the process of employment 21 In time-series regressions, it is shown by Bewley (1986) and Banerjee et al. (1990) that when both the processes of outcome and weakly exogenous regressors are stationary, estimation using dierent transformations of an ARDL model such as the error correction model and the respective consistent estimator will yield the same estimated results for the long-run coecients. For panel data models considering heterogenous long-run eects, the MG-JK estimators can be applied to the error correction model or other transformations where the long-run eect is the slope coecient in front of some regressor. 32 with the second-order autoregressive coecient. Moreover, when the lags of minimum wages are included in the model, it can measure the eects of both the levels and dierences in minimum wages across dif- ferent periods. Thus, we setp = q = 2 as the main specication such that there is no restriction on the coecients being zero for lagged variables of the same order, and there is a sucient degree of freedom in the time-series observations for the half-panel Jackknife unit-specic estimators. The paper studies the eects of minimum wages on two outcome variables: total employment and teenage employment in the private sector across U.S. counties. For regressions of the total employment, the time-varying control variables,z it , include the logarithm of population, the share of the population aged 15–59, and the logarithm of real gross product per capita by county. For regressions of teenage employment, the county-level control variables include the logarithm of population, the logarithm of the teenage population, and the logarithm of total private sector employment. To check the robustness of estimates to dynamic specications, dierent numbers of the lagged outcome variables (p) and numbers of the lagged real minimum wages (q) are considered for the ARDL(p;q) specication in (2.31). The data of the outcome variables and the control variables—population and teenage population, are from the Quarterly Workforce Indicators (QWI) sample in Dube et al. (2016). The data for real minimum wages are from Meer and West (2016), which are adjusted to 2011 U.S. dollars using the CPI-Urban adjust- ment factor. 22 For the other control variables, the data of the share of the population aged 15–59 by county is from the Bureau of Economic Analysis, and the data of the real GDP per capita by county is from Cen- sus. 23 After merging dierent data sets and eliminating missing observations, the sample contains more than two thousand counties in 41 states observed over 38 quarters from 2002Q2 to 2011Q4. 22 The minimum wages in San Francisco are dierent from the other counties in California and adjusted accordingly. 23 The observations of employment and minimum wages are at the county-quarter level, and the observations of the other control variables are at the county-annual level. The sample excludes counties with “distorted" observations (data quality ag = 9). 33 For total employment across U.S. counties, Table 2.3 reports the estimated short-term and long-term eects of real minimum wages in an ARDL(2,2) model. The FE-TE (HJK) estimation results show sig- nicantly higher persistency in the employment process with both the estimated rst- and second-order autoregressive coecients being positive. However, based on the MG-TE (HJK) estimation results, the es- timated rst-order AR coecient is smaller at around 0.62, and the estimated second-order AR coecient is negative at around -0.09, indicating shorter runs above and below the mean and a slight oscillation in the dynamics process of total employment. In eect, the relative magnitudes of the FE-TE, MG-TE, and MG-TE (HJK) estimated autoregressive coecients are aligned with the Monte Carlo simulation results. While the small-T bias in the autoregressive coecients being downward is known in the literature and conrmed by comparing the estimates without and with the Jackknife, it is not clear whether the biases in the coecients for other possibly weakly exogenous regressors are positive or negative. Table 2.3 shows that the signs and magnitudes of the MG-TE (HJK) estimated shot-run dynamic eects of real minimum wages are dierent from all the other estimators. In particular, the MG-TE (HJK) results show that there is not an immediate drop in total employment with an increase in the current real minimum wage probably due to the income eect in equilibrium, but a signicantly negative impact of the minimum wage in the last period where responses from the demand side in the labor market might have a dominating eect. Though both the MG-TE and MG-TE (HJK) estimated long-run eects are close to zero, the MG-TE estimate is still negatively signicant, while the MG-TE (HJK) estimate is positive and indistinguishable from zero. Table 2.4 reports estimated long-term eects of real minimum wages with dierent choices ofp and q. The FE-TE (HJK) estimates range from -0.585 to -0.290 across dierent specications, which are all sta- tistically signicant and negative even when the cluster-robust standard errors are used in the short-term coecients. In contrast, the estimates obtained using the MG-TE (HJK) estimator range from -0.093 to 34 0.055 and are statistically insignicant. Since the minimum wage policy does not target the total employ- ment rate, it seems plausible to have a limited eect on the entire distribution of jobs. In short, the MG-TE (HJK) estimation results conrm that while there exist signicant short-term eects on total employment, the average eects on the mean employment for the total workforce are close to zero in the long run. Table 2.5 reports the estimated short-term and long-term minimum wage eects on teenage employ- ment across U.S. counties for the ARDL(2,2) model with time eects. For teenage employment across counties, the MG-TE (HJK) estimates show substantially negative eects of real minimum wages in both the current and rst-lagged minimum wages. Consequently, there is a signicant long-term impact of real minimum wages on teenage employment around -0.24. With the logarithm transformation on both the outcome and the real minimum wage, the estimate implies that on average a 10% increase in real minimum wages will induce a 2.4% decrease in teenage employment. Table 2.6 reports the estimated long-term eects on teenage employment, where the FE-TE and MG-TE estimation results also provide dierential results. The FE-TE (HJK) estimates of long-term eects on teenage employment are not signicant, ranging from -0.065 to -0.033. On the contrary, the MG-TE (HJK) estimates show signicant long-term disemployment eects on teenage workers with a range between -0.266 and -0.219. As teenage workers, in general, have low-wage jobs, they face a higher risk of being laid o with higher minimum wages. In other words, jobs taken by teenage workers may have wages concentrated in a neighborhood around the current real minimum wage in the entire wage distribution, thus they tend to be more vulnerable to changes in the minimum wage policy. 35 Table 2.3: Short-term and long-term eects of real minimum wages on total employment across U.S. coun- ties in an ARDL(2,2) model with time eects using the FE and MG estimators n = 2,647 andT =38 (2002 Quarter 3–2011 Quarter 4) FE-TE MG-TE FE-TE (HJK) MG-TE (HJK) Short-term eects ^ 1 0.682 0.418 0.793 0.616 (0.013) (0.008) (0.014) (0.010) ^ 2 0.045 -0.141 0.132 -0.090 (0.013) (0.009) (0.014) (0.010) ^ 0 -0.021 -0.017 -0.017 0.078 (0.017) (0.007) (0.019) (0.015) ^ 1 0.034 0.025 0.031 -0.083 (0.022) (0.009) (0.023) (0.014) ^ 2 -0.043 -0.040 -0.058 0.031 (0.020) (0.008) (0.021) (0.013) Log(Population) 0.162 0.437 0.013 0.395 (0.012) (0.034) (0.018) (0.077) Share of population aged 15–59 0.002 0.012 0.002 0.005 (0.000) (0.002) (0.000) (0.009) Log(Real GDP per capita) 0.329 1.004 0.282 1.244 (0.059) (0.150) (0.104) (0.273) Long-term eect of minimum wages ^ -0.111 -0.044 -0.585 0.055 (0.041) (0.014) (0.247) (0.041) Notes: The table reports both short-term and long-term estimates of the minimum wage eects on total employment for an ARDL(2,2) model given by (2.31). “FE-TE" denotes the two-way xed eects estimator, and “FE-TE (HJK)" denotes the FE-TE estimator with half-panel Jackknife proposed in Chudik et al. (2018). “MG" denotes the mean group estimator proposed in Pesaran and Smith (1995) where the estimation formula with time eects (TE) is given by (A.20), and the “MG-TE (HJK)" estimator is proposed in the paper. For short-term eects, the estimated standard errors of the FE-TE and FE-TE (HJK) estimators are robust to heteroskedasticity and clustered at the state level as in Liang and Zeger (1986) given by (A.27), assuming zero serial correlations in dynamic panels. The standard errors of both the MG-TE estimators are estimated as the sample counterpart of the variance of the respective individual estimators. For long-term eects, the estimates are computed as ^ = ^ 0 + ^ 1 +:::+ ^ q 1 ^ 1 ^ 2 ::: ^ p , and the asymptotic standard errors are computed by the Delta method. Estimated standard errors are reported in brackets. 36 Table 2.4: Long-term eects of real minimum wages on total employment across U.S. counties with time eects n = 2,647 andT =38 (2002 Quarter 3–2011 Quarter 4) ARDL(p,q) FE-TE MG-TE FE-TE (HJK) MG-TE (HJK) ^ (p,q)=(1,1) -0.081 -0.047 -0.290 -0.093 (0.036) (0.012) (0.135) (0.040) ^ (p,q)=(1,2) -0.107 -0.077 -0.369 0.003 (0.039) (0.013) (0.143) (0.045) ^ (p,q)=(2,1) -0.084 -0.023 -0.469 -0.002 (0.038) (0.012) (0.228) (0.039) ^ (p,q)=(2,2) -0.111 -0.044 -0.585 0.055 (0.041) (0.014) (0.247) (0.041) Notes: The table reports long-term estimates of the minimum wage eects on total employment in dierent ARDL(p,q) models given by (2.31) for p;q = 1; 2. The outcome variable is total private sector employment, and the regressors include lagged employment, contemporary and lagged real minimum wages, and three time-varying controls: population, the share of the population aged 15–59, and real GDP per capita. Details of the model and estimators can be found in the notes under Table 2.3. Table 2.5: Short-term and long-term eects of real minimum wages on teenage employment across U.S. counties in an ARDL(2,2) model with time eects using the FE and MG estimators n = 2,367 andT =38 (2002 Quarter 3–2011 Quarter 4) FE-TE MG-TE FE-TE (HJK) MG-TE (HJK) Short-term eects ^ 1 0.423 0.322 0.555 0.345 (0.011) (0.009) (0.006) (0.009) ^ 2 0.069 -0.063 0.160 -0.047 (0.009) (0.008) (0.006) (0.009) ^ 0 -0.071 -0.053 -0.040 -0.112 (0.015) (0.015) (0.015) (0.032) ^ 1 0.013 -0.060 0.028 -0.068 (0.017) (0.018) (0.016) (0.031) ^ 2 -0.007 -0.001 0.003 0.013 (0.013) (0.017) (0.013) (0.031) Log(Population) -0.467 -0.396 -0.523 -0.752 (0.047) (0.091) (0.037) (0.167) Log(Teen Population) 0.135 0.260 0.066 0.456 (0.020) (0.041) (0.021) (0.083) Log(Total Employment) 0.751 1.015 0.571 0.848 (0.040) (0.021) (0.015) (0.027) Long-term eect of minimum wages ^ -0.128 -0.155 -0.033 -0.238 (0.023) (0.028) (0.042) (0.061) Notes: The table reports both estimated short-term and long-term estimates of the minimum wage eects on teenage employment for an ARDL(2,2) model given by (2.31). Details of the estimators can be found in the notes under Table 2.3. 37 Table 2.6: Long-term eects of real minimum wages on teenage employment across U.S. counties with time eects n = 2,367 andT =38 (2002 Quarter 3–2011 Quarter 4) ARDL(p,q) FE-TE MG-TE FE-TE (HJK) MG-TE (HJK) ^ (p,q)=(1,1) -0.123 -0.176 -0.058 -0.243 (0.021) (0.026) (0.030) (0.055) ^ (p,q)=(1,2) -0.128 -0.167 -0.065 -0.219 (0.022) (0.029) (0.032) (0.060) ^ (p,q)=(2,1) -0.125 -0.162 -0.034 -0.266 (0.021) (0.025) (0.040) (0.056) ^ (p,q)=(2,2) -0.128 -0.155 -0.033 -0.238 (0.023) (0.028) (0.042) (0.061) Notes: The table reports long-term estimates of the minimum wage eects on teenage employment in dierent ARDL(p,q) models given by (2.31) forp;q = 1; 2. The outcome variable is teenage employment in the private sector, and the regressors include lagged teenage employment, contemporary and lagged real minimum wages, and three time-varying controls: population, teenage population, and total private sector employment. Details of the estimators can be found in the notes under Table 2.3. 2.8 Conclusions In the real world, cross-sectional heterogeneity and state dependence are inherent features of decision processes for dierent individuals. When people make consumption decisions with uncertainty in their future incomes, they tend to smooth their consumption levels conditional on their past income or con- sumption levels over time. Thus, it might be of interest to measure and evaluate dynamic processes of consumption where a reversed relationship between the current and the last period’s consumption levels and heterogeneity across households are likely to exist, especially for the assessment of global shocks and policy evaluations. Another example is the elasticity of consumption of cigarettes studied in Baltagi and Levin (1986). New insights can be found as well when re-estimating dynamic models allowing for mul- tiple individual-specic eects, such as a dynamic employment equation in Arellano and Bond (1991). It is essential to use models that appropriately take both cross-sectional heterogeneity and dynamic eects into account. 38 Without knowledge of the underlying treatment processes, the potential misspecication bias in the FE estimators might be substantial. For dynamic panel data models with largen and relative shortT , this paper proposes the MG-JK estimation method to the average of heterogeneous eects under relatively gen- eral prior assumptions. The asymptotic distributions are derived in the paper. We also establish conditions for ther-th moment of mean group estimators to exist, which impose a restriction on the minimum num- ber of observed periods. It provides a guide for the regularity of the MG in sub-panels where at least the second-order moments need to be nite. Through the comprehensive Monte Carlo experiments, this pa- per illustrates dierent estimation results of the FE, MG, and MG-JK estimators applied to a heterogeneous ARX(1) model with dierent ranges ofn andT . In the study of the minimum wage policy, the discrepancies between the MG-HJK and FE-HJK esti- mation results with time eects conrm that allowing for heterogeneous slope coecients will result in dierent conclusions of the long-run elasticity of employment. Our analyses focus on more disaggregated cross-sectional units, counties in addition to states, 24 in a relatively short period so that the short-run eects of minimum wages are less susceptible to structural breaks. For the ongoing debate about the average eects of minimum wages on employment, we highlight the possibility of cross-sectional hetero- geneous dynamics in the employment process and the treatment eects of minimum wage, and we provide econometric-theory-based explanations for contradictory ndings in the existing studies. To summarize, the MG estimation method, unlike the FE estimators, assigns a uniform weight to every individual estimate such that the weights are independent of the individual’s characteristics. Note that based on the theory and Monte Carlo simulations, the MG-TE estimators are more robust than the FE- TE estimators to possibly heterogeneous dynamic eects, and in particular, the MG-TE estimators do not impose the parallel trend assumption in the pre-treatment period or a staggered treatment scheme. Re- searchers can resort to the MG-TE (HJK) estimator with time eects in largen and relatively shortT panels, 24 An analysis of minimum wages on state-level total employment is included in the appendix, where we compare the MG-HJK estimator with the FE estimator with time eects. 39 when the FE-TE estimators are likely to provide misleading results with the presence of heterogeneity in the dynamic process of the outcome variable and treatment eects irrespective of being bias-corrected or not. Thus, in largen and relatively shortT panels, the MG-HJK estimator with time eects can be a useful tool for program evaluation to accommodate possibly divergent trends in outcomes among dier- ent cross-sectional units, which has been a notable concern, particularly for studies of quasi-experiments using survey data. Some extensions can be done in the future. As suggested by the results in the Monte Carlo experiments and empirical applications, individual estimators in shorter sub-panels are less precise, thus it may be valuable to solve for an “optimal" bias-corrected scheme that minimizes estimation errors, in particular with a sample size where dierent Jackknife procedures can provide a valid inference. To further study heterogeneity in treatment eects, it is also of interest to estimate the distributional characteristics of individual estimators, and conditional or unconditional quantile treatment eects within and out of the support of the observed covariates. 40 Chapter3 TrimmedMeanGroupEstimationofAverageTreatmentEectsinUltra ShortT PanelswithCorrelatedHeterogeneousCoecients 1 3.1 Introduction Fixed eects estimation of average treatment eects has been predominantly utilized for program and policy evaluation. For static panel data models where slope heterogeneity is uncorrelated with treatment eects, standard xed and time eects (FE-TE) estimators are consistent and if used in conjunction with robust standard errors lead to valid inference in short T (time dimension) panels when the number of groups or cross section dimension (n) is suciently large. However, when the slope heterogeneity is correlated with the treatment and/or control variables the FE-TE estimators become inconsistent even if bothT andn!1. 2 Such correlated heterogeneity arises endogenously in the case of dynamic panel data models considered by Pesaran and Smith (1995) even if the slope heterogeneity itself is purely random. In the case of static panels, correlated heterogeneity could arise when treatment eects are correlated with the treatment itself and/or the control variables. For example, in estimation of returns to education, the choice of educational level is likely to be correlated with expected returns to education. In a review 1 This chapter is a joint work with M. Hashem Pesaran. 2 The concept of the correlated random coecient model is due to Heckman and Vytlacil (1998). Wooldridge (2005) shows that FE-TE estimators continue to be consistent if slope heterogeneity is mean-independent of all the de-trended covariates. See also condition (3.20) given below. 41 of active policies in labor markets, Crépon and Van Den Berg (2016) emphasize that when estimating the average impacts on workers’ productivity and earnings, correlated heterogeneity should be accounted for, to better encourage enrollment in training programs. Banerjee et al. (2015) summarize results of six ran- dom evaluations of micro-credit, and raise the question of how to identify and estimate heterogeneous treatment eects with behavioral responses that deviate from rationality, which is also noted by Bastagli et al. (2019) in studies of anti-poverty cash transfer programs. 3 Another important example of correlated slope heterogeneity arises in the case of panels with interactive eects where the usual de-meaning tech- niques do not eliminate time eects. In these contexts, the bias of FE-TE estimators could be substantial and does not vanish with random sampling and even if bothn andT tend to innity. 4 In a recent survey de Chaisemartin and D’Haultfoeuille (2022) document empirical evidence of misleading results that can be obtained when FE-TE estimates are used when policy eects are heterogeneous between groups or over time. Pesaran and Smith (1995) proposed mean group (MG) estimators for estimation of dynamic heteroge- neous panel data models. It was later shown that for panels with strictly exogenous regressors, the MG estimator is p n-consistent irrespective of possible dependence of heterogeneous coecients on the covari- ates even ifT is xed asn!1, so long asT is suciently large such that at least second-order moment of the MG estimator exists. When allowing for multiple individual-specic eects in xedT panel data models, for identication, a priori restriction thatT should be strictly larger than the number of parame- ters has to be imposed, which is well-documented in the literature. See, for example, Chamberlain (1992), Arellano and Bonhomme (2012), and Bonhomme (2012). Chamberlain (1992) calculated eciency bounds for models dened by conditional moment restrictions with a nonparametric component, and proposed a p n-consistent Generalized Method of Moments (GMM) estimator for the mean of correlated random 3 Reviews of recent advancements in econometric methods for heterogeneous treatment eects of binary variables can be found in Athey and Imbens (2017) and Abadie and Cattaneo (2018). 4 See, for example, Browning and Carro (2007), and Ferraro and Miranda (2017). 42 coecients in panel data models provided that certain rank and moment conditions hold. 5 Assuming the panel errors follow an autoregressive moving average process, Arellano and Bonhomme (2012) provide rank conditions under which the GMM estimators they propose for variances and densities of correlated random coecients can be regularly identied. 6 This paper investigates identication issues in ultra short linear panels whereT could be as small as the number of regressors (k), and all the regressors are continuous. Estimating heterogeneous eects of treatment doses can facilitate evaluation of the eectiveness of policy and programs, which is beyond the scope of comparing outcomes of being treated or not. Building on the pioneering work of Chamberlain (1992), Graham and Powell (2012) focus on panels withT =k, where identication issues of time eects and the mean coecients arise especially when there are insucient within-individual variations in re- gressors for some units. They derive an irregular estimator of the mean coecients by excluding individual estimates from the MG estimators whose corresponding variance of regressors is smaller than a thresh- old. Exploiting the subpopulation of “stayers" with no time variations in regressors, they then propose an estimator of time eects. 7 Later, Sasaki and Ura (2021) propose a method for robust inference of the MG estimators across various distributions of within-individual variations in regressors, especially when there is a mass point of “stayers". The conditional mean of heterogeneous coecients of “stayers" is estimated by local polynomial regressions. The above papers adopt a nonparametric approach to address identication and estimation of average treatment eects. Some researchers consider regular estimators by imposing additional restrictions on correlation between regressors and heterogeneous coecients. Wooldridge (2005) proposes an alternative estimator for models with nonlinear individual-specic unobserved eects, where he imposes a condition 5 The GMM estimator proposed by Chamberlain (1992) turns out to be the same as the MG estimator. See equation (4.8b) in Chamberlain (1992). 6 An unknown parameter,0, is said to be regularly identied if there exists an estimator that converges to0 in probability at the rate of p n. Any estimator which converges to its true value at a rate slower than p n is said to be irregularly identied. 7 Graham and Powell (2012) establish identication results based on moment equations conditional on the sub-population of “stayers", namely individuals with no time variations in their realized covariates. But in estimation, a sub-sample of “near stayers" is used instead. 43 that random coecients are mean independent of the idiosyncratic deviations in regressors. To estimate the average eects of binary treatment variables for the sub-population of “stayers", Verdier (2020) ex- plicitly models selection into treatment. Assuming random coecients are independent of regressors, Lee and Sul (2022) apply a double-sided trimming scheme to the MG estimators for static panels with common correlated eects developed by Chudik and Pesaran (2015), so as to eliminate eects of outlying individual estimates whose sample variances of regressors are too large or small. There is another string of the literature that considers xedT nonseparable panel data models with multiple individual-specic eects, including Bester and Hansen (2009), Bonhomme (2012), Hoderlein and White (2012), and Chernozhukov et al. (2013). Specically, Bonhomme (2012) proposes p n-consistent method of moments estimators based on the functional dierencing approach, which can be applied to the random coecients model. However, his analysis hasn’t been extended to models with heterogeneous slope coecients, and the non-surjectivity condition imposed for identication and estimation requires T >k. 8 In general, the aforementioned studies are motivated by and focus on nonlinear panel data models and models where regressors have discrete supports, hence our paper can be viewed as complementary to theirs. In this paper, we rst carefully examine asymptotic properties of the MG and FE estimators in large n and shortT heterogeneous static panel data models, and provide respective sucient conditions under which the MG and FE estimators are regular, in the sense that they are p n-consistent. In cases where these conditions are not met, we propose a new trimmed mean group (TMG) estimator which makes use of additional information not exploited by Graham and Powell (2012), and as a result, it is shown to perform much better in small samples. We also show that trimming is innocuous whenT is suciently large. For an appropriate trimming process, we trade o the asymptotic bias and variance of the TMG estima- tor, and then establish its asymptotic distribution. Our trimming approach is easy to implement and shown 8 See page 1347 in Bonhomme (2012). 44 to be eective whether the regressors follow Gaussian or other distributions. The regressors could contain a factor structure, or follow heterogeneous autoregressions. The panel errors could be non-Gaussian, seri- ally correlated, or correlated heteroskedastic. Since our TMG estimator is derived in this way, our paper is also related to the literature on optimal inference with bias in estimators of linear regressions such as Arm- strong and Kolesár (2018). While they consider the problem of constructing shortest condence intervals, this paper aims to provide a consistent and robust estimation method for the average treatment eects in ultra shortT panels, where heterogeneous coecients can be potentially correlated with regressors. Since the FE estimator is valid in the presence of uncorrelated heterogeneity and in such cases has the regular convergence rate of p n, then the use of TMG is to be recommended over FE, if the assumption of uncorrelated heterogeneity is rejected. Hence, we also provide a test of validity of the FE estimator in the presence of slope heterogeneity. By Monte Carlo simulation, we demonstrate that the TMG estimator not only has the correct size but also achieves better nite sample properties compared with other trimmed estimators in a number of experiments with dierent data generating processes for the regressors and er- rors. The simulation results also conrm that the Hausman-type test has power against the alternative of correlated heterogeneity. Last but not least, the nite sample performance of the TMG estimator and the Hausman-type test is illustrated through an empirical analysis, which studies the average eect of house- hold’s total expenditure on calorie demand with a short panel of households in poor rural communities in Nicaragua. The rest of the paper is organized as follows: Section 3.2 sets out the heterogeneous panel data model and investigates the asymptotic properties of the MG and FE estimators. Section 3.3 begins with a brief discussion of an existing trimming approach, then Section 3.4 formally introduces our proposed TMG es- timator. Section 3.5 establishes its asymptotic properties. Section 3.6 extends the TMG estimation method 45 to panel data models with time eects. Section 3.7 constructs the Hausman-type test of correlated het- erogeneous slope coecients. Section 3.8 describes the Monte Carlo experiments and reports simulation results. Section 3.9 presents the empirical application. Section 4.9 concludes. Notations: Generic positive nite constants are denoted byC when large, andc when small. They can take dierent values at dierent instances. max (A) and min (A) denote the maximum and minimum eigenvalues of matrixA. A 0 andA 0 denote thatA is a positive denite and a non-negative denite matrix, respectively. kAk = 1=2 max (A 0 A) andkAk 1 denote the spectral and column norms of matrixA respectively.A denotes the adjoint ofA; such that A 1 =d 1 A , andd =det(A).kxk p = [E (kxk p )] 1=p . Ifff n g 1 n=1 is any real sequence andfg n g 1 n=1 is a sequences of positive real numbers, then f n = O(g n ), if there existsC such thatjf n j=g n C for alln. f n = o(g n ) iff n =g n ! 0 asn!1. Similarly,f n =O p (g n ) iff n =g n is stochastically bounded, andf n =o p (g n ); iff n =g n ! p 0. The operator ! p denotes convergence in probability, and! d denotes convergence in distribution. 3.2 Heterogeneouslinearpaneldatamodels Consider the panel data model where the outcome variabley it for uniti at timet is explained linearly in terms of thek 1 vector of covariatesw it y it = 0 i w it +u it , fori = 1; 2;:::;n, andt = 1; 2;:::;T; (3.1) where i isk 1 vector of unknown unit-specic coecients andu it is the error terms. Stacking by time we have y i =W i i +u i ; i = 1; 2;:::;n; (3.2) 46 wherey i = (y i1 ;y i2 ;:::;y iT ) 0 ,W i = (w i1 ;w i2 ;:::;w iT ) 0 , andu i = (u i1 ;u i2 ;:::;u iT ) 0 . The parameter of interest is thek 1 vector of average treatment eects, 0 , dened by 0 =p lim n!1 n 1 n X i=1 i ! : (3.3) WhenT k, 0 can be estimated by the mean group estimator, ^ MG , computed as a simple average of the least squares estimates of i , namely (see Pesaran and Smith (1995)) ^ MG = 1 n n X i=1 ^ i ; (3.4) where ^ i = W 0 i W i 1 W 0 i y i : (3.5) As we shall see, the MG estimator is consistent even in the presence of correlation between i andw it . To investigate the asymptotic properties of the MG estimator whenT is short andn!1, we make the following assumptions: Assumption8(Errors) ConditionalonW i ,(a)theerrors,u it ,in(3.1)arecross-sectionallyindependent,(b) E(u i jW i ) = 0,fori = 1; 2;:::;n;and(c)E(u i u 0 i jW i ) =H i (W i ) = H i ,whereH i isaTT bounded matrix with 0<c< min (H i )< max (H i )<C. Assumption9(Regressioncoecients) Thek 1 vector of coecients, i , is allowed to depend on the distributionofW i withrank(W i ) =k. Thisdependencecouldbe(a)deterministicwith i xedandbounded or (b) stochastic, with i jointly determined withW i . 47 (a) i are deterministic withsup i k i k < C fori = 1; 2;:::;n. The mean of coecients converges to a nite limit 0 , the average treatment eects, i.e., n =n 1 n X i=1 i ! 0 , withk 0 k<C: (3.6) (b) i are independent draws from a distribution with E( i ) = 0 and bounded variances for i = 1; 2;:::;n, wherek 0 k<C, andsup i Ek i k 4 <C. Remark8 UnderAssumption17,the k1vectorofcovariates,w it ,fori = 1; 2;:::;narestrictlyexogenous, but it allows the conditional variance ofu i to depend onW i , and for the errors,u it , to be serially (over time t) correlated. Remark9 Assumption 9 is an identication condition for 0 . Under Assumption 9(b) where i follows a random coecients model,E( n ) = 0 ; andn 1 P n i=1 i ! p 0 . 3.2.1 PropertiesofmeangroupestimatorinshortT panels Substituting (3.2) in (3.5) we have ^ i = i + iT ; (3.7) where iT =R 0 i u i ; (3.8) andR i =W i (W 0 i W i ) 1 . Averaging both sides of (3.7) overi, we have ^ MG = n + nT ; (3.9) 48 where n =n 1 n X i=1 i , and nT =n 1 n X i=1 iT : (3.10) Then under Assumption 17,E (u i jW i ) =E n 1 P n i=1 iT =n 1 P n i=1 E [R 0 i E (u i jW i )] = 0. Then using (3.9)E ^ MG =E( n ) +E nT = 0 ; namely ^ MG is anunbiased estimator of 0 irrespective of the possible dependence of i onW i . However, the MG estimator is likely to have a large variance whenT is too small. This arises, for example, when the variance of nT does not exist or is very large. The conditions under which ^ MG converges to 0 at the regularn 1=2 rate is given in the following proposition: Proposition1(Sucientconditionsfor p n-consistencyof ^ MG ) Suppose thaty it fori = 1; 2;:::;n and t = 1; 2;:::;T are generated by model (3.2), and Assumptions 17-9 hold. Then as n!1, the MG estimator given by (3.4) is p n-consistent for xedT panels if sup i E d 2 i <C, andsup i E h (W 0 i W i ) 2 1 i <C; (3.11) whered i = det(W 0 i W i ), and (W 0 i W i ) is the adjoint ofW 0 i W i . For a proof see B.2.1 in the Appendix. Example2 In the simple case wherek = 2,w it = (1;x it ) 0 , and i = ( i ; i ) 0 . SupposeE(u i u 0 i jx i ) = 2 i I T fort = 1; 2;:::;T andi = 1; 2;:::;n withx i = (x i1 ;x i2 ;:::;x iT ) 0 , then the individual OLS estimator of slope coecients, ^ i = (x 0 i M T x i ) 1 x 0 i M T y i ; have rst and second order moments ifE u 2 it <C and E d 2 ix < C; whered ix = det(x 0 i M T x i ),M T =I T T 1 T 0 T , and T = (1; 1;:::; 1) 0 . In the case wherex it areGaussiandistributedwithmeanzerosandanitevariance, 2 x ,itfollowsthat 1 d ix = 2 x 2 T1 ;where 2 T1 is a Chi-squared variable withT 1 degrees of freedom. Hence,E d 2 ix exists ifT 1 > 4; or if T > 5. ForpanelswithT < 5,theMGestimatorwouldbeirregularwhenrst-and/orsecond-ordermoments of some individual estimates do not exist. 49 3.2.2 AcomparisonofMGandFEestimators Consider a panel data model with individual xed eects, i , and heterogeneous slope coecients, i y it = i + 0 i x it +u it , fori = 1; 2;:::;n, andt = 1; 2;:::;T; (3.12) wherex it is ak 0 1 vector of regressors (k 0 =k 1). In matrix notations y i = i T +X i i +u i ; (3.13) whereX i = (x i1 ;x i2 ;:::;x iT ) 0 . The FE and MG estimators of 0 are given by ^ FE = n 1 n X i=1 X 0 i M T X i ! 1 n 1 n X i=1 X 0 i M T y i ! ; (3.14) and ^ MG = 1 n n X i=1 ^ i ; (3.15) where ^ i = (X 0 i M T X i ) 1 X 0 i M T y i : In this setting the parameter of interest is given by 0 =p lim n!1 n 1 P n i=1 i . One of the main advantages of theFE estimator is its robustness to the dependence betweena i and the regressors. ^ FE is also well dened even ifT =k so long as the following standard assumption is met: Assumption10(Datapoolingassumption) Let n =n 1 P n i=1 i ,where i =X 0 i M T X i :ForT k; there existsn 0 such that for alln>n 0 , n is positive denite n ! p lim n!1 n 1 n X i=1 E ( i ) = 0; (3.16) 50 and 1 n = 1 +o p (1): (3.17) 3.2.2.1 Conditionsfor p nconsistencyofFEestimator Under the heterogeneous specication (3.12) and noting thatM T T = 0, we have ^ FE 0 = 1 n " n 1 n X i=1 X 0 i M T X i ( i 0 ) # + 1 n n 1 n X i=1 X 0 i M T u i ! : (3.18) Then by Assumption 17, E (u i jX i ) = E [E (X 0 i M T u i jX)] = E [X 0 i M T E (u i jX i )] = 0. Under Assumptions 17, 9 and 10, ^ FE 0 ! p 1 lim n!1 n 1 n X i=1 E X 0 i M T X i i ; where i = ( i 0 ), and ^ FE is a consistent estimator of the average treatment eect, 0 , if lim n!1 n 1 n X i=1 E X 0 i M T X i i = 0: (3.19) This condition is clearly met if E X 0 i M T X i i = 0; for alli = 1; 2;:::;n; (3.20) and has been already derived by Wooldridge (2005). 9 But it is too restrictive, since it is possible for the average condition in (3.19) to hold even though condition (3.20) is violated for some units asn!1. What is required is that a suciently large number of units satisfy the condition (3.20). Specically, denote the number of units that do not satisfy (3.20) bym n = (n a ) and note thatn 1 P n i=1 E (X 0 i M T X i i ) = 9 See equation (12) on page 387 of Wooldridge (2005). 51 (n a1 ), and condition (3.19) is met if < 1. But for ^ FE to be a regular p n-consistent estimator of 0 a much more restrictive condition ona is required. Using (3.18) note that p n ^ FE 0 = 1 n n 1=2 n X i=1 X 0 i M T X i i ! + 1 n n 1=2 n X i=1 X 0 i M T u i ! ; and p n ^ FE 0 ! p 0, ifn 1=2 P n i=1 X 0 i M T X i i ! p 0. The bias term can be written as n 1=2 n X i=1 X 0 i M T X i i =n 1=2 n X i=1 X 0 i M T X i i E X 0 i M T X i i +n 1=2 n X i=1 E X 0 i M T X i i : The rst term tends to zero in probability ifX 0 i M T X i i are weakly cross-correlated over i. For the second term to tend to zero we must havem n n 1=2 ! 0, or ifa < 1=2. Proposition2(Conditionfor p nconsistencyoftheFEestimator) Suppose thaty it for i = 1; 2;:::;nandt = 1; 2;:::;T aregeneratedbytheheterogeneouspaneldatamodel(3.13),andAssumptions 17, 9 and (3.16) hold. Then the FE estimator given by (3.14) is p n-consistent if n 1=2 n X i=1 E X 0 i M T X i ( i 0 ) ! 0; (3.21) and this condition is met if < 1=2, with dened bym n = (n ), wherem n denotes the number of units that are subject to correlated heterogeneity. 52 3.2.2.2 RelativeeciencyofFEandMGestimators Suppose now that conditions (3.21) and (3.11) hold and both FE and MG estimators are p n-consistent. The choice between the two estimators will then depend on their relative eciency, which we measure in terms of their asymptotic covariances, given by Var p n ^ MG jX = +n 1 n X i=1 1 ix X 0 i M T H i M T X i 1 ix ; and Var p n ^ FE jX = 1 n n 1 n X i=1 ix ix ! 1 n + 1 n " n 1 n X i=1 X 0 i M T H i M T X i # 1 n ; whereX =fX 1 ;X 2 ;::::;X n g, = Var( i jX i ) 0,H i = E (u i u 0 i jX i ), and ix =X 0 i M T X i , and n =n 1 P n i=1 ix : Hence Var p n ^ MG jX Var p n ^ FE jX =A n +B n ; (3.22) where A n = 1 n n 1 n X i=1 ix ix ! 1 n ; (3.23) and B n = n 1 n X i=1 1 ix X 0 i M T H i M T X i 1 ix ! 1 n n 1 n X i=1 X 0 i M T H i M T X i ! 1 n : (3.24) 53 A n and B n capture the eects of two dierent types of heterogeneity, namely slope heterogeneity and regressors/errors heterogeneity. The superiority of the FE over MG is readily established when the slope coecients and error variances are homogeneous acrossi, and the errors are serially uncorrelated, namely if = 0 andH i = 2 I T for alli. In this case A n = 0, and we have Var p n ^ MG jX Var p n ^ FE jX 2 =n 1 n X i=1 1 ix 1 n ; which is the dierence between harmonic mean of ix and the inverse of its arithmetic mean, which is a non-negative denite matrix. 10 However, this result may be reversed when we allow for heterogeneity 0, and/or if H i 6= 2 I T . The following proposition summarizes the results of the comparison between the FE and MG estimators. Proposition3(RelativeeciencyofMGandFEestimators) Suppose thaty it for i = 1; 2;:::;nandt = 1; 2;:::;T aregeneratedbytheheterogeneouspaneldatamodel(3.13),andAssumptions 17,9and(3.16)hold,andtheuncorrelatedheterogeneitycondition(3.21)ismet. ThenVar p n ^ MG jX Var p n ^ FE jX = A n +B n ; whereA n andB n are given by (3.23) and (3.24), respectively. A n is a non-positive denite matrix, and the sign ofB n is indeterminate. Under uncorrelated heterogeneity, the FE estimator, ^ FE , is asymptotically more ecient than the MG estimator if the benet from pooling (i.e. when B n > 0) outweighs the loss in eciency due to slope heterogeneity (since A n 0 ˙ ). For a proof see Section B.2.2 of the Appendix. Example3 Consider a simple case wherek 0 = 1, ix = ix and = 2 are scalers, and suppose that H i (X i ) =E (u i u 0 i jW i ) = 2 ix I T : then Var p n ^ MG jX Var p n ^ FE jX = 2 + 2 " n 1 P n i=1 ix n 2 2 n # ; 10 For a proof see the Appendix to Pesaran et al. (1996). 54 where n =n 1 P n i=1 ix . In this simple case the MG estimator is more ecient than the FE estimator even if 2 = 0. In general, with uncorrelated heterogeneous coecients, the relative eciency of the MG and FE es- timators depends on the relative magnitude of the two components in (3.22). SinceA n 0, the outcome depends on the sign and the magnitude ofB n , which in turn depends on the heterogeneity of error vari- ances, H i (X i ) and ix overi. 3.3 Irregularmeangroupestimators So far we have argued that the MG estimator is robust to correlated heterogeneity, and its performance is comparable to FE estimator even under uncorrelated heterogeneity. However, since the MG estimator is based on the individual estimates, ^ i fori = 1; 2;:::;n, its optimality and robustness critically depend on how well the individual coecients can be estimated. This is particularly important whenT is ultra short, which is the primary concern of this paper. In cases whereT is small and/or the observations on w it are highly correlated, or are slowly moving, thend i = det (W 0 i W i ) is likely to be close to zero in nite samples for a large number of unitsi = 1; 2;:::;n. As a result, ^ i is likely to be a poor estimate of i for somei, and including it in ^ MG could be problematic, rendering the MG estimator inecient and unreliable. However, as discussed above, ^ MG continues to be an unbiased estimator of 0 , even if i are correlated withW i so long as the stochastic component ofw it is strictly exogenous with respect tou it . By averaging over ^ i fori = 1; 2;:::;n, asn!1, the MG estimator converges to 0 ifT is suciently large such that ^ i have at least second order moments for alli. The existence of rst order moments of ^ i is required for the MG estimator to be unbiased, and we need ^ i to have second order moments for p n-consistent estimation and valid inference about the average eects, 0 . Accordingly, we need to distinguish between cases where ^ i have rst and second order moments for alli, as compared to cases where some ^ i may 55 not even have rst order moments. We refer to the MG estimator based on individual estimates without rst or second order moments as “irregular MG estimator", which is the focus of our analysis. We consider the irregular MG estimator both for models with and without time eects and show how our proposed estimator relates to the literature. 3.3.1 Existingliteratureontrimming For panels withT =k, Graham and Powell (2012) propose a trimmed GMM estimator (denoted as “GP") whereby individual estimates withjdet(W i )j smaller than a given threshold value,h n , are omitted from the estimation of 0 . For now, abstracting from time eects, the GP estimator can be viewed as a trimmed MG estimator given by ^ GP = P n i=1 1fd i >h 2 n g ^ i P n i=1 1fd i >h 2 n g : (3.25) In the special case whereT = k,d i =jdet(W i )j 2 , the trimming procedure based onjdet(W i )j > h n is algebraically the same as the one used in (3.25). GP also suggest settingh n as h n =b 1 n 1=3 ; whereb 1 = 1 2 min (^ D ; ^ r D =1:34), and ^ D and ^ r D are the respective sample standard deviation and in- terquartile range of det(W i ). We refer to this approach as trimming by exclusion. However, this approach to trimming overlooks the information contained in (W 0 i W i ) which is well dened even ifd i is close to zero. 11 In the case whereT =k = 2, W i = 0 B B @ 1 x i1 1 x i2 1 C C A ,y i = 0 B B @ y i1 y i2 1 C C A ; 11 Although the estimator proposed by Sasaki and Ura (2021) (denoted by “SU") uses the information of “slow movers", the individual estimates of “slow movers" by local polynomial regressions are less precise, which is illustrated by Monte Carlo exper- iments in the paper. 56 and it readily follows thatd i = det(W 0 i W i ) = (x i2 x i1 ) 2 = (x i2 ) 2 , and ^ i = x i2 (x i2 y i1 x i1 y i2 ) (x i2 ) 2 = sign(x i2 ) (x i2 y i1 x i1 y i2 ) jx i2 j ; ^ i = x i2 y i2 (x i2 ) 2 = sign(x i2 )y i2 jx i2 j : It is clear that it is information in the sign of x i2 which can be exploited even whend i is below a small thresholda n =o(n). This is reminiscent of rank correlation where the sign of y i2 x i2 is used to identify the strength of dependence between y i2 and x i2 ; particularly in the case of fat-tailed distributions. 3.4 Anewtrimmedmeangroupestimator To motivate our proposed trimmed mean group (TMG) estimator we introduce the follow trimmed esti- mator of i , ~ i = ^ i ; ifd i >a n ; and ~ i = ^ i , ifd i a n ; where as before ^ i = (W 0 i W i ) 1 W 0 i y i , andd i = det(W 0 i W i ). ^ i =a 1 n (W 0 i W i ) W 0 i y i ; and a n =C n n ; (3.26) with > 0; andC n > 0 bounded inn. The choice of andC n will be discussed below. This estimator can be written more compactly as ~ i = 1fd i >a n g ^ i + 1fd i a n g ^ i = (1 + i ) ^ i ; (3.27) where i is given by i = d i a n a n 1fd i a n g 0: (3.28) 57 We considered two versions of the TMG estimator depending on how individual trimmed estimators, ~ i , are combined. An obvious choice was to use a simple average of ~ i , namely ~ n =n 1 n X i=1 ~ i =n 1 n X i=1 (1 + i ) ^ i ; (3.29) which can also be viewed as a weighted average estimator with the weightsw i = (1+ i )=n< 1=n. But it is easily seen that these weights do not add up to unity, and it might be desirable to use the scaled weights w i =(1 + n ) = n 1 (1 + i )=(1 + n ), where n = n 1 P n i=1 i . Using these modied weights we also consider the following estimator ^ TMG =n 1 n X i=1 1 + i 1 + n ^ i = ~ n 1 + n : (3.30) Although the dierence between the two TMG estimators is small for suciently largen, it turns out that ^ TMG behaves much better in small samples and will be the focus of this paper. To relate ~ n to the GP estimator given by (3.25), using the above results we note that ~ n = (1 n ) P n i=1 1fd i >a n g ^ i P n i=1 1fd i >a n g ! + n P n i=1 1fd i a n g ^ i P n i=1 1fd i a n g ! ; (3.31) where n is the fraction of the estimates being trimmed n = P n i=1 1fd i a n g n : (3.32) Compared to ~ n , the GP estimator ignores the second term in (3.31), and hence places zero weights on the estimates withd i a n . In contrast, both ~ n and hence ^ TMG place non-zero weights on all the individual estimates, ^ i . 58 3.5 AsymptoticpropertiesoftheTMGestimator To investigate the asymptotic properties of the TMG estimator, ^ TMG , we introduce the following addi- tional assumptions: Assumption11 For i = 1; 2;:::;n, denote by d i = det (W 0 i W i ), where W i = (w i1 ;w i2 ;:::;w iT ) 0 is the Tk matrix of observations onw it in the heterogeneous panel data model (3.2). inf i (d i ) > 0, inf i min (W 0 i W i ) >c> 0, andsup i E h (W 0 i W i ) 2 i <C„ where (W 0 i W i ) =d i (W 0 i W i ) 1 is the adjoint ofW 0 i W i . Assumption12(Distributionofd i ) Fori = 1; 2;:::;n,d i are random draws from the probability distri- bution function,F d (u), with the continuously dierentiable density function,f d (u), overu2 (0;1), such thatF d (0) = 0,f d ( a n ) < C, and f 0 d ( a n ) < C; wheref 0 d ( a n ) is the rst derivative off d (u) evaluated at a n 2 (0;a n ). Assumption13(Characterizationofcorrelationbetween i andd i ) For i = 1; 2;:::;n, the depen- dence of i = ( i1 ; i2 ;:::; ik ) 0 ond i is characterized by (a): i =E( i jd i ) + i ; (3.33) whereE( i jd i ) = 0, and sup i Ek i k 4 <C. (b): Denoting i = i 0 : (3.34) i =E ( i jd i ) =B i fg(d i )E [g(d i )]g; (3.35) 59 whereg(u) = (g 1 (u);g 2 (u);:::;g k (u)) 0 andg j (u) forj = 1; 2;:::;k are bounded and continuously dieren- tiable functions ofu on (0;1), andB i are boundedkk matrices of xed constants withsup i kB i k<C. (c) i are distributed independently overi. Remark10 Under Assumption 11, by imposinginf i (d i ) > 0 andF d (0) = 0, we do not consider the case where there is a positive mass of “stayers" in the population, which is the focus of Sasaki and Ura (2021). Remark11 UnderAssumption12,d i aredistributedindependentlyoveri,whichalsoimpliesthat i ,dened by ( 3.28), are also distributed independently overi. Remark12 Under Assumption 13, i can be written as, i = i + i ; (3.36) where i represents the part of the heterogeneity of i that is correlated withd i , and i represents random or idiosyncratic heterogeneity which is distributed independently ofd i , withE( i ) = 0, for alli. Remark13 Assumptions12and13canberelaxedbyrequiringthat i and i tobeweaklycross-sectionally correlated. The cross-sectional independence assumption is maintained to simplify the mathematical exposi- tion. Remark14 UnderAssumption13,italsofollowsthat (1+ i ) i aredistributedindepedentlyoveri,although in generalE( i i )6= 0. Using (3.7) and (3.34) in (3.27) we have ~ i = (1 + i ) 0 + iT , where iT = (1 + i ) ( i + iT ), and ^ TMG dened by (3.30) can be written as ^ TMG 0 = 1 1 + n nT ; (3.37) 60 where nT =n 1 P n i=1 iT . (3.37) can be written equivalently as ^ TMG 0 = 1 +E n 1 + n ! b n +n 1 n X i=1 [p i E (p i )] + q nT ! (3.38) where b n =n 1 n X i=1 E (p i ) =n 1 n X i=1 E( i i ) 1 +E n ; and q nT =n 1 n X i=1 q iT ; (3.39) with p i = (1 + i ) i 1 +E n , andq iT = (1 + i ) iT 1 +E n : (3.40) Under Assumptions 11, 12 and 13 i E ( i ) andp i E (p i ) are distributed independently overi with zero means, and bounded variances, and we have n E n =n 1 n X i=1 [ i E( i )] =O p (n 1=2 ); andn 1 n X i=1 [p i E (p i )] =O p (n 1=2 ): Furthermore by Lemma B.1,E( i ) =O(a n ),E ( i i ) =O(a n ), and it follows that b n = 1 1 +E n " n 1 n X i=1 E( i i ) # = O(a n ) 1 +O(a n ) =O(a n ); (3.41) and 1 +E n 1 + n = 1 n E n 1 +E n + n E n = 1 +O p (n 1=2 ): (3.42) Also conditional on W i , q iT are distributed independently with mean zero, and since q nt =n 1 P n i=1 q iT = 1 1+E( n) ;nT ; where ;nT =n 1 P n i=1 (1 + i ) iT , using results in Lemma B.2 we have E ( q nt ) = 0 and Var ( q nt ) = 1 1 +E n ! 2 Var ;nT =O(n 1+ ): 61 Hence, q nt =O p n 1=2+=2 . Using these results in (3.38) we have ^ TMG 0 =O(n ) +O p n (1) 2 : (3.43) Hence ^ TMG asymptotically converges to 0 , so long as 0<< 1, asn!1. The convergence rate of ^ TMG to 0 will depend on the trade-o between the asymptotic bias and variance of ^ TMG . Although it is possible to reduce the bias of ^ TMG by choosing a value close to unity, it will be at the expense of large variance. In what follows we shed light on the choice of, by considering the conditions under which the asymptotic distribution of ^ TMG is centered around 0 such thatVar ^ TMG also tends to zero at a reasonably fast rate. 3.5.1 Thechoiceofthetrimmingthreshold We begin by assuming that the rate at which ^ TMG converges to 0 is given by n , where is set in relation to . Given the irregular nature of the individual estimators of i when T is ultra short (for exampleT =k), we expect the rate,n , to be below the standard rate ofn 1=2 . 12 Using (3.38) we have n ^ TMG 0 = n b n 1 +E n 1 + n ! +n (1)=2 " n (1+)=2 n X i=1 [p i E (p i )] +n (1+)=2 n X i=1 q iT # ; which in view of (3.42) yields (recall thata n = (n )) n ^ TMG 0 =n b n +n (1)=2 " n (1+)=2 n X i=1 [p i E (p i )] +n (1+)=2 n X i=1 q iT # +o p (1): (3.44) 12 This issue has also been addressed by Graham and Powell (2012) and Sasaki and Ura (2021). 62 To ensure that the asymptotic distribution of ^ TMG is correctly centered, we must haven b n ! 0 asn! 1. Sincen b n =O(n a n ) =O(n ), this condition is ensured if <: Turning to the second term of the above, we also note that to obtain a non-degenerate distribution we also need to set = (1)=2. Combining these two requirements yields 1 2 <, or> 1=3; (3.45) which implies that at most the convergence rate of ^ TMG can ben 1=3 , well below the standard convergence rate,n 1=2 , which is achieved only if individual estimators of i have at least second order moments for alli. In practice, we suggest setting at the boundary value of 1=3 or just above 1=3, which yields the familiar non-parametric convergent rate of 1=3. 3.5.2 Trimmingcondition The condition> 1=3 whilst necessary, it is not sucient. It is also required that the asymptotic variance ofn ^ TMG 0 tends to a positive denite matrix. To this, setting = (1)=2 we rst write (3.44) as n (1)=2 ^ TMG 0 =n (1)=2 b n +z p;n +z q;nT +o p (1); wherez p;n =n (1+)=2 P n i=1 [p i E (p i )], andz q;nT =n (1+)=2 P n i=1 q iT . Recall also that n (1)=2 b n = O n (13)=2 which becomes negligible since > 1=3, and under Assumption 13,p i are cross-sectionally independent and we haveVar (z p;n ) = n n 1 P n i=1 Var (p i ) = O(n ). Since E (z p;n ) = 0, it follows thatz p;n ! p 0 at the rate ofa 1=2 n asn!1, and hence n (1)=2 ^ TMG 0 =z q;nT +O p (n =2 ) +o p (1): 63 The rst term can be written asz q;nT = n (1)=2 1 1+E( n) ;nT ; and by result (B.14) of Lemma B.2 and recalling thatE n = O(a n ), we haveVar (z q;nT ) = n (1) 1 1+O(an) 2 O(n 1+ ) = O(1), and the asymptotic distribution of ^ TMG is determined by that ofz q;nT . Under Assumption 17, conditional on W i ,q iT are independently distributed overi with zero means andz q;nT tends to a normal distribution if lim n!1 Var (z q;nT ) is a positive denite matrix. Using (B.13) of Lemma B.2 we note that Var (z q;nT ) = 1 1 +E n ! 2 ( n 1 n X i=1 E 1fd i >a n gR 0 i H i R i ) ; + 1 1 +E n ! 2 ( n 1 n X i=1 a 2 n E d 2 i 1fd i a n gR 0 i H i R i ) (3.46) which can be written equivalently as Var (z q;nT ) = C 1 n 1 +E n 2 " n 1 n X i=1 E a n 1fd i >a n gR 0 i H i R i # (3.47) +C 1 n 1 +E n 2 " n 1 n X i=1 a 1 n E d 2 i 1fd i a n gR 0 i H i R i # : By (B.15) in Lemma B.2E n 1 P n i=1 a 1 n d 2 i 1fd i a n gR 0 i H i R i =O a 1=2 n , and sinceE n =O(a n ) it then follows that (Recall that 0<C n <C) lim n!1 Var (z q;nT ) =C 1 lim n!1 " n 1 n X i=1 E a n 1fd i >a n gR 0 i H i R i # : To establish conditions under whichlim n!1 Var (z q;nT ) 0, note that akk symmetric matrixA is positive denite ifp 0 Ap> 0, for all non-zero vectorsp2R k . Accordingly, consider p 0 " n 1 n X i=1 a n 1fd i >a n gR 0 i H i R i # p =n 1 n X i=1 a n 1fd i >a n gp 0 R 0 i H i R i p; 64 for some p such that p 0 p> 0: Note that n 1 n X i=1 a n 1fd i >a n gp 0 R 0 i H i R i pn 1 n X i=1 a n 1fd i >a n g p 0 R 0 i R i p min (H i ); and sinceR 0 i R i = (W 0 i W i ) 1 =d 1 i (W 0 i W i ) , then p 0 " n 1 n X i=1 a n 1fd i >a n gR 0 i H i R i # p p 0 p n 1 n X i=1 a n d i 1fd i >a n g min (W 0 i W i ) min (H i ): But by assumptioninf i min (H i )>c> 0, and inf i min [(W 0 i W i ) ]>c> 0; (see Assumptions 17 and 11). Hence, a necessary and sucient condition forVar (z q;nT ) to tend to a positive denite matrix is given by lim n!1 " n 1 n X i=1 a n d i 1fd i >a n g # > 0: (3.48) Assumption14(Trimmingcondition) d i and (W 0 i W i ) are jointly distributed such that lim n!1 n 1 n X i=1 E a n d i 1fd i >a n g > 0 (3.49) wherea n =C n n , for> 1=3 and 0<C n <C. Theorem3(AsymptoticdistributionofTMGestimator) Suppose that fori = 1; 2;:::;n and t = 1; 2;:::;T ,y it are generated by the heterogeneous panel data model (3.2), and Assumptions 17-14 hold. Then asn!1, for> 1=3, we have n (1)=2 ^ TMG 0 ! d N (0 k ;V ); (3.50) 65 where ^ TMG is given by (3.30), and V = lim n!1 1 1 +E n ! 2 n (1+) n X i=1 E (1 + i ) 2 R 0 i H i R i ; (3.51) where H i =H i (W i ) = E (u i u 0 i jW i ),R i =W i (W 0 i W i ) 1 ,E n = n 1 P n i=1 E( i ), (1 + i ) 2 = 1fd i >a n g +a 2 n d 2 i 1fd i a n g, andd i =det (W 0 i W i ). 3.5.3 RobustestimationofthecovariancematrixofthetrimmedMGestimator As with standard MG estimation, consistent estimation ofV using (3.51) requires knowledge ofH i which cannot be estimated consistently when T is short. Here we follow the literature and propose a robust covariance estimator ofV which is asymptotically unbiased for a wide class of error variances, E (u i u 0 i jW i ) = H i (W i ), thus allowing for serially correlated and conditionally heteroskedatic errors. The main result is summarized in the following theorem. Theorem4(RobustcovariancematrixofTMGestimator) SupposeAssumptions11-14hold,and 0 is estimated by ^ TMG given by (3.30). Then asn!1, for> 1=3 lim n!1 h nVar ^ TMG i =plim n!1 " n 1 n X i=1 ~ i ^ TMG ~ i ^ TMG 0 # ; (3.52) andVar ^ TMG can be consistently estimated byn 2 P n i=1 ~ i ^ TMG ~ i ^ TMG 0 . See Section B.2.3 of the Appendix for a proof. Remark15 FollowingtheliteratureonMGestimationherewealsoconsiderthefollowingbias-adjustedand scaled version \ Var( ^ TMG ) = 1 n(n 1)(1 + n ) 2 n X i=1 ( ~ i ^ TMG )( ~ i ^ TMG ) 0 : (3.53) The above results can be readily extended to panel data models with time eects. 66 3.6 Heterogeneouspaneldatamodelswithtimeeects Setting w it = (1;x 0 it ) 0 in (3.1), the panel data model with time eects can be written as y it = i +x 0 it i + t +u it ; (3.54) where t fort = 1; 2;:::;T are common time eects under the normalization 0 T = 0. 13 To identify 2R T withTk, we make the following assumption. 14 Assumption15 E x 0 it i =E x 0 is i , for allt;s = 1; 2;:::;T; (3.55) where i = i 0 , andk 0 k<C. Remark16 Assumption 15 allows for dependence betweenx it and i , but requires this dependence to be time-invariant. To estimate 0 , initially we suppose is known. Let Q i = (1 + i )W i W 0 i W i 1 : (3.56) Then the trimmed estimator of i = ( i ; 0 i ) 0 given by ~ i () =Q 0 i (y i ) = ~ i Q 0 i ; and the associated TMG-TE estimator follows as ^ TMGTE () =n 1 n X i=1 1 + n 1 ~ i () = ^ TMG Q 0 n ; 13 It is well known that0 and = (1;2;:::;T ) 0 are identied subject to a normalization. In the analysis by Graham and Powell (2012), they set1 = 0. Here we consider the alternative normalization 0 T = 0, which is popular in the literature. The choice of normalization is innocuous for the estimation of the parameters of interest0 = (0; 0 0 ) 0 . 14 Note that the irregular identication of whenT =k in Graham and Powell (2012) is based on moments conditional on the sub-population of “stayers". Under Assumption 11di > 0 for alli, i.e., there are no "stayers" in the population, this identication strategy cannot be used. Moreover, they assume that the joint distribution of (uit; 0 i ) 0 givenWi does not depend ont, which is similar to Assumption 15. See interpretations of Assumption 1.1 part (ii) on page 2111 in Graham and Powell (2012). 67 where ^ TMG is given by (3.30), and Q n = 1 1 + n n 1 n X i=1 Q i ! : (3.57) From our earlier analysis, it is clear that for a known, ^ TMGTE () has the same asymptotic distribution as ^ TMG withy i replaced byy i . We rst propose an estimator of for the case whereT k, and then following Chamberlain (1992) we consider an alternative estimator of with better small sample properties whenT >k. 3.6.1 TMG-TEestimatorwithTk Averaging (3.54) overi y t = n + t + x 0 t 0 + t ; (3.58) where t = n 1 P n i=1 it , it = x 0 it i + u it ; y t = n 1 P n i=1 y it , x t = n 1 P n i=1 x it , u t = n 1 P n i=1 u it , and n = n 1 P n i=1 i . Averaging over t, under the normalization P T t=1 t = 0, we have y = + x 0 0 +n 1 n X i=1 x 0 i i + u ; (3.59) where y =T 1 P T t=1 y t , x =T 1 P T t=1 x t and u =T 1 P T t=1 u t . Subtracting (3.59) from (3.58), yields (noting that ( x t x ) 0 0 = ( w t w ) 0 0 ) t = ( y t y ) ( w t w ) 0 0 ( t ) , fort = 1; 2;:::;T; (3.60) 68 where t = ( u t u )+n 1 P n i=1 (x it x i ) 0 i . Under Assumptions 17, 13, and 15, t = O p (n 1=2 ), 15 which suggests the following estimator of t ^ t = ( y t y ) ( w t w ) 0 ^ TMGTE ; fort = 1; 2;:::;T; (3.61) where ^ TMGTE = ^ TMG Q 0 n ^ : (3.62) Stacking the equations in (3.61) overt = 1; 2; ;T we have ^ =M T ( y W ^ TMGTE ); (3.63) whereM T = I T T 1 T 0 T , y =n 1 P n i=1 y i , and W =n 1 P n i=1 W i . The above system of equations can now be solved in terms of ^ TMG if I T M T W Q n is non-singular. Under this condition we have ^ = I T M T W Q 0 n 1 M T y W ^ TMG (3.64) and substituting ^ from (3.63) in (3.62) we have ^ TMGTE = I k Q 0 n M T W 1 ^ TMG Q 0 n M T y : (3.65) Remark17 Note that M T W Q 0 n and Q 0 n M T W have the samek (Tk) non-zero eigenvalues and det I T M T W Q 0 n =det I k Q 0 n M T W , and if I T M T W Q n is invertible so will I k Q 0 n M T W . 15 For a proof see Lemma B.3. 69 The following theorem provides a summary of the results for estimation of 0 and 0 , and their asymp- totic distribution. Theorem5(Asymptoticdistributionof ^ TMGTE andthetimeeects, ^ whenTk) Suppose that fori = 1; 2;:::;n andt = 1; 2;:::;T,y it are generated by (3.54),T k, Assumptions 17-15 hold, and I k Q 0 n M T W is invertible where Q n is given by (3.57), and W = n 1 P n i=1 W i . Then asn!1, for > 1=3, n (1)=2 ^ TMGTE 0 ! d N (0 k ;V ;TMGTE ); (3.66) where ^ TMGTE is given by (3.65), V ;TMGTE = (I k G w ) 1 V () I k G 0 w 1 ; G w =lim n!1 Q 0 n M T W , andV () = lim n!1 Var h n (1)=2 ^ TMGTE () i . Also ^ =M T ( y W ^ TMGTE ) =M T ( y X ^ TMGTE ): (a) Ifplim n!1 M T X = 0, we have p n ^ 0 ! d N(0 T ;M T M T ); (3.67) where = lim n!1 n 1 P n i=1 E ( i 0 i ), i = ( i1 ; i2 ;:::; iT ) 0 , and it =u it +x 0 it i . (b) Ifplim n!1 M T X6= 0, for> 1=3, we have n (1)=2 ^ 0 ! d N (0 T ;V ); (3.68) whereV =plim n!1 M T XVar n (1)=2 ^ TMGTE X 0 M T . 70 A proof is given in Section B.3 of the Appendix. Using results similar to the ones employed to establish Theorem (4), robust covariance matrices for ^ TMGTE and ^ are given by (B.36) and (B.39), respectively in Section B.3 of the Appendix. In particular, the asymptotic variance of ^ is applicable to both cases (a) and (b) of Theorem 5 and does not require knowing ifplim n!1 M T X = 0, or not. Example4 As an example of case (a) in the above theorem, considerx it = ix +u x;it , whereu x;it are distributed independently overi with zero means. Then x t x = u x;t u x; ! p 0, and we have plim n!1 M T X = 0. An example of case (b) arises whenx it contains an interactive eect, namelyx it = ix + i f t +u x;it . Inthiscase x t x = f t f + u x;t u x; ;where =n 1 P n i=1 i ! p , and it follows that x t x ! p f t f which is non-zero iff t varies over time and 6= 0, namely at least one of the factors has loadings with non-zero means. 3.6.2 TMG-CestimatorwhenT >k WhenT >k, we can follow Chamberlain (1992) and eliminate the time eects by the de-meaning trans- formation ~ M i = I T ~ X i ( ~ X 0 i ~ X i ) 1 ~ X 0 i , where ~ X i = M T X i . Under the normalization 0 T = 0, M T = and we have M T y i =M T X i i + +M T u i . Then ~ M i M T y i = ~ M i + ~ M i M T u i , and averaging overi we obtain n 1 n X i=1 ~ M i M T y i = n 1 n X i=1 ~ M i ! +n 1 n X i=1 ~ M i M T u i : Hence, can be estimated if M n =n 1 P n i=1 ~ M i is a positive denite matrix, and interestingly does not require knowing 0 . This comes at a cost requiresT > k, since M n is singular ifT = k. Therefore, to implement the Chamberlain estimation approach we require the following additional assumption: Assumption16 WhenT >k, M n =n 1 n X i=1 ~ M i ! p M; (3.69) 71 where ~ X i =M T X i , ~ M i =I T ~ X i ( ~ X 0 i ~ X i ) 1 ~ X 0 i , andM is aTT positive denite matrix. Under this Assumption can be estimated by ^ C = n 1 n X i=1 ~ M i ! 1 n 1 n X i=1 ~ M i M T y i ! ; (3.70) and its asymptotic distribution follows straightforwardly, namely p n( ^ C 0 )! d N(0;V ;C ), where V ;C = M 1 lim n!1 E n 1 P n i=1 ~ M i M T u i u 0 i M T ~ M i M 1 . Since ~ M i M T u i = ~ M i M T (y i );Var ^ C can be consistently estimated by \ Var ^ C =n 1 M 1 n " n 1 n X i=1 ~ M i M T (y i ^ C )(y i ^ C ) 0 M T ~ M i # M 1 n : (3.71) Using ^ C , the TMG-C estimator of 0 is now given by ^ TMGC = 1 1 + n " n 1 n X i=1 Q 0 i (y i ^ C ) # : (3.72) Also since ^ TMGC = ^ TMGC () Q 0 n ( ^ C ), the asymptotic covariance of ^ TMGC can be consistently estimated by \ Var ^ TMGC = \ Var ^ TMGC () + Q 0 n \ Var ^ C Q n ; where \ Var ^ TMGC () = 1 n(n 1)(1 + n ) 2 n X i=1 ( ~ i;C ^ TMGC )( ~ i;C ^ TMGC ) 0 ; and ~ i;C =Q 0 i (y i ^ C ). 72 3.7 AHausman-typetestofthevalidityoftheFEestimator As summarized by Theorem 2, the validity of the FE estimator critically depends on the independence of slope heterogeneity, i = i 0 , from the covariates, X i . Here we propose a Hausman-type test of this condition whenT is ultra short by comparing the FE and TMG estimators. We consider the following null hypothesis H 0 :E i jX i = 0, for alli. (3.73) It is clear that the homogeneous alternative i = 0, for alli , and the uncorrelated alternative (E (X 0 i M T X i ) i = 0, for alli), are covered by the above null hypothesis. The above null hypothesis can be relaxed somewhat by allowingE i jX i 6= 0, fori = 1; 2;:::;n , so long as < 1=2, namely the number of violation of the null is relatively few. This is in line with condition (3.21) that requires n 1=2 P n i=1 E X 0 i M T X i i ! 0, which is the implicit null of the Hausman-type test. But to simplify the derivations we derive the tests underH 0 . Consider the FE and TMG estimators dened by (3.14) and (3.30) respectively. Then a Hausman-type test ofH 0 can be constructed based on the dierence ^ = ^ FE ^ TMG . Such a test has been considered by Pesaran et al. (1996) and Pesaran and Yamagata (2008), assuming the MG estimator has at least a second- order moment. 16 Here we extend this test to cover cases whenT is ultra short and the sucient condition (3.11) of the Proposition 1 is not met. Also, the earlier tests were derived under the null of homogeneity (namely i = 0, for alli), whilst the null that we are considering is more general and covers the null of homogeneity as a special case. First recall from (3.18 ) and (3.30) that ^ FE 0 = 1 n 1 n P n i=1 X 0 i M T i and ^ TMG 0 = 1 n P n i=1 1+ i 1+ n 1 ix X 0 i M T i , where n = n 1 P n i=1 ix ; ix = X 0 i M T X i , i is given by (3.28), 16 See pp. 160-162 of Pesaran et al. (1996), and p. 53 of Pesaran and Yamagata (2008). 73 and i = u i + X i i . Also by Assumption 10, n ! p lim n!1 n 1 P n i=1 E ( ix ) = 0; and 1 n 1 =o p (1). Using these results it follows that p n ^ = 1 p n n X i=1 G 0 i M T i +o p (1); whereG 0 i = h 1 1+ i 1+ n 1 ix i X 0 i . UnderH 0 ,E( it jG i ) = 0 for alli andt, and since by Assump- tions 17 and part (c) of Assumption 13u it and i are cross-sectionally independent, then conditional on X i ; i are also cross-sectionally independent and we have p n ^ ! d N(0; V ); asn!1, so long as V = lim n!1 1 n n X i=1 E G 0 i M T E i 0 i jX i M T G i = lim n!1 1 n n X i=1 E G 0 i M T V i M T G i 0; where V i = H i + X i X 0 i , and =E i 0 i . HenceH =n ^ 0 V 1 ^ ! d 2 k 0 ; where 2 k 0 is a chi-squared distribution withk 0 =dim() degree of freedom. Note that V can be written equivalently as V = n 1 P n i=1 P T t=1 P T t 0 =1 E g it g 0 it 0 ~ it ~ it 0 ; where ~ it = it i ; andg it is thet th row ofG 0 i . For xedT , a consistent estimator of V , which is robust to the choices of H i and , can be obtained given by b V = 1 n n X i=1 T X t=1 T X t 0 =1 ^ g it ^ g 0 it 0 ^ it ^ it 0; (3.74) where ^ it = (y it y i ) ^ 0 FE (x it x i ); y i = T 1 P T t=1 y it , x i = T 1 P T t=1 x it , with ^ g it being the t th row of ^ G 0 i , dened by ^ G i = 2 4 n 1 n X i=1 X 0 i M T X i ! 1 1 + i 1 + n X 0 i M T X i 1 3 5 X 0 i : (3.75) 74 Using the above estimator ofV , the Hausman-type test statistic for testing H 0 (null of uncorrelated heterogeneity) is given by ^ H =n ^ FE ^ TMG 0 b V 1 ^ FE ^ TMG : (3.76) Under the alternative hypothesis thatH 1 :lim n!1 n 1 P n i=1 E X 0 i M T X i i = Q 0, ^ H ! p 1, asn!1, and the test is consistent. 3.8 MonteCarloevidenceonsmallsampleproperties This section assesses small-sample properties of the TMG estimator as compared to the FE, MG, GP, and SU 17 estimators for average treatment eects, as well as time eects when included, in heterogeneous static panel models through Monte Carlo experiments. The nite-sample performance of the Hausman-type test is also examined for a comprehensive set of sample sizes. 3.8.1 DesignsofMonteCarloexperiments 3.8.1.1 Datageneratingprocessoftheoutcomevariableandregressor The dependent variable is generated as y it = i + t + i x it + it e it , fori = 1; 2;:::;n, andt = 1; 2;:::;T; (3.77) 17 We are grateful for the codes of the SU estimator being made available from the authors of Sasaki and Ura (2021), which can be used only in the caseT =k = 2. 75 where we consider both correlated and uncorrelated eects of i . We allow for heteroskedastic and serial correlated errors and generatee it as the rst order autoregressive (AR(1)) process e it = ie e i;t1 + 1 2 ie 1=2 & it : We generatex it as a factor-augmented AR(1) process x it = ix (1 ix ) + ix f t + ix x i;t1 + 1 2 ix 1=2 ix e x;it ; (3.78) fori = 1; 2;:::;n, andt = 1; 2;:::;T . For future reference we set u it = it e it andu x;it = ix e x;it : (3.79) For the shocks to the outcome equation, & it , we consider both Gaussian and chi-squared distributed er- rors: & it IIDN(0; 1) and& it IID 1 2 2 2 2 . For the variance ofu it , we allow for cross-sectional heteroskedasticity, where 2 it are generated independently of e it : 2 it IID 1 2 1 +z 2 iu with z iu IIDN(0; 1) in the baseline case, and two other cases shown in Section 3.8.1.3 where 2 it can depend onx i . Unconditionally we setE( 2 it ) = 1, thenVar(u it ) = 2 E( 2 it ) = 2 . 18 For thex it equation, the individ- ual eects, ix , are generated by ix IIDN(1; 1), and the errors are generated with mean-zero shocks independently distributed across individuals, e x;it IID(0; 1), and cross-sectional heteroskedasticity, 2 ix = 1 2 1 +z 2 ix withz ix IIDN(0; 1), which yieldsE( 2 ix ) = 1. When time eects are included in the model, we set t = 1 fort = 1; 2;:::;T 1 and T = 1T such that 0 T = 0. Our aim is to consider dierent distributions ofd i that play a key role in the eectiveness of the trim- ming approach, whered i = det(W 0 i W i ),W i = ( T ;x i ); T = (1; 1;:::; 1) 0 andx i = (x i1 ;x i2 ;:::;x iT ) 0 . 18 Note thatVar(uit) does not change with serial correlation,E(e 2 it ) =E( 2 ix )E(e 2 i;t1 ) +E(1 2 ix )E(& 2 it ) =E(& 2 it ). 76 Specically, we generate heterogeneous i as functions of the shocks tox i ,e ix = (e x;i1 ;e x;i2 ;:::;e x;iT ) 0 described in the sub-section 3.8.1.2, and we consider two distributions fore x;it , namely Gaussian and uni- form distributions. We also calibrate the heterogeneity of i and the error distribution of they it processes to achieve dierent levels of correlation between i andd i , and dierent degrees of overall t,PR 2 dened below. To control for the t, we set. Section 3.8.1.3 summarizes parameters and details of the baseline case and other experiments where there is a heterogeneous autoregression infu it g orfx it g, or an interactive eect infx it g. When there are feedbacks infx it g orfe it g, we generatex it ore it fort =49;48;:::;1; 0; 1;:::;T , then drop the rst 50 observations. The data on (y it ;w it ) 0 withw it = (1;x it ) 0 fori = 1; 2;:::;n andt = 1; 2;::;T are used as the sample for estimation. 3.8.1.2 Datageneratingprocessofheterogeneouscoecients The heterogeneous coecients i and i are generated as i = 0 B B @ i i 1 C C A = 0 B B @ 0 0 1 C C A + 0 B B @ i i 1 C C A = 0 + i ; (3.80) and i = 0 B B @ i i 1 C C A = 0 B B @ 1 C C A i + 0 B B @ i i 1 C C A = i + i ; (3.81) where = ( ; ) 0 , i = ( i ; i ) 0 and i = e 0 ix M T e ix E (e 0 ix M T e ix ) p Var (e 0 ix M T e ix ) : (3.82) 77 Sincee ix = (e x;i1 ;e x;i2 ;:::;e x;iT ) 0 IID(0;I T ), it also follows that i isIID(0; 1). The random com- ponents of i , namely i , are generated independently ofW i , as i IIDN (0;V ); whereV = Diag( 2 ; 2 ). Hence, E( i ) = 0, andV =E( i 0 i ) = 0 B B @ 2 2 1 C C A = 0 +V : The degree of correlated heterogeneity is controlled by 0 , and it is zero if = 0. AlsoCov( i ; i ) = will be non-zero when both and are non-zero. Specically 2 = 2 + 2 , = and 2 = 2 + 2 : Thus, the degrees of correlation between heterogeneous i and i are given by = =Corr( i ; i ) = q 2 + 2 ; (3.83) and =Corr( i ; i ) = p 2 + 2 : Solving the above equations for and , we have = 2 1 2 1=2 ; and = 2 1 2 ! 1=2 : (3.84) Also recall that 2 = 2 + 2 , and 2 = 2 + 2 ; then 2 = (1 2 ) 2 , and 2 = (1 2 ) 2 : 78 Hence, the key parameters relating to heterogeneity can be set in terms of 2 , 2 , 2 , and 2 . The scaling parameter in (3.77) is set to achieve a given level of t as measured by the pooledPR 2 dened in the online supplement of Pesaran and Yang (2023) and calibrated by stochastic simulation due to the non-linear relationship between i andx i . As we shall see, the extent of trimming will depend on corresponding to the overall measure of t, the degree of correlated heterogeneity in slopes given by , and the distribution ofd i . 3.8.1.3 Baselineandotherexperiments We set 0 = 0 = 1, and try two values forPR 2 , a low value ofPR 2 = 0:2 and a medium value of PR 2 = 0:4. We set 2 = 0:5, 2 = 0:2, andCorr( i ; i ) = 0:25. Two distributions are considered to generate the error termse x;it : (i) Gaussian distribution,e x;it IIDN(0; 1), and (ii) uniform distribution, e x;it = p 12( i 1=2) with i IIDU(0; 1). In the baseline cases, we generate the errors in the outcome process by chi-squared distribution with- out serial correlation, i.e. ie = 0. We generate x it by a heterogeneous AR(1) process with ix IIDU(0; 0:95) and set ix = 0 with x i;50 = 0 such that there is not an interactive eect in the x it equation. We xPR 2 = 0:2, and consider three degrees of correlated heterogeneity measured by in (3.83), including (1) zero correlation: = 0, (2) median correlation: = 0:25, and (3) large correlation: = 0:5. Note that for each case, the values of 2 are set according to the results of stochastic simula- tion, 2 T , described in the online supplement of Pesaran and Yang (2023), such that the overall tness of regressions,PR 2 , is still xed in a given level when certain parameters change. In the case of uncorrelated heterogeneity, = 0, to examine the relative eciency of the TMG and FE estimators, we consider deterministic cross-sectional heteroskedasticity of errors in the outcome equation given by it = u i +(1 u ) 1 2 (1 +z 2 iu ) 1=2 +c u , wherez iu IIDN(0; 1) and i is given by (3.82) with u = 2 3 ,c u = p 51 3 , such thatE( 2 it ) = 1. We also experiment with correlated heteroskedasticity 79 ofu it as 2 it = x 2 it 3 ; when there is not an interactive eect or a dynamic process in thex it equation, i.e., ix = 0 and ix = 0, such thatE(x 2 it ) =E( 2 ix ) +E( 2 ix )E(e 2 x;it ) = 3 andE(u 2 it ) =E 2 x 2 it 3 = 2 . To check the robustness of our TMG estimators, the following variations in the data generating pro- cesses (DGP) of the regressors and errors are also considered. When the shocks to the outcome process are serially correlated, we set ie IIDU(0; 0:95), ande i0 IIDN(0; 1) for alli. When there is an interactive eect in thex it equation, ix IIDU(0; 2) for alli, andf t =t fort = 1; 2;:::;T . For Monte Carlo experiments of the Hausman-type test, we allow for correlated heterogeneity in in- dividual xed eects, i , under both the null and alternative hypotheses, and also try a higher degree of heterogeneity in slope coecients, 2 = 0:75. 3.8.2 MonteCarlondings 3.8.2.1 ComparisonofTMG,FE,andMGestimators We rst contrast the performance of the TMG estimator with the FE and MG estimator in both the cases of uncorrelated and correlated heterogeneous slope coecients for n = 1000; 2000; 5000; 10000 and T = 2; 3; 4; 5; 6; 8. Table 3.1 reports bias, root mean squared errors (RMSE) and size of estimating in simulations with 2,000 replications. The left panel shows that with uncorrelated heterogeneity, the FE, MG, and TMG estimators are all unbiased when their rst moments exist, and their size is around the 5% nominal level, except for the MG estimator whenT = k = 2. The right panel reports estimation results with correlated heterogeneity. In this case, the FE estimator is no longer consistent, where the bias is colossal and does not diminish whenT increases. Size distortions of the FE estimator are severer with largern for a givenT , which is not the case when errors in thex it equation follow a uniform distribution. On the contrary, the MG and TMG estimators are robust to correlation between heterogeneous slope co- ecients and functions of regressors, specicallyd i , as they consistently deliver size around the nominal level 5%. In our design, the correlated heterogeneity in slope coecients is generated in a way that our 80 TMG estimator would be biased with a randomly chosen trimming threshold. Nonetheless, it does not rule out other sources or functional forms of endogeneity such that FE and other trimmed estimators would be biased. Note that whenT =k = 2 the bias and RMSE of the MG estimator are outsized, which demonstrates irregularity of the MG estimator in ultra shortT panels, where estimation errors in some individual es- timates are not bounded such that the MG estimator cannot be relied upon. Moreover, the RMSE of the TMG estimators is smaller than that of the MG estimators in both cases, as trimming based on d i gets rid o outlying individual estimates whose denominators are close to zero. Hence, the TMG estimator we propose is not aected by individual estimates whose rst- or second-order moments do not exist and retains robust properties of the MG estimator. Figure 3.1 plots empirical power functions of the TMG and FE estimators. In the left column, with uncorrelated heterogeneity the empirical power functions of both estimators center around the true value, 0 . The FE estimator has stronger power, but the dierence between power of the FE and TMG estimators shrinks and becomes negligible as T increases. In the right column, with correlated heterogeneity, the empirical power functions of the FE estimator shift dramatically away from the true value. Moreover, when T increases, inference based on the FE estimator yields a higher probability of rejecting the true value. In contrast, the empirical power functions of the TMG estimator are almost indierent in these two cases. To summarize, in the case of uncorrelated heterogeneity, the relative eciency of the TMG and FE estimators depend on the underlying DGP. More important, the TMG estimator always provides a valid inference with size around the 5% nominal level whether there is correlated heterogeneity in slope coe- cients or not. 81 Table 3.1: Bias, RMSE and size of FE, MG and TMG estimators of ( 0 = 1) (The errors processes fory it andx it equations are chi-squared and Gaussian, and there is a heterogeneous autoregression in thex it equation.) Uncorrelated heterogeneity: = 0 Correlated heterogeneity: = 0:5 Prop. trim. Bias RMSE Size (100) Prop. trim. Bias RMSE Size (100) T TMG FE MG TMG FE MG TMG FE MG TMG TMG FE MG TMG FE MG TMG FE MG TMG n = 1; 000 2 31.2 -0.004 -7.362 -0.002 0.17 354.71 0.33 5.0 2.1 4.7 31.2 0.444 -7.489 0.048 0.48 361.04 0.34 69.0 2.1 4.9 3 16.5 0.001 -0.011 -0.002 0.11 0.57 0.19 4.9 4.0 4.7 16.5 0.322 -0.011 0.023 0.34 0.58 0.20 75.6 4.1 5.2 4 10.4 -0.002 -0.001 -0.002 0.09 0.21 0.14 4.8 5.1 5.5 10.4 0.265 0.000 0.013 0.28 0.21 0.15 78.5 5.3 5.2 5 7.1 -0.002 -0.001 0.000 0.08 0.13 0.11 5.6 4.0 3.6 7.1 0.230 0.000 0.009 0.24 0.14 0.12 78.8 4.2 3.8 6 5.2 0.002 0.000 0.001 0.07 0.11 0.10 4.5 4.5 4.7 5.2 0.211 0.000 0.008 0.22 0.11 0.10 81.5 4.7 5.1 8 3.2 0.002 0.000 0.000 0.06 0.08 0.08 4.9 4.5 4.6 3.2 0.179 0.000 0.003 0.19 0.08 0.08 81.0 4.6 4.5 n = 2; 000 2 28.5 -0.005 -2.399 -0.001 0.12 149.93 0.26 4.6 2.0 5.5 28.5 0.445 -2.451 0.044 0.46 153.08 0.26 92.2 2.0 5.4 3 14.1 0.002 0.007 -0.002 0.08 0.44 0.15 4.9 4.3 5.5 14.1 0.323 0.007 0.018 0.34 0.45 0.16 95.7 4.5 5.4 4 8.4 0.000 -0.003 -0.003 0.07 0.15 0.11 4.9 4.3 4.5 8.4 0.266 -0.003 0.008 0.27 0.15 0.11 96.2 4.8 4.7 5 5.6 0.001 -0.004 -0.001 0.06 0.10 0.09 5.5 5.1 5.1 5.6 0.233 -0.004 0.006 0.24 0.11 0.09 96.8 5.2 5.1 6 4.0 0.000 -0.001 -0.001 0.05 0.08 0.07 3.8 4.2 4.3 4.0 0.208 -0.001 0.004 0.21 0.08 0.07 98.0 4.0 4.3 8 2.4 0.000 0.000 0.000 0.04 0.06 0.06 4.2 5.0 4.5 2.4 0.178 0.000 0.003 0.18 0.06 0.06 97.9 4.7 4.7 n = 5; 000 2 24.7 0.002 -4.506 0.000 0.08 423.67 0.17 5.2 1.8 4.3 24.7 0.452 -4.601 0.037 0.46 432.56 0.18 100.0 1.8 4.5 3 10.8 0.001 -0.006 0.000 0.05 0.28 0.10 4.3 4.7 4.8 10.8 0.323 -0.006 0.016 0.33 0.29 0.11 100.0 4.7 5.3 4 5.8 0.001 0.000 0.000 0.04 0.10 0.07 5.1 4.9 5.7 5.8 0.265 0.000 0.008 0.27 0.10 0.08 100.0 5.3 5.7 5 3.5 0.000 -0.001 0.000 0.04 0.07 0.06 5.1 4.5 4.0 3.5 0.231 -0.001 0.004 0.23 0.07 0.06 100.0 4.3 4.0 6 2.3 -0.001 0.000 0.000 0.03 0.05 0.05 4.7 5.1 5.2 2.3 0.207 0.000 0.003 0.21 0.05 0.05 100.0 5.1 5.5 8 1.2 0.000 0.000 0.000 0.03 0.04 0.04 5.2 5.0 5.1 1.2 0.178 0.000 0.001 0.18 0.04 0.04 100.0 5.1 5.1 n = 10; 000 2 22.1 -0.001 -1.562 -0.004 0.05 175.36 0.13 5.3 2.2 4.7 22.1 0.449 -1.595 0.029 0.45 179.03 0.14 100.0 2.2 5.7 3 8.8 -0.002 0.004 0.000 0.04 0.19 0.07 5.1 4.2 4.9 8.8 0.321 0.004 0.013 0.32 0.20 0.08 100.0 4.3 5.4 4 4.4 0.000 0.000 0.001 0.03 0.07 0.05 4.6 4.6 4.6 4.4 0.265 0.000 0.007 0.27 0.07 0.05 100.0 4.7 4.5 5 2.5 0.000 -0.001 -0.001 0.03 0.05 0.04 4.5 4.4 4.5 2.5 0.231 -0.001 0.002 0.23 0.05 0.04 100.0 4.7 4.8 6 1.6 0.000 0.000 0.000 0.02 0.04 0.03 5.6 4.2 4.0 1.6 0.208 0.000 0.002 0.21 0.04 0.04 100.0 4.4 4.1 8 0.8 0.000 0.000 0.001 0.02 0.03 0.03 5.1 4.9 4.8 0.8 0.177 0.000 0.001 0.18 0.03 0.03 100.0 4.9 4.6 Notes: The model is given byyit = i +ixit +uit. The sample data are generated according to the baseline case (1) with = 0 and case (3) with = 0:5. “Prop. trim." denotes the trimmed proportion of individual estimates in the trimmed mean group (TMG) estimator. Numbers in the columns of trimmed proportion are in per cent. “FE" denotes the xed eects estimator of panel data models calculated based on (3.14). “MG" denotes the mean group estimator calculated based on (3.4) without trimming. “TMG" denotes the new trimmed mean group estimator proposed in the paper, calculated based on (3.30) and (3.53). The trimming threshold of the TMG estimator,an, is dened ondi = det(W 0 i Wi) withWi = (wi1;wi2;:::;wiT ) 0 andwit = (1;x 0 it ) given byan = dnn , where dn = 1 n P n i di and = 1=3 whenTk. 82 Figure 3.1: Empirical power functions of TMG and FE estimators of ( 0 = 1) in both cases of uncorrelated and correlated heterogeneity (The errors processes fory it andx it equations are chi-squared and Gaussian, and there is a heterogeneous autoregression in thex it equation withn = 10; 000.) 83 3.8.2.2 ComparisonofTMG,GP,andSUestimators Now we focus on the case of correlated heterogeneity and compare performance of the TMG estimator with the GP and SU estimators. Table 3.2 summarizes bias, RMSE, and size of the FE, MG, TMG, GP, and SU estimators forT = 2; 3 andn = 1000; 2000; 5000; 10000. Though the TMG estimator has larger bias, it has much smaller RMSE and stronger powers than the GP and SU estimators, where the comparison of the empirical power functions is illustrated in Figure 3.2 whenT = 2 and Figure 3.3 whenT = 3. Note that rst, the threshold of the TMG estimator ond i takes into account the bias-variance trade-o in the asymptotic distribution. As a result, a greater proportion of individual estimates isf trimmed in our TMG estimator as shown in Table 3.2. Second, the information from the trimmed units is still utilized in our TMG estimator, so that in eect our TMG estimator is not largely aected by, that is, the rate ofn in the threshold, which is not the case for the GP estimator with dierent threshold values. This dierence can be seen by comparing Figure 3.4 with Figure 3.5. Though SU also exploit the information of the sub- population “slow movers", their aim is to correct for bias especially when there is a mass point of “stayers" in the population. When there are no “stayers”, by local polynomial regression they eectively eliminate the small-sample bias but incur a cost of more estimation errors, which results in greater RMSE in Table 3.2 and inferior power in Figure 3.2. Last but not least, including time eects in panel data models does not aect nite sample performance of our TMG estimator for the slope coecients. In Table 3.3, the bias, RMSE, and size of the TMG-TE and TMG-C estimators reported are almost identical to the previous results of the TMG estimator with no time eects in Table 3.2 since the time eects can be precisely estimated based on the moment condition we provide, which is further conrmed by Figure 3.6 of empirical power functions. Table 3.4 reports bias, RMSE, and size of the TMG-TE and GP estimators for time eects withT = 2; 3 andk = 2. As we can see, the magnitudes of bias and RMSE of ^ are much smaller than those of ^ TMGTE . Thus estimation of time eects makes no dierence to the estimation of the mean coecients. Moreover, for time eects, 84 the TMG-TE estimator has both smaller bias and RMSE than the GP estimator. Note that a similar version of the moment condition in (15) is also imposed by Graham and Powell (2012) in order to solve irregular identifying and estimating issues of time eects whenT = k, and thus, the condition we impose is not restrictive. WhenT >k, Figure 3.7 shows that when there is an interactive eect in thex it equation, the TMG-C estimator has stronger power than the TMG-TE estimator, which suggests that it might be better to combine Chamberlain’s estimator of time eects with the TMG estimator of the mean of heterogeneous coecients. 85 Table 3.2: Bias, RMSE, and size of TMG and alternative estimators of ( 0 = 1) in the case of correlated heterogeneity (The errors processes fory it andx it equations are chi-squared and Gaussian, and there is not an interactive eect in thex it .) T = 2 T = 3 Prop. Bias RMSE Size Prop. Bias RMSE Size Estimator (100) (100) n = 1; 000 FE n/a 0.498 0.52 91.9 n/a 0.352 0.37 92.9 MG 0.0 -1.449 152.67 1.7 0.0 0.000 0.35 3.5 TMG 27.3 0.054 0.26 4.8 11.9 0.018 0.15 4.9 GP 4.1 -0.003 0.58 5.7 0.5 0.000 0.22 4.2 SU 4.1 -0.007 1.10 5.0 ... ... ... ... n = 2; 000 FE n/a 0.503 0.51 99.7 n/a 0.353 0.36 99.7 MG 0.0 -30.842 1330.42 1.7 0.0 -0.003 0.29 4.7 TMG 24.5 0.060 0.21 7.0 9.7 0.016 0.12 5.7 GP 3.2 0.023 0.45 4.5 0.3 -0.001 0.17 5.4 SU 3.2 0.004 0.91 5.3 ... ... ... ... n = 5; 000 FE n/a 0.501 0.51 100.0 n/a 0.353 0.36 100.0 MG 0.0 -0.262 87.72 2.4 0.0 -0.004 0.19 3.7 TMG 21.2 0.037 0.14 6.9 7.2 0.013 0.08 4.3 GP 2.4 0.000 0.35 5.7 0.2 -0.001 0.11 4.2 SU 2.4 0.011 0.70 5.8 ... ... ... ... n = 10; 000 FE n/a 0.500 0.50 100.0 n/a 0.354 0.36 100.0 MG 0.1 -14.577 525.26 1.9 0.0 0.000 0.13 4.0 TMG 18.9 0.038 0.11 7.2 5.8 0.010 0.06 5.3 GP 1.9 0.009 0.27 5.5 0.1 0.001 0.08 4.4 SU 1.9 0.004 0.54 5.9 ... ... ... ... Notes: The model is given byyit =i +ixit +uit with correlated heterogeneity, = 0:5, and the parameter values are set according to case (3). “Prop." denotes the trimmed proportion of individual estimates for the columns of “GP", “SU" and “TMG". In the column of “MG", “Prop." denotes the proportion out ofR = 2; 000 replications when the mean group estimator is not well- dened since some individual estimates do not exist. Numbers in the column of “Prop." are in per cent. “FE" denotes the xed eects estimator of panel data models, calculated based on (3.14). “MG" denotes the mean group estimator without trimming, calculated based on (3.4). “GP" denotes the trimmed mean group estimator by exclusion in Graham and Powell (2012). “SU" denotes the trimmed mean group estimator with local polynomial regression in Sasaki and Ura (2021). “TMG" denotes the new trimmed mean group proposed in the paper, calculated based on (3.30) and (3.53). We dene one threshold ondi = det(W 0 i Wi) withWi = (wi1;wi2;:::;wiT ) 0 andwit = (1;x 0 it ) 0 for our TMG estimator asan = dnn where dn = 1 n P n i di and = 1=3 whenTk. “n/a" denotes not applicable. “..." denotes the estimation algorithms are not available. 86 Figure 3.2: Empirical power functions of TMG, GP and SU estimators of ( 0 = 1) in the case of correlated heterogeneity (The errors processes fory it andx it equations are chi-squared and Gaussian, and there is not an interactive eect in thex it equation withT = 2.) Figure 3.3: Empirical power functions of MG, TMG and GP estimators of ( 0 = 1) in the case of correlated heterogeneity (The errors processes fory it andx it equations are chi-squared and Gaussian, and there is not an interactive eect in thex it equation withT = 3.) 87 Figure 3.4: Empirical power functions of TMG estimators of ( 0 = 1) using thresholds with dierent orders ofn in the case of correlated heterogeneity (The errors processes fory it andx it equations are chi- squared and Gaussian, and there is not an interactive eect in thex it equation withT = 2.) Figure 3.5: Empirical power functions of GP estimators of ( 0 = 1) using thresholds with dierent orders ofn in the case of correlated heterogeneity (The errors processes fory it andx it equations are chi-squared and Gaussian, and there is not an interactive eect in thex it equation withT = 2.) 88 Table 3.3: Bias, RMSE, and size of TMG-TE and alternative estimators of ( 0 = 1) with a time eect in the model in the case of correlated heterogeneity (The errors processes fory it andx it equations are chi-squared and Gaussian, and there is not an interactive eect in thex it equation.) T = 2 T = 3 Prop. Bias RMSE Size Prop. Bias RMSE Size Estimator (100) (100) n = 1; 000 FE-TE n/a 0.498 0.52 92.0 n/a 0.351 0.37 92.8 MG-TE 0.0 0.913 117.51 3.8 0.0 0.000 0.35 3.7 TMG-TE 27.3 0.055 0.26 4.9 11.9 0.019 0.15 4.8 TMG-C n/a n/a n/a n/a 11.9 0.018 0.15 4.9 GP 4.1 -0.005 0.59 4.8 0.5 0.000 0.22 4.1 SU 4.1 -0.013 1.11 4.9 ... ... ... ... n = 2; 000 FE-TE n/a 0.503 0.51 99.7 n/a 0.353 0.36 99.7 MG-TE 0.0 5.510 631.45 2.8 0.0 -0.004 0.19 3.8 TMG-TE 21.2 0.038 0.14 6.8 7.2 0.013 0.08 4.4 TMG-C n/a n/a n/a n/a 7.2 0.013 0.08 4.4 GP 2.4 -0.001 0.36 5.5 0.2 -0.001 0.11 4.3 SU 2.4 0.011 0.71 5.9 ... ... ... ... n = 5; 000 FE-TE n/a 0.501 0.51 100.0 n/a 0.353 0.36 100.0 MG-TE 0.0 0.076 118.02 3.0 0.0 -0.004 0.29 4.5 TMG-TE 24.5 0.060 0.21 7.1 9.7 0.015 0.12 5.8 TMG-C n/a n/a n/a n/a 9.7 0.016 0.12 5.7 GP 3.2 0.026 0.46 4.2 0.3 -0.001 0.17 5.2 SU 3.2 0.016 0.92 5.1 ... ... ... ... n = 10; 000 FE-TE n/a 0.500 0.50 100.0 n/a 0.354 0.36 100.0 MG-TE 0.1 -5.250 121.80 2.6 0.0 0.000 0.13 4.0 TMG-TE 18.9 0.038 0.11 7.2 5.8 0.010 0.06 5.3 TMG-C n/a n/a n/a n/a 5.8 0.010 0.06 5.3 GP 1.9 0.009 0.27 5.5 0.1 0.001 0.08 4.2 SU 1.9 0.003 0.55 6.2 ... ... ... ... Notes: The model is given byyit =i +ixit +t +uit with correlated heterogeneity, = 0:5, and under the normalization 0 T = 0. “Prop." denotes the trimmed proportion of individual estimates for the columns of “TMG-TE", “TMG-C", “GP" and “SU". In the columns of “MG-TE", “Prop." denotes the proportion out ofR = 2; 000 replications when the MG-TE estimator is not well-dened. Numbers in the columns of “Prop." are in per cent. “FE-TE" denotes the two-way xed eects estimator of panel data models, calculated based on (A.4.3). “MG-TE" denotes the mean group estimator with the time eects estimated based on moments we construct forTk. “GP" includes the trimmed mean group estimator by exclusion in Graham and Powell (2012), and with time eects calculations are based on the GP-TE estimator whenT =k and GP-C estimator whenT >k. “SU" denotes the trimmed mean group estimator with local polynomial regression in Sasaki and Ura (2021), and with time eects the calculation is based on the SU-TE estimator whenT =k. “TMG-TE" denotes the new trimmed mean group estimator we propose, calculated based on (3.62) and (3.64). “TMG-C" denotes the new trimmed mean group we propose given by (3.72), with time eects estimated by (A.18), which is only applicable whenT >k. We dene one threshold ondi = det(W 0 i Wi) withWi = (wi1;wi2;:::;wiT ) 0 andwit = (1;x 0 it ) 0 for TMG, TMG-TE, and TMG-C estimators asan = dnn where dn = 1 n P n i di and = 1=3 whenTk. “n/a" denotes not applicable. “..." denotes the estimation algorithms are not available. 89 Table 3.4: Bias, RMSE, and size of TMG-TE and GP estimators of time eect 1 = 1 in the case of correlated heterogeneity (The errors processes fory it andx it equations are chi-squared and Gaussian, and there is not an interactive eect in thex it equation.) T = 2 T = 3 Prop. Bias RMSE Size Prop. Bias RMSE Size Estimator trimmed (100) trimmed (100) n = 1; 000 TMG-TE 27.3 -0.001 0.094 5.4 11.9 0.003 0.109 6.2 GP 4.1 -0.004 0.582 7.4 0.5 0.003 0.148 5.3 n = 2; 000 TMG-TE 24.5 -0.003 0.066 4.6 9.7 0.002 0.078 5.9 GP 3.2 -0.001 0.474 4.8 0.3 0.002 0.106 5.2 n = 5; 000 TMG-TE 21.2 0.001 0.042 5.5 7.2 0.000 0.048 5.7 GP 2.4 0.013 0.375 6.4 0.2 -0.002 0.067 5.5 n = 1; 000 TMG-TE 18.9 0.000 0.029 4.9 5.8 -0.001 0.033 4.4 GP 1.9 -0.005 0.295 6.0 0.1 0.000 0.046 5.2 Notes: The model is given byyit =i +ixit +t +uit with correlated heterogeneity, = 0:5, and under the normalization of time eects 0 T = 0. “Prop. trimmed" denotes the trimmed proportion of individual estimates. Numbers in the columns of trimmed proportion are in per cent. “TMG-TE" denotes the new trimmed mean group estimator we propose for both0 and calculated based on (3.62) and (3.64). For the TMG-TE estimator, we dene one threshold ondi = det(W 0 i Wi) with Wi = (wi1;wi2;:::;wiT ) 0 andwit = (1;x 0 it ) 0 asan = dnn where dn = 1 n P n i di and = 1=3 whenT k. “GP" includes the trimmed mean group estimator by exclusion in Graham and Powell (2012), and with time eects calculations are based on the GP-TE estimator whenT =k and GP-C estimator whenT >k. 90 Figure 3.6: Empirical power functions of TMG-TE, GP, and SU estimators of ( 0 = 1) with time eects in the case of correlated heterogeneity (The errors processes fory it andx it equations are chi-squared and Gaussian, and there is an interactive eect in thex it equation withT = 2.) Figure 3.7: Empirical power functions of TMG-TE, TMG-C, and GP estimators of ( 0 = 1) with time eects in the case of correlated heterogeneity (The errors processes fory it andx it equations are chi-squared and Gaussian, and there is an interactive eect in thex it equation withT = 3.) 91 3.8.2.3 MonteCarloevidenceforHausman-typetestofcorrelatedheterogeneity Table 3.5 reports empirical size and power of the Hausman-type test of correlated heterogeneous slopes given by (3.76). The left, middle, and right panels report the results under homogeneity, uncorrelated heterogeneity, and correlated heterogeneity in slope coecients respectively. In the left and middle panels, the size of the test is around the nominal level of 5%. As shown in the paper, whenx it is strictly exogenous and i is mean independent ofX i , all the FE, MG, and TMG estimators are consistent under both implicit nulls, thus the Hausman-type test does not have power against uncorrelated heterogeneity. However, in the presence of heterogeneous slopes that are correlated with regressors, the MG and TMG estimators are consistent when they have at least nite second moments, while the FE estimator is biased for allT , which renders power of the test against the alternative of correlated heterogeneous slope coecients shown in the right panel. Moreover, the power of the test becomes stronger with increases inn for a given xedT . 92 Table 3.5: Empirical size and power of the Hausman-type test of correlated heterogeneous slope coecients (The errors processes fory it andx it equations are chi-squared and Gaussian, and there is a heterogeneous autoregression in thex it equation.) UnderH 0 : = 0 UnderH 1 : = 0:5 Homogeneity: 2 = 0 Uncorrelated hetro. 2 = 0:5 Correlated hetro. 2 = 0:5 T=n 1,000 2,000 5,000 10,000 1,000 2,000 5,000 10,000 1,000 2,000 5,000 10,000 2 4.1 4.4 5.3 5.5 5.6 5.5 4.8 5.9 26.0 39.0 69.8 90.8 3 4.5 5.7 5.8 4.8 5.3 4.5 5.7 4.7 41.0 61.5 91.3 99.6 4 4.9 5.7 5.6 4.9 4.8 4.5 5.5 4.7 53.5 77.2 97.8 100.0 5 5.3 4.6 4.9 5.2 4.3 4.6 4.4 4.3 63.5 86.1 99.5 100.0 6 4.8 5.2 5.5 5.3 5.2 4.4 4.6 4.4 72.0 91.7 100.0 100.0 8 5.0 4.8 5.0 5.3 4.6 4.9 6.0 5.7 81.1 96.2 100.0 100.0 Notes: The model is given byyit = i +ixit +uit. Two cases are considered under the implicit null, including a case with homogeneous coecients, 2 = 0, and another case with uncorrelated heterogeneous coecients, 2 = 0:5 and = 0, where measures the degree of heterogeneity ini, and measures the degree of correlation betweeni and regressors. The alternative of correlated heterogeneity is given by 2 = 0:5 and = 0:5. The sample data are generated separately for the three cases. The Hausman-type test statistics is calculated by ^ H = ^ 0 b V ^ in (3.76), where ^ = ^ FE ^ TMG , and b V is given by (3.74). Under the null hypothesis, the Hausman-type test is asymptotically distributed as 2 k1 whenn!1. Size and power are in per cent. The number of replications is 2,000. 93 3.9 Empiricalapplication In this section, we apply our methods to estimate the average of heterogeneous eects of household’s total expenditure on calorie demand based on a sample of households from poor rural communities in Nicaragua that participated in a pilot of the conditional cash transfer program Red de Proteccion Social (RPS). The panel data set includes 1,358 households whose observations are available for 2000, 2001, and 2002, which was originally studied by Graham and Powell (2012). We consider a linear model of calorie demand that varies with household’s total expenditure, allowing for correlated heterogeneous coecients, individual xed eects, and time eects ln(Cal it ) = i + i ln(Exp it ) + t +u it ; (3.85) where ln(Cal it ) denotes the logarithm of household calorie availability per capita in yeart of household i, and ln(Exp it ) denotes the logarithm of real household expenditure per capita (in thousands of 2001 cordobas) in yeart of householdi. For cash transfer programs designed to resolve certain factors contributing to a poverty trap such as malnutrition, the average eect of household’s income on the household’s expenditure on a certain good or service is a crucial measure, which facilitates evaluating the eectiveness of programs, in the presence of heterogeneity in treatment eects. Particularly, individuals may respond strategically to treatments. For example, households would distribute disparate portions of their budget to food consumption according to variations in their total resources, even with exogenous shocks in income levels. Since such correlation between heterogeneous treatment eects and regressors cannot be ruled out in many economic studies, it is essential to provide an estimation method for the mean of such heterogenous eects which are consistent in this case. To illustrate, we compare the estimation results of our TMG estimator with the FE, MG, GP, and SU estimators in four regression models, including (i)T = 2 andk = 2 with no time eects; (ii)T = 3 94 andk = 2 with no time eects; (iii)T = 2 andk = 2 with time eects; and (iv)T = 3 andk = 2 with time eects, wherek is the number of regressors in the model including the intercept but not the time eects. 19 Table 3.6 reports the Hausman-type test results of correlated heterogeneity in the eects of household’s total expenditure on calorie demand, which clearly reject the null hypothesis. We have shown in Monte Carlo experiments that the test only has power against the alternative of correlated heterogeneity. There- fore, these results provide strong evidence of non-negligible correlated heterogeneity in the treatment eects with the regressors. Table 3.7 reports estimates of the average eect of household’s total expenditure on calorie demand. Estimation results of our TMG estimator are shown in column (3). Estimation results of the FE, MG, GP, and SU 20 estimators are reported in columns (1), (2), (4), and (5) respectively. The FE estimates are greater than estimates of all the other mean group estimators without or with trimming. Combined with the result of the Hausman-type test, the FE estimates are upward biased, which exaggerates the estimated income elasticity of calorie consumption. Thus, if social planners rely on the FE estimates, they may be over-optimistic about the extent of malnutrition cured by the cash transfer program. WhenT = k, the substantial magnitude of estimated standard errors of the MG estimator demonstrates the issue of irregular identication and highlights the necessity of trimming. WhenT > k, the magnitudes of the respective estimate and estimated standard errors for the MG estimator are more reasonable. In both cases, the MG estimate forms a lower bound for the other estimates given a regression model, as it is mostly aected by outlying individual estimates. In the third column, we can see that for estimates with two and three waves, levels of the TMG estimates are relatively stable with the largest proportion of individual estimates being trimmed, and the TMG estimates also have the smallest estimated standard errors compared with the MG, GP, and SU estimates. Moreover, note that the SU estimator is proposed specically for eliminating the 19 Graham and Powell (2012) used observations of 2001 and 2002 to showcase irregular identication of the mean group esti- mators whenT =k = 2, which we follow here. 20 We are grateful for the codes of the SU estimator being made available from the authors Sasaki and Ura (2021), which can be used only in the caseT =k = 2. 95 nite-sample bias of the GP estimator. Since the estimates based on our TMG estimate are very close to the estimates of the SU method, our estimation methods eectively eliminate the small-sample bias due to trimming. Table 3.8 reports estimates of and, when time eects are included in the model. As shown in the previous sections, we consider two estimators of time eects forT k, andT > k. The MG and TMG estimators combined with these two estimators of time eects are reported in columns (2), (3), (4), and (5) respectively. The previous comparisons on ^ between dierent estimators still hold. WhenT = 3, the results of the TMG-TE and TMG-C estimators of time eects are very similar. Moreover, using ^ or ^ C does not make a dierence to (trimmed) mean group estimates of the average eect. WhenT = 2, adding time eects into the model does not change the estimated results of the average eect. WithT = 3, the estimated time eects are more substantial, which implies there was a negative shock in 2001, thus after removing the negative common time eect, the average eect of the household’s total expenditure on calorie demand was greater over a longer time period. Table 3.6: Hausman-type test of correlated heterogeneity applied to the average eect of household’s total expenditure on calorie demand without time eects (i)T = 2 andk = 2 (ii)T = 3 andk = 2 Hausman-type test 5.918 7.626 p-value 0.015 0.006 Prop. trimmed (100) 27.1 10.9 Notes: Estimates are based on the panel ofn = 1; 358 households in the 2001 and 2002 samples whenT = 2, and the 2000, 2001 and 2002 samples whenT = 3. “Prop. trimmed" denotes the trimmed proportion of individual estimates in the TMG estimator proposed in the paper. The Hausman-type test statistics is constructed based on the dierence between the FE estimator of panel data models and the TMG estimator given by ^ H = ^ 0 b V ^ , where ^ = ^ FE ^ TMG , and b V is given by (3.74). Under the null, the Hausman-type test is asymptotically distributed as 2 k1 , asn!1. 96 Table 3.7: Estimates of the average eect of household’s total expenditure on calorie demand without time eects (1) (2) (3) (4) (5) FE MG TMG GP 21 SU (i)T = 2 andk = 2 ^ 0 0.6568 0.2444 0.5623 0.4549 0.5582 (0.0287) (0.9684) (0.0425) (0.1003) (0.1152) Prop. trimmed (100) n/a 0 27.1 3.8 9.5 (ii)T = 3 andk = 2 ^ 0 0.6588 0.5498 0.5900 0.5938 ::: (0.0233) (0.0474) (0.0284) (0.0396) ::: Prop. trimmed (100) n/a 0 10.9 0.4 ::: Notes: Estimates are based on the panel ofn = 1; 358 households in the 2001 and 2002 samples whenT = 2, and the 2000, 2001 and 2002 samples whenT = 3. “Prop. trimmed" denotes the trimmed proportion of individual estimates in mean group estimators without or with trimming, whose numbers are in per cent. Underneath estimates, the numbers in brackets are the respective estimated standard errors. “FE" denotes the xed eects estimator of panel data models. “MG" denotes the mean group estimator without trimming. “GP" denotes the trimmed mean group estimator by exclusion in Graham and Powell (2012). “SU" denotes the trimmed mean group estimator with local polynomial regression in Sasaki and Ura (2021). “TMG" denotes the new trimmed mean group we propose in the paper. We dene one threshold ondi = det(W 0 i Wi) withWi = (wi1;wi2;:::;wiT ) 0 andwit = (1;x 0 it ) 0 for our TMG estimator asan = dnn , where dn = 1 n P n i di, and = 1=3, regardless ofT =k orT >k. “n/a" denotes not applicable. “..." denotes that the estimation algorithms are not available. 21 The standard errors of the GP estimator are estimated by equation (30) on p. 2126 in Graham and Powell (2012). 97 Table 3.8: Estimates of the average eect of household’s total expenditure on calorie demand and time eects (1) (2) (3) (4) (5) (6) (7) FE-TE MG-TE MG-C TMG-TE TMG-C GP SU (iii)T = 2 andk = 2 with time eects ^ 0 0.6554 0.1338 n/a 0.5612 n/a 0.4629 0.5588 (0.0284) (1.0234) n/a (0.0424) n/a (0.1025) (0.1154) ^ 2001 -0.0172 -0.0206 n/a -0.0178 n/a 0.0181 ::: (0.0063) (0.0100) n/a (0.0064) n/a (0.0296) ::: Prop. trimmed (100) n/a 0 n/a 27.1 n/a 3.8 9.5 (iv)T = 3 andk = 2 with time eects ^ 0 0.6968 0.6013 0.5979 0.6370 0.6338 0.6333 ::: (0.0211) (0.0433) (0.0430) (0.0263) (0.0261) (0.0361) ::: ^ 2001 -0.1793 -0.1744 -0.1636 -0.1762 -0.1636 -0.1636 ::: (0.0087) (0.0091) (0.0123) (0.0088) (0.0123) (0.0123) ::: ^ 2002 0.0727 0.0697 0.0682 0.0708 0.0682 0.0682 ::: (0.0080) (0.0081) (0.0108) (0.0080) (0.0108) (0.0108) ::: Prop. trimmed (100) n/a 0 0 10.9 10.9 0.4 ::: Notes: Estimates are based on the panel ofn = 1; 358 households in the 2001 and 2002 samples whenT = 2, and the 2000, 2001 and 2002 samples whenT = 3, under the normalization of time eects 0 T = 0. “Prop. trimmed" denotes the trimmed proportion of individual estimates in MG estimators without or with trimming. The numbers in brackets are the respective estimated standard errors. “FE-TE" denotes the two-way xed eects estimator of panel regressions. “GP" denotes the estimator proposed by Graham and Powell (2012). “SU" denotes the estimator proposed by Sasaki and Ura (2021). “TMG-TE" denotes the new trimmed mean group we propose in the paper, where the time eects are estimated based on the moment conditions we construct forTk. “TMG-C" denotes the new trimmed mean group we propose in the paper, where the Chamberlain (1992)’s estimator of the time eects are used, which is only applicable whenT > k. We use the same notations for the MG estimator with time eects. We dene one threshold ondi = det(W 0 i Wi) withWi = (wi1;wi2;:::;wiT ) 0 andwit = (1;x 0 it ) 0 for our TMG, TMG-TE, and TMG-C estimators asan = dnn , where dn = 1 n P n i di, and = 1=3, regardless ofT = k orT > k, whether time eects are included in the model or not. “n/a" denotes not applicable. “..." denotes that the estimation algorithms are not available. 98 3.10 Conclusions This paper studies estimation of average treatment eects in panel data models with possibly correlated heterogeneous coecients, when the number of cross-sectional units is large, but the number of time peri- ods can be as small as the number of regressors. Note that the FE estimator is inconsistent under correlated heterogeneity, and the MG estimator can have unbounded rst or second moments in such ultra-short pan- els. Thus, we propose a new trimmed mean group estimator where the trimming process is derived by a careful examination of the bias/eciency trade-o in the asymptotic distribution. Conditions under which the TMG estimator is consistent and asymptotically normally distributed are provided. We also develope a new estimator for time eects whenT k, assuming the dependence between heterogeneous slope coecients and regressors is time-invariant. Moreover, based on the dierence between the TMG and FE estimators, a Hausman-type test is constructed to inspect the presence of slope correlated heterogeneity that invalidates the usage of the FE estimators in some empirical applications. Through Monte Carlo experiments, we provide evidence for the bias in the FE estimator under cor- related heterogeneity. The Monte Carlo experiments also illustrate that the TMG estimator has desirable nite sample performance under various processes of regressors and errors. In particular, since the TMG estimator eectively utilizes information in the data and trades o bias and eciency, it has the smallest RMSE, correct size, and strongest powers compared with the other trimmed estimators. The power of the Hausman-type test against correlated heterogeneity is also demonstrated by Monte Carlo evidence even whenT is ultra short. When the TMG estimator and the Hausman-type test of correlated heterogeneity are applied to a panel of households in poor rural communities in Nicaragua, the results provide clear evidence of correlated heterogeneity in the average eect of the household’s total expenditure on calorie demand. The TMG estimator and the Hausman-type test can be readily applied to panel data models with time eects. Further extensions and generalizations are desirable and implementable. 99 Chapter4 HeterogeneousAutoregressionsinShortT PanelDataModels 1 4.1 Introduction The importance of cross-sectional heterogeneity in panel regressions is becoming increasingly recognized in the literature. When the time dimension of the panel,T , is short, signicant advances have been made in the case of random coecient models with strictly exogenous regressors, for example, Chamberlain (1992), Wooldridge (2005), Arellano and Bonhomme (2012), Bonhomme (2012) and Graham and Powell (2012). A trimmed version of the mean group estimator proposed by Pesaran and Smith (1995) can also be applied to ultra shortT panels when the regressors are strictly exogenous. See Pesaran and Yang (2023). In contrast, there are only a few papers that consider the estimation of heterogeneous dynamic panels when the time dimension is short. There are some limitations to applying existing estimation methods to such heterogeneous shortT dynamic panels. The generalized method of moments (GMM) estimators applied after rst-dierencing by Anderson and Hsiao (1981, 1982), Arellano and Bond (1991), Blundell and Bond (1998), and Chudik and Pesaran (2021), allows for intercept heterogeneity but not if the autoregressive (AR) coecients are heterogeneous and, as shown in this paper, will lead to biased estimates and distorted inference. Gu and Koenker (2017) and Liu (2023) consider estimation of panel AR(1) models with exogenous regressors using 1 This chapter is a joint work with M. Hashem Pesaran. 100 Bayesian techniques. While they assume random coecients on strictly exogenous regressors, they still impose homogeneity on the AR coecients. The mean group and hierarchical Bayesian estimators pro- posed by Hsiao et al. (1999) allow for heterogeneity but require thatT be reasonably large relative ton. Baltagi et al. (2008) provides a review of the estimators for heterogeneous linear panel data models with a moderate sizeT . With reasonably largeT , Okui and Yanagi (2019) and Okui and Yanagi (2020) propose non-parametric estimators for the distribution of sample mean, autocovariances, and their density functions. Also for moderate values ofT , analytical, Bootstrap, and Jackknife bias correction approaches have been also pro- posed to deal with the small sample bias of the mean group and other related estimators. See Pesaran and Zhao (1999) on the mean group estimator, as well as Okui and Yanagi (2019) and Okui and Yanagi (2020). Even with bias corrections,n cannot be too large compared withT , since a valid inference based on the asymptotic distribution often requiresnT c ! 0; for some constantc > 2. In short, none of the above approaches are appropriate and can lead to seriously biased estimates and distorted inference whenT is small and xed withn!1: Nonetheless, heterogeneity in dynamics can play an important role in many empirical studies using panel data models with moderately shortT , for example, earnings dynamics studied by Meghir and Pista- ferri (2004), unemployment dynamics by Browning and Carro (2014), and rm’s growth by Liu (2023). Parametric approaches are widely used to take account of dynamic heterogeneity, particularly in the anal- yses of earnings dynamics using the Panel Study of Income Dynamics (PSID) data. Meghir and Pistaferri (2004) categorized individuals into three educational groups and assumed that autoregressive coecients are heterogeneous across groups but homogeneous within groups. 2 Browning et al. (2010) focused on white males with a high-school degree and showed that allowing for heterogeneity makes a substantial dierence to the estimates. Alan et al. (2018) developed a structural model allowing for heterogeneous 2 However, the within-group homogeneity assumption is not supported by the data. See Section 28.11.8 of Pesaran (2015). 101 parameters in both consumption and income dynamic processes, and used mean group estimation to deal with heterogeneity. Browning and Carro (2014) studied the unemployment dynamics of Danish work- ers and found evidence of heterogeneity in individual unemployment dynamics. These studies should be commended for their explicit treatment of heterogeneity, yet many empirical studies abstract from hetero- geneity in dynamics not because they are not present, but because they are dicult to accommodate in dynamic panels whenT is short. The paper rst show that existing GMM estimators of panel AR(1) models are asymptotically biased under heterogeneity of the AR(1) coecients, i , and derive analytical expressions for their bias in simple cases. It then proposes estimators for the moments of i , using cross-sectional averages of the autocorre- lation coecients of rst dierences, rather than the cross-sectional average of the estimates of i under the mean group estimation. In terms of the estimation approach, the most relevant paper to ours is by Robinson (1978), who considered a random coecient AR(1) model without xed eects. He proposed identifying the moments of the i , as functions of the autocovariance of dierent orders, which he then used to estimate the unknown parameters of an assumed parametric distribution for i . In our analysis, we allow for both individual xed eects and heterogeneous AR coecients. We eliminate the xed eects by rst dierencing, then derive two estimators for the moments ofE( s i ) for s = 1; 2;:::, a relatively simple estimator based on autocorrelations of rst dierences denoted by FDAC, and a generalized method of moments estimator based on autocovariances of rst dierences denoted by HetroGMM. We do not make any assumptions about the xed eects, i , and allow them to have arbitrary correlations with i . We do not need to impose a priori assumptions on the joint distribution of heterogeneous parameters conditional on initial observations or the distribution of errors, but require the underlying AR(1) processes to be stationary. We also provide estimators for the distribution of i assuming its underlying distribution is categorical. It is possible to extend our analysis to higher-order 102 panel AR processes, and possibly dynamic panels with exogenous regressors. However, these important extensions are outside the scope of the present paper. We compare our proposed estimator to the kernel-weighting likelihood estimator by Mavroeidis et al. (2015), which we refer to as the MSW estimator. Based on the deconvolution technique, Mavroeidis et al. (2015) propose a likelihood estimator for the cross-sectional distribution of i conditional on the initial observations,y i1 . Assuming independently distributed Gaussian errors with cross-sectional heteroskedas- ticity, MSW show that the unknown distribution of heterogeneous coecients can be identied provided the linear operator that maps the unknown distribution to the joint distribution of data is complete (or “invertible"). Mavroeidis et al. (2015) provide an estimation algorithm for the parametric version of their estimator assuming the heterogeneous coecients ( i and i ) follow a multivariate normal distribution. The estimation algorithm becomes computationally very demanding if the parametric assumptions about the distribution of i is relaxed. There are also Bayesian approaches in the literature that we do not pursue in this paper. Liu et al. (2017) provide a recent example that build on Hsiao et al. (1999), and develop a hierarchical Bayesian approach for panel AR(1) models with correlated random coecients. 3 We investigate the small sample properties of the proposed FDAC estimators ofE( i ) andE( 2 i ) using Monte Carlo experiments. The simulations show that the relatively simple FDAC estimator performs better than the HetroGMM estimator uniformly across dierent sample sizes, and is robust to non-Gaussian errors and conditional error heteroskedasticity. The latter is particularly relevant as heteroskedastic error variances play an important role in empirical studies of earnings dynamics. See, for example, MaCurdy (1982), Abowd and Card (1989), and Gu and Koenker (2017). We then compare the small sample properties of the FDAC estimator ofE( i ) with a number of GMM estimators derived under homogeneity (denoted by HomoGMM), including the popular Arellano and Bond (1991), AB, and Blundell and Bond (1998), BB, 3 Liu et al. (2017) also consider non-stationary initial values but require them to be normal distributed. See p. 1545 in Liu et al. (2017). 103 estimators. The simulation results conrm the neglected heterogeneity bias of HomoGMM estimators, and show that the FDAC estimator ofE( i ) performs well for all values ofT = 4; 6; 10 andn = 100; 1; 000 and 5; 000, so long as the underlying processes are stationary. This is true for bias, root mean square errors, and the size of the tests of the hypotheses involving the rst and the second order moments of i . It is, however, worth highlighting that the FDAC estimator can result in biased estimates under heterogeneous AR(1) coecients if there are major departures from the stationary distribution. Using Monte Carlo experiments we also provide a limited comparison of MSW and FDAC estimators, and nd that the small sample properties of the MSW estimator are very sensitive to the degree of hetero- geneity and the underlying distribution of i . The MSW estimator can be severely biased when the degree of heterogeneity is relatively high. The small sample properties of the MSW estimator also depends on the assumed distribution of i . The plugged-in estimator for the parameters of the categorical distribution is also shown to be large n consistent with the root mean squared errors shrinking steadily inn. But precise estimation of these parameters requires very large values ofn, since they are functions of the inverse of estimated variances, which could be close to zero in nite samples. The fact that very large values ofn are required for reliable estimation of the categorical distribution has also been observed by Gao and Pesaran (2023) in the context of pure cross section regressions with heterogeneous coecients. We also provide an empirical application using ve and ten yearly samples from the PSID dataset over the 1976-1995 period to estimate the persistence of real earnings. To this end we extend the basic panel AR(1) model to allow for linear trends. Following the empirical literature we report estimates for three educational categories (high school dropouts, high school graduates, and college graduates) and all three categories combined. We nd comparable estimates for the linear trend coecients across sub- periods and educational categories, around 2 per cent per annum. The FDAC estimates of mean persistence for the sub-periods 1991-1995 and 1986-1995 fall in the range of 0.570-0.734, and tend to rise with the 104 level of educational attainment, with college graduates showing the highest degree of persistence. No such patterns are observed for other estimates, which are around 0.3, 0.9 and 0.41 for AB, BB and MSW estimators, respectively. The FDAC estimates ofVar( i ) for all three categories combined are statistically signicant and are given by 0.100 (0.042) and 0.129 (0.023) for the sub-periods 1991-1995 and 1986-1995, respectively, providing further evidence of heterogeneity in real earnings persistence. The rest of the paper is set out as follows. Section 4.2 sets out the model and establishes the identica- tion conditions for the moments of the heterogeneous autoregressive coecients. Section 4.3 shows that standard GMM estimators of dynamic panels are biased in the presence of heterogeneous AR(1) models. Section 4.4 derives conditions under which the moments of i can be identied. Section 4.5 considers group heterogeneity and how their parameters can be estimated. Section 4.6 proposes the FDAC and HetroGMM estimators for moments of the heterogeneous autoregressive coecients. The respective asymptotic distri- butions are also derived. Section 4.7 evaluates the performance of the FDAC, GMM (under homogeneity), and MSW estimators by Monte Carlo simulations. In Section 4.8 presents the empirical application results for the earnings dynamic process. Section 4.9 concludes. Additional Monte Carlo evidence and empirical results can be found in Appendix C. 4.2 Modelandassumptions Consider the following rst-order autoregressive panel data model y it = i + i y i;t1 +u it ; (4.1) wherey it is observed across then cross section unitsi = 1; 2;:::;n over the time periodst = 2; 3;:::;T with a total ofT observationsfy i1 ;y i2 ;:::;y iT g. We introduce the following assumptions. 105 Assumption17 (errors) (a) The idiosyncratic errors, u it IID(0; 2 i ), are cross-sectionally and serially independent overi andt, andE 2 i = 2 for alli. Assumption18 (autoregressive coecients) (a) The autoregressive coecients, i , for i = 1; 2;:::;n are independent draws from the probability density functionf(j), dened over the bounded support,jj < < 1, with mean and variance 2 0; and Ejj s < C s , for all s = 1; 2;:::; some C > 0; and 0 < < 1: (b) i are distributed independently of the error variances, 2 i . (b) i andu it are distributed independently. Assumption19 (initialization) The dynamic processes in (4.1) are started from a long time prior to date t = 1. Assumption20 (individualeects)Theindividualspeciceects, i ,arebounded,sup i j i j<C,butcould be correlated with i and/oru it . Assumption 17 is standard in shortT dynamic panels, but it rules out the possibility of unconditional time series heteroskedasticity, namely it does not allowE(u 2 it ) to dier acrosst. However, this assumption does not rule out conditional heteroskedastic, such as GARCH eects. Assumptions 18 and 19 are required for the identication of the moments of i , which are the parameters of interest. Assumption 20 imposes minimal restrictions on the xed eects. Admittedly, the assumption that all processesfy it ;i = 1; 2;:::;ng are initialized from a distant past, is restrictive. But its relaxation will be beyond the scope of the present paper. Before introducing our identication and estimation strategy, we illustrate the asymptotic bias of some existing GMM estimators ofE( i ) that neglect the heterogeneity of i overi and proceed assuming that heterogeneity of i is a reasonably satisfactory working assumption. 106 4.3 Neglectedheterogeneitybias In the homogeneous case where i = for alli, can be consistently estimated by the method of moments after eliminating i by rst-dierencing (4.1). We begin our analysis of the heterogeneous case by showing that the standard GMM estimators proposed in the literature are biased when i are heterogenous. The extent of the bias depends on the degree of heterogeneity. To simplify the exposition, and without loss of generality, we consider the simple case whereT = 4, the minimum required to identifyE( i ). For example, in the case of the Anderson-Hsiao (AH) estimator, ^ AH = ( P n i=1 y i4 y i2 )= ( P n i=1 y i3 y i2 ), using (4.1) we have ^ AH = P n i=1 i y i3 y i2 P n i=1 y i3 y i2 + P n i=1 u i4 y i2 P n i=1 y i3 y i2 ! p lim n!1 n 1 P n i=1 E ( i y i3 y i2 ) lim n!1 n 1 P n i=1 y i3 y i2 (4.2) Also under Assumptions 18 and 19, for a given i we have y it = P 1 `=0 ` i u i;t` ; which can be written equivalently in terms ofu i;t` as y it =u it (1 i ) 1 X `=1 `1 i u i;t` : (4.3) It is now easily seen that, forh> 0, we have E y it y i;th i , 2 i =E " u it (1 i ) 1 X `=1 `1 i u i;t` ! u i;th (1 i ) 1 X `=1 `1 i u i;t`h !# = 2 i (1 i ) h1 i 1 + i ; and hence E (y it y i;th ) =E 2 i 1 i 1 + i h1 i , forh = 1; 2;::: (4.4) 107 AlsoE ( i y it y i;th ) =E h 2 i 1 i 1+ i h i i . Using these results in (4.2) fort = 3;andh = 1, now yields (asn!1) ^ AH ! p lim n!1 n 1 P n i=1 E ( i y i3 y i2 ) lim n!1 n 1 P n i=1 y i3 y i2 = E h 2 i 1 i 1+ i i i E h 2 i 1 i 1+ i i : In the homogeneous case ( i =), we have ^ AH ! p , as expected. Under heterogeneity, ^ AH is clearly not a consistent estimator ofE( i ). The extent of the bias depends on the joint distribution of i and 2 i . When i and 2 i are independently distributed we obtain the following expression for the asymptotic bias of ^ AH 4 plim n!1 h ^ AH E( i ) i = E h 1 i 1+ i [ i E( i )] i E 1 i 1+ i = 2 [1 +E( i )] h 1 1+E( i ) E 1 1+ i i E 1 i 1+ i : Sincesup i j i j < 1, then 1 +E( i ) 0, andE 1 i 1+ i > 0, and since 1=(1 + i ) is a convex function of i then by Jensen inequalityE 1 1+ i 1 1+E( i ) , and it follows thatplim n!1 ^ AH E( i ), namely we expect the AH estimator to be downward biased. The equality holds only and only if i =E( i ) = for alli. The magnitude of the asymptotic bias depends on the distribution of i . For example, suppose i are random draws form a uniform distribution centered atE( i ) = , then we have i = +v i where v i s IIDUniform(a;a); fora > 0. To ensure thatsup i j i j < 1 we also require thata < 1j j. The homogeneous case arises whena = 0. The degree of heterogeneity of i measured by its standard deviation isa= p 3. For this distribution we obtain plim n!1 h ^ AH E( i ) i = 2(1 + ) h 1 1+ 1 2a ln 1+ +a 1+ a i 1 a ln 1+ +a 1+ a 1 = 2(1 + ) h 1 2 ln 1+ 1 i ln 1+ 1 a 4 Note that (1i)=(1 +i) = 2=(1 +i) 1 andi(1i)=(1 +i) = 2 [1 1=(1 +i)]i. 108 where = a=(1 + ) < 1. It is easily seen that ^ AH E( i )! 0 with! 0. The magnitude of the asymptotic bias of the AH estimator for = 0:4 anda = 0:3; 0:5 will be around0:070 and 0:186 respectively. The asymptotic bias of AB and BB estimators under heterogeneous slopes are derived in Section C.2 of Appendix C. Asymptotic bias, even if small, can lead to substantial size distortions whenn is suciently large. See Section 4.7.3 for Monte Carlo evidence on the bias and size distortions of AH and other (homo) GMM estimators. 4.4 IdenticationofmomentsoftheARcoecients Based on the representation (4.3), moments of i can be identied by constructing moment equations, where moments of i are functions of covariances of transformed data. First, the conditional second mo- ment of the rst-dierenced process can be calculated as E h (y it ) 2 i , 2 i i = 2 2 i 1 + i : Since by Assumption 17, 2 i and i are distributed independently we then have E h (y it ) 2 i =E 2 2 i 1 + i = 2E 2 i E 1 1 + i = 2 2 E 1 1 + i : (4.5) Similarly, using (4.4) E (y it y i;th ) =E 2 i 1 i 1 + i h1 i = 2 E 1 i 1 + i h1 i : (4.6) 109 It is also instructive to write the above expression more explicitly as E h (y it ) 2 i = 2 2 Z jj< 1 1 + f(j)d, (4.7) and E (y it y i;th ) = 2 Z jj< 1 1 + h1 f(j)d. (4.8) As can be seen,E (y it ) 2 andE (y it y i;th ) are general functions of through the probability density f(j). 4.4.1 Identicationconditions In this section, we formally establish identication conditions ofE( s i ) on the minimum number of periods used in estimation,T s . Sincej i j<< 1 under Assumption 18, we have bounded moments of polynomial functions of i , i.e.,Ej i j s < s , then it follows E 1 1 + i 1 X s=0 Ej i j s ! < 1 1 <1: Since the distribution of i has a bounded support withf(j)> 0 for alljj< and 0<< 1 , it also follows that E 1 1 + i = Z = 1 1 + f(j)d = Z 0 1 1 f(j)d + Z 0 1 1 + f(j)d> 0. 110 Combining the above results with 2 > 0, it follows from (4.5) that 0 < E h (y it ) 2 i < C. Denote the h th -order autocorrelation coecients of rst dierences as h given by h = E (y it y i;th ) E h (y it ) 2 i ; (4.9) forh = 1; 2;:::, withj h j 1. Using (4.5) and (4.4), forh = 1; 2;:::, h can be written as h = E h 1 i 1+ i h1 i i 2E 1 1+ i : (4.10) Suppose that h can be consistently estimated using the moment estimators ofE (y it y i;th ) and E h (y it ) 2 i . Then the identication condition of E( s i ) can be derived by the system of equations in (4.10). Forh = 1, 2E 1 1 + i 1 =E 1 i 1 + i = 1 2E 1 1 + i ; which can be equivalently written as 2E 1 1 + i = 1 1 + 1 : (4.11) Also forh = 2, 2E 1 1 + i 2 =E i 2 i 1 + i =2 +E ( i ) + 2E 1 1 + i ; which upon using (4.11) yields E ( i ) = 1 + 2 1 + 2 1 + 1 : (4.12) 111 Similarly, forh = 3 we have 2E 1 1 + i 3 =E 2 i 2 2 i + 2 1 + i ; which yields E 2 i = 1 + 2 1 + 2 2 + 3 1 + 1 : (4.13) The variance of i is now given by Var ( i ) = 1 + 1 + 2 2 + 3 1 + 1 1 + 1 + 2 1 + 1 2 : (4.14) Forh = 4, 2E 1 1 + i 4 =E 2 2 i 3 i 2 i + 2 2 1 + i ; and upon using the results of the lower-order moments we obtain E 3 i = 1 + 2 1 + 2 2 + 2 3 + 4 1 + 1 : (4.15) Higher-order moments of i can be obtained similarly. To identify thes th order moment of i requires consistent estimation of h forh = 1; 2;:::;s + 1. Suppose now that observations on thei th unit are available over the periodt = 1; 2;:::;T . Then we have data on y it over the periodt = 2; 3;:::;T , and a consistent estimator of h is given by ^ h;nT = n 1 (Th 1) 1 P n i=1 P T t=h+2 y it y i;th n 1 (T 1) 1 P n i=1 P T t=2 (y it ) 2 , forh = 1; 2;:::;T 2: (4.16) Therefore, we must haveT s s + 3, asn!1 to identifyE ( s i ) from available observations. 112 Remark18 Itisinterestingtonotethatunderhomogeneityassumptionthat i =foralli,using(4.10)we have h = E (y it y i;th ) E h (y it ) 2 i = 1 1+ h1 2 1 1+ = 1 2 h1 (1); forh = 1; 2;:::;T 2: (4.17) For h = 1 under homogeneity, 1 =(1)=2 and can be estimated by ^ Homo = 1 + 2^ 1;nT . In this case for identication of , we need T 2. This result also follows if we let h = h1 in (4.12) E ( i ) = = 1 + 1 + 1 1+ 1 ; which is satised when 1 =(1)=2. It also follows from (4.14) that under homogeneity Var ( i ) = 1 + 1 + 2 1 + 2 1 1 + 1 2 = 0; asitmust. Therestrictionsin(4.17)canbeusedtotestthehomogeneityhypothesisthat i =foralliunder the assumptions in Section 4.2. 4.5 Panelautoregressionswithgroupheterogeneity In many empirical applications, it is of further interest to go beyond moment estimators and learn about the nature of heterogeneity. One particular feature is heterogeneity across groups. When group characteristics are known, the dynamic panel can be estimated over sub-groups, or the panel could be augmented with group-specic interactive eects. But when individual characteristics are not observed, it is still possible to estimate group-specic probabilities centered on a suitable partition of the parameter space. In the case 113 of i we could postulate the following categorical distribution assuming that possible outcomes of can be grouped intoG categories: i = 8 > > > > > > > > > > < > > > > > > > > > > : (1) with probability 1 (2) with probability 2 . . . (G) with probability G with 0 < g < 1, P G g=1 g = 1, and (g) < < 1 forg = 1; 2;:::;G. Under this specication, the object of the exercise is to estimate 2G 1 unknowns = (1) ; (2) ;:::; (G) ; 1 ; 2 ;:::; G1 0 , with G = 1 P G1 g=1 g . To identify, we rst, note that E ( s i ) = G X g=1 g (g) s , fors = 1; 2;:::;S. Hence we must haveS 2G 1 to identify the 2G 1 unknown parameters. Also to identify the rstS moments, we needTS + 3. Combining these two inequalities, we haveT 2G 1 + 3 = 2(G + 1). These are order conditions, and we still require rank conditions that ensure a unique solution for. But it is clear that the number of groups that can be entertained is closely related to the size ofT , which rises linearly inT . In the simplest possible case withG = 2, let (1) = L , (2) = H , 1 = and 2 = 1, and we need the following moment conditions E () = L + (1) H ; E 2 = 2 L + (1) 2 H ; E 3 = 3 L + (1) 3 H ; 114 where the momentsE( s i ), fors = 1; 2; 3 can be consistently estimated using the pooled estimator of h given by ^ h;nT in (4.16) withT 6. Since by assumption H L > 0, then we obtain three solutions for that must coincide: = H E () H L = 2 H E 2 2 H 2 L = 3 H E 3 3 H 3 L : (4.18) Let s =E ( s i ), eliminating the common factor H L from the denominators of the above yields H 1 = 2 H 2 L + H ; and H 1 = 3 H 3 2 H + H L + 2 L ; or 1 AB = 2 andA ( 1 AB) 1 B = 3 ; whereA = L + H andB = L H . The above equations have the unique solution L + H = 3 1 2 2 2 1 , and L H = 1 3 2 2 2 2 1 ; (4.19) and 2 2 1 =E( 2 i ) [E( i )] 2 =Var( i )> 0: (4.20) Therefore, L and H can be obtained as the solutions to the following quadratic equation 2 3 1 2 2 2 1 + 1 3 2 2 2 2 1 = 0, (4.21) 115 given the moments 1 , 2 and 3 . 5 Namely L = ( 3 1 2 ) p ( 3 1 2 ) 2 4( 2 2 1 )( 1 3 2 2 ) 2( 2 2 1 ) ; H = ( 3 1 2 ) + p ( 3 1 2 ) 2 4( 2 2 1 )( 1 3 2 2 ) 2( 2 2 1 ) . (4.22) Clearly, H or L are identied from the moments if condition (4.20) holds. For real solutions to exist, it is required thatA 2 4B > 0. Once L and H are obtained, can then be identied using (4.18). A consistent estimator of ( L ; H ;) 0 can now be obtained by replacing s =E ( s i ), fors = 1; 2; 3 by their estimators suggested in Sections 4.6.1 and 4.6.2. Remark19 Other parametric distributions for can also be considered. Prominent choices are uniform and beta distributions. In the case of a uniform distribution it is important that is dened on a bounded region thatdoesnotinclude 1,otherwiseconditionEjj s <C s with< 1couldbeviolated. Forexample,suppose that is distributed uniformly over (;), then E (jj s ) = 1 2 Z jj s d = 1 2 Z 0 jj s d + 1 2 Z 0 jj s d = 1 Z 0 s d = s1 1 +s : It is clear that when = 1, the conditionEjj s <C s is not met. Remark20 Furthercomplicationsarisewheny it startsfromanitepast,orifwewishtoallowforexogenous regressors, even if the regressors are strictly exogenous. 5 See also, Gao and Pesaran (2023) for a similar solution in the case of heterogeneous cross-sectional regressions. 116 Example5 As an example, suppose that is distributed uniformly over [0;] with 0 < < 1, but the categorical model withG = 2 is used as an approximation. In this setting s =E( s ) = 1 Z 0 s d = s 1 +s : Using this result in (4.19) we have A = 3 1 2 2 2 1 = 3 4 3 23 2 3 2 4 =,B = 1 3 2 2 2 2 1 = 4 24 4 9 2 3 2 4 = 1 6 2 ; and the two solutions are given by L = 0:2113 and H = 0:7887. Also, = H E() H L = 0:5. 4.6 Estimationofmomentsofautoregressivecoecients 4.6.1 Methodofmomentsestimatorbasedonautocorrelations When the moments of i are just identied orT is very short, estimation of moments can be carried out straightforwardly by the method of moments using the sample analogues of h given by (4.16). We denote this estimator by FDAC. Let = ( 1 ; 2 ; 3 ) 0 = (E( i );E( 2 i );E( 3 i )) 0 , then the FDAC estimator of is given by ^ 1;FDAC = \ E ( i ) = 1 + 2^ 1;nT + ^ 2;nT 1 + ^ 1;nT ; forT 4 (4.23) ^ 2;FDAC = \ E 2 i = 1 + 2^ 1;nT + 2^ 2;nT + ^ 3;nT 1 + ^ 1;nT ; forT 5; (4.24) and ^ 3;FDAC = \ E 3 i = 1 + 2^ 1;nT + 2^ 2;nT + 2^ 3;nT + ^ 4;nT 1 + ^ 1;nT ; forT 6; (4.25) where ^ h;nT ; forh = 1; 2; 3; 4 are given by (4.16). 117 4.6.2 Generalizedmethodofmomentsestimatorbasedonautocovariances The FDAC estimator combines equally-weighted time averages of available data points to estimate dierent h forh = 1; 2;:::, then plug them into equations (4.23), (4.24) and (4.25). An alternative and arguably more ecient approach would have the estimation based on the sample moments ofE (y it y i;th ) rather than h , which allows us to consider the optimum weighting of the moment conditions at dierent periods. 4.6.2.1 GeneralizedmethodofmomentsestimatorofE( i ) Denote the rst moment as 1 =E( i ). Given (4.9), the moment condition (4.12) can be written equiva- lently as 1 h E h (y it ) 2 i +E (y it y i;t1 ) i =E h (y it ) 2 i + 2E (y it y i;t1 ) +E (y it y i;t2 ); (4.26) which yields a totalT3 moment conditions fort = 4; 5;:::;T , requiring thatT 4. The FDAC estimator can now be used to obtain initial estimates for the generalized method of moments (HetroGMM) estimator. WhenT is small, note that ^ 1;FDAC may use more information from data than ^ 1;HetroGMM , as estimating E(y it y i;th ) of dierenth = 0; 1; 2 uses respectiveTh 1 data points rather than the subset of T 3 data points. TheT 3 moment conditions in (4.26) can be written as E [m nt ( 1;0 )] = 0, fort = 4; 5;:::;T; where 1;0 is the true value of 1 and for a given value of 1 , m nt ( 1 ) = 1 n 1 n X i=1 h (y it ) 2 + y it y i;t1 i n 1 n X i=1 h (y it ) 2 + 2y it y i;t1 + y it y i;t2 i : 118 To optimally combine these moment conditions, let h iT = 0 B B B B B B B B B B @ (y i4 ) 2 + y i4 y i3 (y i5 ) 2 + y i5 y i4 . . . (y iT ) 2 + y iT y i;T1 1 C C C C C C C C C C A ; and g iT = 0 B B B B B B B B B B @ (y i4 ) 2 + 2y i4 y i3 + y i4 y i2 (y i5 ) 2 + 2y i5 y i4 + y i5 y i3 . . . (y iT ) 2 + 2y iT y i;T1 + y i;T y i;T2 1 C C C C C C C C C C A : Then m nT ( 1 ) = 0 B B B B B B B B B B @ m n;4 ( 1 ) m n;5 ( 1 ) . . . m n;T ( 1 ) 1 C C C C C C C C C C A = g nT h nT 1 , where g nT =n 1 n X i=1 g iT and h nT =n 1 n X i=1 h iT : Using (4.26), it readily follows thatE [m nT ( 1;0 )] = 0. The HetroGMM estimator of 1 is given by ^ 1;HetroGMM = argmin 1 (g nT h nT 1 ) 0 A nT (g nT 1 h nT ); 119 where A nT is a (T 3) (T 3) positive denite stochastic weight matrix, and for anyT 4, it tends to a non-stochastic positive denite matrix A T asn!1. The most ecient HetroGMM estimator is given by ^ 1;HetroGMM (A T ) = h 0 nT A T h nT 1 h 0 nT A T g nT ; (4.27) where A T = S 1 T ( 1 ) is the optimal weight matrix with S T ( 1 ) =Var p nm nT ( 1 ) =nVar (g nT h nT 1 h nT ) =nVar " n 1 n X i=1 (g iT 1 h iT ) # : In view of (4.26),E (g iT 1;0 h iT ) = 0, and g iT 1;0 h iT are cross-sectionally independent, then S T ( 1;0 ) = 1 n n X i=1 E (g iT 1;0 h iT ) (g iT 1;0 h iT ) 0 : It is dicult to derive an analytical expression for S T ( 1;0 ), but for a given value of 1 , S T ( 1 ) can be consistently estimated by its sample mean as ^ S T ( 1 ) = 1 n n X i=1 (g iT 1 h iT ) (g iT 1 h iT ) 0 , forn>T 3: (4.28) A standard two-step GMM estimator of 1 can now be obtained using ^ 1;FDAC to estimate the optimal weight matrix in the rst step. WhenT > 4, substituting ^ 1;FDAC into (4.28) yields the following two-step HetroGMM estimator ^ 1;HetroGMM = h h 0 nT ^ S 1 T ^ 1;FDAC h nT i 1 h h 0 nT ^ S 1 T ^ 1;FDAC g nT i ; (4.29) 120 where ^ S T ^ 1;FDAC = 1 n n X i=1 g iT ^ 1;FDAC h iT g iT ^ 1;FDAC h iT 0 : (4.30) It is also possible to obtain an iterated version of the above, where ^ 1;HetroGMM is used to obtain a new estimate of ^ S T (), namely ^ S T ( ^ 1;HetroGMM ), and so on. But there seems little gain in doing so since ^ 1;HetroGMM is asymptotically ecient. The asymptotic distribution of ^ 1;HetroGMM is given by p n ^ 1;HetroGMM 1;0 ! d N (0;V 1 ); (4.31) whereV 1 can be consistently estimated by ^ V 1 = h h 0 nT ^ S 1 T ^ 1;HetroGMM h nT i 1 ; (4.32) where ^ S T ^ 1;HetroGMM = 1 n P n i=1 g iT ^ 1;HetroGMM h iT g iT ^ 1;HetroGMM h iT 0 : Remark21 The above estimators should work ne asymptotically underE(u 2 it ) = 2 it , so long as the time variationsof 2 it isstationary,inasensethatE( 2 it ) = 2 i . Oneimportantexampleiswhenu it hasastationary GARCH specication, for example, if h 2 it =E u 2 it jI i;t1 = 2 i (1 0 1 ) + 0 h i;t1 + 1 u 2 i;t1 ; wheresup i j 0 + 1 j< 1,I it = (u it ;u i;t1 ;:::), andtheprocessesofu it havestartedinadistantpast. Note that for the above GARCH processes,E(u 2 it ) = 2 i . 121 4.6.2.2 GeneralizedmethodofmomentsestimatorofE( 2 i ) Similarly, the HetroGMM estimator of 2 =E 2 i can also be obtained based on the equation below for t = 5; 6;:::;T , 2 h E h (y it ) 2 i +E (y it y i;t1 ) i (4.33) =E h (y it ) 2 i + 2E (y it y i;t1 ) + 2E (y it y i;t2 ) +E (y it y i;t3 ): Let h 2;iT = 0 B B B B B B B B B B @ (y i5 ) 2 + y i5 y i4 (y i6 ) 2 + y i6 y i5 . . . (y iT ) 2 + y iT y i;T1 1 C C C C C C C C C C A ; and g 2;iT = 0 B B B B B B B B B B @ (y i5 ) 2 + 2y i5 y i4 + 2y i5 y i3 + y i5 y i2 (y i6 ) 2 + 2y i6 y i5 + 2y i6 y i4 + y i6 y i3 . . . (y iT ) 2 + 2y iT y i;T1 + 2y iT y i;T2 + y iT y i;T3 1 C C C C C C C C C C A : Denote g 2;nT = n 1 P n i=1 g 2;iT , and h 2;nT = n 1 P n i=1 h 2;iT , where g 2;nT and h 2;nT are (T 4) 1 vectors (withT > 4). Then, the two-step HetroGMM estimator of the second moment can be derived as ^ 2;HetroGMM = h h 0 2;nT ^ S 1 2;T ^ 2;FDAC h 2;nT i 1 h h 0 2;nT ^ S 1 2T ^ 2;FDAC g 2;nT i ; (4.34) where the initial estimator can be the FDAC estimator of 2 given by equation (4.24), and ^ S 2;T ( 2 ) = 1 n n X i=1 (g 2;iT 2 h 2;iT ) (g 2;iT 2 h 2;iT ) 0 : (4.35) 122 Finally, the asymptotic distribution of ^ 2;HetroGMM can be derived as the following p n ^ 2;HetroGMM 2;0 ! d N (0;V 2 ); (4.36) where 2;0 is the true value of 2 , andV 2 can be consistently estimated by ^ V 2 = h h 0 2;nT ^ S 1 2;T ^ 2;HetroGMM h 2;nT i 1 : (4.37) 4.6.3 Plug-inestimatorofVar( i ) Consider now the estimation ofVar( i ). By denition, it can be written asVar( i ) =E( 2 i )[E( i )] 2 = 2 = 2 2 1 . By plugging estimators ^ = ( ^ 1 ; ^ 2 ) 0 into the above formula, we derive a consistent estimator of the variance given by d Var( i ) = ^ 2 ^ 1 2 : (4.38) Note that d Var( i ) is a valid estimator if ^ 2 ^ 1 2 > 0, which requiresn to be suciently large. Sup- pose the asymptotic distribution of ^ = ( ^ 1 ; ^ 2 ) 0 is the following p n( ^ )! d N (0; V ). Then the asymptotic distribution of plug-in estimator d Var( i ) is given by p n d Var( i )Var() ! d N (0;V 2); (4.39) whereV 2 = (2 1 ; 1) V 0 B B @ 2 1 1 1 C C A derived by the Delta method, which can be consistently estimated by b V = 2 ^ 1 ; 1 b V 0 B B @ 2 ^ 1 1 1 C C A , where b V is a consistent estimator of V . 123 4.7 MonteCarloexperiments 4.7.1 Datageneratingprocess The dependent variable is generated as y it = i (1 i ) + i y i;t1 +u it ; fort =50;49;:::; 0; 1;:::;T andi = 1; 2;:::;n, where i = i =(1 i ) measure the mean of stationary fy it g, andu it = h it " it . We consider both Gaussian errors," it IIDN(0; 1), and non-Gaussian errors, " it = (e it 2)=2, where e it IID 2 2 , and 2 2 is a chi-squared variate with two degrees of freedom. fh it g captures both cross-sectional and conditional heteroskedasticity generated as GARCH(1,1), given by h 2 it = 2 i (1 0 1 ) + 0 h 2 i;t1 + 1 u 2 i;t1 , with 0 = 0:6, 1 = 0:2, 2 i IID 0:5 + 0:5z 2 i , and z i IIDN(0; 1). 6 The case where errors are conditionally homoskedasticity is obtained as a special case setting 0 = 1 = 0. Thefy it g and h 2 it processes are generated with the initial valuesy i;51 = 0, " i;51 = 0, andh i;51 = 0. The rst 51 time series observations (t =50;49;:::; 0) are discarded, and the estimation of the moments of i are based onfy i1 ;y i2 ;::::;y iT g fori = 1; 2;:::;n. We experiment with two distributions for i : uniform with a medium and a high degree of hetero- geneity, and a categorical distribution with two groups as follows. (a) Uniform distributions: i = +v i ,v i sIIDU(a;a), with = 0:4 anda =f0:3; 0:5g, which givesE( i ) = = 0:4 andVar( i ) =a 2 =3 =f0:030; 0:083g respectively. . (b) Categorical distribution: i = L with probability, and i = H with probability (1). We set = 0:3, L = 0:2, and H = 0:8, thus yieldingE( i ) = = L + H (1) = 0:62, and Var( i ) = 2 L + 2 H (1) [E( i )] 2 = 0:076. 6 The coecients in the GARCH(1,1) model can be heterogeneous acrossi. In this case, the FDAC and HetroGMM estimators are still applicable. 124 Individual xed eects are generated as i = i + i with i sIIDN(0; 1), which allows for non-zero correlations between i and i . We carry out 2; 000 replications for the experiments that compare the small sample performances of FDAC, HetroGMM, and a number of estimators proposed in the literature for the homogeneous slope case, specically the estimators proposed by Anderson and Hsiao (1981, 1982) (AH), Arellano and Bond (1991) (AB), Blundell and Bond (1998) (BB), and the augmented Anderson-Hsiao (AAH) estimator proposed by Chudik and Pesaran (2021), as well as the FDLS estimator due to Han and Phillips (2010). 7 For experiments that compare our proposed estimator with the MSW estimator proposed by Mavroeidis et al. (2015), we use 1; 000 replications as it takes a substantial amount of time to compute the MSW estimator. 8 4.7.2 ComparisonofFDACandHetroGMMestimators Detailed results of the Monte Carlo experiments are summarized in Appendix C. Tables C.1 to C.2 of Appendix C give bias, root mean square errors (RMSE), and size of the FDAC and HetroGMM estimators forE( i ) andVar( i ) withT = 4; 5; 6; 8; 10 andn = 100; 200; 500; 1000; 5000, in the case of Gaussian errors with GARCH eects. The results cover both cases where i are generated as uniform with = 0:4 anda = 0:5 (see (a) above), and categorical as specied under (b). The associated empirical power functions are displayed in Figures C.1 to C.4 of Appendix C. As can be seen, the FDAC estimator has uniformly smaller bias across all sample sizes, and lower RMSE and greater power whenT = 4; 5; 6. This could be because the FDAC estimator uses averages of the individual sample moments both over time and across units, and is not subject to the many moments problems. Most importantly, tests based on the FDAC estimator are not adversely aected asT is increased withn relatively small, and its size is mostly around the nominal 7 We have downloaded the codes for the computation of AH, AB, BB, and AAH estimators from the supplementary materials of Chudik and Pesaran (2021) using the link: https://www.econ.cam.ac.uk/people-files/emeritus/mhp1/fp21/CP_AAH_ paper_July_2021_codes_and_data.zip. We are grateful to Alexander Chudik for making the codes publicly available. 8 We have downloaded the codes of the MSW estimator used in empirical application from the supplementary materials of Mavroeidis et al. (2015) using the link: https://drive.google.com/file/d/1hdRFpcWo3r88YV_5Kc40ur-siCYGSBDN/ view?usp=sharing. We are grateful to Yuya Sasaki for also sharing the codes of the MSW estimator used in their Monte Carlo experiments by private correspondence. 125 size of ve per cent. But tests based on the HetroGMM estimator, tend to over-reject asT is increased whenn is relatively small (n = 100). These results are in line with the results obtained in the literature when GMM is applied to homogeneous dynamic panels. The simulation results reported in Tables C.8–C.15 in Appendix C also show that the performance of the FDAC estimator is reasonably robust to non-Gaussian errors and/or GARCH eects. The RMSE and size distortion of the FDAC estimator increase only slightly as we move from Gaussian to non-Gaussian errors and allow for GARCH eects. Overall, the FDAC estimator outperforms the HetroGMM estimator and seems to be reasonably robust to non-Gaussian errors and GARCH eects. It is also simple to compute. In what follows we focus on comparing the FDAC estimator with a number of GMM estimators proposed for the homogeneous case as well as the MSW estimator that allows for slope heterogeneity. We refer to the former group of estimators as HomoGMM. 4.7.3 ComparisonofFDACandHomoGMMestimators Since it is not known if the heterogeneity bias is serious, it is natural to ask if the FDAC estimator con- tinues to perform equally well under homogeneity ( i =), and if its performance under homogeneity is comparable to the AB, BB, and other HomoGMM estimators of. Tables 4.1, 4.2, and 4.3 report the bias, RMSE, and size of the FDAC, FDLS, AH, AAH, AB, and BB estimators under slope homogeneity (a = 0), and under two uniformly distributed heterogeneous slope cases witha = 0:3 and 0:5. All experiments allow for unrestricted heterogeneous intercepts (xed eects). The results in these tables are based on Gaussian errors and allow for GARCH eects for the sample sizesT = 4; 6; 10, andn = 100; 1000; 5000. Results for non-Gaussian errors with GARCH eects are provided in Tables C.3–C.5 in Appendix C. As can be seen from the results in Table 4.1, the FDAC estimator continues to perform well even under slope homogeneity. Its bias is close to zero and only shows a small degree of size distortions whenn = 100. 126 It is closest to the FDLS estimator since both estimators assume the initial values,fy i0 ; i = 1; 2;:::;ng, are drawn from the steady state distribution offy it g and combine the moments by averaging them over both i andt. Figure C.5 in Appendix C compares the empirical power functions of FDAC and FDLS estimators. Compared to the FDLS estimator, the FDAC estimator makes use of higher order autocorrelation of rst dierences that are not needed for identication of E( i ) under homogeneity. As a result, the FDLS estimator is marginally more powerful than the FDAC for smallT , but suers from size distortion when bothn andT are small. However, when comparing the FDAC and the other HomoGMM estimators (such as AAH, BB, or AB) one needs to be cautious, since these estimators do allow for the distribution of y i0 to depart from the steady state distribution offy it g. With this in mind, we note that the FDAC estimator performs well when compared to AH and AB estimators, although it is marginally less ecient when compared to the AAH and BB estimators. Also, the FDAC estimator has less size distortion and better power performance compared to all HomoGMM estimators asT is increased. These results demonstrate the FDAC estimator is reliable and has desirable small-sample performance even in homogeneous panels with stationary outcome processes. Note that the AH and AAH estimators can be applied to homogeneous dynamic panel data models with less restrictive assumptions, including the unit root case and time series heteroskedasticity. For the heterogeneous case, Table 4.2 gives the summary of the results when heterogeneity is moderate (namely a = 0:3), and Table 4.3 provides the results when a = 0:5. It is clear that the performance of the HomoGMM estimators deteriorates quite rapidly as the degree of heterogeneity is increased, but the FDAC estimator continues to have satisfactory properties irrespective of the degree of heterogeneity. With a moderate degree of heterogeneity (a = 0:3), the FDAC estimator continues to have close to zero bias and the correct size for all sample sizes under consideration. But for the HomoGMM estimators, the magnitudes of the bias are much larger and the size distortions are much more serious. In the case of 127 high heterogeneity (a = 0:5), the FDAC estimator has the smallest RMSE and has the correct size, whilst HomoGMM estimators all suer from large size distortions. The simulation results also conrm that there is a downward bias in the AH estimator in heterogeneous panel AR(1) models. Note that withn = 5; 000 andT = 4, the simulated bias of the AH estimator is very close to the analytical value derived in Section 4.3. Also, the bias of HomoGMM estimators does not diminish with increases inn and/orT , and as a result, the size distortions of HomoGMM estimators become even more pronounced asn and/orT are increased. Figure 4.1 displays empirical power functions for the FDAC estimator in the case of homogeneous and heterogeneous panel AR(1) models with both Gaussian and non-Gaussian errors, and GARCH eects. The power functions become steeper as n and T increase. In general, the power of the FDAC estimator is similar under heterogeneous and homogeneous i . But with non-Gaussian errors, the power functions become noticeably atter, and the size distortions become more pronounced forn = 100. 128 Table 4.1: Bias, RMSE, and size of FDAC, FDLS, AH, AAH, AB, and BB estimators of ( 0 = 0:4) in a homogeneous panel AR(1) model with Gaussian errors and GARCH eects Bias RMSE Size (100) T n FDAC FDLS AH AAH AB BB FDAC FDLS AH AAH AB BB FDAC FDLS AH AAH AB BB 4 100 -0.002 0.006 0.453 0.096 -0.037 0.014 0.189 0.183 14.925 0.326 0.290 0.171 8.1 9.8 8.0 14.1 12.6 18.4 4 1,000 -0.001 0.001 0.024 0.072 -0.004 0.002 0.062 0.058 0.213 0.222 0.088 0.056 5.4 5.6 4.7 15.9 6.6 6.8 4 5,000 -0.001 -0.001 0.003 0.020 -0.002 0.000 0.029 0.026 0.088 0.121 0.039 0.025 5.5 5.3 4.3 7.1 5.0 4.4 6 100 -0.001 0.004 -0.059 0.028 -0.052 0.014 0.117 0.130 0.215 0.163 0.158 0.107 7.6 8.1 18.8 27.4 21.6 28.8 6 1,000 -0.002 0.000 -0.007 0.000 -0.006 0.002 0.038 0.041 0.067 0.035 0.047 0.032 4.2 5.6 6.5 7.4 7.0 8.2 6 5,000 -0.001 0.000 -0.001 0.001 -0.001 0.001 0.018 0.019 0.031 0.016 0.021 0.014 4.8 4.3 5.3 5.3 4.2 5.0 10 100 0.001 0.003 -0.041 0.008 -0.038 0.004 0.081 0.093 0.099 0.083 0.090 0.068 8.3 8.2 39.8 53.3 45.5 54.5 10 1,000 -0.001 0.000 -0.005 0.001 -0.005 0.001 0.026 0.030 0.030 0.020 0.026 0.019 4.8 6.0 9.5 12.3 10.7 12.4 10 5,000 0.000 0.000 -0.001 0.000 -0.001 0.000 0.012 0.014 0.014 0.009 0.012 0.009 5.5 5.5 6.0 5.9 5.5 5.8 Notes: The DGP is given byyit =i(1i) +iyi;t1 +uit fori = 1; 2;:::;n, andt =50;49;:::;T withi =i=(1i), where errors,uit =hit"it, are generated to be Gaussian distributed and cross-sectionally heteroskedastic with GARCH eects:"itIIDN(0; 1), andh 2 it = 2 i (1 0 1)+ 0h 2 i;t1 + 1u 2 i;t1 with 2 i IID(0:5+0:5z 2 i ), zi IIDN(0; 1), 0 = 0:6, and 1 = 0:2. The initial values are given byyi;51 = 0,"i;51 = 0, andhi;51 = 0. The AR(1) coecients are generated to be homogeneous: i = fori = 1; 2;:::;n with0 = 0:4. For each experiment, (i;i) 0 are generated dierently across replications. The FDAC estimator is calculated by (4.23), and its asymptotic variance is estimated by the Delta method. “FDLS" denotes the rst-dierence least square estimator proposed by Han and Phillips (2010). “AH", “AAH", “AB", and “BB" denote the 2-step GMM estimators proposed by Anderson and Hsiao (1981, 1982), Chudik and Pesaran (2021), Arellano and Bond (1991), and Blundell and Bond (1998). The estimation is based onfyi1;yi2;:::;yiTg fori = 1; 2;:::;n. The nominal size of the tests is set to 5 per cent. The number of replications is 2; 000. 129 Table 4.2: Bias, RMSE, and size of FDAC, FDLS, AH, AAH, AB, and BB estimators in a heterogeneous panel AR(1) model with uniformly distributed autoregressive coecients, i = +v i , = 0:4, andv i IIDU(0:3; 0:3), Gaussian errors, and GARCH eects Bias RMSE Size (100) T n FDAC FDLS AH AAH AB BB FDAC FDLS AH AAH AB BB FDAC FDLS AH AAH AB BB 4 100 -0.001 -0.012 -0.035 0.064 -0.074 0.007 0.192 0.188 9.664 0.304 0.339 0.180 8.6 10.2 9.8 14.8 14.0 19.8 4 1,000 -0.001 -0.021 -0.051 0.021 -0.030 -0.013 0.064 0.064 0.203 0.182 0.102 0.061 5.3 6.6 9.8 16.0 9.0 7.4 4 5,000 0.000 -0.022 -0.068 -0.024 -0.025 -0.017 0.029 0.035 0.108 0.062 0.049 0.032 5.2 12.2 17.8 14.5 10.5 10.5 6 100 0.000 -0.016 -0.101 0.011 -0.084 0.006 0.123 0.135 0.235 0.153 0.184 0.115 7.5 8.8 22.7 23.9 26.0 30.1 6 1,000 -0.002 -0.022 -0.054 -0.010 -0.030 -0.007 0.040 0.048 0.087 0.038 0.061 0.036 4.3 8.3 17.0 8.0 12.3 9.6 6 5,000 0.000 -0.021 -0.048 -0.010 -0.022 -0.006 0.019 0.029 0.057 0.018 0.032 0.017 5.3 19.4 36.4 8.9 17.2 7.8 10 100 0.002 -0.018 -0.063 -0.002 -0.056 0.001 0.086 0.099 0.117 0.088 0.107 0.078 8.3 7.8 45.0 52.1 49.0 58.4 10 1,000 -0.001 -0.022 -0.028 0.001 -0.019 0.003 0.028 0.038 0.044 0.023 0.035 0.022 5.6 11.7 23.0 8.7 17.0 12.3 10 5,000 0.000 -0.022 -0.024 0.003 -0.013 0.006 0.013 0.026 0.029 0.010 0.019 0.011 5.3 34.8 40.1 4.3 19.1 10.7 Notes: The DGP is given byyit = i(1i) +iyi;t1 +uit fori = 1; 2;:::;n, andt =50;49;:::;T withi = i=(1i) featuring Gaussian distributed errors with GARCH eects. The heterogeneous AR(1) coecients are generated as case (a):i = +vi with = 0:4,viIIDU(a;a), anda = 0:3. The FDAC estimator is calculated based on (4.23), and its asymptotic variance is estimated by the Delta method. “FDLS" denotes the rst-dierence least square estimator proposed by Han and Phillips (2010). “AH", “AAH", “AB", and “BB" denote the 2-step GMM estimators proposed by Anderson and Hsiao (1981, 1982), Chudik and Pesaran (2021), Arellano and Bond (1991), and Blundell and Bond (1998). The estimation is based onfyi1;yi2;:::;yiTg fori = 1; 2;:::;n. The nominal size of the tests is set to 5 per cent. The number of replications is 2; 000. See also the notes to Table 4.1. 130 Table 4.3: Bias, RMSE, and size of FDAC, FDLS, AH, AAH, AB, and BB estimators in a heterogeneous panel AR(1) model with uniformly distributed autoregressive coecients, i = +v i , = 0:4, andv i IIDU(0:5; 0:5), Gaussian errors, and GARCH eects Bias RMSE Size (100) T n FDAC FDLS AH AAH AB BB FDAC FDLS AH AAH AB BB FDAC FDLS AH AAH AB BB 4 100 0.000 -0.048 -0.092 0.013 -0.162 0.012 0.201 0.200 3.690 0.287 0.423 0.219 8.2 10.3 14.5 18.4 19.5 26.7 4 1,000 -0.001 -0.061 -0.172 -0.053 -0.112 -0.047 0.067 0.088 0.245 0.135 0.170 0.083 5.2 17.0 28.4 23.0 23.2 15.8 4 5,000 0.000 -0.062 -0.185 -0.072 -0.104 -0.056 0.031 0.068 0.200 0.078 0.118 0.064 5.7 59.4 70.2 62.9 51.2 47.5 6 100 0.001 -0.053 -0.185 -0.017 -0.167 0.011 0.133 0.151 0.286 0.146 0.253 0.144 8.0 10.9 34.7 23.4 39.0 39.8 6 1,000 -0.002 -0.062 -0.143 -0.031 -0.112 -0.029 0.043 0.077 0.160 0.048 0.130 0.049 4.6 29.1 58.9 14.8 50.6 19.4 6 5,000 -0.001 -0.061 -0.138 -0.031 -0.102 -0.027 0.020 0.065 0.142 0.035 0.107 0.032 5.5 86.3 98.9 39.2 94.0 36.8 10 100 0.002 -0.056 -0.121 -0.028 -0.110 0.010 0.095 0.119 0.166 0.101 0.155 0.104 8.5 12.2 60.9 52.3 63.3 66.1 10 1,000 -0.001 -0.062 -0.091 -0.021 -0.079 -0.010 0.031 0.070 0.099 0.036 0.088 0.030 5.3 45.6 75.7 16.3 71.1 17.8 10 5,000 0.000 -0.062 -0.087 -0.019 -0.072 -0.003 0.014 0.064 0.089 0.023 0.074 0.014 5.4 98.4 99.8 29.4 99.0 11.6 Notes: The DGP is given byyit = i(1i) +iyi;t1 +uit fori = 1; 2;:::;n, andt =50;49;:::;T withi = i=(1i) featuring Gaussian distributed errors with GARCH eects. The heterogeneous AR(1) coecients are generated as case (a): i = +vi andvi IIDU(a;a) with = 0:4 anda = 0:5. For each experiment, (i;i;i) 0 are generated dierently across replications. The FDAC estimator is calculated by (4.23), and its asymptotic variance is estimated by the Delta method. “FDLS" denotes the rst-dierence least square estimator proposed by Han and Phillips (2010). “AH", “AAH", “AB", and “BB" denote the 2-step GMM estimators proposed by Anderson and Hsiao (1981, 1982), Chudik and Pesaran (2021), Arellano and Bond (1991), and Blundell and Bond (1998). The estimation is based onfyi1;yi2;:::;yiTg fori = 1; 2;:::;n. The nominal size of the tests is set to 5 per cent. The number of replications is 2; 000. See also the notes to Table 4.1. 131 Figure 4.1: Empirical power functions for FDAC estimator in homogeneous and heterogeneous (a=0.5) panel AR(1) models where errors are Gaussian and non-Gaussian distributed with GARCH eects 132 4.7.4 ComparisonofFDACandMSWestimators This section compares the small-sample performance of the FDAC estimator with the kernel-weighted likelihood (MSW) estimator proposed by Mavroeidis et al. (2015). Table 4.4 reports bias, RMSE, and size of the FDAC and MSW estimators forE( i ) forT = 4; 6; 10, andn = 100; 1000, under heterogeneous ( i ; i ) with = 0:4, and Gaussian errors with GARCH eects. The left panel of the table reports the results fora = 0:3, and the right panel fora = 0:5. The performance of the FDAC estimator is in line with the ones already discussed and as noted earlier is not aected by the degree of heterogeneity. In contrast, the performance of the MSW estimator seems to depend critically on the degree of heterogeneity. When a = 0:3, andT = 4 or 8, the MSW estimator performs better than the FDAC estimator in terms of bias and RMSE, but exhibits mild size distortions whenn = 100 andT = 6 or 10. However, the MSW estimator breaks down if we consider the results fora = 0:5. For example, the RMSE of the MSW estimator for T = 4 andn = 1; 000 rises from 0:030 whena = 0:3 to 0:246 whena = 0:5. The size of the MSW estimator also rises from 10:7 per cent to 99:7 per cent as a is increased from 0:3 to 0:5, for the same sample sizes. 9 9 The performance of the MSW estimator under homogeneity is investigated in Appendix C, with the results summarized in Table C.6. Compared with the FDAC estimator, the MSW estimator has much greater bias, higher RMSE, and noticeable size distortions for most of the sample sizes, and for the values of we consider. 133 Table 4.4: Bias, RMSE, and size of FDAC and MSW estimators in a heterogeneous panel AR(1) model where i = +v i andv i IIDU(a;a) with = 0:4,a2f0:3; 0:5g, and Gaussian errors with GARCH eects Medium degree of heterogeneitya = 0:3 High degree of heterogeneitya = 0:5 Bias RMSE Size (100) Bias RMSE Size (100) T n FDAC MSW FDAC MSW FDAC MSW FDAC MSW FDAC MSW FDAC MSW 4 100 -0.005 0.003 0.165 0.084 6.8 6.8 -0.005 0.248 0.168 0.300 7.3 34.4 4 1,000 0.001 0.007 0.050 0.030 3.8 10.7 0.001 0.240 0.052 0.246 4.4 99.7 6 100 0.002 0.012 0.099 0.084 6.0 4.4 0.004 0.260 0.106 0.310 5.8 36.2 6 1,000 0.000 0.023 0.031 0.036 5.1 15.5 0.000 0.261 0.034 0.266 4.9 100.0 10 100 0.000 0.020 0.070 0.089 6.4 4.7 0.001 0.265 0.078 0.315 6.5 37.4 10 1,000 0.001 0.024 0.022 0.038 4.3 15.2 0.001 0.263 0.024 0.269 4.5 100.0 Notes: The DGP is given byyit =i(1i)+iyi;t1 +uit fori = 1; 2;:::;n, andt =50;49;:::;T withi =i=(1i) featuring Gaussian errors with GARCH eects. The heterogeneous AR(1) coecients are generated as case (a):i = +vi and viIIDU(a;a) with = 0:4 anda2f0:3; 0:5g. The FDAC estimator is calculated by (4.23), and its asymptotic variance is estimated by the Delta method. “MSW" denotes the kernel-weighting likelihood estimator proposed by Mavroeidis et al. (2015) and calculated based on the assumption that (i;i)jyi1 follows a multivariate normal distributionN(;V ) with initial values given by = (5; 0:5), = 2, = 0:4,corr(i;i) = 0:5 withu = 0:5. The nominal size of the tests is set to 5 per cent. Due to the extensive computations required for the implementation of the MSW estimator the number of replications is set to 1; 000. See also the notes to Table 4.1. 4.7.5 Initilizations,FDACandHomoGMMestimators One of the key assumptions behind the FDAC estimator is the stationarity offy it g. This assumption re- quires that the initial valuesfy i0 g are drawn from the steady state distribution of the underlying processes, which is given by y i0 = i +v i ; andv i sIID 0; 2 i 1 2 i ; (4.40) where i = i =(1 i ). As shown in Section C.3 of Appendix C, the moment conditions (4.5) and (4.6) can also be derived under the above initial distribution, but need not hold if the distribution ofy i0 departs from the steady state distribution. The same is also true for some of the HomoGMM estimators. It is, therefore, of interest to investigate the sensitivity of the FDAC and HomoGMM estimators to departures from the steady state distribution in (4.40). Here we model this departure by assuming thatfy it g starts 134 fromM periods before the rst observation,y i1 , used in the estimation process. Assuming that alln AR(1) processes are generated fromy i;M (jy i;M j<C,j i j< 1) we have 10 y i0 = i (1 M i ) + M i y i;M + M1 X s=0 s i u i;s ; (4.41) It is clear that asM!1, the distribution ofy i0 converges to the steady state distribution given by (4.40). This suggests that departures from the steady state distribution can be conveniently represented by using relatively small values ofM. To investigate the eects of departures from the steady state distribution we carry out a number of Monte Carlo experiments withM = 1 andM = 2, and settingy i;M = 0 for both values ofM. The choice ofy i;M is not consequential whenM!1, but matters whenM is small. Tables 4.5, 4.6, and 4.7 summarize bias, RMSE, and sizes of the FDAC, FDLS, AH, AAH, AB, and BB estimators for homogeneous and heterogeneous panels withM = 1; 2 andM = 51 (denoted by “1") and Gaussian errors, where i = +v i andv i IIDU(a;a) with = 0:4 anda2f0; 0:3; 0:5g; respectively. In the homogeneous case (a = 0), whenM = 1, the biases of FDAC, FDLS, and BB estimators are sizeable and do not vanish asn increases, and the size distortions also increase withn for a givenT . This is particularly noticeable for the BB estimator. The AH, AAH, and AB estimators are robust to the initialization and have very similar performance across dierent values of M: This is in line with the underlying theory of these estimators. In the case of heterogeneous panels, the FDAC estimator is adversely aected when M = 1 or 2, and display bias and size distortions, particularly when there is a high degree of heterogeneity (namely a = 0:5). But, as to be expected, the bias and size distortions disappear as M is increased. For the 10 One can also allow for heterogeneity in the way dierent processes are initialized, for example, by startingyit fromMi periods in the past, withMi drawn randomly for eachi from the set of integers (1 to 50). 135 HomoGMM estimators, the neglected heterogeneity bias is much greater and the size distortions are much more serious whenM = 1 or 2 as compared to the homogeneous case, shown asM =1. 11 It is clearly a challenge to simultaneously deal with heterogeneity of i and the non-stationarity that arises wheny i0 are not drawn from the steady state distribution. 11 Table C.7 in Appendix C reports the simulation results comparing the FDAC and MSW estimators in the case of non- stationary outcome processes (M = 1; 2; 51), and conrm a similar patter with the performance of the MSW estimator deterio- rating as we move from stationary to non-stationary processes. 136 Table 4.5: Bias, RMSE, and size of FDAC, FDLS, AH, AAH, AB, and BB estimators of ( 0 = 0:4) in a homogeneous panel AR(1) model under non-stationary (M = 1; 2) and stationary (M =1) initialization FDAC FDLS AH AAH AB BB T n=M 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 Bias 4 100 0.016 0.004 0.002 0.012 0.003 0.001 1.709 -0.034 0.202 0.100 0.098 0.100 -0.050 -0.036 -0.030 0.151 0.061 0.015 4 1,000 0.015 0.003 0.001 0.010 0.001 -0.001 0.009 0.008 0.008 0.050 0.052 0.051 -0.005 -0.004 -0.003 0.165 0.061 0.002 4 5,000 0.015 0.003 0.000 0.011 0.003 0.001 0.004 0.003 0.003 0.003 0.004 0.004 0.000 0.000 0.000 0.169 0.063 0.001 6 100 0.008 0.001 0.000 0.008 0.003 0.002 -0.049 -0.049 -0.049 0.023 0.022 0.022 -0.054 -0.046 -0.041 0.091 0.036 0.012 6 1,000 0.006 0.000 -0.001 0.005 0.001 0.000 -0.004 -0.004 -0.004 0.000 0.000 0.000 -0.006 -0.005 -0.004 0.094 0.027 0.001 6 5,000 0.008 0.002 0.000 0.006 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -0.001 0.000 0.000 0.096 0.027 0.001 10 100 0.005 0.002 0.001 0.004 0.002 0.001 -0.033 -0.033 -0.033 0.007 0.006 0.006 -0.035 -0.033 -0.031 0.034 0.015 0.007 10 1,000 0.003 0.000 -0.001 0.003 0.001 0.000 -0.004 -0.004 -0.004 0.000 0.000 0.000 -0.004 -0.004 -0.003 0.033 0.009 0.001 10 5,000 0.004 0.001 0.000 0.003 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.034 0.009 0.000 RMSE 4 100 0.166 0.167 0.167 0.147 0.146 0.146 53.877 10.797 4.182 0.308 0.306 0.308 0.340 0.296 0.269 0.207 0.161 0.145 4 1,000 0.056 0.054 0.054 0.048 0.047 0.047 0.160 0.158 0.157 0.185 0.189 0.186 0.102 0.087 0.079 0.172 0.080 0.048 4 5,000 0.028 0.024 0.024 0.024 0.021 0.021 0.069 0.068 0.068 0.050 0.051 0.051 0.044 0.038 0.034 0.170 0.067 0.022 6 100 0.099 0.099 0.099 0.101 0.100 0.100 0.177 0.175 0.174 0.143 0.142 0.142 0.157 0.144 0.136 0.141 0.105 0.094 6 1,000 0.033 0.032 0.032 0.033 0.033 0.032 0.056 0.055 0.055 0.029 0.029 0.029 0.047 0.044 0.041 0.100 0.041 0.028 6 5,000 0.016 0.014 0.014 0.016 0.015 0.015 0.025 0.025 0.025 0.013 0.013 0.013 0.021 0.020 0.019 0.097 0.031 0.013 10 100 0.066 0.066 0.066 0.073 0.073 0.073 0.089 0.088 0.088 0.067 0.066 0.066 0.087 0.084 0.081 0.073 0.062 0.059 10 1,000 0.021 0.021 0.021 0.024 0.023 0.023 0.026 0.026 0.026 0.016 0.016 0.016 0.024 0.023 0.022 0.038 0.019 0.016 10 5,000 0.010 0.009 0.009 0.011 0.010 0.010 0.011 0.011 0.011 0.007 0.007 0.007 0.010 0.010 0.010 0.035 0.012 0.007 Size (100) 4 100 7.3 7.3 7.4 7.0 6.8 6.6 7.1 6.9 6.9 15.7 15.6 15.7 10.7 9.8 9.2 44.0 22.5 14.2 4 1,000 6.7 5.9 5.4 5.6 5.0 5.0 4.8 5.0 4.7 11.5 12.0 11.7 5.8 6.0 6.2 96.4 32.6 6.0 4 5,000 9.0 5.0 4.8 8.6 5.2 4.6 4.3 4.4 4.5 5.7 5.6 5.7 5.0 4.8 4.8 100.0 82.5 5.3 6 100 6.6 6.3 6.4 6.3 6.4 6.4 13.8 13.7 13.7 20.8 20.6 20.5 17.7 17.1 16.9 45.9 28.7 24.1 6 1,000 5.6 5.3 5.2 5.1 5.4 5.7 6.2 6.2 6.0 7.1 7.2 7.1 6.0 6.3 6.3 89.7 22.8 7.6 6 5,000 8.6 5.1 4.9 7.2 5.8 5.8 5.7 5.7 5.8 5.4 5.4 5.8 5.0 4.9 4.8 100.0 60.2 5.7 10 100 6.3 5.7 5.8 6.9 6.6 6.8 33.1 32.9 33.2 44.9 44.6 44.5 40.2 39.5 39.7 54.4 47.8 46.7 10 1,000 5.7 5.0 5.1 6.1 5.9 5.9 7.8 7.6 7.6 9.4 9.6 9.4 8.2 8.1 7.9 60.1 15.0 8.9 10 5,000 6.6 4.8 4.6 5.2 4.6 4.8 5.0 5.0 5.1 5.9 5.8 5.8 5.1 5.1 5.0 99.3 27.0 5.4 Notes: The DGP is given byyit =i(1i) +iyi;t1 +uit fori = 1; 2;:::;n, andt =50;49;:::;T with Gaussian errors, wherei = fori = 1; 2;:::;n with0 = 0:4. The initial values are generated asyi0IIDN i(1 M i ); 2 i (1 2M i )=(1 2 i ) withM = 1; 2 for the non-stationary case andM = 51 for the stationary case (denoted by “1"). “FDLS" denotes the rst-dierence least square estimator proposed by Han and Phillips (2010). “AH", “AAH", “AB", and “BB" denote the 2-step GMM estimators proposed by Anderson and Hsiao (1981, 1982), Chudik and Pesaran (2021), Arellano and Bond (1991), and Blundell and Bond (1998). The number of replications is 2; 000. See also the notes to Table 4.1. 137 Table 4.6: Bias, RMSE, and size of FDAC, FDLS, AH, AAH, AB, and BB estimators in a heterogeneous panel AR(1) model under non-stationary (M = 1; 2) and stationary (M =1) initialization with i = +v i , = 0:4, andv i IIDU(0:3; 0:3) FDAC FDLS AH AAH AB BB T n=M 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 Bias 4 100 0.081 0.034 0.001 0.034 0.002 -0.019 1.243 -0.180 -0.047 0.075 0.069 0.063 -0.211 -0.156 -0.069 0.243 0.137 0.005 4 1,000 0.078 0.032 0.001 0.029 -0.002 -0.022 -0.086 -0.072 -0.062 0.007 0.003 -0.001 -0.159 -0.111 -0.032 0.271 0.157 -0.015 4 5,000 0.077 0.031 0.000 0.030 -0.001 -0.021 -0.091 -0.076 -0.067 -0.008 -0.019 -0.027 -0.152 -0.105 -0.027 0.274 0.161 -0.018 6 100 0.045 0.018 -0.001 0.011 -0.007 -0.019 -0.121 -0.105 -0.094 0.020 0.011 0.006 -0.151 -0.126 -0.077 0.158 0.079 0.003 6 1,000 0.044 0.017 -0.001 0.008 -0.010 -0.021 -0.071 -0.060 -0.052 0.007 -0.003 -0.010 -0.094 -0.074 -0.031 0.177 0.077 -0.007 6 5,000 0.045 0.018 0.000 0.008 -0.010 -0.021 -0.068 -0.056 -0.049 0.007 -0.003 -0.010 -0.088 -0.069 -0.026 0.182 0.079 -0.007 10 100 0.026 0.011 0.002 -0.004 -0.014 -0.020 -0.072 -0.062 -0.055 0.007 0.002 -0.001 -0.082 -0.072 -0.050 0.058 0.029 0.004 10 1,000 0.023 0.009 0.000 -0.006 -0.015 -0.021 -0.040 -0.033 -0.028 0.010 0.005 0.001 -0.047 -0.040 -0.020 0.057 0.027 0.003 10 5,000 0.024 0.009 0.000 -0.006 -0.016 -0.022 -0.037 -0.030 -0.025 0.011 0.006 0.002 -0.043 -0.036 -0.016 0.061 0.030 0.005 RMSE 4 100 0.187 0.173 0.171 0.156 0.151 0.151 52.302 10.484 4.483 0.280 0.286 0.290 0.464 0.389 0.293 0.284 0.216 0.157 4 1,000 0.095 0.063 0.055 0.057 0.049 0.053 0.183 0.169 0.160 0.108 0.126 0.136 0.200 0.154 0.092 0.274 0.167 0.053 4 5,000 0.081 0.040 0.025 0.037 0.022 0.030 0.115 0.101 0.093 0.027 0.032 0.038 0.161 0.115 0.047 0.275 0.163 0.029 6 100 0.114 0.106 0.104 0.107 0.105 0.106 0.221 0.204 0.194 0.128 0.130 0.135 0.230 0.203 0.161 0.206 0.147 0.103 6 1,000 0.056 0.038 0.034 0.035 0.035 0.040 0.094 0.083 0.076 0.029 0.029 0.031 0.109 0.090 0.055 0.183 0.088 0.030 6 5,000 0.048 0.024 0.015 0.018 0.018 0.026 0.073 0.062 0.055 0.015 0.014 0.017 0.092 0.073 0.033 0.183 0.082 0.015 10 100 0.076 0.072 0.072 0.078 0.079 0.080 0.121 0.113 0.106 0.071 0.071 0.071 0.127 0.117 0.100 0.102 0.081 0.067 10 1,000 0.032 0.024 0.023 0.026 0.029 0.033 0.050 0.044 0.040 0.021 0.019 0.019 0.056 0.049 0.033 0.062 0.035 0.019 10 5,000 0.026 0.014 0.010 0.012 0.019 0.024 0.039 0.033 0.028 0.014 0.010 0.009 0.045 0.038 0.019 0.062 0.032 0.009 Size (100) 4 100 9.3 7.7 7.8 7.9 6.7 7.3 9.8 9.3 9.2 13.5 15.4 16.3 18.3 16.0 11.7 66.9 40.6 16.9 4 1,000 29.3 9.2 5.6 9.8 5.6 7.4 13.8 12.0 10.8 7.0 10.2 12.8 29.8 23.5 9.3 100.0 88.8 7.9 4 5,000 88.5 24.6 4.9 29.6 5.1 16.4 28.2 23.0 20.3 7.0 13.4 20.1 77.4 61.5 12.7 100.0 100.0 13.5 6 100 9.4 6.8 6.3 6.5 6.4 6.0 20.0 18.4 17.8 18.0 18.8 20.3 29.0 27.0 20.8 62.7 39.6 26.4 6 1,000 28.3 8.6 5.0 5.8 6.6 10.7 25.2 20.6 17.7 5.8 6.3 8.2 41.0 33.6 13.4 99.1 69.8 8.4 6 5,000 84.2 23.2 5.7 9.7 10.7 29.6 69.2 57.8 48.6 7.3 5.7 11.9 91.9 82.4 27.3 100.0 99.6 10.0 10 100 8.8 6.6 6.3 6.3 6.9 7.0 42.8 40.4 38.9 40.6 41.1 43.5 52.6 50.0 45.3 60.3 52.6 49.4 10 1,000 17.4 7.0 5.8 6.2 9.8 14.8 34.8 28.6 23.2 9.4 6.8 6.3 45.6 37.6 18.3 81.0 39.8 9.7 10 5,000 64.8 15.4 5.2 7.8 29.5 51.0 80.3 66.8 54.5 20.6 7.6 3.6 91.4 83.2 30.2 100.0 90.9 9.2 Notes: The DGP is given byyit =i(1i) +iyi;t1 +uit fori = 1; 2;:::;n, andt =50;49;:::;T with Gaussian errors, wherei = +vi andviIIDU(a;a) with = 0:4 anda = 0:3. M = 1; 2 denotes the non-stationary case for initial conditions, and “1" denotes the case of stationary initial conditions. The FDAC estimator is calculated based on (4.23), and its asymptotic variance is estimated by the Delta method. “FDLS" denotes the rst-dierence least square estimator proposed by Han and Phillips (2010). “AH", “AAH", “AB", and “BB" denote the 2-step GMM estimators proposed by Anderson and Hsiao (1981, 1982), Chudik and Pesaran (2021), Arellano and Bond (1991), and Blundell and Bond (1998). The number of replications is 2; 000. See also the notes to Table 4.1. 138 Table 4.7: Bias, RMSE, and size of FDAC, FDLS, AH, AAH, AB, and BB estimators in a heterogeneous panel AR(1) model under non-stationary (M = 1; 2) and stationary (M =1) initialization with i = +v i , = 0:4, andv i IIDU(0:5; 0:5) FDAC FDLS AH AAH AB BB T n=M 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 Bias 4 100 0.282 0.195 0.001 0.133 0.069 -0.057 0.136 -0.283 -0.096 0.076 0.047 0.002 -0.201 -0.235 -0.163 0.405 0.331 0.007 4 1,000 0.282 0.195 0.001 0.126 0.062 -0.061 -0.391 -0.302 -0.180 0.038 0.007 -0.064 -0.165 -0.244 -0.123 0.444 0.380 -0.052 4 5,000 0.280 0.193 0.000 0.125 0.062 -0.061 -0.391 -0.304 -0.185 0.034 0.003 -0.073 -0.182 -0.255 -0.115 0.448 0.386 -0.060 6 100 0.210 0.145 -0.001 0.080 0.035 -0.058 -0.378 -0.328 -0.181 0.091 0.055 -0.023 -0.155 -0.187 -0.163 0.326 0.247 0.005 6 1,000 0.210 0.146 -0.001 0.075 0.030 -0.061 -0.336 -0.279 -0.142 0.078 0.046 -0.032 -0.094 -0.144 -0.116 0.391 0.280 -0.031 6 5,000 0.211 0.146 0.000 0.075 0.030 -0.061 -0.332 -0.275 -0.139 0.078 0.046 -0.033 -0.091 -0.141 -0.111 0.403 0.289 -0.030 10 100 0.138 0.096 0.001 0.028 0.000 -0.059 -0.232 -0.224 -0.113 0.044 0.021 -0.025 -0.086 -0.108 -0.107 0.191 0.143 0.012 10 1,000 0.136 0.094 0.000 0.024 -0.003 -0.061 -0.212 -0.204 -0.093 0.028 0.012 -0.022 -0.069 -0.093 -0.082 0.171 0.092 -0.011 10 5,000 0.136 0.094 0.000 0.024 -0.004 -0.062 -0.207 -0.200 -0.090 0.028 0.012 -0.021 -0.064 -0.088 -0.078 0.176 0.090 -0.007 RMSE 4 100 0.337 0.268 0.177 0.222 0.184 0.167 11.466 5.309 1.785 0.233 0.232 0.265 0.686 0.604 0.371 0.429 0.371 0.198 4 1,000 0.288 0.203 0.057 0.138 0.083 0.080 0.430 0.341 0.225 0.066 0.058 0.108 0.321 0.310 0.165 0.446 0.383 0.077 4 5,000 0.282 0.195 0.025 0.128 0.067 0.065 0.398 0.312 0.194 0.041 0.024 0.077 0.222 0.267 0.125 0.448 0.387 0.065 6 100 0.247 0.193 0.112 0.153 0.129 0.125 0.456 0.398 0.252 0.163 0.137 0.129 0.330 0.313 0.231 0.365 0.301 0.130 6 1,000 0.215 0.151 0.037 0.086 0.050 0.071 0.345 0.288 0.153 0.084 0.057 0.045 0.152 0.169 0.128 0.396 0.289 0.045 6 5,000 0.212 0.147 0.016 0.077 0.035 0.064 0.334 0.277 0.142 0.079 0.048 0.036 0.108 0.147 0.114 0.404 0.291 0.033 10 100 0.168 0.132 0.080 0.100 0.092 0.103 0.271 0.260 0.155 0.115 0.097 0.086 0.186 0.181 0.149 0.238 0.194 0.092 10 1,000 0.139 0.098 0.026 0.039 0.029 0.067 0.217 0.208 0.099 0.040 0.030 0.033 0.090 0.104 0.088 0.182 0.104 0.027 10 5,000 0.136 0.095 0.011 0.027 0.014 0.063 0.208 0.201 0.092 0.031 0.017 0.024 0.070 0.091 0.080 0.178 0.093 0.013 Size (100) 4 100 38.8 22.4 7.3 15.9 9.7 8.6 17.0 17.3 15.5 6.9 8.8 18.6 21.9 22.1 18.1 90.3 78.2 24.9 4 1,000 99.7 91.8 5.3 61.6 21.0 22.6 60.7 52.6 34.1 5.1 3.6 28.5 37.5 46.2 28.2 100.0 100.0 20.4 4 5,000 100.0 100.0 4.5 99.9 71.1 76.0 99.5 98.2 85.4 24.0 3.4 80.7 63.9 90.3 70.1 100.0 100.0 66.7 6 100 40.5 25.9 6.7 11.6 7.8 10.0 50.5 48.1 31.6 26.8 19.8 19.9 42.8 42.4 35.8 85.4 74.4 37.1 6 1,000 99.7 94.3 5.1 45.0 12.4 39.9 98.6 96.6 70.8 75.0 36.1 17.8 41.7 58.9 62.4 100.0 99.8 21.3 6 5,000 100.0 100.0 5.4 97.8 39.7 96.6 100.0 100.0 100.0 100.0 88.9 58.1 69.3 97.0 99.3 100.0 100.0 55.0 10 100 33.3 21.4 6.3 8.5 7.3 14.2 76.7 77.9 56.3 48.9 45.2 44.2 60.1 62.5 61.0 83.4 76.4 61.5 10 1,000 99.1 90.5 5.9 13.6 5.3 62.8 99.8 99.8 86.0 28.3 13.8 17.4 54.9 74.5 80.2 98.7 83.9 17.2 10 5,000 100.0 100.0 5.2 42.2 5.8 100.0 100.0 100.0 100.0 70.1 21.8 45.1 88.6 99.7 100.0 100.0 99.9 14.8 Notes: The DGP is given byyit =i(1i) +iyi;t1 +uit fori = 1; 2;:::;n, andt =50;49;:::;T with Gaussian errors, wherei = +vi andviIIDU(a;a) with = 0:4 anda = 0:5. M = 1; 2 denotes the non-stationary case for initial conditions, and “1" denotes the case of stationary initial conditions. The FDAC estimator is calculated based on (4.23), and its asymptotic variance is estimated by the Delta method. “FDLS" denotes the rst-dierence least square estimator proposed by Han and Phillips (2010). “AH", “AAH", “AB", and “BB" denote the 2-step GMM estimators proposed by Anderson and Hsiao (1981, 1982), Chudik and Pesaran (2021), Arellano and Bond (1991), and Blundell and Bond (1998). The number of replications is 2; 000. See also the notes to Table 4.1. 139 4.7.6 Simulationresultsforthecategoricaldistributionparameters Section 4.5 has already shown that assuming i follow a categorical distribution with a nite number of categories, the parameters of the underlying categorical distribution can be identied from the moments of i . Here we investigate the nite sample performance of estimating the parameters in the simple case of two categories, namely L ; H and . Since the procedure for estimating these parameters is based on the rst three moments, then we needT 6. See equation (4.25). Precise estimation of L and H also require quite large samples. Accordingly, we consider the following sample sizes: T = 6; 8; 10; and n = 2; 000; 5; 000; 10; 000; 50; 000. Table 4.8 reports the bias and RMSE of the plug-in estimators given by (4.18) and (4.22), using the FDAC estimators of the moments of i . Compared with the simulation results forE( i ) andVar( i ), the FDAC estimators of L ; H ; and have much larger RMSEs. Also, the magnitude of bias and RMSE of L and H are much larger than those of, particularly whenT = 6. Since the moment conditions of categorical distribution are linear in but nonlinear in L and H , the nonlinearity plays a crucial role here. Given equation (4.22), the solutions of L and H are ratios of functions of moments. The denominator is given by the variance of i , which could be close to zero in nite samples, and thus, minor estimation errors in the denominator could have large adverse eects on the precision with which L and H can be estimated. 12 RMSEs of the FDAC estimators of L ; H ; and decline rapidly withn for a givenT . For example, RMSE of H declines from 0:217 forT = 8 andn = 2; 000; to 0:068 forT = 8 andn = 10; 000. Similar results are obtained by Gao and Pesaran (2023) in the case of pure cross-sectional regressions where it is also shown that relatively large values ofn are required for precise estimation of the parameters of the categorical distribution. 12 See also Remark 15 on p. 26 in Gao and Pesaran (2023). 140 Table 4.8: Bias and RMSE of FDAC estimator of categorical distribution parameters ( L ; H ;) 0 with Gaussian errors and GARCH eects i = 0:4+v i ,v i IIDU(0:3;0:3) i = 0:4+v i ,v i IIDU(0:5;0:5) Categorical distributed i L = 0:11 H = 0:69 = 0:5 L = 0:11 H = 0:69 = 0:5 L = 0:2 H = 0:8 = 0:3 T n Bias RMSE Bias RMSE Bias RMSE Bias RMSE Bias RMSE Bias RMSE Bias RMSE Bias RMSE Bias RMSE 6 2,000 -0.108 0.663 0.225 2.782 0.019 0.309 0.434 15.414 0.710 17.534 0.021 0.442 -0.373 5.974 0.321 2.476 0.103 0.336 6 5,000 -0.036 0.193 0.073 0.264 0.016 0.239 0.034 9.148 1.786 58.043 0.020 0.405 -0.042 0.271 0.084 0.678 0.068 0.256 6 10,000 -0.017 0.129 0.035 0.145 0.012 0.188 -0.216 1.438 0.405 2.856 0.015 0.375 -0.018 0.183 0.031 0.119 0.043 0.191 6 50,000 -0.006 0.058 0.006 0.054 0.001 0.094 -0.047 0.163 0.061 0.211 0.003 0.277 -0.004 0.084 0.005 0.035 0.010 0.084 8 2,000 -0.041 0.216 0.089 0.784 0.020 0.252 -0.460 7.728 3.726 166.514 0.023 0.412 -0.067 0.678 0.130 1.028 0.079 0.272 8 5,000 -0.012 0.120 0.033 0.134 0.015 0.179 -0.099 1.337 0.296 4.758 0.023 0.367 -0.010 0.172 0.029 0.104 0.045 0.185 8 10,000 -0.005 0.085 0.017 0.084 0.011 0.134 -0.077 0.267 0.188 0.864 0.027 0.329 -0.007 0.121 0.012 0.061 0.022 0.130 8 50,000 -0.002 0.039 0.003 0.035 0.002 0.063 -0.018 0.100 0.030 0.115 0.011 0.215 0.000 0.055 0.002 0.021 0.006 0.054 10 2,000 -0.023 0.208 0.054 0.235 0.016 0.214 -0.280 5.066 0.211 9.146 0.021 0.396 -0.021 0.225 0.053 0.218 0.065 0.233 10 5,000 -0.010 0.098 0.019 0.098 0.009 0.150 -0.104 0.318 0.374 5.242 0.017 0.344 -0.006 0.138 0.017 0.068 0.031 0.147 10 10,000 -0.004 0.068 0.010 0.064 0.006 0.110 -0.052 0.193 0.100 0.314 0.019 0.299 -0.005 0.099 0.008 0.044 0.015 0.103 10 50,000 -0.001 0.031 0.002 0.027 0.002 0.050 -0.011 0.077 0.020 0.082 0.012 0.182 -0.001 0.045 0.002 0.017 0.003 0.044 Notes: The DGP is given byyit =i(1i) +iyi;t1 +uit fori = 1; 2;:::;n, andt =50;49;:::;T withi =i=(1i) featuring Gaussian errors with GARCH eects. The heterogeneous AR(1) coecients are generated by case (a): uniform distributioni = +vi andvi IIDU(a;a) with = 0:4 anda2f0:3; 0:5g, and case (b): categorical distribution Pr(i =L) = and Pr(i =H ) = 1 with = 0:3,L = 0:2, andH = 0:8 such thatE(i) = 0:62. See Section S.2.1 for the description of the DGP. The FDAC estimator is calculated by (4.18) and (4.22). The number of replications is 2,000. 141 4.8 Empiricalapplication: heterogeneityinearningsdynamics 4.8.1 Literaturereviewofestimationofearningsdynamics Estimating earnings equations is crucial for answering some of the most important economic questions. 13 Variance of earnings has been modeled and decomposed to measure income uncertainties in Lillard and Weiss (1979), MaCurdy (1982), Carroll and Samwick (1997), Meghir and Pistaferri (2004), Altonji et al. (2013) and to quantify earnings mobility in Lillard and Willis (1978) and Geweke and Keane (2000). The covariance structures between earnings and other households’ characteristics, for example, work hours, consumptions, and savings have been studied by Abowd and Card (1989) Hubbard et al. (1995), Guvenen (2007), and Alan et al. (2018). Among these studies, a homogeneous AR or ARMA process is often used as a component when mod- eling innovations in earnings processes. Based on the Restricted Income Proles model that assumes ho- mogenous linear trends proposed in MaCurdy (1982), MaCurdy (1982) and Hubbard et al. (1995), obtained close to unit root estimates for the AR(1) coecient, ranging from 0.946 to 0.998. 14 Following this litera- ture, a unit root assumption was imposed in Carroll and Samwick (1997) and Meghir and Pistaferri (2004). On the other hand, using the Heterogeneous Income Proles, by assuming unit-specic linear trends, Lil- lard and Weiss (1979) obtained estimates of the AR(1) coecient (assumed to be homogeneous) ranging from 0.153 to 0.860 for a sample with PhD degrees. Guvenen (2009) using PSID data obtained estimates ranging from 0.809 to 0.899. 15 There are also a number of studies that allow for heterogeneity in the AR(1) coecients. Prominent examples, are Browning et al. (2010), Alan et al. (2018), Browning et al. (2010), and Gu and Koenker (2017). 13 See p. 58 in Guvenen (2009) for a brief summary of several economic inquiries hinging on the estimation of earnings functions. 14 See Table 5 on p. 111 in MaCurdy (1982) using an ARMA(1,1) process. See Table 2 on p. 380 in Hubbard et al. (1995) based on an AR(1) process. 15 See Tables 2, 4, 6 and 7 in Lillard and Weiss (1979), Table 1 on p. 64 in Guvenen (2009), and the abstract of Gu and Koenker (2017). 142 These studies are typically based on panels with a moderate time dimension and make parametric assump- tions regarding the distribution of the AR(1) coecients; often using a Bayesian framework. 16 The applica- tion of the FDAC estimator to earnings equation allows for heterogeneity in the AR(1) coecients without making any strong parametric assumptions, even whenT is as small as 5. Also because of rst-dierencing prior to estimation, the FDAC estimator is robust to observed individual-specic characteristics and is not subject to mis-specication bias that could arise when log real wages are ltered for individual-specic characteristics before investigating the dynamics of the earnings process. 4.8.2 AheterogeneouspanelAR(1)modelofearningsdynamicswithlineartrends We consider estimating the earnings equation with xed eects, heterogeneous autoregressive coecients, without imposing any restrictions on the joint distributions of i , i , andy i0 . However, to accommodate growth in real earnings we extend our baseline model in (4.1) to allow for linear trends: y it = i +g i (1 i )t + i y i;t1 +u it ; (4.42) where y it = log(earnings it =p t ); earnings it is the reported earnings of individual i in year t, p t is a general price, g i denotes the growth rate of real earnings for individual i. The above equation can be written equivalently as ~ y it (g i ) =b i + i ~ y i;t1 (g i ) +u it ; where ~ y it (g i ) =y it g i t, andb i = i g i i . The steady state distribution ofy it can now be derived using y it =b i +g i t + 1 X s=0 s i u i;ts ; (4.43) 16 See pp. 227–232 in Browning and Ejrnæs (2013) for a comprehensive survey of heterogeneity in parameters of earnings functions. 143 When T is suciently large individual-specic growth rates, g i , can be estimated p T -consistently by running individual least squares regressions ofy it on an intercept and a linear trend, and then using the residuals from these regressions to estimate the moments of i . This approach requiresn andT to be both large. In the case of the present empirical application whereT is short (5 or 10), we provide estimates of the moments of i assuming thatg i =g for individuals within a given group, but allowg to dier across groups, classied by the educational attainment levels. p n-consistent estimators of g can be obtained either from the pooled regression ofy it on xed eects and a common linear trend, namely ^ g FE = P T t=1 ( y t y )t P T t=1 t (T +1) 2 2 ; (4.44) with y 0t = n 1 P n i=1 y it and y = T 1 n 1 P n i=1 P T t=1 y it , or after rst-dierencing of (4.43) (with g i =g) by ^ g FD = P T t=2 P n i=1 y it n(T 1) : (4.45) For small values ofT there is little to choose between these two estimators, and they are identical when T = 2. Given either of the above estimators, generically denoted by ^ g, ~ y it (^ g) =y it ^ gt can now be used to estimate the moments of i using the FDAC or MSW procedures. 17 In addition to the FDAC estimates, we also present estimates based on four estimation methods assum- ing homogeneous slope coecients, namely AAH, AB, and BB estimators proposed by Chudik and Pesaran (2021), Arellano and Bond (1991), and Blundell and Bond (1998), and the MSW estimator of Mavroeidis et al. (2015). Following Meghir and Pistaferri (2004), individuals in each time series sample are divided into three education categories, where “HSD" refers to high school dropouts with less than 12 years of educa- tion, “HSG" refers to high school graduates with at least 12 but less than 16 years of education, and “CLG" 17 Consistent estimation ofE(i) in the presence of heterogeneity in bothi andgi requires moderate to large values ofT . The approach used in the empirical literature wherebyyit are rst de-meaned and de-trended for eachi prior to the estimation ofE(i) is subject to Nickell (1981) bias in the case of shortT panels, even ifE(i) =. 144 refers to college graduates with at least 16 years of education. 18 To allow for possible time variations in the estimates of mean earnings persistence we provide estimates forfive andten yearly non-overlapping sub-periods. The ve yearly samples are 1976–1980, 1981–1985, 1986–1990, and 1991–1995. The ten yearly samples are 1976–1985, 1981–1990 and 1991–1995. For each sub-period, we provide estimates for all cat- egories combined, as well as separate estimates for the three educational sub-categories. 19 To save space, the results for the last ve and ten yearly samples are given in the paper. The estimates for the earlier sub-periods are provided in Appendix C. Table 4.9 gives the estimates of mean earnings persistence,E( i ); and the common linear trend coe- cient,g, for the sub-periods 1991–1995 (T = 5) and 1986–1995 (T = 10). The estimates ofg is on average around 2 per cent per annum with some modest variations across the sub-samples and educational cate- gories. The HomoGMM estimates (AAH, AB and BB) dier a great deal, both over sub-periods and across educational categories. The AAH estimates are all around 0.50 and show little variations across the two sub-period and the educational categories. The AB estimates tend to be quite low and are not statistically signicant for two of the educational categories in the shorter sub-period (T = 5). In contrast, the BB estimates are much larger and in many instances are close to unity. For example, for the sub-period 1986- 1995 (T = 10), the BB estimates of earnings persistence for the three educational categories HSD, HSG and CLG are 0.923 (0.003), 0.914 (0.003) and 0.992 (0.004), respectively, with standard errors in brackets. We also nd sizeable dierences in the estimates of mean earnings persistence when we consider the FDAC and MSW estimators. The MSW estimates are all around 0.45 and do not vary with the level of educational attainment. In contrast, the FDAC estimates are somewhat larger (lie in the range of 0.570- 0.734) and rise with the level of educational attainment. This pattern can be seen in both sub-periods. For example, for the longer sub-period (1986-1995), the mean persistence for HSD, HSG and CLG categories 18 The sample for all individuals in both 5 and 10 yearly samples covered 3; 113 individuals with consecutive observations of nine years or more, and 36,325 individual-year observations. 19 From 1997 PSID data are updated every two years. We conne our analysis to the years 1976 to 1995 to construct panels with 5 and 10 consecutive years. 145 are estimated to be 0.580 (0.071), 0.611 (0.028) and 0.735 (0.040), respectively. Similar results are obtained for the other sub-periods. See Tables 4.9, C.20, and C.21 of Appendix C. It is interesting that the higher earnings persistence of the college graduate category is a prominent feature of the FDAC estimates for all sub-periods. This result is also in line with a number of theoretical arguments advanced in the literature in terms of the higher mobility of college graduates and their relative job stability. For example, see Carroll and Samwick (1997) and Carneiro et al. (2023). Although we have not developed a formal statistical test of the heterogeneity i , the estimates of Var( i ) provide a good indication of the degree of within-group heterogeneity. Estimates of Var( i ) based on MSW and FDAC procedures for the various sub-periods are given in Tables C.22–C.24 of Ap- pendix C. The FDAC estimates are much larger than the MSW estimates. For example, for the sub-period 1991–1995 the MSW estimates ofVar( i ) are all around 0.011 with standard errors in the range of 0.004– 009, whilst the FDAC estimates of Var( i ) for the same sub-period are 0.241(0.100), 0.081(0.054) and 0.091(0.09) for the three educational categories of HSD, HSG and CLG, respectively. The degree of within- group heterogeneity also seems to vary over time. For example, for the longer sub-period (1986-1995), the FDAC estimates ofVar( i ) are generally larger with lower standard errors for all educational categories. 146 Table 4.9: Estimates of mean persistence ( =E( i )) of log real earnings in a panel AR(1) model with a common linear trend using PSID data over the sub-periods 1991–1995 and 1986–1995 1991–1995,T = 5 1986–1995,T = 10 All Category by education All Category by education categories HSD HSG CLG categories HSD HSG CLG Homogeneous slopes AAH 0.526 0.490 0.547 0.447 0.546 0.569 0.535 0.522 (0.046) (0.072) (0.061) (0.072) (0.028) (0.024) (0.033) (0.038) AB 0.278 0.105 0.320 -0.013 0.311 0.310 0.335 0.232 (0.081) (0.147) (0.097) (0.133) (0.039) (0.045) (0.044) (0.070) BB 0.488 0.872 0.602 0.964 0.880 0.923 0.914 0.992 (0.059) (0.031) (0.042) (0.074) (0.004) (0.003) (0.003) (0.004) Heterogeneous slopes FDAC 0.586 0.582 0.567 0.635 0.636 0.580 0.611 0.734 (0.042) (0.132) (0.056) (0.065) (0.023) (0.071) (0.028) (0.040) MSW 0.437 0.431 0.436 0.452 0.458 0.459 0.452 0.460 (0.040) (0.044) (0.043) (0.045) (0.054) (0.038) (0.046) (0.063) Common linear trend 0.023 0.008 0.027 0.020 0.019 0.024 0.020 0.013 n 1,366 127 832 407 1,139 109 689 341 Notes: The estimates are based on the heterogeneous panel AR(1) model with a common linear trend,yit =i +g(1i)t + iyi;t1 +uit, whereyit =log(earningsit=pt) using PSID data over the sub-periods 1991–1995 and 1986–1995. “HSD" refers to high school dropouts with less than 12 years of education, “HSG" refers to high school graduates with at least 12 but less than 16 years of education, and “CLG" refers to college graduates with at least 16 years of education. The estimates of the trend, ^ gFD , is computed by (4.45). Then the estimation for is based on ~ yit =yit ^ gFDt fort = 1; 2;:::;T . “AAH", “AB", and “BB" denote dierent 2-step GMM estimators proposed by Chudik and Pesaran (2021), Arellano and Bond (1991), and Blundell and Bond (1998). The FDAC estimator is calculated by (4.23), and its asymptotic variance is estimated by the Delta method. “MSW" denotes the kernel-weighted estimator in Mavroeidis et al. (2015) and is calculated based on a parametric assumption that (i;i)jyi1 follows a multivariate normal distributionN(;V ) with initial values given by = (5; 0:5), = 2, = 0:4,corr(i;i) = 0:5 withu = 0:5. 4.9 Conclusion This paper considers the estimation of heterogeneous panel AR(1) data models with short (and xed)T , as n!1. It allows for xed eects and proposes estimating the moments of the AR(1) coecients,E( s i ), fors = 1; 2;:::;S, using the autocorelation function of rst dierences. We also show how estimates of 147 E( s i ), can be used to identify the underlying distribution of i , assuming they follow categorical distri- butions. It is also shown that the standard GMM estimators proposed in the literature for shortT panels are inconsistent in the presence of slope heterogeneity. Analytical expressions for this bias are derived and shown to be very close to estimates obtained from stochastic simulations. The small sample properties of the proposed estimators are investigated using Monte Carlo experi- ments. It is shown that the FDAC estimator which is based on the autocorrelations of rst-dierences (denoted by FDAC and set out in sub-section 4.6.1 ) performs much better than the GMM estimator based on autocovariances of the rst-dierences (set out in sub-section 4.6.2). Focussing on the FDAC estimator, we nd that the FDAC estimator performs well even under homogeneous AR(1) coecients. The mag- nitudes of bias and RMSE of the FDAC estimators are comparable to the HomoGMM estimators, and the size of the tests based on FDAC estimator is mostly around the 5 per cent nominal level. The simulation results also conrm the presence of neglected heterogeneity bias in the case of HomoGMM estimators, which can result in substantial size distortions. The FDAC estimator is shown to be reliable for stationary heterogeneous panel AR(1) models. The simulation results also show that the FDAC estimator is robust to dierent distributions of autoregressive coecients and error processes. However, when the initialization of the outcome process deviates from the steady state distribution, the FDAC estimator could suer from bias and size distortions. We also nd that for estimation of the parameters of the categorical distribution large sample sizes are required. The utility of the FDAC estimator is illustrated by applying it to the 1976–1995 PSID data using het- erogeneous AR(1) panels in log real earnings with a common linear trend. We provide estimates ofE( i ) andVar( i ) over a number of 5 and 10 yearly sub-periods, with 3 educational groupings. The estimates ofE( i ) dier systematically across the education groups, with the mean persistence of real earnings ris- ing with the level of educational attainments (high school dropouts, high school graduates, and college 148 graduates). The estimates ofVar( i ) dier across periods and levels of educational attainment, but do not display any particular patterns. It is, however, important to acknowledge that the scope of the present paper is limited, with a number of remaining challenges: (a) allowing for individual-specic time-varying covariates, such as heterogeneous time trends, and (b) simultaneously dealing with heterogeneity and non-stationary initialization. It is not clear that such extensions will be possible without relaxing the assumption thatT is short and xed, as n!1. But these are clearly important topics for future research. 149 References Abadie, A. and M. D. Cattaneo (2018). Econometric methods for program evaluation. Annual Review of Economics 10, 465–503. Abowd, J. M. and D. Card (1989). On the covariance structure of earnings and hours changes. Economet- rica 57, 411–445. Alan, S., M. Browning, and M. Ejrnæs (2018). Income and consumption: a micro semistructural analysis with pervasive heterogeneity. Journal of Political Economy 126, 1827–1864. Allegretto, S., A. Dube, M. Reich, and B. Zipperer (2017). Credible research designs for minimum wage studies: a response to neumark, salas, and wascher. ILR Review 70, 559–592. Allegretto, S. A., A. Dube, and M. Reich (2011). Do minimum wages really reduce teen employment? Ac- counting for heterogeneity and selectivity in state panel data. IndustrialRelations: AJournalofEconomy and Society 50, 205–240. Altonji, J. G., A. A. Smith Jr, and I. Vidangos (2013). Modeling earnings dynamics. Econometrica 81, 1395– 1454. Alvarez, J. and M. Arellano (2003). The time series and cross-section asymptotics of dynamic panel data estimators. Econometrica 71, 1121–1159. Anderson, T. W. and C. Hsiao (1981). Estimation of dynamic models with error components. Journal of the American Statistical Association 76, 598–606. Anderson, T. W. and C. Hsiao (1982). Formulation and estimation of dynamic models using panel data. Journal of Econometrics 18, 47–82. Arellano, M., R. Blundell, and S. Bonhomme (2017). Earnings and consumption dynamics: a nonlinear panel data framework. Econometrica 85, 693–734. Arellano, M. and S. Bond (1991). Some tests of specication for panel data: Monte Carlo evidence and an application to employment equations. The Review of Economic Studies 58, 277–297. Arellano, M. and S. Bonhomme (2012). Identifying distributional characteristics in random coecients panel data models. The Review of Economic Studies 79, 987–1020. Arellano, M. and S. Bonhomme (2016). Nonlinear panel data estimation via quantile regressions. The Econometrics Journal 3, C61–C94. Arellano, M. and J. Hahn (2016). A likelihood-based approximate solution to the incidental parameter problem in dynamic nonlinear models with multiple eects. Global Economic Review 45, 251–274. 150 Armstrong, T. B. and M. Kolesár (2018). Optimal inference in a class of regression models. Econometrica86, 655–683. Athey, S. and G. W. Imbens (2017). The state of applied econometrics: causality and policy evaluation. Journal of Economic Perspectives 31, 3–32. Baltagi, B. H., G. Bresson, and A. Pirotte (2008). To pool or not to pool? InTheEconometricsofPanelData: Fundamentals and Recent Developments in Theory and Practice, Chapter 16, pp. 517–546. Springer. Baltagi, B. H. and D. Levin (1986). Estimating dynamic demand for cigarettes using panel data: The eects of bootlegging, taxation and advertising reconsidered. The Review of Economics and Statistics 68, 148– 155. Banerjee, A., J. W. Galbraith, J. Dolado, et al. (1990). Dynamic specication and linear transformations of the autoregressive-distributed lag model. Oxford Bulletin of Economics and Statistics 52, 95–104. Banerjee, A., D. Karlan, and J. Zinman (2015). Six randomized evaluations of microcredit: introduction and further steps. American Economic Journal: Applied Economics 7, 1–21. Bao, Y. and A. Ullah (2007). The second-order bias and mean squared error of estimators in time-series models. Journal of Econometrics 140, 650–669. Bastagli, F., J. Hagen-Zanker, L. Harman, V. Barca, G. Sturge, and T. Schmidt (2019). The impact of cash transfers: a review of the evidence from low-and middle-income countries. Journal of Social Policy 48, 569–594. Bester, C. A. and C. Hansen (2009). Identication of marginal eects in a nonparametric correlated random eects model. Journal of Business & Economic Statistics 27, 235–250. Bewley, R. (1986). AllocationModels: Specication,Estimation,andApplications. Ballinger Cambridge, MA. Blundell, R. and S. Bond (1998). Initial conditions and moment restrictions in dynamic panel data models. Journal of Econometrics 87, 115–143. Bonhomme, S. (2012). Functional dierencing. Econometrica 80, 1337–1385. Browning, M. and J. Carro (2007). Heterogeneity and microeconometrics modeling. In R. Blundell, W. Newey, and T. Persson (Eds.), Advances in Economics and Econometrics: Theory and Applications, Ninth World Congress, Econometric Society Monographs. Cambridge University Press. Browning, M. and J. M. Carro (2014). Dynamic binary outcome models with maximal heterogeneity. Jour- nal of Econometrics 178, 805–823. Browning, M. and M. Ejrnæs (2013). Heterogeneity in the dynamics of labor earnings. Annual Review of Econonomics 5, 219–245. Browning, M., M. Ejrnæs, and J. Alvarez (2010). Modelling income processes with lots of heterogeneity. The Review of Economic Studies 77, 1353–1381. Callaway, B., A. Goodman-Bacon, and P. H. Sant’Anna (2021). Dierence-in-dierences with a continuous treatment. arXiv preprint arXiv:2107.02637. Carneiro, A., P. Portugal, P. Raposo, and P. M. Rodrigues (2023). The persistence of wages. Journal of Econometrics 233, 596–611. 151 Carroll, C. D. and A. A. Samwick (1997). The nature of precautionary wealth. Journal of Monetary Eco- nomics 40, 41–71. Cengiz, D., A. Dube, A. Lindner, and B. Zipperer (2019). The eect of minimum wages on low-wage jobs. The Quarterly Journal of Economics 134, 1405–1454. Chamberlain, G. (1992). Eciency bounds for semiparametric regression. Econometrica 60, 567–596. Chamberlain, G. (2022). Feedback in panel data models. Journal of Econometrics 226, 4–20. Chambers, M. J. (2013). Jackknife estimation of stationary autoregressive models. Journal of Economet- rics 172, 142–157. Chernozhukov, V., I. Fernández-Val, J. Hahn, and W. Newey (2013). Average and quantile eects in non- separable panel models. Econometrica 81, 535–580. Chudik, A., G. Kapetanios, and M. H. Pesaran (2018). A one covariate at a time, multiple testing approach to variable selection in high-dimensional linear regression models. Econometrica 86, 1479–1512. Chudik, A. and M. H. Pesaran (2015). Common correlated eects estimation of heterogeneous dynamic panel data models with weakly exogenous regressors. Journal of Econometrics 188, 393–420. Chudik, A. and M. H. Pesaran (2019). Mean group estimation in presence of weakly cross-correlated estimators. Economics Letters 175, 101–105. Chudik, A. and M. H. Pesaran (2021). An augmented Anderson–Hsiao estimator for dynamic short-T panels. Econometric Reviews 41, 1–32. Chudik, A. and M. H. Pesaran (2022). An augmented Anderson–Hsiao estimator for dynamic short-T panels. Econometric Reviews 41, 416–447. Crépon, B. and G. J. Van Den Berg (2016). Active labor market policies. Annual Review of Economics 8, 521–546. de Chaisemartin, C., X. d’Haultfoeuille, F. Pasquier, and G. Vazquez-Bare (2022). Dierence-in-dierences estimators for treatments continuously distributed at every period. arXiv preprint arXiv:2201.06898. Dhaene, G. and K. Jochmans (2015). Split-panel jackknife estimation of xed-eect models. The Review of Economic Studies 82, 991–1030. Dube, A., T. W. Lester, and M. Reich (2010). Minimum wage eects across state borders: Estimates using contiguous counties. The Review of Economics and Statistics 92, 945–964. Dube, A., T. W. Lester, and M. Reich (2016). Minimum wage shocks, employment ows, and labor market frictions. Journal of Labor Economics 34, 663–704. Durbin, J. and G. S. Watson (1950). Testing for serial correlation in least squares regression: I.Biometrika37, 409–428. Fernández-Val, I., W. Y. Gao, Y. Liao, and F. Vella (2022). Dynamic heterogeneous distribution regression panel models, with an application to labor income processes. arXiv preprint. Fernández-Val, I. and J. Lee (2013). Panel data models with nonadditive unobserved heterogeneity: Esti- mation and inference. Quantitative Economics 4, 453–481. 152 Ferraro, P. J. and J. J. Miranda (2017). Panel data designs and estimators as substitutes for randomized controlled trials in the evaluation of public programs. Journal of the Association of Environmental and Resource Economists 4, 281–317. Gao, Z. and M. H. Pesaran (2023). Identication and estimation of categorical random coecient models. Empirical Economics, a Special Issue in Honor of Peter Schmidt, 1–46. Geweke, J. and M. Keane (2000). An empirical analysis of earnings dynamics among men in the PSID: 1968–1989. Journal of Econometrics 96, 293–356. Graham, B. S. and J. L. Powell (2012). Identication and estimation of average partial eects in “irregular” correlated random coecient panel data models. Econometrica 80, 2105–2152. Gu, J. and R. Koenker (2017). Unobserved heterogeneity in income dynamics: an empirical Bayes perspec- tive. Journal of Business & Economic Statistics 35, 1–16. Guvenen, F. (2007). Learning your earning: are labor income shocks really very persistent? American Economic Review 97, 687–712. Guvenen, F. (2009). An empirical investigation of labor income processes. ReviewofEconomicDynamics12, 58–79. Hahn, J. and G. Kuersteiner (2010). Stationarity and mixing properties of the dynamic tobit model. Eco- nomics Letters 107, 105–111. Hahn, J. and G. Kuersteiner (2011). Bias reduction for dynamic nonlinear panel models with xed eects. Econometric Theory 27, 1152–1191. Hahn, J. and W. Newey (2004). Jackknife and analytical bias reduction for nonlinear panel models. Econo- metrica 72, 1295–1319. Han, C. and P. C. Phillips (2010). GMM estimation for dynamic panels with xed eects and strong instru- ments at unity. Econometric Theory 26, 119–151. Heckman, J. and E. Vytlacil (1998). Instrumental variables methods for the correlated random coecient model: estimating the average rate of return to schooling when the return is correlated with schooling. Journal of Human Resources 33, 974–987. Hoderlein, S. and H. White (2012). Nonparametric identication in nonseparable panel data models with generalized xed eects. Journal of Econometrics 168, 300–314. Hsiao, C., M. H. Pesaran, and A. Tahmiscioglu (1999). Bayes estimation of short-run coecients in dy- namic panel data models. In Analysis of Panel Data and Limited Dependent Variable Models. Cambridge University Press. Hsiao, C., M. H. Pesaran, and A. K. Tahmiscioglu (2002). Maximum likelihood estimation of xed eects dynamic panel data models covering short time periods. Journal of Econometrics 109, 107–150. Hubbard, R. G., J. Skinner, and S. P. Zeldes (1995). Precautionary saving and social insurance. Journal of Political Economy 103, 360–399. Jha, P., D. Neumark, and A. Rodriguez-Lopez (2022). What’s across the border? Re-evaluating the cross- border evidence on minimum wage eects. CESifo Working Paper Series. 153 Kendall, M. G. (1954). Note on bias in the estimation of autocorrelation. Biometrika 41, 403–404. Kiviet, J. F. and G. D. Phillips (2012). Higher-order asymptotic expansions of the least-squares estimation bias in rst-order dynamic regression models. Computational Statistics & Data Analysis 56, 3705–3729. Lee, Y. and D. Sul (2022). Trimmed mean group estimation. InEssaysinHonorofM.HashemPesaran: Panel Modeling, Micro Applications, and Econometric Methodology. Emerald Publishing Limited. Liang, K.-Y. and S. L. Zeger (1986). Longitudinal data analysis using generalized linear models. Biometrika 73, 13–22. Lillard, L. A. and Y. Weiss (1979). Components of variation in panel earnings data: American scientists 1960-70. Econometrica: Journal of the Econometric Society 47, 437–454. Lillard, L. A. and R. J. Willis (1978). Dynamic aspects of earning mobility. Econometrica 46, 985–1012. Liu, F., P. Zhang, I. Erkan, and D. S. Small (2017). Bayesian inference for random coecient dynamic panel data models. Journal of Applied Statistics 44, 1543–1559. Liu, L. (2023). Density forecasts in panel data models: a semiparametric Bayesian perspective. Journal of Business & Economic Statistics 41, 1–15. MaCurdy, T. E. (1982). The use of time series processes to model the error structure of earnings in a longitudinal data analysis. Journal of Econometrics 18, 83–114. Magnus, J. R. (1986). The exact moments of a ratio of quadratic forms in normal variables. Annales d’Economie et de Statistique, 95–109. Mavroeidis, S., Y. Sasaki, and I. Welch (2015). Estimation of heterogeneous autoregressive parameters with short panel data. Journal of Econometrics 188, 219–235. Meer, J. and J. West (2016). Eects of the minimum wage on employment dynamics. Journal of Human Resources 51, 500–522. Meghir, C. and L. Pistaferri (2004). Income variance dynamics and heterogeneity. Econometrica 72, 1–32. Nagar, A. L. (1959). The bias and moment matrix of the general k-class estimators of the parameters in simultaneous equations. Econometrica: Journal of the Econometric Society 27, 575–595. Neumark, D. (2019). The econometrics and economics of the employment eects of minimum wages: Getting from known unknowns to known knowns. German Economic Review 20, 293–329. Neumark, D. and W. Wascher (1992). Employment eects of minimum and subminimum wages: panel data on state minimum wage laws. ILR Review 46, 55–81. Neumark, D. and W. Wascher (2017). Reply to “credible research designs for minimum wage studies”. ILR Review 70, 593–609. Nickell, S. (1981). Biases in dynamic models with xed eects. Econometrica: Journal of the Econometric Society 49, 1417–1426. Okui, R. and T. Yanagi (2019). Panel data analysis with heterogeneous dynamics. Journal of Economet- rics 212, 451–475. 154 Okui, R. and T. Yanagi (2020). Kernel estimation for panel data with heterogeneous dynamics. The Econo- metrics Journal 23, 156–175. Pesaran, H., R. Smith, and K. S. Im (1996). Dynamic linear models for heterogenous panels. In P. Mátyás, Lászlóand Sevestre (Ed.), The Econometrics of Panel Data: A Handbook of the Theory with Applications. Springer Netherlands. Pesaran, M. H. (2015). Time Series and Panel Data Econometrics. Oxford University Press. Pesaran, M. H. and R. Smith (1995). Estimating long-run relationships from dynamic heterogeneous panels. Journal of Econometrics 68, 79–113. Pesaran, M. H. and A. Timmermann (2005). Small sample properties of forecasts from autoregressive models under structural breaks. Journal of Econometrics 129, 183–217. Pesaran, M. H. and T. Yamagata (2008). Testing slope homogeneity in large panels. Journal of Economet- rics 142, 50–93. Pesaran, M. H. and L. Yang (2023). Trimmed mean group estimation of average treatment eects in ultra short-T panels with correlated heterogeneous coecients. (work in progress). Pesaran, M. H. and Z. Zhao (1999). Bias reduction in estimating long-run relationships from dynamic heterogeneous panels. In Analysis of Panel Data and Limited Dependent Variable Models. Cambridge University Press. Powell, D. (2022). Synthetic control estimation beyond comparative case studies: Does the minimum wage reduce employment? Journal of Business & Economic Statistics 40, 1302–1314. Roberts, L. A. (1995). On the existence of moments of ratios of quadratic forms. Econometric Theory 11, 750–774. Robinson, P. M. (1978). Statistical inference for a random coecient autoregressive model. Scandinavian Journal of Statistics 5, 163–168. Sargan, J. (1974). The validity of nagar’s expansion for the moments of econometric estimators. Economet- rica: Journal of the Econometric Society 42, 169–176. Sasaki, Y. (2015). Heterogeneity and selection in dynamic panel data. JournalofEconometrics188, 236–249. Sasaki, Y. and T. Ura (2021). Slow movers in panel data. arXiv preprint arXiv:2110.12041. Smith, M. D. (1988). Convergent series expressions for inverse moments of quadratic forms in normal variables. Australian Journal of Statistics 30, 235–246. Thompson, J. P. (2009). Using local labor market data to re-examine the employment eects of the minimum wage. ILR Review 62, 343–366. Totty, E. (2017). The eect of minimum wages on employment: a factor model approach. Economic In- quiry 55, 1712–1737. Verdier, V. (2020). Average treatment eects for stayers with correlated random coecient models of panel data. Journal of Applied Econometrics 35, 917–939. Von Neumann, J. (1941). Distribution of the ratio of the mean square successive dierence to the variance. The Annals of Mathematical Statistics 12, 367–395. 155 Wooldridge, J. M. (2005). Fixed-eects and related estimators for correlated random-coecient and treatment-eect panel data models. Review of Economics and Statistics 87, 385–390. 156 AppendixA AppendixtoChapter2 A.1 AuxiliaryLemmasforProofs LemmaA.1(ExistenceofRatiosofQuadraticForms) Lety beaT 1non-degeneraterandomvector withaboundeddensityfunctionf y (y)almosteverywhereonitssupportS. LetAbeasymmetricTT matrix andletBbeapositivesemideniteTT matrix. LetBhavethespectralrepresentationB =E B E 0 ,where E isaTT matrixcontainingtheeigenvectorsofB ascolumnsand B isadiagonalmatrixcontainingthe corresponding eigenvalues. We order such that the rstb eigenvalues are non-zero as rank(B) = b. LetE 1 be aTb matrix, the columns of which are composed of the rstb eigenvectors with non-zero eigenvalues. LetE 2 beaT(Tb)matrixwhosecolumnsarecomposedoftherestofeigenvectorswithzeroeigenvalues. Supposeoneofthefollowingassumptionsholds: (i)y hasanellipticallysymmetricdistribution,(ii)y has a compact support, or (iii) there exists a functiong() 0 such that f y (y)g (jy 2 j) with 2 2 0 (B)\S, and (A.1) (a) whenrs; Z 1 0 g()d<1; (A.2) (b) or whenr>s; as!1;9> 0; g() =O 2(uv)1 ; (A.3) where 0 (B) is the null space ofB. SinceB is positive semi-denite, 0 (B) =E 2 , 157 (i) whenE 0 1 AE 2 = 0 andE 0 2 AE 2 = 0, the momentE (y 0 Ay) r (y 0 By) s exists if rank(B) =b> 2(sr), (ii) whenE 0 1 AE 2 6= 0 andE 0 2 AE 2 = 0, the momentE (y 0 Ay) r (y 0 By) s exists if rank(B) =b> 2sr, and (iii) whenE 0 1 AE 2 6= 0 andE 0 2 AE 2 6= 0, the momentE (y 0 Ay) r (y 0 By) s exists if rank(B) =b> 2s. Proof. Roberts (1995). RemarkA.1 According to Lemma A.1, whenr >s, the integralE h (y 0 Ay) r (y 0 By) s i converges to a nite vector if andonlyifthe 2(rs)thmomentofyexists,forwhichequation(A.3)intheappendixprovidesasucientbut not necessary condition. See Section 4.1 in Roberts (1995). Whenrs, the condition is only on the existence of some integrable functiong() that bounds the density functionf y (y), but no requirement on the rate of decay ofg() or existence of certain moments ofy. A.2 MathematicalProofs A.2.1 ProofofLemma1 Proof. Under Assumptions 1, 2, and 4 (ii), ~ iT = iT i =O p T 1=2 . Suppose Assumption 5 holds forl = 5, then we can apply the Nagar-type expansion (Nagar (1959)) to the inverse of iT in (2.8): 1 iT = h i + ~ iT i 1 = 1 i h I k + ~ iT 1 i i 1 = 1 i h I k ~ iT 1 i + ~ iT 1 i ~ iT 1 i ~ iT 1 i ~ iT 1 i ~ iT 1 i + ~ iT 1 i ~ iT 1 i ~ iT 1 i ~ iT 1 i +o p T 5=2 i ; (A.4) wherek 1 i k<C, and ~ iT 1 i =O p T 1=2 . Moreover note that by Assumption 4 (i), T 1 W 0 i M T u i =O p T 1=2 : 158 Then by collecting the terms of the respective orders we have ^ i i0 = i;1=2 + i;1 + i;2=3 + i;2 + i;5=2 +o p T 5=2 ; where forj = 1; 2; 3; 4; 5, i;1=2 = 1 i T 1 W 0 i M T u i =O p T 1=2 ; i;2=2 = 1 i ~ iT 1 i T 1 W 0 i M T u i =O p T 1 ; i;3=2 = 1 i ~ iT 1 i ~ iT 1 i T 1 W 0 i M T u i =O p T 3=2 ; i;4=2 = 1 i ~ iT 1 i ~ iT 1 i ~ iT 1 i T 1 W 0 i M T u i =O p T 2 ; and i;5=2 = 1 i ~ iT 1 i ~ iT 1 i ~ iT 1 i ~ iT 1 i T 1 W 0 i M T u i =O p T 5=2 : Suppose Assumption 4 (iii) holds form = 5, that is, up to the fth order moments of iT or ~ iT exist, then evaluating the order of expectations of the approximate components, we have E i;1=2 = 0 k ; and E(^ i ) i0 = B i1 T + B i2 T 2 +O 1 T 3 : A.2.2 ProofofTheorem1 Proof. Equations (2.15) and (2.16) immediately follow from Lemma 1. Under Assumption 6, ^ MGHJK ^ 0 = 1 n n X i=1 ( i0 0 ) + 1 n n X i=1 (^ i;HJK i0 ) =O p 1 p n +O p 1 p nT 4 ; 159 thus given the rectangular array n T 4 ! 0, we have p n(^ MGHJK ^ 0 ) =O p (1) +O p r n T 4 =O p (1): Fori = 1; 2;:::;n, under Assumptions 1–4, Var( i;HJK ) =O T 2 ; where i;HJK = ^ i;HJK i0 : Moreover, by Assumption 1 i;HJK is independent acrossi, thus asT!1, kVar( nT )k =kn 1 n X i=1 Var( i;HJK )jk sup 1in Var( i;HJK )k! 0 where nT = 1 n n X i=1 iT : Hence under Assumption 6, p n(^ MGHJK ^ 0 ) d !N(0 k ; ): The asymptotic distribution of ^ MGTJK in (2.18) can be shown similarly for n T 6 ! 0. A.2.3 ProofofLemma2 Proof. Under Assumption 2, the process ofy it is stable withj i j< 1. Suppose a random initial condition is given by y i0 = i0 + 0 i x i0 +! i0 " i0 : (A.5) 160 Combining (2.1) and (A.5), we can rewrite the equation ofy it fort = 1; 2; ;T as follows: y i1 = ( i + i i0 ) + 0 i ( i x i0 +x i1 ) + ( i ! i0 " i0 + i " i1 ); y i2 = ( i + i i + i i0 ) + 2 i i0 + 0 i 2 i x i0 + i x i1 +x i2 + 2 i ! i0 " i0 + i i " i1 + i " i2 ; . . . Then fort = 1; 2;:::;T y it = t X s=1 ts i i + t1 i i0 + t X s=1 ts i 0 i x is + t i ! i0 " i0 + i t X s=2 ts i " is : (A.6) Let ~ y i = (y i0 ;y i1 ; ;y iT ) 0 , ~ " i = (" i0 ;" i1 ; ;" iT ) 0 , and ~ X i = (x i0 ;x i1 ; ;x iT ) 0 . Then we can write down the equations of ~ y i in matrix notations ~ y i = ~ i;T +1 +H i ~ X i i +H i V i ~ " i ; (A.7) where ~ i;T +1 = i H i T +1 + i0 F i ,F i = (1; i ; ; T i ) 0 are (T + 1) 1 vectors, and H i = 0 B B B B B B B B B B @ 1 0 0 i 1 0 . . . . . . . . . T i T1 i 1 1 C C C C C C C C C C A andV i = 0 B B B B B B B B B B @ ! i0 = i 0 0 0 1 0 . . . . . . . . . 0 0 1 1 C C C C C C C C C C A are (T + 1) (T + 1) matrices. The distribution of ~ y i conditional on ~ X i can be written as follows: ~ y i j ~ X i = (c i +H i V i ~ " i )j ~ X i =c i j ~ X i + (H i V i j ~ X i )~ " i (~ i ; ~ i ); 161 wherec i = ~ i;T +1 +H i ~ X i i . Since ~ y i j ~ X i is linear in ~ " i , its mean and variance are give by ~ i =c i j ~ X i , and ~ i =H i V i V 0 i H 0 i j ~ X i : Furthermore, for all elements ~ !2 ~ i , j~ !j 1 2(T1) i 1 2 i + ! 2 i0 (1 2(T1) i ) 2 i (1 2 i ) ; which implies that y it has a bounded variance for t = 0; 1;:::;T , providedj i j < 1, ! i0 < C, and 0< 2 i <C. A.2.4 ProofofLemma3 Proof. Since ~ i has a full rank ofT + 1, given (2.22), ~ y i j ~ X i (~ i ; ~ i ) has a non-degenerate distribution. Under Assumption 7 on the distribution of shocks to the outcome process, the distribution of ~ y i j ~ X i will satisfy the respective assumptions in Lemma A.1. Now we are ready to apply Lemma A.1 to (2.23). In the following, we calculate the rank ofB i and show that result (ii) in Lemma A.1 follows. Note that bothB i andM i are idempotent and symmetric matrices with rank(B i ) = rank(M i ) =Tk x 1: (A.8) LetE M andE B be orthogonal matrices whose column vectors are composed of eigenvectors ofM i and B i respectively so thatM i =E M M E 0 M andB i =E B B E 0 B . We order the eigenvalues such that the rstTk x 1 eigenvalues are non-zero. Then we have the following partitions E M = (E M;1 . . .E M;2 ), andE B = (E B;1 . . .E B;2 ); 162 whereE M;1 andE B;1 areT(Tk x 1) andT(Tk x 2) matrices whose columns are composed of eigenvectors with non-zero eigenvalues ofM i andB i respectively, andE M;2 andE B;2 areT (k x + 1) andT (k x + 2) matrices whose columns are composed of eigenvectors with zero eigenvalues ofM i and B i respectively. Given the structure ofB i , we can verify that E B;1 = 0 B B @ E M;1 0 0 Tkx1 1 C C A and E B;2 = 0 B B @ E M;2 0 T 0 0 kx+1 1 1 C C A : Given (2.23), ifE 0 B;1 (A i +A 0 i )E B;2 6= 0, andE 0 B;2 (A i +A 0 i )E B;2 = 0, then we can apply result (ii) in Lemma A.1. By algebra, we can show that the following equations hold. B i E B;2 = 0 B B @ M i E M;2 0 T 0 0 kx+1 0 1 C C A = 0 (T +1)(kx+2) (A.9) A 0 i E B;2 = 0 B B @ 0 0 kx+1 0 M i E M;2 0 T 1 C C A = 0 (T +1)(kx+2) (A.10) A i E B;2 6= 0 (T +1)(kx+2) (A.11) SinceA i is not symmetric, equation (A.11) holds as long asZ i has full column rank. Equation (A.10) implies that for anyl (T + 1) matrixP (l = 1; 2; ), we have (PA 0 i E B;2 ) 0 = 0. Next, we prove thatE 0 B;1 A i E B;2 6= 0 (Tkx1)(kx+1) by contradiction with E 0 B;1 A i E B;2 = 0 B B @ E 0 M;1 0 B B @ E M;2 (2 :T 1; 1 :k x ) 0 0 kx 1 C C A E 0 M;1 (1 :Tk x 2;T 1) 1 C C A : 163 SupposeE 0 B;1 A i E B;2 = 0, then we have E 0 M;1 0 B B @ E M;2 (2 :T 1; 1 :k x ) 0 0 kx 1 C C A = 0 (Tkx2)kx ; (A.12) and E 0 M;1 (1 :Tk x 2;T 1) = 0 (Tkx2)1 : (A.13) Let E M;2 = 0 B B @ E M;2 (2 :T 1; 1 :k x ) 0 0 kx 1 C C A : Equation (A.12) implies that the columns ofE M;2 are orthogonal toE M;1 , which is the range space of M i , thusE M;2 should be the null space ofM i . Then the (T 1) (T 1) orthogonal matrix whose columns are composed of all eigenvectors ofM i is given by E M = E M;1 . . .E M;2 : Combing the above equation with equation (A.13), the last row ofE M would be a zero vector, which contradicts it being a full rank orthogonal matrix. A.3 DependentRelationshipbetweentheNumeratorandDenominator In this section, we rst establish under what conditions the numerator and denominator of a quadratic ratio are independent of each other in Lemma A.2. Then we show that in terms of ^ i , the conditions 164 are not satised. Assumingy t N(; 2 ) fort = 1; ;T , Von Neumann (1941) has shown that the numerator and denominator of P T1 t=1 (y t+1 y t ) 2 P T t=1 (y t y) 2 are independent of each other, such that the moment of the ratio equals the ratio of moments. Following his arguments, we formalize the conditions needed. LemmaA.2(IndependencebetweenRatioanditsDenominator) LetybeaT1randomvectorwith a marginal distributiony t N(; 2 ) fort = 1; 2;:::;T. SupposeA is aTT real symmetric matrix, and B is aTT positive semi-denite and idempotent matrix whose spectral representation is given by B =E B B E 0 B ; where the columns ofE B are composed of eigenvectors ofB, and B is a diagonal matrix contains all the respective eigenvalues. Denoteb =Rank(B). The eigenvalues are ordered so that the rstb eigenvalues are non-zero with the corresponding eigenvectors contained inE B;1 , and E B = (E B;1 . . .E B;2 ): For a quadratic ratio given by R = y 0 Ay y 0 By ; (A.14) R is distributed independently ofy 0 By if (i) Rank(E 0 B;1 AE B;1 ) =b, (ii) E 0 B;1 AE B;1 = 0, and (iii) E 0 B;2 AE B;2 = 0. 165 Proof. SinceB is idempotent,E B =E 0 B =E 1 B , andE B is an orthogonal transformation ofy such that y 0 By =y 0 E B B E 0 B y =x 0 B x = b X j=1 x 2 j : Then we can write down the following equation: E 0 B AE B = 0 B B @ E 0 B;1 AE B;1 E 0 B;1 AE B;2 E 0 B;2 AE B;1 E 0 B;2 AE B;2 1 C C A = 0 B B @ D bb 0 b(Tb) 0 (Tb)b 0 (Tb)(Tb) 1 C C A ; (A.15) whereD bb is abb symmetric matrix with full rankb. Then the quadratic ratio ofy can be written in terms of ab 1 vectorx =E 0 B y = (x 1 ; ;x b ) 0 , R = y 0 Ay y 0 By = x 0 Dx x 0 x ; (A.16) wherex j iidN(0; 2 x ). Next, we follow the argument by Von Neumann (1941). 1 The distribution ofR can be obtained rst by determining the distribution of R over every spherical surfacex 0 x = r 2 , and then averaging these distributions with the weights (r)dr, where (r)dr is the probability of the spherical shell from r to r +dr with respect to the distribution law of (x 1 ; ;x b ) 0 . Since the (x 1 ; ;x b ) 0 distribution law is spherically symmetric in these variables, the rst-mentioned distributions over the spherical surfaces are readily obtained by assigning each piece of the surfaces in question its relative,b-dimensional area as weight. SinceR is a homogeneous function of (x 1 ; ;x b ) 0 of order zero, these spherical surface distributions ofR are the same for allr. Consequently, we can replace all theser by, sayr = 1, and the subsequent averaging over ther may be omitted altogether. Finally, since 1 See Section 2 Von Neumann (1941). 166 we restrict ourselves tor = 1 (i.e. to the spherical surfacex 0 x = 1) and the denominator ofR can be omitted. In terms of ^ i in (2.23), we do the following calculation: E 0 B (A i +A 0 i )E B = (E 0 B;1 ;E 0 B;2 ) 0 (A i +A 0 i )(E B;1 ;E B;2 ) = 0 B B @ E 0 B;1 A i E B;1 E 0 B;1 A i E B;2 E 0 B;2 A i E B;1 E 0 B;2 A i E B;2 1 C C A + 0 B B @ E 0 B;1 A 0 i E B;1 E 0 B;1 A 0 i E B;2 E 0 B;2 A 0 i E B;1 E 0 B;2 A 0 i E B;2 1 C C A = 0 B B @ E 0 B;1 A i E B;1 E 0 B;1 A i E B;2 0 0 1 C C A + 0 B B @ E 0 B;1 A 0 i E B;1 0 E 0 B;2 A 0 i E B;1 0 1 C C A = 0 B B @ 2E 0 B;1 A 0 i E B;1 E 0 B;1 A i E B;2 E 0 B;2 A 0 i E B;1 0 1 C C A : In the previous proof of Lemma 3, we already show thatE 0 B;1 A i E B;2 6= 0, and thenE 0 B (A i +A 0 i )E B can not be written as the right-hand side in equation (A.15). Thus, ^ i is not distributed independently of its denominator. RemarkA.2 WecanshowthatE 0 B;1 A 0 i E B;1 =E 0 M;1 (1 :Tk x 1; 1 :T1)E M;1 (2 :T; 1 :Tk x 1), which is, in general, a full rank matrix. SinceE 0 B;1 A i E B;2 6= 0, the ratio is not a homogeneous function with respect tox of order zero. RemarkA.3 The conditions in Lemma A.2 overlap with the conditions in case (i) of Lemma A.1 and are augmented with two more requirements:B is idempotent, andE 0 B;1 AE B;1 is a full rank matrix. RemarkA.4 Lemma A.2 can be applied to the general ratio (y 0 Ay) r (y 0 By) s when s = r. It helps us to better understand why the condition is reduced tob> 2(sr) in case (i) of Lemma A.1 as (y 0 Ay) r (y 0 By) s = y 0 Ay y 0 By r (y 0 Ay) sr : 167 There is another special case that (y 0 Ay) r and (y 0 By) s are distributed independently, where we replace the conditionE 0 B;1 AE B;1 being a full rank matrix byE 0 B;1 AE B;1 = 0. RemarkA.5 Durbin and Watson (1950) applied a similar argument as in Von Neumann (1941) to the test statistics of serial correlations in a linear model with simultaneous equations. A.4 TimeEectsandCluster-robustStandardErrorsinHeterogeneous DynamicPanels With time eects, the heterogeneous dynamic panel model is given by y i = i T + T +W i i +u i ; (A.17) where T is a T 1 vector of time xed eects with a normalization 0 T T = 0. The MG estimation formula of for regressions with time eects is provided as the following. For bias correction of the MG estimator with split-panel Jackknife, we replace ^ MG by ^ MGHJK or ^ MGTJK when involved in calculations. A.4.1 MeanGroupEstimatorwithTimeEectswhenT >k WhenT >k, the time eects can be estimated by 2 ^ T = n 1 n X i=1 ~ M i ! 1 n 1 n X i=1 ~ M i M T y i ! ; (A.18) where ~ M i =I T M T W i (W 0 i M T W i ) 1 W 0 i M T , then ^ i;TE = (W 0 i M T W i ) 1 W 0 i M T y i ^ T ; (A.19) 2 See Chamberlain (1992). 168 and the MG estimator of 0 is now given by ^ MGTE =n 1 n X i=1 ^ i;TE : (A.20) Its variance can be consistently estimated by \ Var [^ MGTE ] = \ Var [^ MGTE ()] + R 0 w \ Var( ^ T ) R w ; (A.21) where \ Var [^ MGTE ()] = 1 n(n 1) n X i=1 (^ i;TE ^ MGTE )(^ i;TE ^ MGTE ) 0 ; R w = 1 n n X i=1 M T W i (W 0 i M T W i ) 1 ; and \ Var( ^ T ) = 1 n n 1 n X i=1 ~ M i ! 1 " 1 n n X i=1 ~ M i M T (y i ^ T )(y i ^ T ) 0 M T ~ M i # n 1 n X i=1 ~ M i ! 1 : A.4.2 IndividualEstimatorwithTimeFixedEects When there is a non-zero correlation between the level of regressors and heterogeneous slope coecients, dierencing out the time xed eects by the FE-TE estimation approach will also induce bias in the esti- mate. An individual estimator with common time eects dierenced out as the FE-TE estimator is given by ~ i;TE = iT 1 T 1 W 0 i M T y i ; (A.22) where iT =T 1 W 0 i M T W i ; (A.23) 169 with W i =W i W T , y i =y i y T , W T = 1 n P n i=1 W i , and y T = 1 n P n i=1 y i . Since ~ i;TE i = iT 1 T 1 W 0 i M T u i + iT 1 " T 1 W 0 i M T ( W T i 1 n n X i=1 W i i ) # ; (A.24) with u i = u i u T and u T = 1 n P n i=1 u i , where the rst term is the usual small-T bias of the OLS estimator in models with weakly exogenous regressors provided errors in regressions are independent or weakly correlated overi, then ~ i;TE is consistent if plim T!1 2 4 T 1 W 0 i M T 0 @ W T i n 1 n X j=1 W j j 1 A 3 5 = 0: (A.25) The quantity in the left-hand side of (A.25) can be further written as T 1 W 0 i M T 0 @ W T i n 1 n X j=1 W j j 1 A =T 1 W 0 i M T 2 4 n 1 n X j=1 W j ( i j ) 3 5 : Then under the assumption thatW j and i are cross-sectionally independent fori6=j, condition (A.25) will be violated if the correlation between i and the levels of individuali’s regressors does not diminish asT grows, that is, plim T!1 T 1 W 0 i ( i 0 ) 6= 0: (A.26) Note that the same transformation of dierencing out the time xed eects,z i n 1 P n i=1 z i , is utilized in the FE-TE estimation. For an autoregressive model, condition (A.25) is violated due to dependence between i andy i;t1 , then ~ i;TE and FE-TE estimators are biased. 170 A.4.3 Cluster-RobustVarianceoftheFixedEectsEstimators The xed eects estimator with time eects (FE-TE) can be written as ^ FETE = 1 nT n X i=1 W 0 i M T W i ! 1 1 nT n X i=1 W 0 i M T y i ! : In dynamic panels, errors are assumed to be serially uncorrelated for allt. For unitsi andj6= i, cross- sectional correlations in errors are non-zero ifi andj are in the same clusterN g withG clusters in total, and the correlations are zero between units in dierent clusters. The cluster-robust estimator of variance is calculated as in Liang and Zeger (1986), \ Var(^ FETE ) = n X i=1 W 0 i M T W i ! 1 ^ V CR n X i=1 W 0 i M T W i ! 1 ; (A.27) where ^ V CR = G X g=1 X i;j2Ng T X t=1 w it w jt ^ u it ^ u jt and ^ u it =y it ^ 0 FETE w it ; withy it =y it y i y t + y andw it =w it w i w t + w. 3 The formulas of the half-panel Jackknife xed eects (FE-HJK) estimator with time eects can be found in Chudik et al. (2018), where we use a similar cluster-robust estimator of variance in the paper. 4 3 The symbol denotes average over the respective index. 4 See equation (8) on p.820 for the FE-HJK estimate and equation (24) on p.823 for the robust estimator of its variance without clustering in Chudik et al. (2018). 171 A.5 ExtensiveEmpiricalResultsofMinimumWagePolicy A.5.1 RobustnessCheckwithDierentLagsintheCounty-levelAnalysis For the county-level analysis using the QWI data, Tables A.1–A.6 report estimation results in an ARDL(p,q) with dierent numbers of lags, p andq, for both the total employment and teenage employment in the private sector, where our ndings in the comparisons between the FE-HJK and MG-HJK estimators with time eects are consistent with the previous results in the ARDL(2,2) model. 172 Table A.1: Estimated eects of real minimum wages on total employment across counties in an ARDL(1,1) model with time eects n = 2,647 andT =38 (2002 Quarter 3–2011 Quarter 4) FE-TE MG-TE FE-TE (HJK) MG-TE (HJK) Short-term eects ^ 1 0.714 0.388 0.882 0.568 (0.010) (0.006) (0.012) (0.008) ^ 0 -0.023 -0.036 -0.018 0.035 (0.017) (0.008) (0.020) (0.015) ^ 1 0.000 0.007 -0.016 -0.076 (0.016) (0.007) (0.017) (0.012) Log(Population) 0.174 0.367 0.047 0.239 (0.011) (0.026) (0.018) (0.066) Share of aged 15–59 0.002 0.011 0.002 -0.007 (0.000) (0.002) (0.000) (0.009) Log(Real GDP per capita) 0.328 0.890 0.283 0.595 (0.059) (0.128) (0.104) (0.253) Long-term eect of minimum wages ^ -0.081 -0.047 -0.290 -0.093 (0.036) (0.012) (0.135) (0.040) Notes: The table reports both estimated short-term and long-term coecients in an ARDL(1,1) model given by (2.31), where the outcome variable is total private sector employment, and the regressors include lagged employment, contemporary and lagged real minimum wages, and three time-varying controls: population, the share of the population aged in 15–59, and real GDP per capita. “FE-TE" denotes the xed eects estimator with time eects, and “FE-TE (HJK)" denotes the FE-TE estimator with half- panel Jackknife proposed in Chudik et al. (2018). “MG" denotes the mean group estimator proposed in Pesaran and Smith (1995) where the estimation formula with time eects (TE) is given by (A.20), and the “MG-TE (HJK)" estimator is proposed in the paper. For short-term eects, the estimated standard errors of the FE-TE and FE-TE (HJK) estimators are robust to heteroskedasticity and clustered at the state level as in Liang and Zeger (1986) given by (A.27), assuming zero serial correlations in dynamic panels. The standard errors of both the MG-TE estimators are estimated as the sample counterpart of the variance of the respective individual estimators. For long-term eects, the estimates are computed as ^ = ^ 0 + ^ 1 +:::+ ^ q 1 ^ 1 ^ 2 ::: ^ p , and the asymptotic standard errors are computed by the Delta method. Estimated standard errors are reported in brackets. 173 Table A.2: Estimated eects of real minimum wages on total employment across counties in an ARDL(1,2) model with time eects n = 2,647 andT =38 (2002 Quarter 3–2011 Quarter 4) FE-TE MG-TE FE-TE (HJK) MG-TE (HJK) Short-term eects ^ 1 0.714 0.386 0.882 0.581 (0.010) (0.006) (0.012) (0.008) ^ 0 -0.020 -0.039 -0.016 0.055 (0.017) (0.008) (0.020) (0.016) ^ 1 0.034 0.044 0.033 -0.063 (0.022) (0.008) (0.023) (0.013) ^ 2 -0.044 -0.052 -0.061 0.009 (0.020) (0.007) (0.021) (0.015) Log(Population) 0.173 0.364 0.047 0.219 (0.011) (0.027) (0.018) (0.072) Share of aged 15–59 0.002 0.010 0.002 -0.001 (0.000) (0.002) (0.000) (0.009) Log(Real GDP per capita) 0.332 0.845 0.287 0.902 (0.059) (0.130) (0.104) (0.257). Long-term eect of minimum wages ^ -0.107 -0.077 -0.369 0.003 (0.039) (0.013) (0.143) (0.045) Notes: The table reports both estimated short-term and long-term coecients in an ARDL(1,2) model given by (2.31), where the outcome variable is total private sector employment, and the regressors include lagged employment, contemporary and lagged real minimum wages, and three time-varying controls: population, the share of the population aged in 15–59, and real GDP per capita. “FE-TE" denotes the xed eects estimator with time eects, and “FE-TE (HJK)" denotes the FE-TE estimator with half- panel Jackknife proposed in Chudik et al. (2018). “MG" denotes the mean group estimator proposed in Pesaran and Smith (1995) where the estimation formula with time eects (TE) is given by (A.20), and the “MG-TE (HJK)" estimator is proposed in the paper. For short-term eects, the estimated standard errors of the FE-TE and FE-TE (HJK) estimators are robust to heteroskedasticity and clustered at the state level as in Liang and Zeger (1986) given by (A.27), assuming zero serial correlations in dynamic panels. The standard errors of both the MG-TE estimators are estimated as the sample counterpart of the variance of the respective individual estimators. For long-term eects, the estimates are computed as ^ = ^ 0 + ^ 1 +:::+ ^ q 1 ^ 1 ^ 2 ::: ^ p , and the asymptotic standard errors are computed by the Delta method. Estimated standard errors are reported in brackets. 174 Table A.3: Estimated eects of real minimum wages on total employment across counties in an ARDL(2,1) model with time eects n = 2,647 andT =38 (2002 Quarter 3–2011 Quarter 4) FE-TE MG-TE FE-TE (HJK) MG-TE (HJK) Short-term eects ^ 1 0.682 0.416 0.793 0.589 (0.013) (0.008) (0.014) (0.009) ^ 2 0.045 -0.132 0.133 -0.050 (0.013) (0.009) (0.014) (0.010) ^ 0 -0.023 -0.016 -0.020 0.101 (0.017) (0.007) (0.019) (0.015) ^ 1 0.000 0.000 -0.015 -0.102 (0.016) (0.008) (0.017) (0.013) Log(Population) 0.163 0.437 0.013 0.371 (0.012) (0.032) (0.018) (0.072) Share of aged 15–59 0.002 0.013 0.002 -0.002 (0.000) (0.002) (0.000) (0.009) Log(Real GDP per capita) 0.325 1.019 0.278 0.998 (0.059) (0.148) (0.104) (0.267) Long-term eect of minimum wages ^ -0.084 -0.023 -0.469 -0.002 (0.038) (0.012) (0.228) (0.039) Notes: The table reports both estimated short-term and long-term coecients in an ARDL(2,1) model given by (2.31), where the outcome variable is total private sector employment, and the regressors include lagged employment, contemporary and lagged real minimum wages, and three time-varying controls: population, the share of the population with ages in 15–59, and real GDP per capita. “FE-TE" denotes the xed eects estimator with time eects, and “FE-TE (HJK)" denotes the FE-TE estimator with half- panel Jackknife proposed in Chudik et al. (2018). “MG" denotes the mean group estimator proposed in Pesaran and Smith (1995) where the estimation formula with time eects (TE) is given by (A.20), and the “MG-TE (HJK)" estimator is proposed in the paper. For short-term eects, the estimated standard errors of the FE-TE and FE-TE (HJK) estimators are robust to heteroskedasticity and clustered at the state level as in Liang and Zeger (1986) given by (A.27), assuming zero serial correlations in dynamic panels. The standard errors of both the MG-TE estimators are estimated as the sample counterpart of the variance of the respective individual estimators. For long-term eects, the estimates are computed as ^ = ^ 0 + ^ 1 +:::+ ^ q 1 ^ 1 ^ 2 ::: ^ p , and the asymptotic standard errors are computed by the Delta method. Estimated standard errors are reported in brackets. 175 Table A.4: Estimated eects of real minimum wages on teenage employment across counties in an ARDL(1,1) model with time eects n = 2,367 andT =38 (2002 Quarter 3–2011 Quarter 4) FE-TE MG-TE FE-TE (HJK) MG-TE (HJK) Short-term eects ^ 1 0.460 0.314 0.629 0.341 (0.014) (0.008) (0.006) (0.008) ^ 0 -0.068 -0.067 -0.046 -0.098 (0.015) (0.014) (0.015) (0.028) ^ 1 0.002 -0.054 0.024 -0.062 (0.015) (0.017) (0.014) (0.026) Log(Population) -0.445 -0.516 -0.435 -0.727 (0.048) (0.088) (0.036) (0.158) Log(Teen Population) 0.135 0.261 0.063 0.461 (0.021) (0.040) (0.021) (0.077) Log(Total Employment) 0.757 1.021 0.580 0.809 (0.039) (0.021) (0.015) (0.024) Long-term eect of minimum wages ^ -0.123 -0.176 -0.058 -0.243 (0.021) (0.026) (0.030) (0.055) Notes: The table reports both estimated short-term and long-term coecients in an ARDL(1,1) model given by (2.31), where the outcome variable is teenage employment in the private sector, and the regressors include lagged teenage employment, contempo- rary and lagged real minimum wages, and three time-varying controls: population, teenage population, and total private sector employment. “FE-TE" denotes the xed eects estimator with time eects, and “FE-TE (HJK)" denotes the FE-TE estimator with half-panel Jackknife proposed in Chudik et al. (2018). “MG" denotes the mean group estimator proposed in Pesaran and Smith (1995) where the estimation formula with time eects (TE) is given by (A.20), and the “MG-TE (HJK)" estimator is proposed in the paper. For short-term eects, the estimated standard errors of the FE-TE and FE-TE (HJK) estimators are robust to heteroskedas- ticity and clustered at the state level as in Liang and Zeger (1986) given by (A.27), assuming zero serial correlations in dynamic panels. The standard errors of both the MG-TE estimators are estimated as the sample counterpart of the variance of the respec- tive individual estimators. For long-term eects, the estimates are computed as ^ = ^ 0 + ^ 1 +:::+ ^ q 1 ^ 1 ^ 2 ::: ^ p , and the asymptotic standard errors are computed by the Delta method. Estimated standard errors are reported in brackets. 176 Table A.5: Estimated eects of real minimum wages on teenage employment across counties in an ARDL(1,2) model with time eects n = 2,367 andT =38 (2002 Quarter 3–2011 Quarter 4) FE-TE MG-TE FE-TE (HJK) MG-TE (HJK) Short-term eects ^ 1 0.460 0.316 0.629 0.334 (0.014) (0.008) (0.006) (0.008) 0 -0.068 -0.063 -0.043 -0.167 (0.015) (0.015) (0.015) (0.030) ^ 1 0.013 -0.063 0.033 -0.075 (0.017) (0.017) (0.016) (0.028) ^ 2 -0.014 0.012 -0.014 0.096 (0.013) (0.015) (0.012) (0.033) Log(Population) -0.445 -0.514 -0.437 -0.785 (0.048) (0.090) (0.036) (0.165) Log(Teen Population) 0.135 0.279 0.064 0.454 (0.021) (0.040) (0.021) (0.079) Log(Total Employment) 0.757 1.027 0.579 0.850 (0.039) (0.021) (0.015) (0.026) Long-term eect of minimum wages ^ -0.128 -0.167 -0.065 -0.219 (0.022) (0.029) (0.032) (0.060) Notes: The table reports both estimated short-term and long-term coecients in an ARDL(1,2) model given by (2.31), where the outcome variable is teenage employment in the private sector, and the regressors include lagged teenage employment, contempo- rary and lagged real minimum wages, and three time-varying controls: population, teenage population, and total private sector employment. “FE-TE" denotes the xed eects estimator with time eects, and “FE-TE (HJK)" denotes the FE-TE estimator with half-panel Jackknife proposed in Chudik et al. (2018). “MG" denotes the mean group estimator proposed in Pesaran and Smith (1995) where the estimation formula with time eects (TE) is given by (A.20), and the “MG-TE (HJK)" estimator is proposed in the paper. For short-term eects, the estimated standard errors of the FE-TE and FE-TE (HJK) estimators are robust to heteroskedas- ticity and clustered at the state level as in Liang and Zeger (1986) given by (A.27), assuming zero serial correlations in dynamic panels. The standard errors of both the MG-TE estimators are estimated as the sample counterpart of the variance of the respec- tive individual estimators. For long-term eects, the estimates are computed as ^ = ^ 0 + ^ 1 +:::+ ^ q 1 ^ 1 ^ 2 ::: ^ p , and the asymptotic standard errors are computed by the Delta method. Estimated standard errors are reported in brackets. 177 Table A.6: Estimated eects of real minimum wages on teenage employment across counties in an ARDL(2,1) model with time eects n = 2,367 andT =38 (2002 Quarter 3–2011 Quarter 4) FE-TE MG-TE FE-TE (HJK) MG-TE (HJK) Short-term eects ^ 1 0.423 0.319 0.554 0.358 (0.011) (0.009) (0.006) (0.009) ^ 2 0.069 -0.057 0.160 -0.066 (0.009) (0.008) (0.006) (0.009) ^ 0 -0.071 -0.057 -0.043 -0.015 (0.015) (0.015) (0.015) (0.032) ^ 1 0.008 -0.063 0.033 -0.174 (0.015) (0.017) (0.014) (0.033) Log(Population) -0.467 -0.423 -0.521 -0.557 (0.047) (0.090) (0.037) (0.163) Log(Teen Population) 0.135 0.243 0.065 0.479 (0.020) (0.041) (0.021) (0.082) Log(Total Employment) 0.751 1.014 0.571 0.815 (0.040) (0.021) (0.015) (0.026) Long-term eect of minimum wages ^ -0.125 -0.162 -0.034 -0.266 (0.021) (0.025) (0.040) (0.056) Notes: The table reports both estimated short-term and long-term coecients in an ARDL(2,1) model given by (2.31), where the outcome variable is teenage employment in the private sector, and the regressors include lagged teenage employment, contempo- rary and lagged real minimum wages, and three time-varying controls: population, teenage population, and total private sector employment. “FE-TE" denotes the xed eects estimator with time eects, and “FE-TE (HJK)" denotes the FE-TE estimator with half-panel Jackknife proposed in Chudik et al. (2018). “MG" denotes the mean group estimator proposed in Pesaran and Smith (1995) where the estimation formula with time eects (TE) is given by (A.20), and the “MG-TE (HJK)" estimator is proposed in the paper. For short-term eects, the estimated standard errors of the FE-TE and FE-TE (HJK) estimators are robust to heteroskedas- ticity and clustered at the state level as in Liang and Zeger (1986) given by (A.27), assuming zero serial correlations in dynamic panels. The standard errors of both the MG-TE estimators are estimated as the sample counterpart of the variance of the respec- tive individual estimators. For long-term eects, the estimates are computed as ^ = ^ 0 + ^ 1 +:::+ ^ q 1 ^ 1 ^ 2 ::: ^ p , and the asymptotic standard errors are computed by the Delta method. Estimated standard errors are reported in brackets. 178 A.5.2 EectsofRealMinimumWagesonTotalEmploymentacrossStates Meer and West (2016) estimate eects of the minimum wage policy on employment across all 50 states plus the District of Columbia in the United States with a sample from Business Dynamics Statistics (BDS) from 1977 to 2011 in a panel model with two-way xed eects and homogeneous slope coecients. The means and standard deviations of the above variables in dierent periods are reported in Table A.7. As shown in the table, although the nominal minimum wages are continuously raised by the federal and state governments, the average of real minimum wages across states has experienced both increases and decreases because of ination erosions. Over the sample periods, the means of all the variables have relatively small variations compared with their standard deviations. Table A.7: Descriptive statistics of state characteristics 1977 1984 1991 1998 2005 2011 log(Employment) 13.55 13.70 13.88 14.05 14.14 14.13 (1.06) (1.07) (1.06) (1.06) (1.03) (1.01) log(Minimum wage) 2.20 2.02 1.79 1.91 1.87 2.03 (0.04) (0.03) (0.09) (0.04) (0.11) (0.05) log(Population) 14.79 14.86 14.92 14.98 15.07 15.12 (1.04) (1.02) (1.03) (1.04) (1.04) (1.04) Share of population aged 15-59 0.61 0.61 0.61 0.62 0.63 0.61 (0.02) (0.02) (0.02) (0.02) (0.01) (0.02) log(Real gross state product per capita) 10.38 10.44 10.52 10.67 10.77 10.78 (0.23) (0.26) (0.24) (0.23) (0.25) (0.26) Notes: The sample is a balanced panel of 50 states plus the District of Columbia in the United States of the period 1977–2011 from the sample of Business Dynamics Statistics in Meer and West (2016). Numbers in brackets are standard deviations of variables. While as a benchmark the FE-TE estimate shows a signicantly negative eect of the minimum wage on employment, the estimate becomes close to zero when state-specic time trends are included in the re- gression. Meer and West (2016) suspect there is an attenuation bias in the estimate with state-specic time trends due to a staggered dynamic treatment process. Later, they provide further evidence to support the substantial negative eects of the minimum wage policy based on the pooled estimates of long dierences and distributed lag regressions. 179 Table A.8 reports estimated results of the short-run eects of minimum wages on total employment across states in the United States from 1977 to 2011 based on an ARDL(1,1) model. Consistent with the theory, estimates for the mean of heterogeneous autoregressive coecients are greater after Jackknife bias correction, and the dierences between the MG and MG-HJK estimators of the contemporary and lagged eects of minimum wages are marginal. While the values of the MG and MG-HJK estimates of the rst- order autoregressive coecient, i , are reasonable, the FE estimates are much greater. Moreover, contrary to the FE and FE-HJK estimates, the MG-HJK results show a negative eect of the total population and a stronger and positive eect of productivity on total employment. Table A.9 reports estimated results of the long-run elasticity of minimum wages on the overall employment, where the MG-HJK estimates are insignicant and close to zero, but the FE estimates are signicantly negative. 180 Table A.8: Estimates of the short-run eects of the minimum wage on employment across states with an ARDL(1,1) model FE FE-HJK MG MG-HJK Short-run coecients withn = 51 andT = 34 ^ 0.829 0.836 0.868 0.947 0.932 0.961 0.473 0.554 0.543 0.548 0.735 0.750 (0.032) (0.023) (0.020) (0.063) (0.050) (0.029) (0.043) (0.036) (0.040) (0.049) (0.053) (0.057) ^ 1;0 -0.029 -0.038 -0.024 -0.033 -0.030 0.003 -0.005 0.001 -0.013 0.002 -0.002 -0.017 (0.016) (0.013) (0.013) (0.037) (0.031) (0.032) (0.024) (0.023) (0.024) (0.027) (0.026) (0.026) ^ 1;1 -0.036 -0.030 -0.007 -0.005 -0.019 0.001 -0.056 -0.038 -0.023 -0.056 -0.039 -0.026 (0.016) (0.015) (0.011) (0.039) (0.044) (0.038) (0.022) (0.019) (0.017) (0.024) (0.022) (0.020) ^ 2 0.120 0.065 0.036 0.039 0.026 0.003 0.094 0.003 0.030 -0.147 -0.144 -0.066 (0.034) (0.020) (0.014) (0.049) (0.045) (0.025) (0.123) (0.086) (0.080) (0.147) (0.104) (0.108) ^ 3 0.721 0.461 0.267 0.714 0.479 0.154 1.411 0.301 0.171 0.733 0.051 -0.415 (0.184) (0.130) (0.105) (0.246) (0.201) (0.126) (0.285) (0.213) (0.193) (0.430) (0.321) (0.279) ^ 4 0.045 0.068 0.036 -0.040 0.013 -0.003 0.322 0.254 0.170 0.361 0.250 0.174 (0.023) (0.021) (0.018) (0.030) (0.039) (0.026) (0.026) (0.027) (0.025) (0.034) (0.031) (0.029) H 0 : i1 = 1 ; i1 = 1 Hausman-type test - - - - - - - - - 128.559 20.785 17.761 p-value - - - - - - - - - 0.000 0.000 0.000 Time xed eects Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Region-specic time trends - Yes - - Yes - - Yes - - Yes - Division-specic time trends - - Yes - - Yes - - Yes - - Yes Notes: This table reports estimation results of an ARDL(1,1) panel data model given by (2.31) using the BDS sample in 1977–2011 in Meer and West (2016) withn = 51 and T = 34. The MG estimator with time eects is calculated based on equation (A.20). Three cases of time trends are considered with states categorized into 4 regions and 9 divisions according to the census. “-" denotes not applicable. 181 Table A.9: Estimates of the long-run eect of the minimum wage on employment across states with an ARDL(1,1) model FE FE-HJK MG MG-HJK Long-run coecients withn = 51 andT = 34 ^ 1 -0.381 -0.416 -0.235 -0.723 -0.723 0.100 -0.115 -0.082 -0.080 -0.118 -0.157 -0.174 (0.084) (0.106) (0.088) (1.368) (0.438) (1.089) (0.057) (0.052) (0.049) (0.074) (0.110) (0.114) Time xed eects Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Region-specic time trends - Yes - - Yes - - Yes - - Yes - Division-specic time trends - - Yes - - Yes - - Yes - - Yes Notes: This table reports estimation results of an ARDL(1,1) panel data model given by (2.31) using the BDS sample in 1977–2011 in Meer and West (2016) withn = 51 andT = 34. Three cases of time trends are considered with states categorized into 4 regions and 9 divisions according to the census. The MG estimator with time eects is calculated based on equation (A.20). For the long-term eects, the estimates are computed based on the respective estimated means of short-run coecients as ^ 1 = ^ 1;0 + ^ 1;1 +:::+ ^ 1;q 1 ^ 1 ^ 2 ::: ^ p , and the asymptotic standard errors are calculated based on the Delta method. “-" denotes not applicable. 182 AppendixB AppendixtoChapter3 B.1 Lemmas LemmaB.1 Suppose that Assumptions 9, 12, and 13 hold. Then for eachi, we have E [d s i 1fd i a n g] =O(a s+1 n ), fors = 1; 2;:::: (B.1) E( i ) =O(a n ); andE( 2 i ) =O(a n ); (B.2) E ( i i ) =O(a n ); andE 2 i i =O(a n ); (B.3) n 1 n X i=1 E d 2 i 1fd i a n g 1=2 =O(a 3=2 n ); (B.4) n 1 n X i=1 E d 2 i 1fd i >a n g 1=2 =O(a 1 n ); (B.5) Proof. By mean value theorem (MVT), under Assumption 12, we have F d (a n ) =F d (0) +f d ( a n )a n =f d ( a n )a n =O(a n ); (B.6) 183 where a n lies on the line segment betwen 0 anda n . Similarly, let (a n ) = R an 0 u s f d (u)du and note that 0 (a n ) =a s n f d (a n ). Then by MVT (a n ) = (0) + [ a s n f d ( a n )]a n , and we have E [d s i 1fd i a n g] = Z an 0 u s f d (u)du = a s n f d ( a n )a n =O(a s+1 n ), fors = 1; 2;:::: (B.7) Using the above results E( i ) =E d i a n a n 1fd i a n g =a 1 n E [d i 1fd i a n g]E [1fd i a n g] =a 1 n O(a 2 n )F d (a n ) =O(a n ); (B.8) and E( 2 i ) =E " d i a n a n 2 1fd i a n g # =a 2 n E d 2 i 1fd i a n g +E [1fd i a n g] 2a 1 n E [d i 1fd i a n g] =a 2 n O(a 3 n ) +F d (a n ) 2a 1 n O(a 2 n ) =O(a n ); (B.9) Consider now the terms involving the products of i and i E ( i i ) =B i E d i a n a n 1fd i a n g [g(d i )E [g(d i )]] : (B.10) 184 SinceB i is bounded and does not depend ond i , witout loss of generality we setB i =I k and consider the thej th term of (B.10), namley s j (a n ) =E d i a n a n 1fd i a n g [g j (d i )E [g j (d i )]] = 1 a n Z an 0 ug j (u)f d (u)du Z an 0 g j (u)f d (u)du E [g j (d i )] 1 a n Z an 0 uf d (u)du +E [g j (d i )] Z an 0 f d (u)du : By Assumption 13E [g j (d i )]<C, and using (B.6) and (B.1) we have Z an 0 f d (u)du =O(a n ), anda 1 n Z an 0 uf d (u)du =O(a n ): Also by the mean value theorem Z an u=0 g j (u)f d (u)du =g j ( a n )f d ( a n )a n =O(a n ); 1 a n Z an u=0 ug j (u)f d (u)du = 1 a n [ a n g j ( a n )f d ( a n )a n ] =O(a n ): Hence,E ( i i ) =O(a n ). Similarly thej th term ofE 2 i i (settingB i =I k ) is given by s j2 (a n ) = E ( d i a n a n 2 1fd i a n g [g j (d i )E [g j (d i )]] ) = E d 2 i a 2 n 1 2 d i a n a n 1fd i a n g [g j (d i )E [g j (d i )]] : (B.11) Consider the rst term E d 2 i a 2 n 1fd i a n g [g j (d i )E [g j (d i )]] = 1 a 2 n Z an 0 u 2 g j (u)f d (u)du 1 a 2 n E [g j (d i )]E d 2 i 1fd i a n g ; 185 and again by mean value theorema 2 n R an 0 u 2 g j (u)f d (u)du = O(a n ), E [g j (d i )] < C, and using (B.7), E d 2 i 1fd i a n g =O(a 3 n ): Hence, the rst term of (B.11) isO(a n ). For its second term, we have Ef1fd i a n g [g j (d i )E [g j (d i )]]g = Z an 0 g j (u)f d (u)duE [g j (d i )] Z an 0 f d (u)du =O(a n ); and the order of the third term is already established to beO(a n ). Hence, it follows thatE 2 i i =O(a n ). Finally, result (B.4) follows from (B.1) and (B.5) follows noting thatd 2 i 1fd i >a n ga 2 n . LemmaB.2 Suppose that Assumptions 17, 11, 9, 12 and 13, hold. Let ;nT =n 1 n X i=1 (1 + i ) iT ; where i = d i an an 1fd i a n g,a n =C n n ,C n <C,d i = det(W 0 i W i ), and iT = R 0 i u i . Then E ;nT = 0, (B.12) Var ;nT =n 2 n X i=1 E 1fd i >a n gR 0 i H i R i +n 2 n X i=1 a 2 n E d 2 i 1fd i a n gR 0 i H i R i ; (B.13) Var ;nT =O n 1+ ; (B.14) and E " n 1 n X i=1 a 1 n d 2 i 1fd i a n gR 0 i H i (W i )R i # =O a 1=2 n : (B.15) 186 Proof. Under Assumptions 17 and conditional onW i (and hence ond i ), (1 + i ) iT are distributed independently overi and E ;nT jW i = n 1 n X i=1 (1 + i ) R 0 i E (u i jW i ) = 0, Var ;nT jW i = n 2 n X i=1 (1 + i ) 2 E iT 0 iT jW i = n 2 n X i=1 (1 + i ) 2 R 0 i E u i u 0 i jW i R i =n 2 n X i=1 (1 + i ) 2 R 0 i H i R i ; where H i =E (u i u 0 i jW i ). We have suppressed the dependence of H i on W i to simplify the exposition. Hence,E ;nT = 0, andVar ;nT =E Var ;nT jW i . To establish (B.13) note that (1 + i ) 2 = 1fd i >a n g +a 2 n d 2 i 1fd i a n g; (B.16) and Var ;nT =n 2 n X i=1 E 1fd i >a n gR 0 i H i R i +n 2 E " n X i=1 a 2 n d 2 i 1fd i a n gR 0 i H i R i # : SinceH i is positive denite and by Assumption 17sup i max (H i )<C R 0 i H i R i max (H i ) R 0 i R i = max (H i ) (W 0 i W i ) 1 <Cd 1 i (W 0 i W i ) : (B.17) and Var ;nT Cn 2 n X i=1 E 1fd i >a n gd 1 i (W 0 i W i ) +Cn 2 E " n X i=1 a 2 n d i 1fd i a n g (W 0 i W i ) # : 187 By Cauchy-Schwarz inequality E 1fd i >a n gd 1 i (W 0 i W i ) E d 2 i 1fd i >a n g 1=2 n E h W 0 i W i 2 io 1=2 E d i 1fd i a n g max (H i ) (W 0 i W i ) E d 2 i 1fd i a n g 1=2 n E h W 0 i W i 2 io 1=2 ; and since by Assumption 11sup i E h (W 0 i W i ) 2 i <C, then Var ;nT C " n 2 n X i=1 E d 2 i 1fd i >a n g 1=2 +a 2 n n 2 n X i=1 E d 2 i 1fd i a n g 1=2 # : Now using results (B.4) and(B.5) of Lemma B.1, we have n 2 n X i=1 E d 2 i 1fd i >a n g 1=2 = O(n 1 a 1 n ); andn 2 n X i=1 E d 2 i 1fd i a n g 1=2 = O(n 1 a 3=2 n ); then Var ;nT = O(n 1 a 1 n ) +O(n 1 a 1=2 n ), and since a n = C n n result (B.14) follows. To establish (B.15) using (B.17) we have E n 1 n X i=1 a 1 n d 2 i 1fd i a n gR 0 i H i R i Cn 1 n X i=1 a 1 n d i 1fd i a n g (W 0 i W i ) ; (B.18) and by Cauchy-Schwarz inequality E n 1 n X i=1 a 1 n d 2 i 1fd i a n gR 0 i H i R i Cn 1 n X i=1 a 1 n E d 2 i 1fd i a n g 1=2 h E W 0 i W i 2 i 1=2 : 188 Under Assumption 11,sup i E (W 0 i W i ) 2 <C; and we have E n 1 n X i=1 a 1 n d 2 i 1fd i a n gR 0 i H i (W i )R i C " n 1 n X i=1 E a 2 n d 2 i 1fd i a n g 1=2 # : Now using (B.1) we haveE a 2 n d 2 i 1fd i a n g =a 2 n O(a 3 n ) =O (a n ); and result (B.15) follows. LemmaB.3 Let v it v i = u it u i + (x it x i ) 0 i ; fori = 1; 2;:::;n;t = 1; 2;:::;T; v t v = n 1 n X i=1 (v it v i ) = ( u t u ) +n 1 n X i=1 (x it x i ) 0 i ; where i = i 0 , and suppose that Assumptions 17, 13 and 15 hold. Then E (v it v i ) = 0, fori = 1; 2;:::;n;t = 1; 2;:::;T; (B.19) v t v =O p (n 1=2 ); fort = 1; 2;:::;T; (B.20) and (noting thatT is xed asn!1) p n ( v T v T )! d N(0; ); (B.21) where v T = ( v 1 ; v 2 ;:::; v T ) 0 =n 1 P n i=1 i , i = ( i1 ; i2 ;:::; iT ) 0 ; = M " lim n!1 n 1 n X i=1 E i 0 i # M ; (B.22) and M = I T T 1 T 0 T . 189 Proof. Under Assumptions 17 and 15,E(u it ) = 0 andE x 0 it i =E x 0 is i for allt ands. Hence E (u it u i ) = 0 E (x it x i ) 0 i = E x 0 it i T 1 " T X t 0 =1 E x 0 it 0 i # = 0; and result (B.19) follows. Result (B.20) also follows noting that under Assumptions 17 and 13,v it v i , for i = 1; 2;:::;n, are cross-sectionally independent with mean zero and nite variances. To establish B.21 we rst note that v =T 1 ( 0 T v T ), and hence p n ( v T v T ) = M p n v T =n 1=2 n X i=1 M i ; where M i is aT 1 vector (T is xed) with elements zero means and nite variances, and by As- sumption 13 are cross-sectionally independent. Therefore, result B.21 follows by standard central limit theorems for independent but not identically distributed random variables. B.2 ProofofPropositionsandTheorems B.2.1 ProofofProposition1 Proof. Under Assumption 9, ^ MG ! p 0 if nT ! p 0. A sucient (but not necessary) condition for the latter to hold can be obtained by applying Markov inequality to nT , i.e., for any xed > 0, Pr nT Ek nT k 2 2 . Thus for nT ! p 0, it is sucient to show thatE nT 2 ! 0. In what follows we nd conditions under whichE nT 2 =O(n 1 ), and hence establish that nT ! p 0 at the regular rate ofn 1=2 . Note that nT 2 =n 2 n X i=1 iT 2 =n 2 n X i=1 iT ! 0 n X i=1 iT ! =n 2 n X i=1 n X j=1 0 iT jT : 190 HenceE nT 2 = n 2 P n i=1 P n j=1 E 0 iT jT . Since under Assumption 17u 0 it s are cross-sectionally independent and we have E nT 2 =n 2 n X i=1 E 0 iT iT : (B.23) Then using (3.8), E 0 iT iT =E E u 0 i R i R 0 i u i jW i =E E Tr R 0 i u i u 0 i R i jW i =Tr R 0 i H i (W i )R i ; where by Assumption 17,H i (W i ) =E(u i u 0 i jW i ). Also Tr R 0 i H i (W i )R i max [H i (W i )]Tr R 0 i R i = max [H i (W i )]Tr h W 0 i W i 1 i max [H i (W i )] n k max h W 0 i W i 1 io : Sincek is nite, and under Assumption 17, max [H i (W i )] is bounded for eachi . Then given (B.23) we have E nT 2 =n 2 n X i=1 Tr R 0 i H i (W i )R i Cn 2 n X i=1 E n max h W 0 i W i 1 io : Hence,E nT 2 =O n 1 ; if sup i E n max h W 0 i W i 1 io <C <1: (B.24) It is also worth noting that condition (B.24) can be written in terms of column or row norms of (W 0 i W i ) 1 which is easier to use in practice. SinceW 0 i W i is a symmetric matrix then it follows that max (W 0 i W i ) 1 (W 0 i W i ) 1 1 ; wherekAk 1 denotes the column norm ofA. Also (W 0 i W i ) 1 = d 1 i (W 0 i W i ) , whered i = det(W 0 i W i );and (W 0 i W i ) is the adjoint ofW 0 i W i . 191 Then max (W 0 i W i ) 1 d 1 i k(W 0 i W i ) k 1 ; and by Cauchy–Schwarz inequality E max (W 0 i W i ) 1 E d 2 i 1=2 n E h (W 0 i W i ) 2 1 io 1=2 ; hence equation (B.24) will hold under the following conditions E d 2 i <C, andE h (W 0 i W i ) 2 1 i <C, fori = 1; 2;:::;n: Under the above conditions ^ MG converges in probability to 0 at the regular rate ofn 1=2 , irrespective of whether i are correlated with the regressors or not, and it is robust to error serial correlation and conditional heteroskedasticity. B.2.2 ProofofProposition3 Proof. Consider n A n n = n n n 1 n X i=1 ix ix ! ; and without loss of generality suppose that is positive denite. Then n A n n = " n 1 n X i=1 P i P 0 i P n P 0 n # =n 1 n X i=1 P i P n P i P n 0 ; whereP i = ix 1=2 and P n =n 1 P n i=1 P i . HenceA n = 1 n V P n 1 n ; where V P n = " n 1 n X i=1 P i P n P i P n 0 # : 192 It is clear that V P n is semi-positive denite and by assumption n is positive denite. Then it follows that 1 n V P n 1 n is also semi-positive denite and henceA n is non-positive denite,A n 0. ForB n we have n B n n = n " n 1 n X i=1 1 ix X 0 i M T H i (X i )M T X i 1 ix # n " n 1 n X i=1 X 0 i M T H i (X i )M T X i # ; and in general it is not possible to sign n B n n . The outcome depends on the heterogeneity of error variances and their interactions with the heterogeneity of regressors. We have already seen thatB n 0 whenV i = 2 I T , but this result need not hold in a more general setting whereH i (X i ) varies acrossi. B.2.3 ProofofTheorem4 Proof. Using (3.5), (3.34), and (3.27), we have ~ i 0 = i 0 + iT ; (B.25) where iT = (1 + i ) i + (1 + i ) iT , and using (3.37) ^ TMG 0 = 1 1 + n nT ; (B.26) where nT =n 1 P n i=1 i +n 1 P n i=1 i i +n 1 P n i=1 (1 + i ) iT :Substracting (B.26) from (B.25) now yields ~ i ^ TMG = iT + i 0 1 1 + n nT ; 193 and we have n 1 n X i=1 ~ i ^ TMG ~ i ^ TMG 0 = n 1 n X i=1 iT 0 iT + n 1 n X i=1 2 i ! 0 0 0 + n 1 n X i=1 i iT ! 0 0 + 0 n 1 n X i=1 i 0 iT ! + " 1 1 + n 2 2 1 1 + n # nT 0 nT n 1 + n 0 0 nT n 1 + n nT 0 0 (B.27) By the results in Lemma B.1,E n = O(a n ), andE nT = E ( i i ) = O(a n ), n = O(a n ) +o p (1) andn 1 P n i=1 2 i =O(a n ) +o p (1). Also using (3.43) we have nT =O(n ) +O p n (1) 2 ; and since> 1=3 using (3.42), " 1 1 + n 2 2 1 1 + n # nT 0 nT =O p n 2 +O p n (1) =O p n (1) , (B.28) n 1 + n nT 0 0 =O(a n n ) +O p a n n (1) 2 =O p n (1+) 2 : (B.29) Consider now ;nT =n 1 n X i=1 i iT =n 1 n X i=1 i (1 + i ) i +n 1 n X i=1 i (1 + i ) iT : (B.30) By (B.3 ) in Lemma B.1,E [ i (1 + i ) i ] =O(a n ); and since i (1 + i ) i are distributed independently overi we have n 1 n X i=1 i (1 + i ) i =O p (a n ). (B.31) 194 Also since conditional on W i , i (1 + i ) iT are distributed overi with zero means, then following the same line of argument as in the proof of Lemma B.2, we haveE [ i (1 + i ) iT ] = 0 and Var " n 1 n X i=1 i (1 + i ) iT # = n 2 n X i=1 E h 2 i (1 + i ) 2 R 0 i H i R i i Cn 2 n X i=1 E h 2 i (1 + i ) 2 d 1 i (W 0 i W i ) i Further using (B.16) (1 + i ) 2 2 i = 1fd i >a n g +a 2 n d 2 i 1fd i a n g d i a n a n 2 1fd i a n g = a 2 n d 2 i d i a n a n 2 1fd i a n g; and Var " n 1 n X i=1 i (1 + i ) iT # Cn 2 n X i=1 E " a 2 n d i d i a n a n 2 1fd i a n g (W 0 i W i ) # : By Cauchy-Schwarz inequality E " a 2 n d i d i a n a n 2 1fd i a n g (W 0 i W i ) # a 2 n ( E " d 2 i d i a n a n 4 1fd i a n g #) 1=2 h E W 0 i W i 2 i 1=2 ; and since under Assumption 11,sup i E (W 0 i W i ) 2 <C; we have Var " n 1 n X i=1 i (1 + i ) iT # Cn 2 a 2 n n X i=1 ( E " d 2 i d i a n a n 4 1fd i a n g #) 1=2 : 195 Also using (B.1) of Lemma B.1 E " d 2 i d i a n a n 4 1fd i a n g # =a 4 n E d 6 i 3d 5 i a n + 3a 3 n d 3 i a 4 n d 2 i 1fd i a n g =O a 3 n ; which yileds Var " n 1 n X i=1 i (1 + i ) iT # =O n 1 a 2 n a 3=2 n =O n 1 a 1=2 n ; and by Markov inequality n 1 n X i=1 i (1 + i ) iT =O n 1=2 a 1=4 n =O n 1=2+=4 : (B.32) Using (B.31) and (B.32) in (B.30), we have ;nT = O p (n ) +O n 1=2+=4 , which if used with (B.28) and (B.29) in (B.27) now yields (for> 1=3) n 1 n X i=1 ~ i ^ TMG ~ i ^ TMG 0 =n 1 n X i=1 iT 0 iT +O p n (1) +O(n ) +O n 1=2+=4 ; and since iT are independently distributed overi, we have plim n!1 n 1 n X i=1 ~ i ^ TMG ~ i ^ TMG 0 =lim n!1 " n 1 n X i=1 E iT 0 iT # But using (B.26) and recalling that n =O(a n ) then lim n!1 nVar ^ TMG =lim n!1 nVar nT : 196 Also (recall thatE nT =O(a n )) nVar nT =E " n 1 n X i=1 [ iT E ( iT )] [ iT E ( iT )] 0 # =n 1 n X i=1 E iT 0 iT +O(a 2 n ): Hence lim n!1 nVar ^ TMG = lim n!1 n 1 n X i=1 E iT 0 iT = plim n!1 n 1 n X i=1 ~ i ^ TMG ~ i ^ TMG 0 ; andn 1 P n i=1 ~ i ^ TMG ~ i ^ TMG 0 is a consistent estimator ofnVar ^ TMG . B.3 ProofofTheorem5(AsymptoticdistributionofTMG-TEestimator) Proof. Initially, we consider the case whereTk. To derive the asymptotic distribution of ^ TMGTE we rst note that ^ TMGTE () = ^ TMG Q 0 n , and ^ TMGTE = ^ TMG Q 0 n ^ . Hence ^ TMGTE 0 ^ TMGTE () 0 = Q 0 n ^ : (B.33) Also stacking (3.60) overt and subtracting the results from (3.63) yields ^ =M T W ^ TMGTE 0 +M T ; (B.34) where = n 1 P n i=1 i , and i = ( i1 ; i2 ;:::; iT ) 0 with it = u it +x 0 it i . Using this result in (B.33) we have I k Q 0 n M T W ^ TMGTE 0 = ^ TMGTE () 0 Q 0 n M T : 197 For a known value of, the asymptotic distribution of ^ TMGTE () 0 is the same as ^ TMG with y i replaced byy i . Under the assumption thatI k Q 0 n M T W is invertible, we have ^ TMGTE 0 = I k Q 0 n M T W 1 ^ TMGTE () 0 I k Q 0 n M T W 1 Q 0 n M T : Hence using Lemma B.3, =O p n 1=2 , and we have n (1)=2 ^ TMGTE 0 = I k Q 0 n M T W 1 h n (1)=2 ^ TMGTE () 0 i +O p (n =2 ); where for a known we have already established in Theorem 3 that n (1)=2 ^ TMGTE () 0 ! d N (0;V ()); with V () = lim n!1 Var h n (1)=2 ^ TMGTE () i . Suppose further thatplim n!1 Q 0 n M T W =G w ;whereI k G w is non-singular. For > 1=3, we have n (1)=2 ^ TMGTE 0 ! d N (0;V ;TMGTE ), where V ;TMGTE = (I k G w ) 1 V () h (I k G w ) 1 i 0 : (B.35) A consistent estimator of the asymptotic variance of ^ TMGTE is given by \ Var( ^ TMGTE ) = 1 n 1 I k Q 0 n M T W 1 b V I k Q 0 n M T W 1 ; (B.36) where b V = 1 n(1 + n ) 2 n X i=1 ( ~ i Q 0 i ^ ^ TMGTE )( ~ i Q 0 i ^ ^ TMGTE ) 0 ; (B.37) 198 Consider now the asymptotic distribution of ^ . Using (B.34), and noting thatM T W ^ TMGTE 0 =M T X ^ TMGTE 0 , we have ^ 0 =M T X ^ TMGTE 0 +M T : Two cases can arise depending on whether the probability limit ofM T X tends to zero asn!1, or not. Under (a)plim n!1 M T X = 0; we haven 1=2 ^ 0 ! d N(0;M T M T );where is given by (B.22), namely ^ ! p 0 at the regular rate ofn 1=2 . Also, noting that it i = (u it u i ) + (x it x i ) 0 i =y it y i (x it x i ) 0 t ; can be consistently estimated by b =n 1 n X i=1 y i X i ^ TMGTE ^ y i X i ^ TMGTE ^ 0 : (B.38) Under case (b),plim n!1 M T X6= 0, and convergence of ^ to 0 cannot achieve the regular rate. To see this note that n (1)=2 ^ 0 =M T X h n (1)=2 ^ TMGTE 0 i +n =2 M T n 1=2 ; whereM T n 1=2 = O p (1) and and since > 0 the second term tends to zero, but rather slowly. In practice, where it is not known whetherM T X! 0 or not, one can consistently estimate the asymptotic variance of ^ by \ Var ^ =M T X \ Var ^ TMGTE X 0 + b M T ; (B.39) 199 where \ Var ^ TMGTE and b are given by (B.36) and (B.38), respectively. Note that \ Var ^ is singu- lar as \ Var ^ T = 0, but its diagonal elements can be used to test if ^ t fort = 1; 2;::;T are individually or jointly statistically signicant subject to 0 T = 0. 200 AppendixC AppendixtoChapter4 C.1 Introduction This online supplement is organized as follows: Section C.2 derives expressions for the analytical bias of the AB and BB estimators under heterogeneity of i whenT = 4. Section C.3 derives the autocovariances of rst dierences assuming the initial values are random drawn from the steady state distribution offy it g. Section C.4 provides additional Monte Carlo evidence. Section C.5 describes the sample (1976–1995) of the Panel Study of Income Dynamics (PSID) data used in the empirical application and provides estimation results for a number of sub-periods in addition to the ones reported in the main paper. C.2 NeglectedheterogeneitybiasinABandBBestimators The AB estimator proposed by Arellano and Bond (1991) is based on the following moment conditions: 1 E(y is u it ) = 0, fori = 1; 2;:::;n;s = 1; 2;:::;t 2, andt = 3; 4;:::;T; (C.1) which can also be written asE[y is (y it i y i;t1 )] = 0; with (T 1)(T 2)=2 moment conditions in total. WhenT = 4, the AB moment conditions, neglecting the heterogeneity, are given byE[y i1 (y i3 1 See equation (8) on p. 5 in Chudik and Pesaran (2021). 201 y i2 )] = 0,E[y i1 (y i4 y i3 )] = 0, andE[y i2 (y i4 y i3 )] = 0. With a xed weight matrix W AB , the AB estimator can be written as ^ AB = z 0 na W AB z na 1 z 0 na W AB z nb ; (C.2) where z na =n 1 ( P n i=1 y i1 y i2 ; P n i=1 y i1 y i3 ; P n i=1 y i2 y i3 ) 0 ; and z nb =n 1 ( P n i=1 y i1 y i3 ; P n i=1 y i1 y i4 ; P n i=1 y i2 y i4 ) 0 : Under Assumptions 18 and 19, y it = (1 i ) 1 i + 1 X `=0 ` i u i;t` ; (C.3) and assuming that i is distributed independently offu it g (as assumed under AB) then using (4.3) and (C.3) we have E y i;th y it j i ; i ; 2 i =E " i 1 X `=0 ` i + 1 X `=0 ` i u i;t` ! u it (1 i ) 1 X `=1 `1 i u i;t` ! i ; i ; 2 i # =E " (1 i ) 1 X `=0 h1+2` i u 2 i;th` ! i ; 2 i # = 2 i (1 i ) h1 i 1 2 i ; Hence E (y i;th y it ) =E 2 i h1 i 1 + i ! , forh = 1; 2;:::: (C.4) Given (C.4), we have z a =plim n!1 z na = E 2 i 1 + i ;E 2 i i 1 + i ;E 2 i 1 + i 0 ; and z b =plim n!1 z nb = E 2 i i 1 + i ;E 2 i 2 i 1 + i ;E 2 i i 1 + i 0 : 202 Thus, when i = +v i distributed independently of 2 i withv i sIIDUniform(a;a); fora> 0, plim n!1 ^ AB E( i ) = z 0 a W AB z na 1 z 0 a W AB z b ; (C.5) wherez a = 2 (c ; 1c ;c ) 0 andz a = 2 (1c ; 1 +c ; 1c ) 0 with 2 = E( 2 i ) and c =E 1 1+ i = 1 2a ln 1+ +a 1+ a . In addition to (C.1), consider the following moment condition also used in the system GMM estimator proposed by Blundell and Bond (1998) given by 2 E[y i;t1 ( i +u it )] = 0, fori = 1; 2;:::;n; , andt = 3; 4;:::;T; (C.6) which can also be written asE[y i;t1 (y it y i;t1 )] = 0: ForT = 4, with a xed weight matrixW BB , the BB estimator combining moment conditions in (C.1) and (C.6) is given by ^ BB = z 0 nc W BB z nc 1 z 0 nc W BB z nd ; (C.7) where z nc =n 1 n X i=1 y i1 y i2 ; n X i=1 y i1 y i3 ; n X i=1 y i2 y i3 ; n X i=1 y i2 y i2 ; n X i=1 y i3 y i3 ! 0 ; and z nd =n 1 n X i=1 y i1 y i3 ; n X i=1 y i1 y i4 ; n X i=1 y i2 y i4 ; n X i=1 y i3 y i2 ; n X i=1 y i4 y i3 ! 0 : Using (4.3) and (C.3), similarly, we can derive the following equations E (y it y i;th ) =E 2 i h i 1 + i , forh = 0; 1; 2;:::; (C.8) 2 See equation (9) on p. 5 in Chudik and Pesaran (2021). 203 and it follows that z c =plim n!1 z nc = E 2 i 1 + i ;E 2 i i 1 + i ;E 2 i 1 + i ;E 2 i 1 + i ;E 2 i 1 + i 0 ; z d =plim n!1 z nd = E 2 i i 1 + i ;E 2 i 2 i 1 + i ;E 2 i i 1 + i ;E 2 i 2 i 1 + i ;E 2 i 2 i 1 + i 0 : Thus, when i and 2 i are distributed independently and i = +v i wherev i sIIDUniform(a;a); fora> 0, we have plim n!1 h ^ BB E( i ) i = z 0 c W BB z c 1 z 0 c W BB z d ; (C.9) wherez c = 2 (c ;1 +c ;c ;c ;c ) 0 andz d = 2 (c 1; 1 c ;1 +c ; 1c ; 1c ) 0 with 2 =E( 2 i ) andc =E 1 1+ i = 1 2a ln 1+ +a 1+ a . To obtain the asymptotic bias of AB and BB estimators corresponding to our Monte Carlo experiment, we replaceW AB andW BB by the simulated weight matrices 3 with = 0:4, a = 0:5, T = 4, and n = 5; 000, with Gaussian errors with GARCH eects. In this case, the bias of AB and BB estimators are around -0.099 and -0.059, respectively, which are close to the simulated bias of these estimators reported in Table 4.3 in the main paper, namely -0.104 and -0.056, forT = 4 andn = 5; 000. 3 The simulated weight matrices are calculated as the average of the weight matrices used in calculating the two-step AB and BB estimators across 2,000 replications. 204 C.3 InitializationofpanelAR(1)processes Under Assumption 18,sup i j i j< 1, and using (4.1) we haveE(y it ) = i = i =(1 i ) and (4.1) can be written equivalently in the error correction form: y it =(1 i )(y i;t1 i ) +u it ; fort = 1; 2;:::;T: Suppose now thaty i0 are generated from the steady state distribution offy it g, theny i0 IID i ; 2 i 1 2 i : All the moment conditions used in estimation of the proposed estimator can be derived from this initial distribution, if it is further assumed that (y i0 i ) is distributed independently ofu it fort = 1; 2;:::;T . Note that y i1 =(1 i )(y i0 i ) +u i1 ; y i2 = i (1 i )(y i0 i ) +u i2 (1 i )u i1 ; and more generally y it = t1 i (1 i )(y i0 i ) +u it (1 i ) t1 X s=1 s1 i u i;ts ! ; fort = 1; 2;:::;T . Then, it follows that fort = 1; 2;:::;T ,E(y it ) = 0, and Var(y it j i ; 2 i ) = 2(t1) i (1 i ) 2 Var(y i0 j i ; 2 i ) + 2 i + (1 i ) 2 t1 X s=1 2(s1) i 2 i ! = 2(t1) i (1 i ) 2 2 i 1 2 i + 2 i + (1 i ) 2 2 i 1 2(t1) i 1 2 i ! = 2 2 i 1 + i : Thus, E y 2 it =Var(y it ) = 2E 2 i 1 + i ; (C.10) 205 which is the same as (4.5), derived assuming the AR has started from a distant past. Similarly, y it y i;t1 = t1 i (1 i ) 2 (y i0 i ) 2 t2 i + (1 i ) 2 t1 X s=0 s i u i;ts ! t2 X s=0 s i u i;t1s ! ; and noting thatE (y i0 i ) 2 j i ; 2 i = 2 i =(1 2 i ) we have E(y it y i;t1 j i ; 2 i ) = 2t3 i (1 i ) 2 2 i 1 2 i + (1 i ) 2 t2 X s=1 2s i ! 2 i (1 i ) 2 i = 2 i 2t3 i (1 i ) 2 1 2 i + 2 i (1 i ) 2 i 1 2(t2) i 1 2 i 2 i (1 i )(1 i ) 2 (1 i ) 2 = 2 i 1 i 1 + i : Hence, E(y it y i;t1 ) =E 2 i (1 i ) 1 + i : (C.11) Similarly, we can show thatE(y it y i;th ) =E 2 i (1 i ) h1 i 1+ i , forh = 1; 2;:::; which is the same as (4.6) in the main paper. C.4 MonteCarloevidence C.4.1 ComparisonofFDACandHetroGMMestimatorsformoments Tables C.1 and C.2 report bias, RMSE, and size of the FDAC and HetroGMM estimators of E( i ) and Var( i ) forn = 100; 200; 500; 1000; 5000 andT = 4; 5; 6; 8; 10, and the empirical power functions are shown in Figures C.1–C.4. 206 Table C.1: Bias, RMSE, and size of FDAC and HetroGMM estimators ofE( i ) in a heterogeneous panel AR(1) model with Gaussian errors and GARCH eects (a) UniformE( i ) = 0:4 witha = 0:5 (b) CategoricalE( i ) = 0:62 Bias RMSE Size (100) Bias RMSE Size (100) Hetro Hetro Hetro Hetro Hetro Hetro T n FDAC GMM FDAC GMM FDAC GMM FDAC GMM FDAC GMM FDAC GMM 4 100 -0.010 -0.045 0.204 0.390 8.8 6.5 -0.006 -0.025 0.200 0.410 8.7 5.9 4 200 -0.004 -0.021 0.152 0.240 7.8 6.9 -0.003 -0.019 0.141 0.237 6.3 5.9 4 500 0.001 -0.007 0.098 0.146 6.3 5.3 -0.001 -0.005 0.092 0.132 6.0 6.1 4 1,000 0.001 -0.004 0.069 0.101 5.3 5.5 -0.002 -0.001 0.066 0.091 4.5 5.5 4 5,000 0.001 0.001 0.031 0.045 5.2 5.9 0.000 -0.001 0.030 0.040 4.0 5.3 5 100 -0.005 0.008 0.157 0.183 7.6 8.3 -0.006 0.007 0.149 0.167 7.6 7.8 5 200 0.001 0.012 0.116 0.132 6.8 7.5 -0.004 0.003 0.106 0.123 5.6 6.6 5 500 0.001 0.005 0.074 0.088 5.7 6.5 -0.002 0.002 0.069 0.079 4.9 5.1 5 1,000 0.001 0.002 0.052 0.062 5.3 5.9 -0.002 0.002 0.049 0.056 4.7 5.2 5 5,000 0.000 0.001 0.023 0.028 4.9 4.8 -0.001 0.000 0.022 0.025 4.8 4.6 6 100 -0.005 0.013 0.133 0.135 7.4 9.8 0.000 0.011 0.123 0.127 6.6 8.8 6 200 0.001 0.013 0.098 0.099 7.3 7.2 -0.001 0.004 0.089 0.093 5.4 7.2 6 500 0.001 0.006 0.062 0.065 6.0 7.3 -0.001 0.002 0.057 0.061 4.4 5.6 6 1,000 0.000 0.002 0.044 0.047 5.6 5.3 -0.001 0.002 0.041 0.044 5.1 5.3 6 5,000 0.000 0.000 0.019 0.021 4.4 5.3 0.000 0.000 0.018 0.020 4.9 4.8 8 100 -0.003 0.013 0.108 0.101 7.5 11.1 0.001 0.009 0.097 0.092 5.3 9.4 8 200 0.001 0.011 0.079 0.075 5.9 9.0 -0.002 0.004 0.072 0.069 5.2 7.0 8 500 0.000 0.004 0.049 0.049 4.7 6.7 -0.001 0.002 0.046 0.046 4.8 5.3 8 1,000 -0.001 0.002 0.036 0.035 5.0 5.9 0.000 0.001 0.033 0.033 3.5 4.7 8 5,000 0.000 0.000 0.016 0.016 5.1 5.1 0.000 0.000 0.015 0.015 3.9 4.1 10 100 -0.002 0.013 0.094 0.088 7.4 13.7 0.000 0.008 0.084 0.078 5.9 10.2 10 200 0.001 0.010 0.071 0.065 6.0 9.2 -0.001 0.004 0.063 0.059 5.2 7.3 10 500 0.000 0.004 0.044 0.042 4.8 6.8 -0.001 0.002 0.040 0.039 3.7 5.3 10 1,000 -0.001 0.001 0.031 0.030 4.5 5.8 -0.001 0.001 0.028 0.027 3.5 4.9 10 5,000 0.000 0.001 0.014 0.014 4.4 4.8 0.000 0.000 0.013 0.013 3.6 4.0 Notes: The DGP is given byyit =i(1i)+iyi;t1 +uit fori = 1; 2;:::;n, andt =50;49;:::;T withi =i=(1i), where errors,uit =hit"it, are generated to be Gaussian distributed and cross-sectionally heteroskedastic with GARCH eects: "it IIDN(0; 1), andh 2 it = 2 i (1 0 1) + 0h 2 i;t1 + 1u 2 i;t1 withi IID(0:5 + 0:5z 2 i ), zi IIDN(0; 1), 0 = 0:6, and 1 = 0:2. The initial values are given byyi;51 = 0,"i;51 = 0, andhi;51 = 0. The heterogeneous AR(1) coecients are generated as case (a): uniform distributioni = +vi with = E(i) = 0:4 andvi IIDU(0:5; 0:5) and (b): categorical distribution Pr(i =L) = and Pr(i =H ) = 1 with = 0:3,L = 0:2, andH = 0:8. For each experiment, (i;i;i) 0 are generated dierently across replications. The FDAC estimator is calculated based on (4.23), and its asymptotic variance is estimated by the Delta method. The HetroGMM estimator and its asymptotic variance are calculated by (4.29) and (4.32). The estimation is based onfyi1;yi2;:::;yiTg fori = 1; 2;:::;n. The nominal size of the tests is set to 5 per cent. The number of replications is 2; 000. 207 Table C.2: Bias, RMSE, and size of FDAC and HetroGMM estimators ofVar( i ) in a heterogeneous panel AR(1) model with Gaussian errors and GARCH eects (a) UniformVar( i ) = 0:083 witha = 0:5 (b) CategoricalVar( i ) = 0:076 Bias RMSE Size (100) Bias RMSE Size (100) Hetro Hetro Hetro Hetro Hetro Hetro T n FDAC GMM FDAC GMM FDAC GMM FDAC GMM FDAC GMM FDAC GMM 5 100 -0.021 -0.062 0.192 0.389 7.8 4.9 -0.012 -0.065 0.186 0.390 8.6 4.8 5 200 -0.015 -0.033 0.139 0.271 7.8 6.4 -0.001 -0.028 0.135 0.254 7.4 4.4 5 500 -0.008 -0.019 0.086 0.165 5.8 4.9 0.001 -0.013 0.087 0.166 6.3 5.1 5 1,000 -0.004 -0.010 0.062 0.117 4.7 4.7 0.001 -0.007 0.062 0.117 6.1 5.6 5 5,000 -0.001 -0.004 0.028 0.055 4.6 6.0 0.000 -0.002 0.027 0.049 4.2 4.2 6 100 -0.016 -0.018 0.149 0.204 8.5 7.2 -0.008 -0.010 0.143 0.205 8.2 7.7 6 200 -0.011 -0.010 0.107 0.153 7.0 7.0 -0.002 -0.002 0.103 0.145 7.0 5.9 6 500 -0.005 -0.006 0.068 0.095 5.8 5.4 0.001 0.000 0.065 0.096 5.2 6.1 6 1,000 -0.003 -0.003 0.047 0.069 5.1 6.2 0.002 0.001 0.046 0.068 5.5 5.7 6 5,000 -0.001 -0.002 0.022 0.032 5.0 5.9 0.000 0.000 0.020 0.031 4.5 4.9 8 100 -0.010 -0.011 0.109 0.118 8.4 8.8 -0.007 -0.010 0.106 0.121 7.8 8.3 8 200 -0.006 -0.009 0.079 0.089 6.7 7.8 -0.001 -0.004 0.074 0.087 6.3 6.5 8 500 -0.003 -0.004 0.051 0.059 6.9 7.0 0.000 -0.001 0.049 0.059 5.8 6.8 8 1,000 -0.001 -0.002 0.036 0.042 6.3 6.1 0.001 0.000 0.034 0.042 5.5 6.0 8 5,000 0.000 -0.001 0.016 0.019 5.7 5.1 0.000 0.000 0.015 0.019 4.3 4.7 10 100 -0.009 -0.012 0.091 0.091 8.2 9.4 -0.005 -0.012 0.089 0.093 7.8 9.6 10 200 -0.005 -0.009 0.066 0.069 6.8 9.0 -0.001 -0.006 0.062 0.068 6.2 7.3 10 500 -0.002 -0.004 0.042 0.045 6.3 7.3 -0.001 -0.003 0.040 0.046 5.7 6.3 10 1,000 -0.001 -0.003 0.030 0.032 5.0 5.9 0.000 -0.001 0.029 0.033 5.4 5.9 10 5,000 0.000 -0.001 0.013 0.015 5.9 5.2 0.000 -0.001 0.013 0.015 4.2 5.4 Notes: The DGP is given byyit =i(1i)+iyi;t1 +uit fori = 1; 2;:::;n, andt =50;49;:::;T withi =i=(1i), where errors,uit =hit"it, featuring Gaussian errors with GARCH eects. The heterogeneous AR(1) coecients are generated as case (a): uniform distributioni = +vi with = E(i) = 0:4 andvi IIDU(0:5; 0:5), and case (b): categorical distribution Pr(i = L) = and Pr(i = H ) = 1 with = 0:3, L = 0:2, andH = 0:8. For each experiment, (i;i;i) 0 are generated dierently across replications. The FDAC estimator is calculated by plugging (4.23) and (4.24) into (4.38). The HetroGMM estimator is calculated by plugging (4.29) and (4.34) into (4.38). Their asymptotic variances are estimated by the Delta method. The estimation is based onfyi1;yi2;:::;yiTg fori = 1; 2;:::;n. The nominal size of the tests is set to 5 per cent. The number of replications is 2; 000. See also the notes to Table C.1. 208 Figure C.1: Empirical power functions for FDAC and HetroGMM estimators of E( i ) = 0:4 in a het- erogeneous AR(1) panel where i is uniformly distributed, i = + v i , = E( i ) = 0:4, and v i IIDU(0:5; 0:5), with Gaussian errors and GARCH eects 209 Figure C.2: Empirical power functions for FDAC and HetroGMM estimators of Var( i ) = 0:083 in a heterogeneous AR(1) panel where i is uniformly distributed, i = +v i , = E( i ) = 0:4, and v i IIDU(0:5; 0:5), with Gaussian errors and GARCH eects 210 Figure C.3: Empirical power functions for FDAC and HetroGMM estimators ofE( i ) = 0:62 in a hetero- geneous panel AR(1) model where i is categorical distributed with Gaussian errors and GARCH eects 211 Figure C.4: Empirical power functions for FDAC and HetroGMM estimators of Var( i ) = 0:076 in a heterogeneous panel AR(1) model where i is categorical distributed with Gaussian errors and GARCH eects 212 C.4.2 Simulation results of FDAC, FDLS, AH, AAH, AB, and BB estimators with non- Gaussianerrors Tables C.3–C.5 report bias, RMSE, and size of the FDAC, FDLS, AH, AAH, AB, and BB estimators with i = +v i ,v i IIDUniform(a;a), = 0:4, anda2f0; 0:3; 0:5g, under non-Gaussian errors and with GARCH eects. Figure C.5: Empirical power functions for FDAC and FDLS estimators of 0 = 0:4 in a homogeneous AR(1) panel with Gaussian errors and GARCH eects 213 Table C.3: Bias, RMSE, and size of FDAC, FDLS, AH, AAH, AB, and BB estimators of ( 0 = 0:4) in a homogeneous panel AR(1) model with non-Gaussian errors and GARCH eects Bias RMSE Size (100) T n FDAC FDLS AH AAH AB BB FDAC FDLS AH AAH AB BB FDAC FDLS AH AAH AB BB 4 100 -0.023 -0.006 1.447 0.042 -0.085 -0.002 0.281 0.240 62.477 0.305 0.348 0.174 12.7 10.2 10.3 16.1 17.2 24.4 4 1,000 -0.002 0.000 0.068 0.055 -0.016 -0.005 0.111 0.097 1.126 0.211 0.109 0.070 5.8 6.6 7.0 14.4 8.2 10.9 4 5,000 0.002 0.000 0.014 0.029 -0.005 -0.002 0.059 0.056 0.373 0.151 0.054 0.035 5.2 5.8 5.7 10.7 7.1 6.5 6 100 -0.006 -0.002 -0.086 0.008 -0.079 0.001 0.161 0.168 0.232 0.160 0.183 0.118 8.6 10.2 26.8 35.2 32.0 41.0 6 1,000 -0.001 -0.002 -0.012 0.001 -0.015 -0.004 0.064 0.068 0.084 0.063 0.057 0.039 5.3 5.8 10.2 12.8 10.8 12.7 6 5,000 0.000 -0.001 -0.004 -0.001 -0.006 -0.002 0.032 0.035 0.042 0.027 0.029 0.020 4.4 5.1 6.5 8.0 8.0 7.6 10 100 -0.006 -0.004 -0.060 0.000 -0.057 -0.009 0.109 0.121 0.116 0.108 0.107 0.080 8.6 9.8 57.1 70.7 61.9 69.7 10 1,000 -0.001 -0.002 -0.012 -0.005 -0.014 -0.007 0.041 0.047 0.036 0.030 0.032 0.024 5.3 6.1 16.6 24.0 19.6 23.2 10 5,000 -0.001 0.000 -0.005 -0.003 -0.007 -0.004 0.021 0.024 0.019 0.015 0.016 0.012 4.6 4.7 8.8 12.2 11.8 12.2 Notes: The DGP is given byyit =i(1i) +iyi;t1 +uit fori = 1; 2;:::;n, andt =50;49;:::;T withi =i=(1i), where errors,uit =hit"it, are generated to be non-Gaussian distributed and cross-sectionally heteroskedastic with GARCH eects:"it = (eit 2)=2, whereeitIID 2 2 , 2 2 is a chi-squared variate with two degrees of freedom, andh 2 it = 2 i (1 0 1)+ 0h 2 i;t1 + 1u 2 i;t1 with 2 i IID(0:5+0:5z 2 i ),ziIIDN(0; 1), 0 = 0:6, and 1 = 0:2. The initial values are given byyi;51 = 0, "i;51 = 0, andhi;51 = 0. The homogeneous AR(1) coecients are generated asi = fori = 1; 2;:::;n with0 = 0:4. For each experiment, (i;i) 0 are generated dierently across replications. The FDAC estimator is calculated by (4.23), and its asymptotic variance is estimated by the Delta method. “FDLS" denotes the rst-dierence least square estimator proposed by Han and Phillips (2010). “AH", “AAH", “AB", and “BB" denote the 2-step GMM estimators proposed by Anderson and Hsiao (1981, 1982), Chudik and Pesaran (2021), Arellano and Bond (1991), and Blundell and Bond (1998). The estimation is based onfyi1;yi2;:::;yiTg fori = 1; 2;:::;n. The nominal size of the tests is set to 5 per cent. The number of replications is 2; 000. 214 Table C.4: Bias, RMSE, and size of FDAC, FDLS, AH, AAH, AB, and BB estimators in a heterogeneous panel AR(1) model where i = +v i and v i IIDU(a;a) with = 0:4,a = 0:3, non-Gaussian errors, and GARCH eects Bias RMSE Size (100) T n FDAC FDLS AH AAH AB BB FDAC FDLS AH AAH AB BB FDAC FDLS AH AAH AB BB 4 100 -0.023 -0.024 -57.795 0.020 -0.127 -0.005 0.284 0.244 2590.301 0.305 0.420 0.190 12.9 11.1 12.0 18.1 19.4 26.7 4 1,000 -0.003 -0.022 -0.035 0.014 -0.042 -0.020 0.112 0.100 0.452 0.189 0.137 0.078 6.3 6.6 12.4 15.7 11.1 12.2 4 5,000 0.002 -0.022 -0.061 -0.015 -0.026 -0.017 0.060 0.060 0.295 0.112 0.065 0.041 5.4 8.8 14.5 14.5 11.2 11.3 6 100 -0.005 -0.022 -0.130 -0.006 -0.109 -0.002 0.168 0.176 0.261 0.159 0.210 0.133 9.2 10.3 33.1 34.2 37.0 44.7 6 1,000 -0.001 -0.024 -0.058 -0.011 -0.038 -0.013 0.066 0.074 0.104 0.055 0.074 0.044 5.1 8.1 20.0 12.4 17.6 14.2 6 5,000 0.000 -0.023 -0.051 -0.011 -0.026 -0.009 0.033 0.043 0.067 0.031 0.041 0.024 4.1 13.5 28.9 9.8 18.1 11.1 10 100 -0.005 -0.024 -0.083 -0.012 -0.075 -0.007 0.116 0.129 0.138 0.114 0.127 0.096 9.3 10.2 62.2 69.7 66.1 72.2 10 1,000 -0.002 -0.024 -0.034 -0.008 -0.027 -0.007 0.044 0.055 0.052 0.034 0.044 0.028 5.7 9.4 30.5 20.8 28.1 21.2 10 5,000 -0.001 -0.022 -0.026 -0.002 -0.017 0.000 0.022 0.034 0.034 0.016 0.025 0.014 4.3 20.2 37.4 9.0 25.4 10.0 Notes: The DGP is given byyit =i(1i) +iyi;t1 +uit fori = 1; 2;:::;n, andt =50;49;:::;T withi =i=(1i) featuring non-Gaussian errors with GARCH eects. The heterogeneous AR(1) coecients are generated as case (a): i = +vi andvi IIDU(a;a) with = 0:4 anda = 0:3. For each experiment, (i;i;i) 0 are generated dierently across replications. The FDAC estimator is calculated by (4.23), and its asymptotic variance is estimated by the Delta method. “FDLS" denotes the rst- dierence least square estimator proposed by Han and Phillips (2010). “AH", “AAH", “AB", and “BB" denote the 2-step GMM estimators proposed by Anderson and Hsiao (1981, 1982), Chudik and Pesaran (2021), Arellano and Bond (1991), and Blundell and Bond (1998). The estimation is based onfyi1;yi2;:::;yiTg fori = 1; 2;:::;n. The nominal size of the tests is set to 5 per cent. The number of replications is 2; 000. See also the notes to Table C.3. 215 Table C.5: Bias, RMSE, and size of FDAC, FDLS, AH, AAH, AB, and BB estimators in a heterogeneous panel AR(1) model where i = +v i and v i IIDU(a;a) with = 0:4,a = 0:5, non-Gaussian errors, and GARCH eects Bias RMSE Size (100) T n FDAC FDLS AH AAH AB BB FDAC FDLS AH AAH AB BB FDAC FDLS AH AAH AB BB 4 100 -0.022 -0.059 3.092 -0.020 -0.214 0.012 0.289 0.255 123.889 0.299 0.628 0.242 12.2 12.3 18.0 21.1 24.0 35.0 4 1,000 -0.004 -0.062 -0.160 -0.053 -0.117 -0.052 0.115 0.119 0.623 0.159 0.202 0.103 6.8 12.9 24.8 22.5 22.5 19.4 4 5,000 0.002 -0.062 -0.182 -0.073 -0.097 -0.054 0.060 0.084 0.238 0.100 0.125 0.069 5.3 31.9 48.9 40.4 35.4 31.5 6 100 -0.005 -0.059 -0.211 -0.030 -0.179 0.015 0.178 0.193 0.314 0.165 0.273 0.174 9.5 12.7 44.0 35.2 47.8 54.5 6 1,000 -0.001 -0.064 -0.147 -0.034 -0.113 -0.036 0.071 0.097 0.172 0.064 0.138 0.063 5.6 20.8 53.2 17.8 47.4 25.0 6 5,000 0.000 -0.063 -0.141 -0.035 -0.101 -0.032 0.035 0.073 0.148 0.043 0.110 0.042 3.9 55.3 90.3 34.3 79.1 36.9 10 100 -0.005 -0.061 -0.139 -0.038 -0.121 0.019 0.128 0.150 0.190 0.131 0.173 0.136 9.7 14.2 73.8 69.5 75.9 80.7 10 1,000 -0.002 -0.064 -0.097 -0.031 -0.082 -0.022 0.049 0.084 0.108 0.051 0.093 0.044 5.6 30.8 77.0 33.0 72.0 34.9 10 5,000 -0.001 -0.062 -0.089 -0.025 -0.073 -0.012 0.025 0.068 0.093 0.033 0.077 0.023 4.9 77.2 98.0 36.6 94.2 23.1 Notes: The DGP is given byyit =i(1i) +iyi;t1 +uit fori = 1; 2;:::;n, andt =50;49;:::;T withi =i=(1i) featuring non-Gaussian errors with GARCH eects. The heterogeneous AR(1) coecients are generated as case (a): i = +vi andvi IIDU(a;a) with = 0:4 anda = 0:5. For each experiment, (i;i;i) 0 are generated dierently across replications. The FDAC estimator is calculated by (4.23), and its asymptotic variance is estimated by the Delta method. “FDLS" denotes the rst- dierence least square estimator proposed by Han and Phillips (2010). “AH", “AAH", “AB", and “BB" denote the 2-step GMM estimators proposed by Anderson and Hsiao (1981, 1982), Chudik and Pesaran (2021), Arellano and Bond (1991), and Blundell and Bond (1998). The estimation is based onfyi1;yi2;:::;yiTg fori = 1; 2;:::;n. The nominal size of the tests is set to 5 per cent. The number of replications is 2; 000. See also the notes to Table C.3. 216 C.4.3 SimulationresultsofFDACandMSWestimators Table C.6 reports bias, RMSE, and size of the FDAC and MSW estimators in homogeneous panels with i = for alli. Table C.7 reports bias, RMSE, and size of the FDAC and MSW estimators in the experiments wherey i0 are not drawn from the steady state distribution offy it g. Table C.6: Bias, RMSE, and size of FDAC and MSW estimators of in homogeneous panels with Gaussian errors and GARCH eects Bias RMSE Size (100) T n FDAC MSW FDAC MSW FDAC MSW i = = 0:475 for alli 6 100 0.004 -0.045 0.098 0.087 7.9 9.1 6 1,000 0.000 -0.037 0.030 0.045 4.1 35.8 10 100 0.003 -0.028 0.064 0.086 6.5 7.1 10 1,000 0.000 -0.024 0.020 0.036 4.2 17.0 i = = 0:620 for alli 6 100 0.003 0.094 0.095 0.143 7.9 12.0 6 1,000 0.000 0.096 0.029 0.102 4.5 78.1 10 100 0.003 0.125 0.062 0.169 5.5 18.7 10 1,000 0.000 0.123 0.019 0.129 4.3 92.2 Notes: The DGP is given byyit =i(1i)+iyi;t1 +uit fori = 1; 2;:::;n, andt =50;49;:::;T withi =i=(1i) featuring Gaussian errors with GARCH eects. The homogeneous AR(1) coecients are generated as i = for all i with 0 2f0:475; 0:62g. The FDAC estimator is calculated based on (4.23), and its asymptotic variance is estimated by the Delta method. “MSW" denotes the kernel-weighting likelihood estimator proposed by Mavroeidis et al. (2015) and calculated based on an assumption that (i;i)jyi1 follows a multivariate normal distributionN(;V ) with initial values given by = (5; 0:5), = 2, = 0:4,corr(i;i) = 0:5 withu = 0:5. The number of replications is 1; 000. See also the notes to Table C.1. 217 Table C.7: Bias, RMSE, and size of FDAC and MSW estimators in a heterogeneous panel AR(1) model under non-stationary (M = 1; 2) and stationary (M =1) initialization with i = +v i , = 0:4, and v i IIDU(0:3; 0:3) FDAC MSW T n=M 1 2 1 1 2 1 Bias 4 100 0.072 0.026 -0.004 -0.054 -0.042 -0.002 4 1,000 0.078 0.035 0.002 -0.045 -0.036 0.002 6 100 0.045 0.018 0.000 -0.062 -0.040 0.008 6 1,000 0.045 0.018 0.001 -0.053 -0.029 0.021 10 100 0.022 0.007 -0.002 -0.067 -0.036 0.019 10 1,000 0.023 0.008 -0.001 -0.063 -0.032 0.022 RMSE 4 100 0.203 0.194 0.193 0.083 0.082 0.085 4 1,000 0.099 0.070 0.062 0.051 0.042 0.030 6 100 0.130 0.124 0.123 0.086 0.080 0.086 6 1,000 0.060 0.044 0.040 0.057 0.037 0.035 10 100 0.086 0.084 0.084 0.093 0.083 0.092 10 1,000 0.036 0.029 0.028 0.066 0.040 0.036 Size (100) 4 100 9.9 9.2 7.7 21.3 14.6 8.0 4 1,000 26.9 7.0 6.2 73.2 44.5 10.8 6 100 11.8 8.3 7.1 21.4 10.7 6.5 6 1,000 24.4 8.9 5.2 80.0 29.3 14.5 10 100 7.2 5.5 5.5 20.4 9.2 6.2 10 1,000 14.5 6.6 5.8 84.8 30.7 13.6 Notes: The DGP is given by yit = i(1 i) + iyi;t1 + uit for i = 1; 2;:::;n, and t = 50;49;:::;T with i = i=(1 i) featuring Gaussian errors with GARCH eects. The heterogeneous AR(1) coecients are generated as case (a): i = + vi and vi IIDU(a;a) with = 0:4 and a = 0:3. The initial values are generated as yi1 IIDN i(1 M i ); 2 i (1 2M i )=(1 2 i ) with M = 1; 2 for the non-stationary case and M = 51 for the sta- tionary case (denoted by “1"). The FDAC estimator is calculated based on (4.23), and its asymptotic variance is estimated by the Delta method. “MSW" denotes the kernel-weighting likelihood estimator proposed by Mavroeidis et al. (2015). The number of replications is 1; 000. See also the notes to Table C.6. C.4.4 RobustnessofFDACestimatortodierenterrorprocesses Tables C.8–C.15 report bias, RMSE, and size of the FDAC estimators ofE( i ) andVar( i ), and Tables C.16–C.18 report bias and RMSE of the FDAC estimator of the parameters of the categorical distribution, namely L ; H ; and. 218 Table C.8: Bias, RMSE, and size of FDAC estimator of =E( i ) in a heterogeneous panel AR(1) model with Gaussian errors Bias RMSE Size (100) T=n 100 200 500 1,000 5,000 100 200 500 1,000 5,000 100 200 500 1,000 5,000 i = +v i andv i IIDU(a;a) with = 0:4 anda = 0:3 4 -0.004 0.001 0.001 0.001 0.000 0.170 0.119 0.074 0.054 0.024 7.7 5.7 4.2 4.8 4.0 5 0.002 0.001 0.000 0.000 0.000 0.120 0.088 0.054 0.039 0.018 6.0 5.2 4.4 4.9 4.6 6 0.000 0.000 0.000 0.000 0.000 0.102 0.074 0.046 0.033 0.015 6.6 6.2 4.8 5.2 5.4 8 0.001 0.002 0.001 0.000 0.000 0.081 0.058 0.036 0.026 0.012 6.3 5.7 5.1 5.0 5.1 10 0.001 0.001 0.001 0.001 0.000 0.068 0.050 0.031 0.022 0.010 5.0 5.2 4.0 4.7 5.3 i = +v i andv i IIDU(a;a) with = 0:4 anda = 0:5 4 -0.004 0.000 0.001 0.001 0.000 0.178 0.126 0.080 0.057 0.026 7.0 6.7 5.9 4.9 4.8 5 -0.006 -0.002 -0.001 0.000 0.000 0.133 0.097 0.061 0.043 0.019 6.0 6.7 5.2 5.4 4.9 6 -0.003 -0.001 -0.001 0.000 0.000 0.115 0.080 0.050 0.035 0.016 7.0 6.0 4.4 5.1 5.1 8 -0.004 0.000 -0.001 0.000 0.000 0.093 0.065 0.040 0.029 0.013 7.1 5.6 5.1 5.0 4.4 10 -0.002 0.000 -0.001 -0.001 0.000 0.080 0.056 0.035 0.025 0.011 6.7 5.4 4.3 4.6 4.4 Pr( i = 0:2) = 0:3 andPr( i = 0:8) = 0:7 with = 0:62 4 -0.004 -0.002 0.001 0.001 0.001 0.171 0.123 0.077 0.053 0.024 8.4 7.1 5.8 4.6 4.7 5 0.001 0.001 0.001 0.001 0.000 0.128 0.090 0.056 0.040 0.019 7.0 5.4 4.4 5.2 4.7 6 0.002 0.002 0.001 0.001 0.001 0.107 0.075 0.047 0.033 0.015 7.3 5.2 4.0 4.2 4.2 8 0.002 0.001 0.000 0.000 0.000 0.084 0.060 0.037 0.026 0.012 5.6 4.7 4.0 3.1 3.6 10 0.000 0.000 0.001 0.000 0.000 0.072 0.051 0.032 0.022 0.010 5.0 4.0 3.5 2.6 3.5 Notes: The DGP is given byyit =i(1i)+iyi;t1 +uit fori = 1; 2;:::;n, andt =50;49;:::;T withi =i=(1i) featuring Gaussian errors. For each experiment, (i;i;i) 0 are generated dierently across replications. The FDAC estimator is calculated by (4.23), and its asymptotic variance is estimated by the Delta method. The estimation is based onfyi1;yi2;:::;yiTg fori = 1; 2;:::;n. The nominal size of the tests is set to 5 per cent. The number of replications is 2; 000. Table C.9: Bias, RMSE, and size of FDAC estimator ofVar( i ) in a heterogeneous panel AR(1) model with Gaussian errors Bias RMSE Size (100) T=n 100 200 500 1,000 5,000 100 200 500 1,000 5,000 100 200 500 1,000 5,000 Var( i ) = 0:03 where i = +v i andv i IIDU(a;a) with = 0:4 anda = 0:3 5 -0.009 -0.005 -0.005 -0.002 -0.001 0.160 0.112 0.073 0.051 0.023 7.0 6.2 6.3 5.3 4.9 6 -0.007 -0.004 -0.003 -0.001 0.000 0.122 0.084 0.053 0.038 0.017 7.5 5.8 5.2 5.3 5.6 8 -0.006 -0.003 -0.002 -0.001 0.000 0.088 0.061 0.039 0.028 0.013 6.8 6.0 5.3 5.6 5.1 10 -0.005 -0.003 -0.002 -0.001 0.000 0.073 0.050 0.031 0.023 0.010 6.8 5.7 4.6 5.1 4.6 Var( i ) = 0:083 where i = +v i andv i IIDU(a;a) with = 0:4 anda = 0:5 5 -0.019 -0.009 -0.004 -0.002 -0.001 0.165 0.116 0.074 0.053 0.023 8.3 7.0 5.3 5.1 4.1 6 -0.013 -0.006 -0.002 -0.001 -0.001 0.125 0.090 0.056 0.041 0.018 6.7 6.7 5.7 5.9 5.3 8 -0.006 -0.002 0.000 0.000 0.000 0.091 0.065 0.041 0.029 0.013 6.5 5.3 5.2 5.2 4.9 10 -0.005 -0.002 0.000 0.000 0.000 0.075 0.054 0.034 0.025 0.011 6.7 5.8 5.7 5.9 5.3 Var( i ) = 0:076 wherePr( i = 0:2) = 0:3 andPr( i = 0:8) = 0:7 5 -0.013 -0.006 -0.004 -0.003 -0.001 0.154 0.112 0.072 0.049 0.023 6.9 6.3 5.4 3.8 4.7 6 -0.004 -0.003 -0.001 0.000 0.000 0.118 0.085 0.054 0.037 0.017 6.7 6.0 5.1 4.5 5.3 8 -0.004 -0.002 -0.001 0.000 0.000 0.087 0.061 0.039 0.027 0.013 6.3 5.6 5.5 5.1 5.2 10 -0.003 -0.002 0.000 0.000 0.000 0.073 0.050 0.032 0.022 0.010 7.4 5.4 4.6 4.3 4.6 Notes: The FDAC estimator is computed by plugging (4.23) and (4.24) into (4.38). See the notes to Table C.8. 219 Table C.10: Bias, RMSE, and size of FDAC estimator ofE( i ) in a heterogeneous panel AR(1) model with Gaussian errors and GARCH eects Bias RMSE Size (100) T=n 100 200 500 1,000 5,000 100 200 500 1,000 5,000 100 200 500 1,000 5,000 i = +v i andv i IIDU(a;a) with = 0:4 anda = 0:3 4 -0.008 -0.008 -0.004 -0.003 -0.001 0.192 0.141 0.091 0.066 0.030 7.6 7.6 6.0 5.9 6.2 5 -0.002 -0.003 -0.001 -0.002 0.000 0.145 0.106 0.067 0.049 0.023 7.0 6.8 5.3 6.0 6.4 6 -0.001 -0.003 -0.001 -0.002 0.000 0.123 0.090 0.057 0.040 0.019 7.1 6.0 4.8 4.6 6.2 8 -0.002 -0.002 -0.001 -0.001 0.001 0.100 0.072 0.046 0.033 0.015 7.6 6.6 5.8 5.4 5.3 10 -0.001 -0.002 0.000 -0.001 0.001 0.085 0.062 0.040 0.028 0.013 6.8 6.2 5.6 5.1 5.3 Notes: The DGP is given byyit =i(1i)+iyi;t1 +uit fori = 1; 2;:::;n, andt =50;49;:::;T withi =i=(1i) featuring Gaussian errors with GARCH eects. For each experiment, (i;i;i) 0 are generated dierently across replications. The FDAC estimator is calculated by (4.23), and its asymptotic variance is estimated by the Delta method. The estimation is based onfyi1;yi2;:::;yiTg fori = 1; 2;:::;n. The nominal size of the tests is set to 5 per cent. The number of replications is 2; 000. Table C.11: Bias, RMSE, and size of FDAC estimator ofVar( i ) in a heterogeneous panel AR(1) model with Gaussian errors and GARCH eects Bias RMSE Size (100) T=n 100 200 500 1,000 5,000 100 200 500 1,000 5,000 100 200 500 1,000 5,000 Var( i ) = 0:03 where i = +v i andv i IIDU(a;a) with = 0:4 anda = 0:3 5 -0.018 -0.009 -0.004 0.001 0.000 0.183 0.131 0.088 0.062 0.028 8.6 5.8 6.1 5.9 5.6 6 -0.010 -0.006 -0.003 -0.001 -0.001 0.139 0.099 0.065 0.046 0.021 6.9 6.2 5.9 5.5 5.3 8 -0.010 -0.004 -0.002 -0.001 0.000 0.103 0.073 0.047 0.033 0.015 8.1 5.8 5.8 5.2 4.9 10 -0.007 -0.002 -0.002 -0.001 0.000 0.086 0.061 0.039 0.028 0.013 7.8 6.3 6.2 4.8 5.3 Notes: The FDAC estimator is computed by plugging (4.23) and (4.24) into (4.38). See also the notes to Table C.10. 220 Table C.12: Bias, RMSE, and size of FDAC estimator ofE( i ) in a heterogeneous panel AR(1) model with non-Gaussian errors Bias RMSE Size (100) T=n 100 200 500 1,000 5,000 100 200 500 1,000 5,000 100 200 500 1,000 5,000 i = +v i andv i IIDU(a;a) with = 0:4 anda = 0:3 4 -0.007 -0.005 -0.003 0.000 0.000 0.216 0.158 0.106 0.076 0.033 8.5 6.3 7.3 6.8 5.1 5 -0.003 -0.002 0.000 0.001 0.000 0.155 0.110 0.072 0.053 0.023 8.8 7.2 6.0 5.9 5.8 6 0.002 0.001 0.000 0.001 0.000 0.122 0.088 0.057 0.040 0.018 7.4 6.0 6.2 5.0 4.9 8 -0.001 0.000 -0.001 0.000 0.000 0.093 0.067 0.044 0.031 0.014 6.8 6.0 6.6 5.7 5.4 10 0.002 0.001 0.000 0.001 0.000 0.079 0.056 0.037 0.026 0.012 7.4 6.5 6.3 6.0 5.1 i = +v i andv i IIDU(a;a) with = 0:4 anda = 0:5 4 0.007 0.006 0.002 0.003 0.000 0.226 0.163 0.106 0.075 0.033 9.7 7.7 7.3 6.8 4.6 5 0.000 0.003 -0.001 -0.001 0.000 0.165 0.120 0.076 0.054 0.024 9.0 7.8 5.6 6.1 5.7 6 0.002 0.004 0.001 0.001 0.000 0.132 0.097 0.062 0.044 0.019 7.2 6.7 6.0 5.4 5.1 8 0.003 0.003 0.000 0.000 0.000 0.106 0.076 0.047 0.034 0.015 8.6 6.0 5.9 5.3 5.3 10 0.000 0.000 0.000 0.000 0.000 0.088 0.064 0.041 0.029 0.013 7.2 5.9 4.8 5.2 4.8 Pr( i = 0:2) = 0:3 andPr( i = 0:8) = 0:7 with = 0:62 4 0.001 -0.004 -0.001 -0.001 -0.001 0.195 0.141 0.090 0.065 0.028 9.0 7.1 5.9 5.5 4.9 5 -0.001 -0.001 0.000 0.000 0.000 0.136 0.099 0.064 0.046 0.021 6.8 5.8 5.6 4.6 4.7 6 0.000 -0.001 0.001 0.000 0.000 0.116 0.083 0.052 0.038 0.017 6.7 5.6 5.3 5.1 5.0 8 0.000 0.000 0.001 0.001 0.000 0.093 0.065 0.041 0.030 0.013 6.8 5.9 4.5 4.1 3.8 10 0.001 0.000 0.001 0.000 0.000 0.078 0.056 0.035 0.025 0.012 5.6 5.1 3.7 3.8 3.8 Notes: The DGP is given by yit = i(1 i) + iyi;t1 + uit for i = 1; 2;:::;n, and t = 50;49;:::;T with i = i=(1i) featuring non-Gaussian errors. For each experiment, (i;i;i) 0 are generated dierently across replications. The FDAC estimator is calculated by (4.23), and its asymptotic variance is estimated by the Delta method. The estimation is based on fyi1;yi2;:::;yiTg fori = 1; 2;:::;n. The nominal size of the tests is set to 5 per cent. The number of replications is 2; 000. Table C.13: Bias, RMSE, and size of FDAC estimator ofVar( i ) in a heterogeneous panel AR(1) model with non-Gaussian errors Bias RMSE Size (100) T=n 100 200 500 1,000 5,000 100 200 500 1,000 5,000 100 200 500 1,000 5,000 Var( i ) = 0:03 where i = +v i andv i IIDU(a;a) with = 0:4 anda = 0:3 5 -0.018 -0.010 -0.004 -0.002 -0.001 0.166 0.121 0.077 0.055 0.025 7.9 7.2 6.1 5.1 5.3 6 -0.014 -0.007 -0.003 -0.001 -0.001 0.124 0.091 0.057 0.040 0.018 7.2 7.3 5.3 5.0 4.7 8 -0.009 -0.004 -0.002 -0.001 0.000 0.090 0.065 0.042 0.029 0.013 7.3 6.9 6.2 5.3 5.9 10 -0.005 -0.002 -0.001 0.000 0.000 0.073 0.052 0.034 0.024 0.011 7.6 6.3 6.5 5.7 5.3 Var( i ) = 0:083 where i = +v i andv i IIDU(a;a) with = 0:4 anda = 0:5 5 -0.019 -0.008 -0.004 -0.003 -0.001 0.173 0.125 0.080 0.055 0.025 6.9 6.9 6.6 5.9 5.8 6 -0.011 -0.006 -0.002 -0.001 -0.001 0.130 0.094 0.060 0.041 0.019 8.1 6.9 5.7 4.8 5.3 8 -0.009 -0.005 -0.002 -0.001 0.000 0.093 0.066 0.043 0.030 0.014 7.3 6.6 6.3 5.0 5.5 10 -0.004 -0.004 -0.002 -0.001 0.000 0.076 0.055 0.036 0.025 0.012 6.6 6.3 6.7 5.3 5.4 Var( i ) = 0:076 wherePr( i = 0:2) = 0:3 andPr( i = 0:8) = 0:7 5 -0.015 -0.010 -0.002 -0.001 -0.001 0.162 0.114 0.072 0.051 0.023 8.5 6.3 5.8 4.7 5.2 6 -0.016 -0.007 -0.003 -0.002 -0.001 0.124 0.087 0.055 0.038 0.018 9.6 7.0 5.9 5.0 4.9 8 -0.005 -0.003 -0.002 -0.002 -0.001 0.089 0.063 0.040 0.028 0.013 7.4 6.6 5.9 5.4 5.1 10 -0.003 -0.002 -0.002 -0.002 -0.001 0.073 0.052 0.033 0.023 0.010 7.2 6.3 5.3 5.3 4.8 Notes: The FDAC estimator is computed by plugging (4.23)–(4.24) into (4.38). See the notes to Table C.12. 221 Table C.14: Bias, RMSE, and size of FDAC estimator ofE( i ) in a heterogeneous panel AR(1) model with non-Gaussian errors and GARCH eects Bias RMSE Size (100) T=n 100 200 500 1,000 5,000 100 200 500 1,000 5,000 100 200 500 1,000 5,000 i = +v i andv i IIDU(a;a) with = 0:4 anda = 0:3 4 -0.009 -0.010 -0.011 -0.007 -0.001 0.279 0.214 0.148 0.112 0.057 12.3 10.0 7.5 6.8 5.6 5 -0.009 -0.007 -0.008 -0.004 -0.001 0.208 0.157 0.110 0.084 0.042 11.8 8.7 7.7 6.4 4.6 6 -0.009 -0.006 -0.005 -0.003 -0.001 0.172 0.136 0.092 0.067 0.033 10.8 8.5 7.1 5.4 4.8 8 -0.006 -0.006 -0.005 -0.003 -0.001 0.133 0.105 0.072 0.052 0.026 9.0 8.1 7.6 5.5 4.4 10 -0.005 -0.003 -0.003 -0.002 -0.001 0.116 0.089 0.059 0.043 0.022 9.1 7.7 7.2 5.2 4.2 i = +v i andv i IIDU(a;a) with = 0:4 anda = 0:5 4 -0.023 -0.008 -0.001 0.001 0.000 0.293 0.228 0.156 0.116 0.060 11.2 10.0 7.8 6.0 6.2 5 -0.012 -0.005 -0.001 -0.001 0.001 0.218 0.176 0.122 0.092 0.046 10.2 9.3 8.2 6.8 5.0 6 -0.011 -0.004 -0.002 0.000 0.001 0.188 0.149 0.102 0.077 0.037 10.4 9.3 7.8 7.6 4.3 8 -0.011 -0.005 -0.002 -0.001 0.000 0.148 0.118 0.080 0.060 0.029 9.4 8.2 6.9 6.7 4.7 10 -0.010 -0.004 -0.001 0.000 0.000 0.130 0.103 0.072 0.054 0.026 10.0 8.2 7.1 6.2 4.4 Pr( i = 0:2) = 0:3 andPr( i = 0:8) = 0:7 with = 0:62 4 -0.016 -0.012 -0.004 -0.006 0.002 0.279 0.216 0.158 0.120 0.063 12.7 9.8 8.2 6.9 5.8 5 -0.008 -0.004 -0.003 -0.004 0.001 0.204 0.158 0.113 0.086 0.042 11.2 9.2 7.1 6.0 4.9 6 -0.006 -0.002 -0.002 -0.002 0.000 0.174 0.136 0.096 0.073 0.034 10.5 8.1 6.5 6.4 5.0 8 -0.006 -0.001 0.000 -0.001 0.000 0.136 0.103 0.075 0.057 0.027 8.6 6.9 6.3 5.9 4.0 10 -0.002 0.000 0.000 0.000 0.000 0.126 0.096 0.067 0.052 0.024 9.1 6.3 5.8 5.1 4.2 Notes: The DGP is given by yit = i(1 i) + iyi;t1 + uit for i = 1; 2;:::;n, and t = 50;49;:::;T with i = i=(1i) featuring non-Gaussian errors with GARCH eects. For each experiment, (i;i;i) 0 are generated dierently across replications. The FDAC estimator is calculated by (4.23), and its asymptotic variance is estimated by the Delta method. The estimation is based onfyi1;yi2;:::;yiTg fori = 1; 2;:::;n. The nominal size of the tests is set to 5 per cent. The number of replications is 2; 000. Table C.15: Bias, RMSE, and size of FDAC estimator ofVar( i ) in a heterogeneous panel AR(1) model with non-Gaussian errors and GARCH eects Bias RMSE Size (100) T=n 100 200 500 1,000 5,000 100 200 500 1,000 5,000 100 200 500 1,000 5,000 Var( i ) = 0:03 where i = +v i andv i IIDU(a;a) with = 0:4 anda = 0:3 5 -0.022 -0.014 -0.010 -0.006 -0.004 0.237 0.181 0.124 0.094 0.046 12.2 8.7 7.4 6.5 5.0 6 -0.012 -0.010 -0.003 -0.002 -0.002 0.174 0.134 0.092 0.070 0.035 10.9 7.4 6.7 6.4 6.0 8 -0.007 -0.005 -0.003 -0.002 -0.001 0.130 0.101 0.069 0.052 0.026 10.1 8.8 7.1 5.9 5.1 10 -0.008 -0.005 -0.003 -0.002 -0.001 0.108 0.085 0.057 0.043 0.021 9.9 8.8 6.0 6.4 5.7 Var( i ) = 0:083 where i = +v i andv i IIDU(a;a) with = 0:4 anda = 0:5 5 -0.029 -0.024 -0.016 -0.010 0.000 0.232 0.186 0.131 0.097 0.046 11.1 9.4 7.4 5.6 4.3 6 -0.021 -0.015 -0.010 -0.007 0.000 0.180 0.142 0.099 0.075 0.036 10.0 8.8 6.4 6.3 5.0 8 -0.011 -0.008 -0.005 -0.004 -0.001 0.136 0.105 0.071 0.055 0.026 10.2 8.2 6.6 6.4 5.8 10 -0.007 -0.006 -0.004 -0.003 -0.001 0.113 0.088 0.060 0.045 0.021 10.3 8.7 7.1 5.8 5.1 Var( i ) = 0:076 wherePr( i = 0:2) = 0:3 andPr( i = 0:8) = 0:7 5 -0.025 -0.014 -0.010 -0.004 -0.001 0.229 0.175 0.125 0.098 0.051 11.2 8.9 7.4 5.8 5.1 6 -0.010 -0.008 -0.006 0.000 -0.001 0.170 0.132 0.096 0.074 0.037 11.2 8.7 6.9 5.4 5.4 8 -0.007 -0.004 -0.004 0.000 0.000 0.128 0.099 0.071 0.055 0.027 8.6 8.3 6.2 5.9 5.8 10 -0.007 -0.004 -0.002 -0.001 0.000 0.110 0.084 0.060 0.045 0.022 9.7 7.0 6.3 5.9 4.6 Notes: The FDAC estimator is computed by plugging (4.23)–(4.24) into (4.38). See the notes to Table C.14. 222 Table C.16: Bias and RMSE of FDAC estimator of categorical distribution parameters ( L ; H ;) 0 with Gaussian errors i = 0:4+v i ,v i IIDU(0:3;0:3) i = 0:4+v i ,v i IIDU(0:5;0:5) Categorical distributed i L = 0:11 H = 0:69 = 0:5 L = 0:11 H = 0:69 = 0:5 L = 0:2 H = 0:8 = 0:3 T n Bias RMSE Bias RMSE Bias RMSE Bias RMSE Bias RMSE Bias RMSE Bias RMSE Bias RMSE Bias RMSE 6 2,000 -0.250 4.478 0.181 1.233 0.012 0.291 -0.216 3.167 0.222 1.093 0.003 0.299 -0.081 0.518 0.203 1.599 0.100 0.312 6 5,000 -0.037 0.167 0.039 0.182 -0.001 0.215 -0.043 0.267 0.061 0.234 0.010 0.237 -0.026 0.218 0.043 0.149 0.055 0.220 6 10,000 -0.020 0.116 0.018 0.117 -0.001 0.167 -0.020 0.124 0.024 0.131 0.003 0.177 -0.012 0.155 0.020 0.086 0.033 0.160 6 50,000 -0.003 0.050 0.004 0.047 0.002 0.081 -0.003 0.053 0.004 0.050 0.001 0.085 -0.002 0.070 0.004 0.029 0.008 0.070 8 2,000 -0.032 0.181 0.053 0.217 0.013 0.224 -0.040 0.235 0.062 0.243 0.010 0.235 -0.027 0.252 0.055 0.202 0.064 0.236 8 5,000 -0.013 0.103 0.017 0.105 0.004 0.156 -0.014 0.112 0.020 0.112 0.005 0.166 -0.007 0.143 0.018 0.073 0.032 0.153 8 10,000 -0.007 0.072 0.006 0.068 0.001 0.114 -0.009 0.080 0.008 0.075 0.001 0.124 -0.005 0.102 0.007 0.045 0.015 0.105 8 50,000 -0.001 0.032 0.002 0.029 0.001 0.053 -0.001 0.035 0.001 0.031 0.001 0.057 -0.001 0.046 0.002 0.018 0.004 0.045 10 2,000 -0.017 0.130 0.031 0.140 0.011 0.188 -0.021 0.149 0.039 0.168 0.011 0.203 -0.014 0.186 0.031 0.112 0.049 0.196 10 5,000 -0.006 0.079 0.011 0.077 0.005 0.124 -0.008 0.090 0.014 0.088 0.006 0.140 -0.005 0.116 0.011 0.052 0.022 0.120 10 10,000 -0.004 0.056 0.005 0.052 0.002 0.091 -0.005 0.064 0.006 0.057 0.002 0.101 -0.004 0.083 0.005 0.034 0.010 0.083 10 50,000 0.000 0.026 0.001 0.023 0.001 0.042 -0.001 0.028 0.001 0.024 0.000 0.046 -0.001 0.037 0.001 0.014 0.002 0.036 Notes: The DGP is given byyit =i(1i) +iyi;t1 +uit fori = 1; 2;:::;n, andt =50;49;:::;T withi =i=(1i) featuring Gaussian errors. The heterogeneous AR(1) coecients are generated as case (a): uniform distributioni = +vi andviIIDU(a;a) with = 0:4 anda2f0:3; 0:5g, and case (b): categorical distribution Pr(i =L) = and Pr(i =H ) = 1 with = 0:3,L = 0:2, andH = 0:8 such thatE(i) = 0:62. The FDAC estimator is calculated by (4.18) and (4.22). The number of replications is 2,000. 223 Table C.17: Bias and RMSE of FDAC estimator of categorical distribution parameters ( L ; H ;) 0 with non-Gaussian errors i = 0:4+v i ,v i IIDU(0:3;0:3) i = 0:4+v i ,v i IIDU(0:5;0:5) Categorical distributed i L = 0:11 H = 0:69 = 0:5 L = 0:11 H = 0:69 = 0:5 L = 0:2 H = 0:8 = 0:3 T n Bias RMSE Bias RMSE Bias RMSE Bias RMSE Bias RMSE Bias RMSE Bias RMSE Bias RMSE Bias RMSE 6 2,000 -0.044 2.382 0.130 0.571 0.017 0.289 -0.581 112.257 1.301 46.586 0.012 0.440 -0.119 0.699 0.329 8.457 0.085 0.305 6 5,000 -0.030 0.164 0.044 0.180 0.008 0.214 7.445 322.661 0.497 17.235 0.013 0.396 -0.033 0.229 0.044 0.155 0.054 0.225 6 10,000 -0.017 0.114 0.020 0.112 0.003 0.164 -0.228 2.265 0.207 1.701 0.005 0.370 -0.015 0.163 0.023 0.090 0.035 0.168 6 50,000 -0.003 0.050 0.005 0.047 0.002 0.082 -0.035 0.144 0.046 0.155 0.008 0.259 -0.003 0.071 0.003 0.030 0.007 0.071 8 2,000 -0.032 0.175 0.060 0.234 0.012 0.225 -0.385 8.894 0.065 9.112 0.009 0.401 -0.030 0.226 0.052 0.208 0.056 0.230 8 5,000 -0.011 0.103 0.020 0.105 0.008 0.155 -0.129 0.462 0.326 5.081 0.016 0.350 -0.011 0.140 0.016 0.075 0.027 0.148 8 10,000 -0.006 0.073 0.009 0.069 0.003 0.115 -0.067 0.235 0.123 1.090 0.008 0.307 -0.006 0.102 0.008 0.046 0.014 0.106 8 50,000 -0.001 0.032 0.002 0.029 0.001 0.052 -0.014 0.084 0.019 0.088 0.006 0.191 -0.001 0.046 0.001 0.018 0.003 0.045 10 2,000 -0.018 0.132 0.039 0.165 0.012 0.190 -0.916 30.862 0.420 16.282 0.013 0.374 -0.017 0.182 0.034 0.143 0.043 0.193 10 5,000 -0.006 0.081 0.014 0.078 0.007 0.127 -0.077 0.260 0.127 0.652 0.014 0.321 -0.009 0.116 0.010 0.055 0.017 0.120 10 10,000 -0.004 0.058 0.006 0.053 0.002 0.093 -0.042 0.161 0.078 0.663 0.010 0.275 -0.004 0.083 0.005 0.034 0.010 0.083 10 50,000 0.000 0.026 0.002 0.023 0.002 0.042 -0.008 0.066 0.014 0.068 0.008 0.162 -0.001 0.038 0.001 0.014 0.002 0.037 Notes: The DGP is given byyit =i(1i)+iyi;t1 +uit fori = 1; 2;:::;n, andt =50;49;:::;T withi =i=(1i) featuring non-Gaussian errors. The heterogeneous AR(1) coecients are generated as case (a): uniform distributioni = +vi andviIIDU(a;a) with = 0:4 anda2f0:3; 0:5g, and case (b): categorical distribution Pr(i =L) = and Pr(i =H ) = 1 with = 0:3,L = 0:2, andH = 0:8 such thatE(i) = 0:62. The FDAC estimator is calculated by (4.18) and (4.22). The number of replications is 2,000. 224 Table C.18: Bias and RMSE of FDAC estimator of categorical distribution parameters ( L ; H ;) 0 with non-Gaussian errors with GARCH eects i = 0:4+v i ,v i IIDU(0:3;0:3) i = 0:4+v i ,v i IIDU(0:5;0:5) Categorical distributed i L = 0:11 H = 0:69 = 0:5 L = 0:11 H = 0:69 = 0:5 L = 0:2 H = 0:8 = 0:3 T n Bias RMSE Bias RMSE Bias RMSE Bias RMSE Bias RMSE Bias RMSE Bias RMSE Bias RMSE Bias RMSE 6 2,000 -0.097 2.825 -0.497 28.154 0.019 0.347 -0.321 7.038 -0.707 17.935 0.023 0.463 -0.834 10.246 0.774 8.182 0.109 0.375 6 5,000 -0.016 3.287 0.182 2.468 0.013 0.306 0.064 7.597 1.958 111.930 0.004 0.611 -0.209 1.508 0.295 3.222 0.093 0.321 6 10,000 -0.043 0.346 0.069 0.424 0.010 0.244 -0.369 4.600 0.987 27.901 0.012 0.404 -0.141 2.251 0.112 0.494 0.077 0.271 6 50,000 -0.015 0.249 0.025 0.207 0.006 0.148 -0.140 1.264 0.103 3.232 0.010 0.342 -0.009 0.144 0.024 0.156 0.027 0.149 8 2,000 -0.157 2.712 0.362 11.808 0.013 0.301 -0.355 7.342 0.575 26.846 -0.004 0.450 -0.527 5.822 0.572 8.689 0.085 0.321 8 5,000 -0.036 0.375 -0.040 2.775 0.013 0.234 -0.086 7.742 -0.034 8.012 0.014 0.398 -0.129 3.435 0.114 0.982 0.064 0.263 8 10,000 -0.028 0.344 0.039 0.334 0.009 0.188 -0.099 2.452 0.400 12.054 0.010 0.374 -0.035 0.339 0.066 1.116 0.042 0.196 8 50,000 -0.005 0.072 0.009 0.080 0.003 0.106 -0.060 0.358 0.268 7.459 0.007 0.286 0.001 0.095 0.009 0.057 0.017 0.103 10 2,000 -0.052 0.393 0.245 14.341 0.022 0.266 -0.423 12.188 4.518 105.464 0.017 0.428 -0.122 1.218 0.247 4.237 0.088 0.296 10 5,000 -0.011 0.295 0.039 0.492 0.015 0.203 -0.160 2.766 0.221 11.469 0.019 0.382 -0.050 0.705 0.053 0.280 0.049 0.217 10 10,000 -0.013 0.127 0.025 0.253 0.009 0.159 -0.084 2.053 0.161 3.386 0.010 0.346 -0.058 1.372 0.033 0.281 0.035 0.169 10 50,000 -0.003 0.055 0.006 0.054 0.003 0.088 -0.036 0.186 0.056 0.301 0.015 0.257 0.003 0.079 0.007 0.045 0.014 0.087 Notes: The DGP is given byyit =i(1i) +iyi;t1 +uit fori = 1; 2;:::;n, andt =50;49;:::;T withi =i=(1i) featuring non-Gaussian errors with GARCH eects. The heterogeneous AR(1) coecients are generated as case (a): uniform distributioni = +vi andviIIDU(a;a) with = 0:4 anda2f0:3; 0:5g, and case (b): categorical distribution Pr(i =L) = and Pr(i =H ) = 1 with = 0:3,L = 0:2, andH = 0:8 such thatE(i) = 0:62. The FDAC estimator is calculated by (4.18) and (4.22). The number of replications is 2,000. 225 C.5 Empiricalapplicationresultsforothersub-periodsofthePSID Table C.19 shows the distribution of cross-sectional observation numbers by year based on the sample selection criterion in Meghir and Pistaferri (2004). For dierent sub-periods, Tables C.20 and C.21 report the estimates of mean persistence of log real earnings in a panel AR(1) model with a common linear trend, and Tables C.22–C.24 report the estimates of the variance of heterogeneous persistence. Table C.19: Distribution of individual observation numbers by year Year Number of observations 1976 1,600 1977 1,663 1978 1,706 1979 1,773 1980 1,800 1981 1,868 1982 1,884 1983 1,933 1984 1,972 1985 2,012 1986 2,053 1987 2,083 1988 2,091 1989 2,008 1990 1,907 1991 1,831 1992 1,711 1993 1,576 1994 1,471 1995 1,384 Total 36,325 Notes: The sample selection criteria of Meghir and Pistaferri (2004) are summarized as the following. (i) Individuals are from the “core" sample, i.e., the 1968 SRC cross-section sample and the 1968 Census sample. (ii) Individuals are continuously heads of their families. (iii) Over the respective observed period, the range of individuals’ ages is 25 to 55. (iv) Individuals are males. (v) Individuals have nine years or more observations of usable (non-zero and not top-coded) money income of laborearningsit. (vi) Individuals have no missing records of education or race over their sample periods. (vii) Observations with only self-employed status are dropped. (viii) Observations of outcome variablesyit = log(earningsit=pt) with outlying deviations yit > 5 or yit <1 are dropped. 226 Table C.20: Estimates of mean persistence ( =E( i )) of log real earnings in a panel AR(1) model with a common linear trend using PSID data over the sub-periods 1976–1980, 1981–1985 and 1986–1990 1976–1980,T = 5 1981–1985,T = 5 1986–1990,T = 5 All Category by education All Category by education All Category by education categories HSD HSG CLG categories HSD HSG CLG categories HSD HSG CLG Homogeneous slopes AAH 0.527 0.545 0.489 0.560 0.481 0.426 0.465 0.598 0.499 0.725 0.426 0.491 (0.051) (0.079) (0.084) (0.070) (0.038) (0.083) (0.046) (0.072) (0.035) (0.093) (0.041) (0.065) AB 0.326 0.346 0.076 0.623 0.219 0.286 0.178 -0.066 0.281 0.239 0.305 0.131 (0.109) (0.151) (0.148) (0.207) (0.071) (0.092) (0.092) (0.214) (0.089) (0.303) (0.100) (0.171) BB 0.905 0.916 0.898 0.916 0.957 0.939 0.962 1.041 0.939 0.897 0.929 0.978 (0.012) (0.015) (0.015) (0.028) (0.005) (0.009) (0.006) (0.014) (0.011) (0.027) (0.012) (0.014) Heterogeneous slopes FDAC 0.589 0.567 0.595 0.607 0.602 0.428 0.596 0.844 0.675 0.760 0.604 0.805 (0.037) (0.062) (0.056) (0.079) (0.039) (0.076) (0.053) (0.056) (0.032) (0.083) (0.042) (0.056) MSW 0.419 0.388 0.434 0.452 0.420 0.378 0.439 0.452 0.429 0.427 0.427 0.450 (0.060) (0.058) (0.045) (0.030) (0.058) (0.055) (0.031) (0.031) (0.056) (0.048) (0.056) (0.046) Common linear trend 0.023 0.029 0.021 0.021 0.025 0.036 0.019 0.032 0.018 0.009 0.021 0.014 n 1,312 363 641 308 1,489 283 855 351 1,654 201 994 459 Notes: The estimates are based on the heterogeneous panel AR(1) model with a common linear trend,yit =i +g(1i)t +iyi;t1 +uit, whereyit =log(earningsit=pt) using PSID data over the sub-periods 1976–1980, 1981–1985, and 1986–1990. “HSD" refers to high school dropouts with less than 12 years of education, “HSG" refers to high school graduates with at least 12 but less than 16 years of education, and “CLG" refers to college graduates with at least 16 years of education. The common trend,g, is estimated by ^ gFD = n 1 (T 1) 1 P n i=1 P T t=2 yit. Then the estimation for is based on ~ yit = yit ^ gFDt fort = 1; 2;:::;T . “AAH", “AB", and “BB" denote dierent 2-step GMM estimators proposed by Chudik and Pesaran (2021), Arellano and Bond (1991), and Blundell and Bond (1998). The FDAC estimator is calculated by (4.23), and its asymptotic variance is estimated by the Delta method. “MSW" denotes the kernel-weighted estimator in Mavroeidis et al. (2015) and is calculated based on a parametric assumption that (i;i)jyi1 follows a multivariate normal distributionN(;V ) with initial values given by = (5; 0:5), = 2, = 0:4,corr(i;i) = 0:5 withu = 0:5. 227 Table C.21: Estimates of mean persistence ( =E( i )) of log real earnings in a panel AR(1) model with a common linear trend using PSID data over the sub-periods 1976–1985 and 1981–1990 1976–1985,T = 10 1981–1990,T = 10 All Category by education All Category by education categories HSD HSG CLG categories HSD HSG CLG Homogeneous slopes AAH 0.615 0.532 0.587 0.632 0.579 0.545 0.529 0.654 (0.044) (0.040) (0.045) (0.027) (0.030) (0.038) (0.027) (0.043) AB 0.471 0.402 0.391 0.348 0.265 0.261 0.273 0.388 (0.048) (0.054) (0.061) (0.051) (0.041) (0.053) (0.038) (0.059) BB 0.960 0.922 0.962 1.001 0.958 0.956 0.961 0.978 (0.002) (0.004) (0.002) (0.002) (0.002) (0.002) (0.002) (0.002) Heterogeneous slopes FDAC 0.643 0.554 0.637 0.766 0.628 0.614 0.600 0.734 (0.028) (0.052) (0.041) (0.054) (0.025) (0.057) (0.033) (0.042) MSW 0.443 0.397 0.443 0.474 0.458 0.453 0.446 0.541 (0.060) (0.047) (0.067) (0.062) (0.030) (0.041) (0.025) (0.064) Common linear trend 0.024 0.026 0.021 0.029 0.023 0.031 0.019 0.025 n 885 201 458 226 1,046 170 620 256 Notes: The estimates are based on the heterogeneous panel AR(1) model with a common linear trend,yit =i +g(1i)t + iyi;t1 +uit, whereyit =log(earningsit=pt) using PSID data over the sub-periods 1976–1985 and 1981–1990. “HSD" refers to high school dropouts with less than 12 years of education, “HSG" refers to high school graduates with at least 12 but less than 16 years of education, and “CLG" refers to college graduates with at least 16 years of education. The common trend,g, is estimated by ^ gFD = n 1 (T 1) 1 P n i=1 P T t=2 yit. Then the estimation for is based on ~ yit = yit ^ gFDt fort = 1; 2;:::;T . “AAH", “AB", and “BB" denote dierent 2-step GMM estimators proposed by Chudik and Pesaran (2021), Arellano and Bond (1991), and Blundell and Bond (1998). The FDAC estimator is calculated by (4.23), and its asymptotic variance is estimated by the Delta method. “MSW" denotes the kernel-weighted estimator in Mavroeidis et al. (2015) and is calculated based on a parametric assumption that (i;i)jyi1 follows a multivariate normal distribution N(;V ) with initial values given by = (5; 0:5), = 2, = 0:4,corr(i;i) = 0:5 withu = 0:5. 228 Table C.22: Estimates of variance of heterogeneous persistence (Var( i )) of log real earnings in a panel AR(1) model with a common linear trend using PSID data over the sub-periods 1991–1995 and 1986–1995 1991–1995,T = 5 1986–1995,T = 10 All Category by education All Category by education categories HSD HSG CLG categories HSD HSG CLG FDAC 0.100 0.204 0.081 0.091 0.129 0.122 0.120 0.141 (0.042) (0.100) (0.054) (0.090) (0.023) (0.060) (0.031) (0.036) MSW 0.012 0.011 0.011 0.010 0.015 0.010 0.011 0.014 (0.003) (0.009) (0.004) (0.007) (0.005) (0.011) (0.005) (0.011) n 1,366 127 832 407 1,139 109 689 341 Notes: The estimates are based on the heterogeneous panel AR(1) model with a common linear trend,yit =i +g(1i)t + iyi;t1 +uit, whereyit =log(earningsit=pt) using PSID data over the sub-periods 1991–1995 and 1986–1995. “HSD" refers to high school dropouts with less than 12 years of education, “HSG" refers to high school graduates with at least 12 but less than 16 years of education, and “CLG" refers to college graduates with at least 16 years of education. The common trend,g, is estimated by ^ gFD =n 1 (T 1) 1 P n i=1 P T t=2 yit. Then the estimation for is based on ~ yit =yit ^ gFDt fort = 1; 2;:::;T . The The FDAC estimator ofVar(i) is calculated by (4.38), and its asymptotic variance is estimated by the Delta method. “MSW" denotes the kernel-weighted maximum likelihood estimator in Mavroeidis et al. (2015). Table C.23: Estimates of variance of heterogeneous persistence (Var( i )) of log real earnings in a panel AR(1) model with a common linear trend using PSID data over the sub-periods 1976–1985 and 1981–1990 1976–1985,T = 10 1981–1990,T = 10 All Category by education All Category by education categories HSD HSG CLG categories HSD HSG CLG FDAC 0.095 0.139 0.100 0.001 0.150 0.104 0.171 0.113 (0.028) (0.049) (0.043) (0.046) (0.022) (0.058) (0.026) (0.046) MSW 0.016 0.013 0.013 0.013 0.003 0.008 0.003 0.012 (0.007) (0.010) (0.010) (0.013) (0.011) (0.011) (0.010) (0.014) n 885 201 458 226 1,046 170 620 256 Notes: The estimates are based on the heterogeneous panel AR(1) model with a common linear trend,yit =i +g(1i)t + iyi;t1 +uit, whereyit =log(earningsit=pt) using PSID data over the sub-periods 1976–1985 and 1981–1990. “HSD" refers to high school dropouts with less than 12 years of education, “HSG" refers to high school graduates with at least 12 but less than 16 years of education, and “CLG" refers to college graduates with at least 16 years of education. The common trend,g, is estimated by ^ gFD = n 1 (T 1) 1 P n i=1 P T t=2 yit. Then the estimation for is based on ~ yit = yit ^ gFDt fort = 1; 2;:::;T . The FDAC estimator is calculated by (4.38), and its asymptotic variance is estimated by the Delta method. “MSW" denotes the estimator proposed by Mavroeidis et al. (2015). See also the notes to Table C.21. 229 Table C.24: Estimates of variance of heterogeneous persistence (Var( i )) of log real earnings in a panel AR(1) model with a common linear trend using PSID data over the sub-periods 1976–1980, 1981–1985, and 1986–1990 1976–1980,T = 5 1981–1985,T = 5 1986–1990,T = 5 All Category by education All Category by education All Category by education categories HSD HSG CLG categories HSD HSG CLG categories HSD HSG CLG FDAC 0.038 0.072 0.025 0.013 0.089 0.040 0.093 0.032 0.126 0.095 0.111 0.151 (0.056) (0.078) (0.099) (0.098) (0.037) (0.068) (0.052) (0.072) (0.040) (0.105) (0.056) (0.054) MSW 0.015 0.014 0.013 0.009 0.015 0.014 0.010 0.009 0.015 0.011 0.013 0.011 (0.004) (0.008) (0.005) (0.006) (0.004) (0.008) (0.002) (0.005) (0.004) (0.008) (0.004) (0.007) n 1,312 363 641 308 1,489 283 855 351 1,654 201 994 459 Notes: The estimates are based on the heterogeneous panel AR(1) model with a common linear trend,yit =i +g(1i)t +iyi;t1 +uit, whereyit =log(earningsit=pt) using PSID data over the sub-periods 1976–1980, 1981–1985, and 1986–1990. “HSD" refers to high school dropouts with less than 12 years of education, “HSG" refers to high school graduates with at least 12 but less than 16 years of education, and “CLG" refers to college graduates with at least 16 years of education. The common trend,g, is estimated by ^ gFD = n 1 (T 1) 1 P n i=1 P T t=2 yit. Then the estimation for is based on ~ yit = yit ^ gFDt fort = 1; 2;:::;T . The FDAC estimator is calculated by (4.38), and its asymptotic variance is estimated by the Delta method. “MSW" denotes the estimator proposed by Mavroeidis et al. (2015). See also the notes to Table C.20. 230
Abstract (if available)
Abstract
This dissertation aims to develop new estimation methods for heterogeneous panel data models with a large number of cross-sectional units (n) but a limited number of periods (T). In the second chapter, for average treatment effects, the mean group (MG) estimator proposed by Pesaran and Smith (1995) is extended to short T heterogeneous dynamic panels by the Jackknife (JK). The MG-JK estimator is root-n consistent as n and T tend to infinity and n/T^4 converges to zero. For the validity of the Jackknife in finite samples, a sufficient condition for the r-th moment existence of the MG estimator is derived. Using the MG-JK estimator, we find a close-to-zero effect of minimum wages on total employment but a negative impact on teenage employment in the U.S. The third chapter (with M. Hashem Pesaran) proposes a new trimmed mean group (TMG) estimator for average treatment effects in large n static panels with correlated heterogeneous coefficients where T can be as small as the number of regressors. The TMG estimator is consistent and asymptotically normal distributed. A suitable trimming process is chosen by bias/efficiency trade-off. A Hausman-type test is proposed to diagnose the validity of the fixed effects estimation. The fourth chapter (with M. Hashem Pesaran) considers a first-order autoregressive panel data model with individual-specific effects and a heterogeneous autoregressive coefficient. New estimators are proposed for the moments of the random autoregressive coefficients. The identification conditions under which the probability distribution of the autoregressive coefficients assuming a categorical distribution are investigated.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Estimation of heterogeneous average treatment effect-panel data correlated random coefficients model with polychotomous endogenous treatments
PDF
Essays on econometrics analysis of panel data models
PDF
Essays on the estimation and inference of heterogeneous treatment effects
PDF
Essays on treatment effect and policy learning
PDF
Three essays on the statistical inference of dynamic panel models
PDF
Three essays on econometrics
PDF
Essays on the econometric analysis of cross-sectional dependence
PDF
Large N, T asymptotic analysis of panel data models with incidental parameters
PDF
Three essays on the identification and estimation of structural economic models
PDF
Three essays on linear and non-linear econometric dependencies
PDF
Two essays in econometrics: large N T properties of IV, GMM, MLE and least square model selection/averaging
PDF
Essays on nonparametric and finite-sample econometrics
PDF
Panel data forecasting and application to epidemic disease
PDF
Essays on beliefs, networks and spatial modeling
PDF
Essays on high-dimensional econometric models
PDF
Essays on innovation, human capital, and COVID-19 related policies
PDF
Essays on family planning policies
PDF
Essays on competition and antitrust issues in the airline industry
PDF
Essays on health economics
PDF
Essays on price determinants in the Los Angeles housing market
Asset Metadata
Creator
Yang, Liying
(author)
Core Title
Essays on estimation and inference for heterogeneous panel data models with large n and short T
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Economics
Degree Conferral Date
2023-05
Publication Date
11/11/2024
Defense Date
05/11/2023
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
average treatment effects,correlated heterogeneity,dynamic panels,mean group estimation,OAI-PMH Harvest,random and group heterogeneity,short T panels
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Pesaran, M. Hashem (
committee chair
), Hsiao, Cheng (
committee member
), Ridder, Geert (
committee member
)
Creator Email
yangliyi@usc.edu,yangliying92@gmail.com
Unique identifier
UC113121765
Identifier
etd-YangLiying-11834.pdf (filename)
Legacy Identifier
etd-YangLiying-11834
Document Type
Dissertation
Format
theses (aat)
Rights
Yang, Liying
Internet Media Type
application/pdf
Type
texts
Source
20230512-usctheses-batch-1043
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
average treatment effects
correlated heterogeneity
dynamic panels
mean group estimation
random and group heterogeneity
short T panels