Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Essays on econometrics analysis of panel data models
(USC Thesis Other)
Essays on econometrics analysis of panel data models
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ESSAYS ON ECONOMETRICS ANALYSIS OF PANEL DATA MODELS by Yimeng Xie A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Economics) May 2021 Copyright 2021 Yimeng Xie To my parents and family. ii Acknowledgements First of all, I would like to express my deep gratitude to my parents. They have devoted so much to what I achieved today. For the long time of my pursuit of higher education, they supported me selflessly in not only finance but also spirits, and always encouraged me whenever I encountered difficulties in life or studies. I am really blessed to be their son. Last July, my beloved father left me forever, and it is such a regret that I cannot share the day of my graduation with him together. I will appreciate him and my mother for the lifetime. Their deep love for family and enterprising spirit will be inherited by me. In addition, I am very thankful to my sister for undertaking many burdens of the family for me. Without her, it is impossible for me to leave home and study overseas for so many years. I am so lucky to have such a supportive sister. I am particularly grateful to my advisors Dr. Cheng Hsiao, Dr. M. Hashem Pesaran and Dr. Qiankun Zhou. My research is so indebted to their immense knowledge and insightful comments, and I learned from them about the enthusiasm and the attitudes to be a really outstanding scholar in econometrics. They are the models of mentors and teachers I have been and will always be admiring. In addition, I am thankful to Dr. Scott Joslin for his kindness to be the outside member of my dissertation committee. My third thanks goes to Dr. Shin-huei Wang, who is the first instructor of econometrics in my undergraduate. She introduced the fantastic world of econometrics to me, and encouraged me to come to USC for more training. Without her recommendation and encouragement, I would not take the first step on this long journey to USC. Finally I would like to thank my colleagues and friends from USC. My research also bene- fits from helpful discussions with them, and it is such a pleasure for me to share with them the unforgettable campus life. iii Table of Contents Dedication ii Acknowledgements iii List of Tables vi List of Figures vii Abstract viii Chapter 1: Dimension heterogeneity of panel threshold model 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.1 Dimension Heterogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.2 Model and Assumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Projection Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.1 Intuition behind the Estimator . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.2 A Simple Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3.2.1 Asymptotic Theory . . . . . . . . . . . . . . . . . . . . . . . . 15 1.3.3 More Generalized Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.3.3.1 d j = 1 for all j= 1;;J . . . . . . . . . . . . . . . . . . . . . 17 1.3.3.2 d j > 1 for some j . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.3.3.3 Unknown d j . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.3.4 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.4 Monte Carlo Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 1.4.1 DGP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 1.4.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 1.4.2.1 Experiment A . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 1.4.2.2 Experiment B . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 1.5 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 1.5.1 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 1.5.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 1.5.3 Robust Examination with Mundlak Specification . . . . . . . . . . . . . . 35 1.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 iv Chapter 2: Factor Dimension Detection in Panel Interactive Effects Models 42 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.2 The Bai and Ng (2002) Factor Model Dimension Determination Criterion and Their Finite Sample Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.3 The Model and the Recursive Iterating Procedure to Implement Bai and Ng (2002) Information Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.4 Orthogonal Projection Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.5 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.6 Empirical Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Chapter 3: Semiparametric Least Squares Estimation of Binary Choice Panel Data Mod- els with Endogeneity 63 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.3 Identification and Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.3.1 Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.3.2 Semiparametric Least Squares (SLS) Estimation . . . . . . . . . . . . . . 71 3.4 Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.4.1 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.4.2 Asymptotic Distribution of the SLS Estimator . . . . . . . . . . . . . . . . 75 3.5 Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 3.6 Empirical Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 References 93 Appendices 97 A Appendix to Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 B Appendix to Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 B.1 Recursively Iterating Procedure . . . . . . . . . . . . . . . . . . . . . . . 105 B.2 Orthogonal Projection Method . . . . . . . . . . . . . . . . . . . . . . . . 108 C Appendix to Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 C.1 Proof for Theorem 3.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 C.2 Proof for Theorem 3.4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 C.3 Proof for Theorem 3.4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 v List of Tables 1.1 Simulation results of coefficients for Experiment A . . . . . . . . . . . . . . . . . 37 1.2 Simulation results of threshold parameters for Experiment A . . . . . . . . . . . . 38 1.3 Simulation results of threshold numbers for Experiment B . . . . . . . . . . . . . 39 1.4 The number and values of thresholds for each dimension . . . . . . . . . . . . . . 40 1.5 Estimation results of investment spending model . . . . . . . . . . . . . . . . . . 40 1.6 The number and values of thresholds for each dimension . . . . . . . . . . . . . . 41 1.7 Estimation results of investment spending model . . . . . . . . . . . . . . . . . . 41 2.1 Average Number of Factors Selected during Replications for Case 1 with three factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 2.2 Percentage of correctly estimating the number of factors for Case 1 with three factors 58 2.3 Average Number of Factors Selected during Replications for Case 2 with three factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 2.4 Percentage of correctly estimating the number of factors for Case 2 with three factors 60 2.5 Average Number of Factors Selected during Replications for Case 3 with three factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 2.6 Percentage of correctly estimating the number of factors for Case 3 with three factors 62 2.7 Determination of dim(f t ) in model (2.45) . . . . . . . . . . . . . . . . . . . . . . . . 62 3.1 Small sample properties of the estimators ofb 2 =b 1 for Design I (Gaussian errors) . 87 3.2 Small sample properties of the estimators of b 2 =b 1 for Design II (Non-Gaussian errors) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.3 Descriptive statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 3.4 Estimation results of labor force participation for married women . . . . . . . . . . 90 vi List of Figures 1.1 Interactive sparsity problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.1 Average structural function for the first year of our panel . . . . . . . . . . . . . . 91 3.2 Average structural function for the second year of our panel . . . . . . . . . . . . . 91 3.3 Average structural function for the third year of our panel . . . . . . . . . . . . . . 92 3.4 Average structural function averaged over all three yeas of our panel . . . . . . . . 92 vii Abstract Panel data models have been increasingly popular due to its flexibility to describe economics phe- nomena and the increasing availability of the source of big data. This dissertation contributes to the econometric analysis of panel data in mainly nonlinear model and latent factor model. The first chapter is concerned with unobserved heterogeneity between regressors of a panel data threshold model. Previous studies unnecessarily assume coefficients of arbitrary regressors change across the same thresholds which may cause misspecification of model and break down in- ference and prediction. In this chapter we allow coefficient of each regressor to be changed across specific thresholds such that regressors sharing the same thresholds are defined as one dimension while across dimensions thresholds can be different. We propose a new estimator for this gener- alized model by exploiting the characteristics of modeling threshold. The estimator is shown to be asymptotically valid regardless of the numbers of both dimensions and thresholds. It is com- putationally efficient especially when there is sparse interaction, i.e. a threshold of one dimension is too close to some threshold of another dimension. Small sample properties of the estimator are investigated by Monte Carlo simulations and shown to be appealing under many different settings. Moreover, the estimator is applied to two empirical finance studies and thus verifies the importance of discussing dimension heterogeneity in threshold models. In the second chapter (co-authored with Cheng Hsiao and Qiankun Zhou), we consider a com- putationally simple orthogonal projection method to implement the Bai and Ng (2002) information criterion to select the factor dimension for panel interactive effects models that bypasses issues aris- ing from the joint estimation of the slope coefficients and factor structure. Our simulations show that it performs well in cases the method can be implemented. viii The third chapter (co-authored with Anastasia Semykina, Fan Yang and Qiankun Zhou) con- siders a semiparametric least squares estimation of binary response panel data models with en- dogenous regressors. The estimator relies on the correlated random effects model and control function approach to address the endogeneity due to the presence of the unobserved time-invariant effect and nonzero correlation of the idiosyncratic error with one or more explanatory variables. We derive the asymptotic properties of the proposed estimator and use Monte Carlo simulations to show that it performs well in finite samples. As an illustration, the considered method is used for estimating the effect of non-wife income on labor force participation of married women. ix Chapter 1 Dimension heterogeneity of panel threshold model 1.1 Introduction Threshold model has been increasingly popular after firstly proposed in Tong (1978). The model is usually constructed by presuming economic relationship is changing if the value of an observed variable surpasses a predetermined and unknown threshold. It has been widely used in time series and cross sectional applications to explore economic pattern in different economic regime. One of the most famous time series applications is threshold autoregressive models. For example, Beaudry and Koop (1993) apply threshold autoregressive model to U.S. GNP and verify asymmetric effects of shocks over business cycle. In addition, as Hansen (2011) discussed, threshold autoregressive modeling has been also involved in macro and finance studies such as interest arbitrage, purchas- ing power parity, exchange rates, stock returns, and transaction cost effects. Nowadays, as the pandemic of COVID 19 swept the global economy, plenty of researches switch to threshold au- toregressive models again to study either the pattern of virus transmission or influence of pandemic on economics development, see Kim et al. (2020) and Chudik et al. (2020) for instance. Threshold model is also popular with cross sectional applications. Among them one of the most famous re- searches is Card et al. (2008) who observe a tipping point of minority share in a neighborhood by threshold model. They find white population of a neighborhood will drop if the minority share sur- passes the tipping point. Another example is Yu and Phillips (2018) in which threshold in income level is studied and it is found tax deferred savings policies induce different impacts on savings for 1 people in different income level. Recently researches of threshold model have also been extended to panel data model, of which economics regimes can vary along both time series dimension or cross sectional dimension. One of classic panel data applications is examination of the effect of financial constraints on investment decisions by short panel data of firm level, which has been studied by Hansen (1999) and Seo and Shin (2016). However, all of the above works make arbitrary implicit assumptions about heterogeneity of regressors in threshold model. For example, most of these papers allow coefficients of all regres- sors to change when the value of threshold variable surpasses the same threshold while others arbitrarily assume coefficients of some regressors change across the same threshold. These im- plicit assumptions indeed are unnecessary and will induce adverse impacts on both specification and prediction. In this paper, we define “dimension” as regressors which share the same thresh- old structure including number and values of thresholds, and allow threshold structure of different dimensions to be different. Thereafter we design an estimator to address this potential problem by taking advantage of structure of threshold model. The essence of our estimator is based on the indicator function and linearity in threshold model such that we can filter out regressors regardless of whether coefficients of these regressors are subject to threshold. Our estimation procedure can be indeed applicable for all kinds of data structure while in this paper we focus on short panel data. More than that, we generalize threshold setups such that for each dimension there can be more than one threshold. We are able to determine the number of thresholds for each dimension by a revision of information criterion suggested by Gonzalo and Pitarakis (2002). The asymptotic theory of estimation and inference for threshold model for time series and cross sectional data has been developed well by Chan et al. (1993) and Hansen (2000). These works are valid for model with exogenous regressors and threshold parameters. Later, many efforts have been made to generalize the model by including endogeneity in either the regressors or threshold variables, such as Caner and Hansen (2004), Yu (2013), Kourtellos et al. (2014), Yu and Phillips (2018) and Liao et al. (2015) and Yu et al. (2019). For panel data, Hansen (1999) adapts the traditional threshold estimation to nondynamic model with fixed effect, which is then extended 2 to dynamic model by Ram´ ırez-Rond´ an (2020) and Seo and Shin (2016). In this paper we focus on panel data model in which regressors and threshold variable can be either weakly or strictly exogenous. While previous researches generally exclude fixed effect in estimation of threshold model, our paper keeps fixed effect and allows it to be subject to threshold. In particular, we assume threshold structure can divide all observations into groups, and each group share the same fixed effect. This setting is similar to finite mixtures model which has fixed effect with a finite support. It has been studied in Sun (2005) and Kasahara and Shimotsu (2009), and Hahn and Moon (2010) provide its economic foundation. On the other hand, we can also keep fixed effect with infinite support, and it is shown later that such model can still be analyzed by our methods with Mundlak transformations. Misspecification of the threshold model with dimension heterogeneity will cause units to be misclassified into the wrong regime, resulting in biased estimates and weak prediction perfor- mance. These points are reflected in two applications of this paper. In the issue of testing whether the sensitivity of investment spending to cash flow is changing when financial condition is worse, Gonz´ alez et al. (2004) and Hansen (1999) restrict that only coefficient of the variable cash flow is subject to the influence of financial condition, while Seo and Shin (2016) and Gonzalez et al. (2017) assume all independent variables switch at the same threshold of financial condition. The former finds when firm is financially constrained, the sensitivity of investment to cash flow is lower which contradicts prediction by the corporate finance theory. The latter is consistent with the corporate finance theory, but only allows one threshold. From the perspective of econometrics modeling, we don’t know which one is more reasonable and thus cannot verify whether the predic- tion by corporate finance theory is right or not. As a matter of fact, in the application part of this paper, as dimension heterogeneity is considered, we still find evidence inconsistent with prediction of corporate finance theory. The other application of our estimator studies possible threshold in payment ratio, which is equal to repayment divided by account balance, as an indicator of credit card default. We find under different values of payment ratio, default probability is determined 3 in different way by credit score and unemployment rate. We involve this information in predic- tion of default probability and find our method dominates two benchmarks used in the industry. This sheds some light on the reasons why previous studies always find linear model is better at prediction than threshold model. For instance, Clements and Krolzig (1998) compare time series forecasting ability of linear autoregressive model and threshold autoregressive model only to find accounting for non-linearity decreases forecasting power. Implied by our research findings, the im- plicit restrictions of dimensional homogeneity can be one possible reason for failure of prediction of threshold model. One can consider dealing with dimension heterogeneity of threshold model in other ways. An alternative is to estimate one threshold by fixing all other thresholds. Despite plausible feasibility, this method will undoubtedly bring out a lot of computation burden. Another way is treat all regressors to be in one dimension as dimension heterogeneity can be equivalent to one-dimension model as a result of linear algebra. Often there are multiple thresholds in the data, and one can estimate one-dimension model either by joint estimation or sequential estimation as suggested by Gonzalo and Pitarakis (2002). Generally the former is most computationally costly so that practitioners won’t choose it. The latter is comparatively promising in computation cost but will fail in model with sparse interaction. Such model allows thresholds of different dimensions to be very close, thus one-dimension threshold will suffer from limited sample size which breaks down the estimation efficiency. More than that, when the number of threshold is unknown, the problem may be even more serious as model selection may mistake two close thresholds for one threshold. Compared to that, the above problem in sparse interaction will not appear in our method as we estimate coefficient and threshold parameter for each dimension separately. As far as we know, no one has touched topic of dimension heterogeneity in threshold model be- fore. A related work to our study is Cheng et al. (2019) who provide a multi-dimensional clustering approach for unobserved heterogeneity in panel data models. In particular, they use k-means tech- nique to generate groups of units along different dimensions. They show their estimator gains effi- ciency improvement compared to standard one-dimension clustering approach especially when the 4 model has sparse interaction. In their empirical applications of production function, they classify firms along two separate diemensions, the elasticities of variable units and elasticities of capital. Yet there are obvious differences in modeling between Cheng et al. (2019) and our work: Cheng et al. (2019) restrict the group membership to be time invariant, and thus one can only conduct forecasting in sample. Compared to that, ours does not need this restriction as group membership is determined by the value of observed threshold variable which can be also time varying. In that sense, we have advantage over Cheng et al. (2019) as being able to generate forecast for data either in sample or out of sample. The paper is organized as following. Section 1.2 begins with the motivations for this paper and presents a basic panel threshold model with dimension heterogeneity. Section 1.3 gives the estimator and its extensions to more general situations with multiple dimensions and multiple thresholds. Section 1.4 verifies the finite sample performance of our estimator and information criterion proposed in Section 1.3 by Monte Carlo simulations. Section 1.5 applied our methods to two empirical studies: influence of financial constraint on firm’s investment decisions and influence of payment ratio on credit card . Finally we conclude and propose several future extensions in section 1.6. Throughout this paper, we use the following notations. Let tr(A) denotes the trace of matrix A= a i j mm ; the Frobenius norm of A is defined askAk= p tr(A 0 A); andkAk ¥ = max a i j : For arbitrary variables a and b, a_ b= max(a;b), a^ b= min(a;b), a b denotes a and b are equivalent. ! p and! d denote convergence in probability and in distribution,respectively. For a variable x n , x n = o p (a n ) denotes x n =a n converges to zero in probability as n approaches an ap- propriate limit, while x n = O p (a n ) denotes x n =a n is stochastically bounded. For any parameter c, denote c 0 as the true value. 5 1.2 Model 1.2.1 Dimension Heterogeneity To highlight the fact of dimensional heterogeneity, suppose we have the following model with scalar variables(y;x 1 ;x 2 ;q), y = x 1 b 1 + x 2 b 2 +(x 1 d 1 + x 2 d 2 )lfqgg+ u = x 1 b 1 + x 1 d 1 lfqgg+ x 2 b 2 + x 2 d 2 lfqgg+ u; (1.1) then it is obvious that coefficients of x 1 and x 2 switch across the same thresholdg. A more gener- alized form should allow x 1 to have a different threshold from x 2 , such as y= x 1 b 1 + x 1 d 1 lfqg 1 g+ x 2 b 2 + x 2 d 2 lfqg 2 g+ u: (1.2) We call x 1 as one dimension and x 2 as the other dimension and the example shows one kind of dimension of dimension heterogeneity is defined byg 1 6=g 2 . Intuitively we cannot guarantee there is no dimension heterogeneity in a threshold model. On the other hand, it is not necessary to force all regressors to be subject to the threshold structure, and the model can also be y= x 1 b 1 + x 1 d 1 lfqg 1 g+ x 2 b 2 + u: (1.3) which is another kind of dimension heterogeneity. As we explain later, the dimension heterogeneity can be more complicated and generalized by allowing multiple thresholds in different dimensions. Besides, the threshold variable doesn’t need to be the same for two dimensions. Suppose now we have two threshold variables(q 1 ;q 2 ), (1.2) can be generalized as y= x 1 b 1 + x 1 d 1 lfq 1 g 1 g+ x 2 b 2 + x 2 d 2 lfq 2 g 2 g+ u: (1.4) 6 of which we simply let q 1 = q 2 = q in (1.2). When dealing with threshold models, traditional literature almost all avoids the discussion of the dimension heterogeneity but arbitrarily decide the threshold structure. For example, about the application example of investment and financial con- straint, Seo and Shin (2016) construct model by (1.1). Compared to that, for the same application Hansen (1999) construct model by admitting the second dimension heterogeneity. It is hard to say who should be right without any additional systematic analysis. 1.2.2 Model and Assumption Throughout this paper, we focus on short panel data. Consider the following model, y it =b 0 x it +d 0 x it lfq it gg+ u it (1.5) for i= 1;;N and t = 1;;T , where N!¥ and T is fixed. x it is a k 1 vector while q it is a scalar. (y it , x it ;q it ) are observed data. (b;d;g) are unknown parameters. u it denotes the error term. For this moment we don’t consider fixed effect, later we will show under regular conditions a fixed effect model can be transformed to (1.5). It is easy to see such model provides heterogeneity for both time and cross section as long as q it is varying across i and t. As discussed above, we can construct a more generalized threshold model such as the following y it =b 0 1 x 1it +b 0 2 x 2it +d 0 1 x 1it lfq it g 1 g+d 0 2 x 2it lfq it g 2 g+ u it (1.6) where x 1it and x 2it are respectively k 1 1 and k 2 1 vectors such that k 1 +k 2 = k. (y it , x 1it ;x 2it ;q it ) are observed data. (b 1 ;b 2 ;d 1 ;d 2 ;g 1 ;g 2 ) are unknown parameters andg 1 may be or not be different fromg 2 . Indeed in empirical data there is no supporting evidence to guarantee the threshold struc- ture should be the same for all elements in regressors. The concern arises that we may misspecify the model (1.6) as the same as (1.5) and estimation of coefficeints would therefore based on a wrong sample. 7 Defineq =(q;g 1 ;g 2 )=(b 1 ;b 2 ;d 1 ;d 2 ;g 1 ;g 2 ). Correspondingly the parameter space is defined asQ=(Q;G). We shall need the following assumptions for estimation. Assumption A (i) The parameter spaceQ is convex and compact. (ii) For each t, (x 1it ;x 2it ;q it ;u it ) are i.i.d. across i. For each i, (x 1it ;x 2it ;q it ;u it ) are strictly stationary, ergodic andr-mixing, withr-mixing coefficients satisfyingå ¥ m=1 r 1=2 m <¥. (iii)å N i=1 å T t=1 x 1it x 0 2it = O N 1a wherea2(0;1=2]: (iv) E(u it x 1itt )= 0 and E(u it x 2itt )= 0 fort 0. Also E(u it q itt )= 0 fort 0. (v) Ejx it j 4 <¥ and Ejx it u it j 4 <¥ where x it =(x 1it ;x 2it ). (vi) E jx it j 4 u 4 it jq it =g C and E(jx it j 4 jq it =g) C for some C<¥. (vii) The probability density function of the threshold variable q it , f it (g) satisfies 0< f it (g)< ¯ f <¥ for allg2G Assumption A (i) (ii) mainly specify the regular conditions for repeated cross sectional data model. In particular, for the error term we exclude the cross sectional dependence but allow serial correlation. (iii) specifies the weak correlations between two dimensions, in particular when the cross section number N is sufficiently large. This is a technique assumption required for the proof. For example, suppose k 1 = k 2 = 1, then define q 1;2 = N å i=1 T å t=1 x 2 2it ! 1 N å i=1 T å t=1 x 2it x 1it ! and by Assumption A (iii) coefficients q 1;2 is uniformly bounded by N a . By symmetry, as we can define q 2;1 as coefficient for regression of x 2it on x it , this pattern can also be found. Intu- itively, if two regressors are highly correlated, it is better to form them in one dimension. However as discussed in the simulation results, such assumption can be released and the properties of the estimator are still kept. On the other hand, in practice we can also transform regressors to in- dependent covariates by, for example, principal component analysis as it generates independent 8 principal components that are essentially the eigenvector of a matrix. (iv) allows data to be weakly exogenous so that the model can contain dynamic structure with lags of dependent variable. (v) and (vi) specifies boundedness of unconditional and conditional moments which are required to form central limit theorem. (vii) states the support of distribution of q it is the subset of that ofG, and it allows us to estimate value ofg from collections of q it . These assumptions are common in classical panel data researches. 1.3 Projection Estimator 1.3.1 Intuition behind the Estimator Before our estimator is formally presented, we will show the special facts of the model and intuition that our estimator relies on. Note in the classical threshold model with just one dimension such as (1.5), it is usually impossible to locate the exact true threshold parameter. This is because of the limit of information of q it and the fact that q it g 0 ) q it g 0 +D as long asD is a small positive constant. Indeed in classical threshold model, we can only identify g 0 in an interval between two close values of q it . This is the reason why the estimation of g is usually obtained as q it for some i and t (left-hand estimator) or the middle point value of two close values of q it (middle point estimator), see Hansen (1999) and Yu (2013) for instance. In this paper, we choose the left hand estimator. On the other hand, this identification shortcoming can help us to filter out the term that consists of the indicator function. The essence is that the true threshold can only be identified fromfq i1 ;;q iT g for i= 1;;N and t = 1;;T . Suppose there is one threshold in the model and the true value isg 0 , then for variable x it and q it we have x it lfq it g 0 g= x it lfq it ˇ gg 9 where ˇ g2fq 11 ;;q 1T ;q 21 ;;q NT g despite the fact that we know the exact value of neitherg 0 nor ˇ g. So ideally we can construct a large NT NT matrix F= F 0 1 ;F 0 2 ;F 0 N 0 , where F i = 0 B B B B B B B @ x i1 lfq i1 q 11 g x i1 lfq i1 q 1T g x i1 lfq i1 q 21 g x i1 lfq i1 q NT g x i2 lfq i2 q 11 g x i2 lfq i2 q 1T g x i2 lfq i2 q 22 g x i2 lfq i2 q NT g . . . . . . . . . . . . . . . . . . x iT lfq iT q 11 g x iT lfq iT q 1T g x iT lfq iT q 22 g x iT lfq iT q NT g 1 C C C C C C C A TNT for i= 1;;N such that one column in F i is exactly equal to(x i1 lfq i1 g 0 g;;x iT lfq iT g 0 g). However N can be large and we may have many values either repeated or close to each other so that we don’t need to represent threshold by q it for all i= 1;;N and t = 1;;T . Instead we can use sufficient number of quantiles of empirical distribution of q it to build up a NT S matrix ˜ F= ˜ F 0 1 ; ˜ F 0 2 ; ˜ F 0 N 0 where S is the number of quantiles we refer to and ˜ F i = 0 B B B B B B B @ x i1 lfq i1 q f1g g x i1 lfq i1 q f2g g x i1 lfq i1 q fSg g x i2 lfq i2 q f1g g x i2 lfq i2 q f2g g x i2 lfq i2 q fSg g . . . . . . . . . . . . x iT lfq iT q f1g g x iT lfq iT q f2g g x iT lfq iT q fSg g 1 C C C C C C C A TS with q fsg for s = 1;2;;S as quantiles of empirical distribution of q it : Apparently when S is sufficiently small, we can construct an orthogonal projection matrix M 0 = I NT ˜ F( ˜ F 0 ˜ F) 1 ˜ F 0 : such that for x(g 0 )=(x 11 lfq 11 g 0 g;;x 1T lfq 1T g 0 g;x 21 lfq 21 g 0 g;;x iT lfq NT g 0 g) 0 ; 10 it is spontaneous to get M 0 x(g 0 )= 0 which suggests a way to filter out the regressors with threshold. 1.3.2 A Simple Example We consider the data generating process that doesn’t have fixed effect. In other words, we can treat constant 1 as a dimension. We can write down the above model in a vector form, basically y i = x 1i b 1 + x 2i b 2 + x 1i (g 1 )d 1 + x 2i (g 2 )d 2 + u i (1.7) for i = 1;;N, where x 1i = (x 1i1 ;;x 1iT ) 0 and x 2i = (x 2i1 ;;x 2iT ) 0 , x 1i (g 1 ) = (x 1i1 lfq i1 g 1 g;;x 1iT lfq iT g 1 g) 0 and x 2i (g 2 )=(x 2i1 lfq i1 g 2 g;;x 2iT lfq iT g 2 g) 0 . Therefore the cri- terion function is constructed as following, S NT (q)= N å i=1 (y i x 1i b 1 x 2i b 2 x 1i (g 1 )d 1 x 2i (g 2 )d 2 ) 0 (y i x 1i b 1 x 2i b 2 x 1i (g 1 )d 1 x 2i (g 2 )d 2 ) (1.8) which leads to the estimator of coefficients and threshold parameter as ˆ q = argmin q2Q S NT (q); We can rewrite the model as Y= X 1 b 1 + X 2 b 2 + X 1 (g 1 )d 1 + X 2 (g 2 )d 2 + U (1.9) 11 where Y=(y 0 1 ;;y 0 N ) 0 , X j =(x 0 j1 ;;x 0 jN ) 0 for j= 1;2, and X j (g j )=(x j1 (g j ) 0 ;;x jN (g j ) 0 ) 0 for j= 1;2. And correspondingly the criterion function is equivalently written as S NT (q)=(Y X 1 b 1 X 2 b 2 X 1 (g 1 )d 1 X 2 (g 2 )d 2 ) 0 (Y X 1 b 1 X 2 b 2 X 1 (g 1 )d 1 X 2 (g 2 )d 2 ) (1.10) For the parameter spaceG, ideally it is represented by the value of q it for i= 1;2;;N and t = 1;;T . When the sample size is large, a good approximation can be some quantiles of the empirical distribution of q it . Suppose we can split the real line between 1% and 99% into m parts and define v= 98=m, then the quantiles we consider are q flg =fg : Pr(q it <g) l%g for l = 1;1+ v;1+ 2v;;99 2v;99 v;99, where v= 98=m. By increasing the value of m, we are able to provide more quantiles of distribution of q it and the estimation of threshold would be more accurate. However, taking too many quantiles will add the computation cost, therefore we will choose the value m as a tradeoff of efficiency and computation cost. The reason we exclude the possibility that true threshold is less than 1% quantile or larger than 99% is because estimation of threshold may be unavailable due to small subsample size. Then we can use the following estimation procedure. • Step 1: For an arbitraryg2G, we are going to filter(Y;X 1 ;X 1 (g)). – For each i, let P 1;i =[x 2i (q f1g );;x 2i (q f99g )] where x 2i (q flg )=(x 2i1 lfq i1 q flg g;;x 2iT lfq iT q flg g) 0 for l = 1;1+ v;;99 v;99 and the size of P 1;i is m+ 1. And we can thereby construct the orthogonal projection matrix, M 1 = I NT P 1 P 0 1 P 1 1 P 0 1 12 where P 1 =(P 0 1;1 ;;P 0 1;N ) 0 of which the dimension is NT(m+ 1). – Now we filter(Y;X 1 ;X 1 (g)) such that ˜ Y 1 = M 1 Y ˜ X 1 = M 1 X 1 ˜ X 1 (g) = M 1 X 1 (g) • Step 2: Define S 1 N (g)= ˜ Y 0 1 M 1 (g) ˜ Y 1 (1.11) where the orthogonal projection matrix M 1 (g) is defined as M 1 (g)= I NT P 1 (g) P 1 (g) 0 P 1 (g) 1 P 1 (g) 0 and P 1 (g)=[ ˜ X 1 ; ˜ X 1 (g)] And we can estimateg 1 by solving ˆ g 1 = argmin g2G S 1 N (g): (1.12) Then we can also get estimation the coefficients ( ˆ b 1 ; ˆ d 1 ) of this dimension by running the least squares regression of ˜ Y 1 on ˜ X 1 and ˜ X 1 ( ˆ g 1 ). • Step 3: For an arbitraryg2G, we are going to filter(Y;X 2 ;X 2 (g)). – For each i, let P 2;i =[x 1i (q f1g );;x 2i (q f99g )] 13 where x 1i (q flg )=(x 1i1 lfq i1 q flg g;;x 1iT lfq iT q flg g) 0 for l= 1%;1%+v;;99% v;99% and the size of P 2;i is T(m+1). And we can thereby construct the orthogonal projection matrix, M 2 = I NT P 2 P 0 2 P 2 1 P 0 2 where P 2 =(P 0 2;1 ;;P 0 2;N ) 0 (1.13) of which the dimension is NT(m+ 1). – Now we filter(Y;X 2 ;X 2 (g)) such that ˜ Y 2 = M 2 Y ˜ X 2 = M 2 X 1 ˜ X 2 (g) = M 2 X 1 (g) • Step 4: And we can estimateg 2 by solving ˆ g 2 = argmin g2G ˜ Y 0 2 M 2 (g) ˜ Y 2 (1.14) where the orthogonal projection matrix M 2 (g) is defined as M 2 (g)= I NT P 2 (g) P 2 (g) 0 P 2 (g) 1 P 2 (g) 0 and P 2 (g)=[ ˜ X 2 ; ˜ X 2 (g)] Then we can also get estimation the coefficients ( ˆ b 2 ; ˆ d 2 ) of this dimension by running the least squares regression of ˜ Y 2 on ˜ X 2 and ˜ X 2 ( ˆ g 2 ). 14 1.3.2.1 Asymptotic Theory Notice here we assume there is only one threshold for each dimension. With this prior information, we will show the consistency of estimators for both threshold parameters and coefficients. Indeed just the analysis of one dimension would be enough as the model is symmetric and the same way can be applied to analyze the other dimension. Define A(g 1 ;g 2 )= 0 B B B B B B B @ X 1 X 0 1 X 1 X 0 2 X 1 X 1 (g 1 ) 0 X 1 X 2 (g 2 ) 0 X 2 X 0 1 X 2 X 0 2 X 2 X 1 (g 1 ) 0 X 2 X 2 (g 2 ) 0 X 1 (g 1 )X 0 1 X 1 (g 1 )X 0 2 X 1 (g 1 )X 1 (g 1 ) 0 X 1 (g 1 )X 2 (g 2 ) 0 X 2 (g 2 )X 0 1 X 2 (g 2 )X 0 2 X 2 (g 2 )X 1 (g 1 ) 0 X 2 (g 2 )X 2 (g 2 ) 0 1 C C C C C C C A and V(g 1 ;g 2 )= A(g 1 ;g 2 ) 1 E(X 0 WX)A(g 1 ;g 2 ) 1 where X = (X 1 ;X 2 ;X 1 (g 1 );X 2 (g 2 )) W = E(UU 0 ): Then we need more assumption about the projection matrix. Assumption B (i) P 0 j P j and P j (g) 0 P j (g) have full rank forg2G and j= 1;2. (ii) A(g 1 ;g 2 ) has full rank forg 1 ;g 2 2G. In Assumption B, (i) ensures the projection matrix P j exists for j = 1;2 and (ii) requires A(g 1 ;g 2 ) is positive definite and thus can form inverse matrix. Now we can discuss the asymptotic properties. 15 Theorem 1.3.1. Under assumptions A and B, as N goes to infinity, ˆ g 1 p !g 0 1 and ˆ g 2 p !g 0 2 ˆ b 1 p !b 0 1 and ˆ b 2 p !b 0 2 ˆ d 1 p !d 0 1 and ˆ d 2 p !d 0 2 Proposition 1.3.1. Under assumptions A and B, as N!¥ and T is fixed, ˆ g 1 =g 0 1 + O p (1=N) and ˆ g 2 =g 0 2 + O p (1=N). See Appendix for a proof. Proposition 1.3.1 indicates that the estimator of threshold parameters converges very faster than that of coefficients. Therefore the distribution of estimator of coefficients can be formed by treating the estimator of threshold parameters as the true value. Theorem 1.3.2. Under assumptions A and B, as N!¥ and T is fixed, p N( ˆ qq 0 ) d !N (0;V(g 0 1 ;g 0 2 )) 1.3.3 More Generalized Case Indeed we can make model (1.6) even more generalized. Suppose the number of dimensions de- fined above is J 2 and for each dimension j= 1;2;;J we have d j thresholds and the regressor x jit is a k j 1 vector. Therefore by these parameters we can imply there are at mostå J j=1 (d j + 1) regimes in total 1 . For example, (1.6) is set by letting J= 2 and d 1 = d 2 = 1, which therefore gen- erates at most 4 regimes. Finally, there also exists situations that d j = 0 which implies there is no threshold in j-th dimension. It can be in the model that only some of units in regressors are subject to threshold effect, for example in Hansen (1999) it is modeled such that only cash flow is subject 1 It is possible that different dimension can share some thresholds so there would be fewer regimes. This issue will be resolved by model selection method as shown below 16 to threshold effect among all firm characteristics to determine investment spending for next period. Yet the assumptions made above still holds even model (1.6) is more generalized. In this subsection, we allow there are J dimensions such that J can be larger than 2. Note it is trivial to discuss the case where threshold variables are different among dimensions after we are clear about the case where threshold variable is the same across dimensions, therefore we will only investigate the latter case. We analyze the models in three cases: (i)d j = 1 for all j= 1;;J, (ii) d j = 0 for some j and (iii) d j > 1 for some j. 1.3.3.1 d j = 1 for all j= 1;;J Firstly we adjust Assumption A and B accordingly. Assumption A 0 (i) The parameter spaceQ is convex and compact. (ii) For each t, (x it ;q it ;u it ) are i.i.d. across i. For each i, (x it ;q it ;u it ) are strictly stationary, ergodic andr-mixing, withr-mixing coefficients satisfyingå ¥ m=1 r 1=2 m <¥. (iii)å N i=1 å T t=1 x 1it x 0 2it = O N 1a wherea2(0;1=2]: (iv) E(u it x itt )= 0 fort 0. Also E(u it q itt )= 0 fort 0. (v) Ejx it j 4 <¥ and Ejx it u it j 4 <¥. (vi) E kx it u it k 4 jq it =g C and E(kx it k 4 jq it =g) C for some C<¥. (vii) The probability density function of the threshold variable q it , f it (g) satisfies 0< f it (g)< ¯ f <¥ for allg2G. Basically, Assumption A 0 is the generalization of assumption of A. Also we have a new version of Assumption B. Defineg l j as the l-th threshold in the j-th dimension. Assumption B 0 (i)X j is weakly dependent on X j 0 where j 0 6= j and j 0 = 1;2;;J. 17 (ii) P 0 j P j has full rank forg2G; the minimum eigenvalues of ˜ X j (g l j x;g l j ) 0 ˜ X j (g l j x;g l j ) N and ˜ X j (g l j x;g l j ) 0 ˜ X j (g l j ;g l j +x) N are bounded away from zero for all l = 1;;m j and j = 1;;J, where x is a small positive constant. Note Assumption B 0 is indeed generalization of Assumption A1 of Gonzalo and Pitarakis (2002). Now the size of projection matrix will expand. In particular, the projection matrix P 1 is constructed as P 1 =(P 0 1;1 ;;P 0 1;N ) 0 ; (1.15) where P 1;i =[x 2i (q f1g );;x 2i (q f99g );x 3i (q f1g );;x 3i (q f99g );;x Ji (q f1g );;x Ji (q f99g )] and x ji (q flg )=(x ji1 (q flg );;x jiT (q flg )) 0 where x jit q flg =(x 1 jit lfq it q flg g;;(x k j jit lfq it q flg g) 0 for j= 1;;J. Therefore the size of P 1 is NT (m+ 1)å J j6=1 k j . Then to avoid the rank deficiency of P 0 1 P 1 , we need to satisfy NT(m+ 1) J å j6=1 k j We can obtain the similar results for P j for j= 2;;J. Then with the same way to construct the orthogonal projection matrix, we can estimate all thresholdsg j . 18 1.3.3.2 d j > 1 for some j When there is more than one threshold for a dimension, we need to test how many thresholds can exist and where they are located after filtering the data by orthogonal projection matrix. When one dimension has more than one threshold our projection estimator is still able to filter out this dimension. To see this, suppose the first regressor has two thresholds, g (1) 1 and g (2) 1 while the second regressor have one thresholdg 2 . Then (1.7) is replaced by y i = x 1i b 1 + x 1i g (1) 1 d 11 + x 1i g (1) 1 ;g (2) 1 d 12 + x 2i b 2 + x 2i (g 2 )d 2 + u i (1.16) for i= 1;;N, where x 1i g (1) 1 ;g (2) 1 =(x 1i1 (lfq i1 g (2) 1 g lfq i1 g (1) 1 g);;x 1iT (lfq iN g (2) 1 g lfq iN g (1) 1 g) 0 ; while the remaining parts also have the same definitions as in (1.7). And as we stack the variables, we can form Y= X 1 b 1 + X 1 g (1) 1 d 11 + X 1 g (1) 1 ;g (2) 1 d 12 + X 2 b 2 + X 2 (g 2 )d 2 + U Recall the formula of P 2 in (1.13) then it is obvious that h X 1 ;X 1 g (1) 1 ;X 1 g (1) 1 ;g (2) 2 i will be on the linear span of matrix P 2 as long as h X 1 g (1) 1 ;X 1 g (2) 2 i is included in P 2 , namely for each i there are two columns in P 2;i such that x 1i g (1) 1 = x 1i q flg x 1i g (2) 2 = x 1i q fl 0 g : Therefore M 2 can cancel out all terms that are formed by x 1i regardless how many thresholds exist for x 1i . Indeed even when we have no prior information about the number of thresholds 19 for one dimension, the orthogonal projection matrix still manages to filter out information of this dimension. As we apply projection method to filter unnecessary dimensions, what remains to be done is indeed an estimation problem of multiple thresholds in one dimension. Suppose we know the number of thresholds for each dimension, a sequential estimation approach proposed by Bai (1997) and Hansen (1999) can then be adopted and conducted as following • After the transformation of orthogonal projection matrix, findg which can minimize the sum of squared residuals from least squares regression of ˜ Y j on ˜ X j (g) and ˜ X + j (g). Denote it as ˆ g (1) j . • Then fix ˆ g (1) j , findg which can minimize the sum of squared residuals from least regression of ˜ Y j on ˜ X j (g [1] j ), ˜ X j (g [1] j ;g [2] j ), and ˜ X + j (g [2] j ), where the number in brackets means the ordering of smallest value in(g; ˆ g (1) j ). Denote it as ˆ g (2) j . • Then find g which can minimize the sum of squared residuals from least squares of ˜ Y j on ˜ X j (g [1] j ), ˜ X j (g [1] j ;g [2] j ), ˜ X j (g [2] j ;g [3] j ) and ˜ X + j (g [3] j ) where the number in brackets means the ordering of smallest value in(g; ˆ g (1) j ; ˆ g (2) j ). Denote it as ˆ g (3) j . • Carry on the same procedure for a new threshold until the number of thresholds is reached Though the above estimation is derived from misspecified model, we can follow Gonzalo and Pitarakis (2002) to verify the estimators still converge to the corresponding true thresholds. To show it, define x it =(x 1it ;x 2it ;;x Jit ). Each step in the above procedure will estimate one thresh- old. In particular, the procedure for the first step is the same as that for single threshold case, therefore let R j;¥ (g)= lim N!¥ S j N S j N (g) where S j N is the sum of squared errors of least squares estimation of ˜ Y j = ˜ X j b+ e 20 while S j N (g) is defined same as for single threshold. Now suppose we have conducted h 1 steps and therefore detected h 1 thresholds for the jth dimension, thus splitting this dimension into h intervals. Therefore the additional threshold should lie in one of those intervals. Then for each interval, we can construct the model of one more threshold as Q jl ˜ Y j = ˜ Z jl (g)b jl1 + ˜ Z + jl (g)b jl2 + e where Q jl = I NT h å s=1;s6=l ˜ Z js ˜ Z 0 js ˜ Z js 1 ˜ Z 0 js where ˜ Z js = ˜ X j ˆ g [s1] j ; ˆ g [s] j ; ˜ Z jl (g)= ˜ X j ˆ g [l] j ;g and ˜ Z + jl (g)= ˜ X j g; ˆ g [l+1] j where the number in brackets means the ordering of smallest value in(G j ; ˆ g (1) j ;; ˆ g (h1) j ; ¯ G j ) such that ˆ g [0] j =G j and ˆ g [h] j = ¯ G j . This design indicates the next threshold may exist between two closest thresholds that have been estimated. Then define S jl N (g) as the sum of residuals from least squares estimation. Meanwhile we can construct the model of no more threshold as Q jl ˜ Y j = ˜ X jl b jl + e and define S jl N as the sum of squared residuals from the least squares estimation. Therefore, crite- rion function for the h-th step is defined as following R j gj ˆ g [1] j ; ˆ g [2] j ;; ˆ g [h1] j = h å l=1 R jl gj ˆ g [1] j ; ˆ g [2] j ;; ˆ g [h1] j lf ˆ g [l1] j <g < ˆ g [l] j g where R jl gj ˆ g [1] j ; ˆ g [2] j ;; ˆ g [h1] j = S jl N S jl N (g): Now also define R j;¥ gjg 1 j ;g 2 j ;;g h1 j = lim N!¥ R j gjg 1 j ;g 2 j ;;g h1 j 21 Before stating the asymptotic properties of ˆ g h j , we need some more assumptions. Assumption C (i) There exists a single threshold parameter g [1] j 2fg 1 j ;g 2 j ;;g m j g such that R j;¥ (g [1] j ) > R j;¥ (g k j )8g k j 6=g [1] j and k= 1;2;;m. (ii) For j= 1;;J, there exists a configuration g 1 j ;g 2 j ;;g m j of the m true threshold pa- rameters such that R j;¥ g h j jg 1 j ;g 2 j ;;g h1 j > R j;¥ gjg 1 j ;g 2 j ;;g h1 j 8g2fg h+1 j ;g h+2 j ;;g m j g and h= 1;;m. Proposition 1.3.2. Under assumptions A 0 , B 0 and C, as N goes to infinity and T is fixed, ˆ g h j = g h j + o p (1) and Nj ˆ g h j g h j j= O p (1) for h= 1;;m and j= 1;;J. Proposition 1.3.2 is easy to verify as extensions of Proposition 2.4 of Gonzalo and Pitarakis (2002). In particular, the main difference is here the Assumption C states properties for models that have filtered out redundant dimensions while Gonzalo and Pitarakis (2002) focus on just one dimension and therefore make the corresponding assumption. Since the sequential estimation is conducted from misspecified model, Bai (1997) proposes a refinement estimation procedure. Suppose the jth dimension has only two thresholds, then one can follow our above procedure to obtain corresponding estimators. Next one can fix the second threshold, and estimate the first threshold by grid search. In this way, the estimator of first threshold can be more accurate. As a matter of fact, more efficiency can be achieved by keeping refining ˆ g [1] j and ˆ g [2] j . 1.3.3.3 Unknown d j The number of thresholds in one dimension is generally unknown. One special case is that there is no threshold in some dimension. To test if there is threshold in one dimension, the null hypothesis 22 is usually set such as the coefficient would not change even if regressors are multiplied by indicator function of threshold. Therefore the threshold effect will be taken into model only under alternative hypothesis, and that caused the inferences problem of test statistic, which is well known as Davis’ problem. When there is only one dimension, Hansen (1996) has proposed the test for this case where a bootstrapping method is used to calculate the P-value. Alternatively, Gonzalo and Pitarakis (2002) treat it as a model selection problem to see if there is a need to include the term with indicator function. After we filter the data by orthogonal projection matrix, what is left is only one dimension. Thus here we are able to simply adopt the method in Gonzalo and Pitarakis (2002). More specifically, this model selection approach can work for multiple thresholds. Namely we can construct the following information criterion: IC j (g 1 j ;;g m j )= logS j N (g 1 j ;;g m j )+ c N N [k j (m+ 1)] (1.17) where S j N (g 1 j ;;g m j ) is the sum of squared errors of regression ˜ Y j on ˜ X j (g 1 j ); ˜ X j (g 1 j ;g 2 j );; ˜ X + j (g m j ) and c N is a deterministic function of sample size such that c N !¥ and c N =N! 0 as N!¥. Then define IC j (0)= logS j N +(c N =N)k j , where S j N is the sum of squared error of regression ˜ Y j on ˜ X j . The model selection based estimator of the number of unknown threshold parameters can then be defined as ˆ m= arg max 0mM max Q j (m) (1.18) where Q j (m)= IC j (0) min g 1 j ;;g m j IC j (g 1 j ;;g m j ) for j 1 and Q 0 (m)= IC j (0) In particular, we can compare Q j (0) and Q j (1) to detect whether there is a threshold in the jth dimension. Then we have the following proposition. 23 Proposition 1.3.3. Under assumptions A 0 , B 0 and C, as N goes to infinity, if c N !¥ and c N =N! 0, P( ˆ m j = m j )! 1. 1.3.4 Remarks Remark 1.3.1. It is obvious that we can estimate (1.7) by an alternative. Firstly for an arbitrary g 1 2G find its corresponding g 2 that can minimize the criterion function (1.8). This implies ˆ g 2 is indeed a function of g 1 . Then we can generate the criterion function as a function of g 1 , and thus compute its estimator by findingg 1 that minimizes the criterion function. This method, how- ever, will add computing cost as in our case there can be 393 393= 15449 combinations of (g 1 ;g 2 ). And it is expected to induce bias of estimator of threshold parameters due to the two-step procedure. On the other hand, when the situation is more complicated, for example there are more than one threshold for one regressor, there can be more computation cost. Compared to that, our method won’t include unnecessary computation cost in multiple thresholds case. This is the advantage of our estimator over previous studies (e.g. regression trees) which usually need sacrificing sample size as the tipping point needs to be found from subsample generated from last layer. Remark 1.3.2. When the threshold variable is the same for all dimensions, one is able to model by one dimension with multiple thresholds. For example, (1.16) can be rewritten as a model with two thresholds. Supposeg 1 g 2 , Y= X (g 1 )r 1 + X(g 1 ;g 2 )r 2 + X + (g 2 )r 3 + U (1.19) wherer 1 =(b 1 ;b 2 ),r 2 =(b 1 +d 1 ;b 2 ) andr 3 =(b 1 +d 1 ;b 2 +d 2 ). One can apply the sequential estimation by Gonzalo and Pitarakis (2002) to find the number of thresholds and then derive the estimation of the model. However, considering each regime requires sufficiently many observations to be estimated, the threshold estimation may suffer in efficiency when there are too few observa- tions in one regime. Even we make the assumption that the regimes in each dimension have enough 24 Figure 1.1: Interactive sparsity problem Note: The figure shows the comparison of orthogonal projection estimation and classic threshold estimation for a model with two dimensions. There are two thresholds for the first dimension and one threshold for the second dimen- sion. The threshold of lower value in the first dimension is very close to the threshold of second dimension. observations, two thresholds from two different dimensions may still be too close which makes the associated regime narrow and lack observations. This fact is depicted by Figure 1.1 where the three horizontal lines are all possible values of threshold variables. The upper line corresponds to the 1st dimension of (1.19), and the middle line the 2nd dimension. The bottom line is for both di- mensions. From this figure, we can infer the the first threshold of 1st dimension is very close to the threshold of 2nd dimension. So if we estimate all thresholds by just one dimension, the subsample formed by value of threshold variable between the two thresholds will be too small and it will dete- riorate the estimation efficiency. Indeed when we have multiple thresholds in different dimensions, it is not surprising such phenomenon occurs frequently. This is equivalent to the sparse interaction problem in multi-dimensional clustering literature. Therefore sequential estimation by Gonzalo and Pitarakis (2002) are unable to assure the efficiency. modeling with dimensional homogeneity is not sufficiently efficient. Compared to that, our orthogonal projection method is not subject to the sparse interaction problem as it analyzes each dimension separately. More than that, when the threshold variable is different across dimensions, one dimension is not able to account for the model. Remark 1.3.3. Indeed model (1.6) can be furthermore generalized. Instead of restricting tran- sition variable for regressors is just one variable q i , we can set q i1 as the transition variable for 25 x i1 and q i2 for x i2 . It is obvious that the above estimation procedure can still work for this model. Indeed one example is when x i1 = q i1 and x i2 = q i2 . Remark 1.3.4. When the regressors are endogenous, notice there is no need to adjust step 1 and 3. However, we need to modify step 2, 4 and 5. Suppose we have valid instrumental variable z i , then the 2SLS proposed by Caner and Hansen (2004) can be used to estimateg 1 andg 2 , namely we need to replace regressors by the predicted value from regressions on instrumental variables and then apply them to step 2, 4 and 5. Remark 1.3.5. Now we can consider the panel data model with both fixed effect and dimension heterogeneity of thresholds. Firstly consider the case where the regressors are strictly exogenous, the the model is y it = a i1 +a i2 lfq it g a g+b 0 1 x 1it +d 0 1 x 1it lfq it g 1 g +b 0 2 x 2it +d 0 2 x 2it lfq it g 2 g+ u it : (1.20) We can assume the fixed effect is modeled as a function of regressors by Mundlak specification, namely a i1 = E(a i1 jx i1 ;;x iT )+ v i1 a i2 = E(a i2 jx i1 ;;x iT )+ v i2 In particular, we can model the fixed effect by a linear function of time series average, a i1 = r 1 ¯ x i + v i1 a i2 = r 2 ¯ x i + v i2 26 where ¯ x i = T 1 å T t=1 x it and therefore (1.20) can be rewritten as y it = r 1 ¯ x i +r 2 ¯ x i lfq it g a g+b 0 1 x 1it +d 0 1 x 1it lfq it g 1 g+ b 0 2 x 2it +d 0 2 x 2it lfq it g 2 g+ v i1 + v i2 lfq it g a g+ u it This indeed satisfies the characteristics of a threshold model with three dimensions except the error term is now serially correlated. Alternatively there is no harm for us to allow dimension heterogeneity in the Mundlak specification such as y it = r 11 ¯ x 1i +r 12 ¯ x 1i lfq it g a1 g+r 21 ¯ x 2i +r 22 ¯ x 2i lfq it g a2 g +b 0 1 x 1it +d 0 1 x 1it lfq it g 1 g+b 0 2 x 2it +d 0 2 x 2it lfq it g 2 g +v i + u it where the threshold effect in error of Mundlak specification is ignored and ¯ x ji = T 1 å T t=1 x jit for j= 1;2, that is we allow the fixed effect to be formed by two additional dimensions. On the other hand, considering the correlation between ¯ x ji and x jit , it would be more appropriate to classify (¯ x ji ;x jit ) into one dimension, so the model to be estimated is y it = r 11 ¯ x 1i +r 12 ¯ x 1i lfq it g 1 g+r 21 ¯ x 2i +r 22 ¯ x 2i lfq it g 2 g +b 0 1 x 1it +d 0 1 x 1it lfq it g 1 g+b 0 2 x 2it +d 0 2 x 2it lfq it g 2 g +v i + u it (1.21) Note even when the model has a dynamic structure, for instance, the lag of dependent variable is also included as a regressor, the above Mundlak specification still holds if we assume the initial observation y i0 is uncorrelated with the fixed effect, see Bai (2013) and Hsiao and Zhou (2018). If such correlation is not true, then we may need to add the conditions accounting for the initial observation y i0 . We will leave that for future work. 27 1.4 Monte Carlo Simulations In this section we are going to verify the finite sample performance of the orthogonal projection estimator. In particular, we will focus on a panel threshold model with two dimensions, and two sets of experiments are designed, namely A: each dimension has only one threshold, and B: one dimension has two thresholds and the other has one threshold. We will examine the consistency and normality of estimator by experiment A and model selection by experiment B. 1.4.1 DGP Consider the following data generating processes for experiment A, • DGP 1: y it =b 1 x 1it +d 1 x 1it lfq it g 1 g+b 2 x 2it +d 2 x 2it lfq it g 2 g+ u it ; (1.22) where q it iid N (1;1) and the error u it iid N 0;s 2 u;i where s 2 u;i are independent draws from 0:5(1+ 0:5c 2 (2)). The two regressors are generated as x jit = h jit + vg it p 1+ v 2 where g it iid N (1;1) andh jit follows an AR(1) process as h jit =r ji h jit1 +e jit ; (1.23) wherer ji is i.i.d draws from U(0;0:5) for i= 1;:::;N ande jit iid (0;s 2 e ji ) withs 2 e ji being independent draw from 0:5(1+ 0:5c 2 (2)). v is used to control the pairwise correlation be- tween x jit . We will focus on v= 0:2;0:5;0:8 which corresponds to corr(x 1it ;x 2it ) with low, middle and high values. • DGP 2: y it =b 1 x 1it +d 1 x 1it lfx 1it g 1 g+b 2 x 2it +d 2 x 2it lfx 2it g 2 g+ u it ; (1.24) 28 where all variables on the right hand side are generated in the same way as DGP 1. • DGP 3: Same as DGP 1 except x 1it is replaced by lag of dependent variable y it1 , namely y it =b 1 y it1 +d 1 y it1 lfq it g 1 g+b 2 x 2it +d 2 x 2it lfq it g 2 g+ u it (1.25) • DGP 4: Same as DGP 3 except q it is replaced by lag of dependent variable, namely y it =b 1 y it1 +d 1 y it1 lfy it1 g 1 g+b 2 x 2it +d 2 x 2it lfy it1 g 2 g+ u it : (1.26) DGP 1 is designed for strictly exogenous threshold variables and the two dimensions share the same threshold variable. In DGP 2 the threshold variable is replaced by regressor for each dimension. DGP 3 is designed based on DGP 1 but including the the lag of dependent variable y it1 and keeping the threshold variable strictly exogenous. Compared to that, DGP 4 furthermore allows the threshold variable to be weakly exogenous by using the lag of dependent variable as the threshold variable. As our main goal is to compare the difference between dimensions, let b 1 =b 2 = 0: Also letd 1 = 0:3, d 2 = 0:7 andg 1 = 1 andg 2 = 0:5. The error u it is generated with heterogeneity such that u it N 0;s 2 ui wheres ui 0:5(1+ 0:5c 2 (2)). In the appendix we also discuss simulations where errors are allowed to have either serial correlation and cross sectional dependence. It is shown that the properties of estimation of coefficients and threshold parameters hold regardless of three types of errors. Secondly, we consider the following DGPs for experiment B to show the model selection of threshold works after filtering out other dimensions. We have • DGP 5: y it =r 11 x 1it lfq it g 11 g+r 12 x 1it lfg 11 < q it g 12 g+r 2 x 2it lfq it g 2 g+ u it (1.27) where all the variables on the right hand side are generated in the same way as DGP 1. 29 • DGP 6: y it =r 11 x 1it lfx 1it g 11 g+r 12 x 1it lfg 11 < x 1it g 12 g+r 2 x 2it lfx 2it g 2 g+ u it (1.28) where all the variables on the right hand side are generated in the same way as DGP 2. • DGP 7: y it =r 11 y it1 lfq it g 11 g+r 12 y it1 lfg 11 < q it g 12 g+r 2 x 2it lfq it g 2 g+ u it (1.29) where all the variables on the right hand side are generated in the same way as DGP 3. • DGP 8: y it =r 11 y it1 lfy it1 g 11 g+r 12 y it1 lfg 11 < y it1 g 12 g+r 2 x 2it lfy it1 g 2 g+u it (1.30) where all the variables on the right hand side are generated in the same way as DGP 4. DGP 5-8 are constructed based on DGP 1-4 correspondingly. The difference between these two groups of DGPs is now we allow two thresholds for the first dimension. Let(r 11 ;r 12 ;r 2 )= (0:3;0:7;0:5) and (g 11 ;g 12 ;g 2 )=(1;2;1:5). Obviously as in experiment A, now we suppress the coefficients of x 1it to be zero when the threshold variable q it is larger than g 12 and the coefficient of x 2it to be zero when the threshold variable q it is larger thang 2 . 1.4.2 Simulation Results For each DGP, we do 1000 iterations for samples with N= 100;200;500;1000 and T = 5. In par- ticular, we generate data with T = 1005 and then drop the first 1000 time periods. The simulation results of experiment A are shown in Table 1.1 and 1.2 while that of experiment B are shown in Table 1.3. 30 1.4.2.1 Experiment A For coefficientsd 1 andd 2 , we provide mean estimates, bias, root mean squared error, IQR (inter- quantile range, 75% 25% percentile) and 95% percentage coverage probability (which is equivalent to 1size). While for the threshold parametersg 1 andg 2 , we provide the same statistics as coefficient estimation but not the coverage probability. Many previous studies have shown that the estimator of threshold parameter has a nonregular distribution, so here we won’t discuss it and thus the coverage probability. The results in Table 1.1 and 1.2 indicate estimators of both coefficients and threshold param- eters achieve consistency for all DGPS. It is also observed that coverage rate of the coefficient estimation is around 95%. In particular for DGP1 and DGP2 though we vary the correlation be- tween regressors, the simulation results are not affected significantly. 1.4.2.2 Experiment B We consider five information criteria, namely BIC, AIC, HQ, BIC2 and BIC3, of which accord- ingly l N = logN, l N = 2, l N = 2loglogN, l N = 2logN, l N = 3logN. We select the number of threshold from a pool of (0;1;2;3) such that we also test if the threshold exists and assume the largest number of threshold is up to 3. In Table 1.3, we report the decision frequencies for true number of thresholds. The results show five information criteria work well in large sample when regressors share the same exogenous threshold variables, which can be found from the decision frequencies of DGP5. And generally the correlation between regressors does not change the results significantly. How- ever, when we adopt regressor as its corresponding threshold variable, the decision frequencies of 1st dimension by AIC and BIC3 break down as shown by results of DGP6. Also if the threshold variable is weakly exogenous, then the decision frequencies of 1st dimension by BIC, BIC2, BIC3 all break down. Compared to that, HQ penalty function behaves more steadily. 31 1.5 Application The influence of financial constraint on investment spending of a firm has been discussed for a long time. In theory of corporate finance, when there is a funding need, firm can either finance it from its own assets (internal financing) or borrow money from outside by means of loans, bonds and equities (external financing). However, when firm is under financial constraint, it will be hard to finance externally as the risk of the firm’s default is also rising. New investment therefore has to rely more on internal funding, and one possible source is the cash at hand. That is if a firm is under financial constraint, its investment next period will be more subject to cash at hand. This hypothesis has been examined in Gonz´ alez et al. (2004), Gonzalez et al. (2017), Hansen (1999), and Seo and Shin (2016) of which the first two replace abrupt indicator function by smooth transition function. Both of Gonz´ alez et al. (2004) and Hansen (1999) assume only the variable of cash flow are subject to financial constraint and find two thresholds exist for the debt level such that from the lowest regime to the highest regime of debt level sensitivity of cash flow firstly rises and then drops. In general both papers find evidence that is against the hypothesis above, since the highest regime indeed has the lowest sensitivity to cash flow. However, it is suspicious that their estimation is biased because of neglecting the dimension heterogeneity. Compared to that, Seo and Shin (2016) and Gonzalez et al. (2017) do not allow dimension heterogeneity by simply assuming all regressors can be classified as one dimension. Both of them include the lag of investment spending in the model and estimate by assuming there is one threshold. Even though they find high sensitivity of investment to cash flow under high level of financial constraint, their conclusions are not sufficiently valid as their coefficient estimates can be biased if there are multiple thresholds. 1.5.1 Modeling We build up our model based on Hansen (1999). The data set is a balanced panel of 565 US firms over 15 years from 1973 to 1987. Following Gonzalez et al. (2017), we exclude seven companies with extreme data, and consider a final sample of 558 companies with 7812 company 32 year observations. Let i denote firm and t denote year. To avoid the use of potentially persistent series, we normalize variables by the book value of assets. Namely, I i;t is measured by investment to the book value of assets, CF i;t by cash flow to the book value of assets, Q i;t by the market value to the book value of assets, and D it by the long-term debt to the book value of assets. Hansen (1999) models the financial constraint’s influence on investment spending as following, I it = m i +q 1 Q i;t1 +q 2 Q 2 i;t1 +q 3 Q 3 i;t1 +q 4 D i;t1 +q 5 Q i;t1 D i;t1 +b 1 CF i;t1 lfD i;t1 g 1 g +b 2 CD i;t1 lfg 1 D i;t1 g 2 g+b 3 CD i;t1 lfg 2 < D i;t1 g+ e i;t (1.31) wherem i is the fixed effect and D i;t1 is the threshold variable and thus serves as an indicator of fi- nancial condition of a firm. He subjectively sets two dimensions, one consisting of(m i ;Q i;t1 ;Q 2 i;t1 ; Q 3 i;t1 ;D i;t1 ;Q i;t1 D i;t1 ) which has no threshold, the other one consisting of CD i;t1 with thresh- old as some values of D i;t1 . The first dimension not only contains two fundamental characteristics of firm, Q i;t1 and D i;t1 , but also introduces non-linear terms in order to reduce the possibility of spurious correlations due to omitted variable bias. Hansen (1999) mentions the two dimensions are adopted because his focus is to see investment’s sensitivity to cash flow. Under this framework, he designs a likelihood ratio test and determines the number of thresholds is two. Therefore Hansen (1999) indeed admits dimension heterogeneity but chooses to ignore the effect of other dimensions. 1.5.2 Estimation We apply our orthogonal projection estimator to reexamine the modeling in Hansen (1999). Note there are two main differences from Hansen (1999). First of all, we would like to show if there is an intercept term that is subject to any threshold structure, so we just replace the fixed effect by an intercept term. Secondly, we classify the regressors that contains Q i;t and D i;t into one dimension denoted as QD. This is because we may have correlation between those regressors, and more 33 examinations are needed to ensure such correlation will not affect our estimation on coefficients of CF i;t . Table 1.4 displays the estimated threshold numbers and values determined by HQ criterion. The test results show three dimensions are completely different from each other in threshold structures. The intercept dimension and CF dimension has only one threshold while the QD dimen- sion has two thresholds. In terms of value, thresholds of intercept dimension and QD dimension are small and thus remark there is excess heterogeneity among financially unconstrained firms. Compared to that, the threshold of CF dimension is high and can be regarded as evidence that financially constrained firms are different from financially unconstrained firms in the sensitivity of cash flow to investment. Also note Hansen (1999) observe two thresholds by neglecting the effect of intercept dimension and QD dimension, while by accounting the two dimensions there is only one threshold in CF dimension. Next we adopt the threshold estimated above to generate threshold regressors and analyze by least squares estimation. The estimation results are shown in table 1.5. Note most of the regressors in in intercept and QD dimensions are significant. Despite that, the coefficient of debt level D i;t is insignificant when the level of D i;t is higher than the lower threshold. It means the influence of debt on investment spending is not through itself but rather other firm characteristics. This fact can also be verified by the insignificance of coefficients for the variable Q i;t D i;t . Now for the cash flow dimension, the value of coefficient decreases and becomes insignificant after the debt level is too high. It indicates when a firm’s debt level is too high, it will indeed not rely on its own cash flow as firms that have high debt are generally not rich in cash. This finding about high debt level regime is consistent with results for previous work such as Gonz´ alez et al. (2004) and Hansen (1999). Overall, we have shown that modeling investment spending should account for the dimension heterogeneity and our robust examination of this issue indicates the sensitivity of cash flow will not rise when the debt level is high. 34 1.5.3 Robust Examination with Mundlak Specification The above empirical analysis neglects the influence from the possible correlations between fixed effect and regressors. So to examine the robustness of the above empirical analysis, we consider Mundlak specification in estimation. In particular, following the discussion of (1.21), we add the time averages of variables for both QD dimension and CF dimension, such that now QD dimension include(Q i;t1 ;Q 2 i;t1 ; Q 3 i;t1 ;D i;t1 ;Q i;t1 D i;t1 ; ¯ Q i ; ¯ Q 2;i ; ¯ Q 3;i ;DQ i ) where ¯ Q i;t = 1 T T å t=1 Q 2 i;t1 ; ¯ Q 2;i = 1 T T å t=1 Q 2 i;t1 ¯ Q 3;i = 1 T T å t=1 Q 3 i;t1 ;DQ i = 1 T T å t=1 Q i;t1 D i;t1 while CF dimension include(CD i;t1 ;CD i ) where CD i = 1 T T å t=1 CD i;t1 : Then we can conduct the HQ information criterion and detect number of thresholds for each di- mension and the results are shown in Table 1.6. Compared to the model ignoring fixed effect, the threshold for intercept rise to a relatively high level while one threshold of low level is added to the QD dimension. As for the CD dimension, there appear two thresholds at the two ends of distribution of debt level. Then based on the estimated number of thresholds, we apply the projection estimation proce- dure and the results are shown in Table 1.7. Because time averages are used to control the possible correlations, we exclude results for them. Now it can be easily found that patterns of Table 1.7 resemble that of Table 1.6 to a large degree. In particular, the pattern still holds that the debt will has a positive effect on investment spending when the debt level is very low. More than that, when the debt level is higher than 0.3645, too much debt causes decrease in investment spending, which is not shown in table 1.6. As for cash flow, a threshold at low level appears such that below it cash flow’s effect on investment spending is lower than when the debt level is higher than that. Again, 35 we observe as debt is of high level, i.e. higher than 0.9142, the coefficient of CF i;t1 will become insignificant. So it is consistent with our previous findings that firm will not use cash at hand as sources of funding when the financial condition is sever. 1.6 Concluding Remarks In this paper, we study the dimension heterogeneity in threshold model of which thresholds are heterogeneous for different regressors. We provide an orthogonal projection estimator for this model, and derive the asymptotic properties of the estimator. Furthermore we have shown the es- timator can also be applied to threshold model with multiple dimensions and multiple thresholds. In addition to decreasing computation cost, our estimation is more efficient than standard thresh- old regression estimation especially when some thresholds from two different dimensions are very close. By applying our method, we find sensitivity of investment spending to cash flow is decreas- ing in debt level, which can be taken as empirical evidence to revise the corresponding corporate finance theory. In the future it is interesting to extend our method to nonlinear models such as probit model and logit model, and it is expected that these models can be utilized in application of credit card to improve prediction of default probability. Another extension is to involve more complicated structure of error term. For instance, the recently popular research in interactive fixed effect can be applied such that the errors can be cross sectional correlated. Besides, our applications resemble to any finance researches that are concerned about influence of debt on economic units including banks, mutual funds and governments. Thus researchers may consider using our methods to get robust estimation of thresholds in these cases, which will gain more efficiency in making policies especially when the estimated thresholds are significantly different from what they referred to currently. 36 Table 1.1: Simulation results of coefficients for Experiment A d 1 d 2 v N mean bias rmse IQR CP mean bias rmse IQR CP DGP1 0.2 100 0.334 0.034 0.149 0.120 0.984 0.707 0.007 0.101 0.135 0.954 200 0.316 0.016 0.062 0.085 0.946 0.698 -0.002 0.069 0.090 0.944 500 0.302 0.002 0.040 0.053 0.945 0.697 -0.003 0.042 0.054 0.948 1000 0.301 0.001 0.028 0.038 0.951 0.698 -0.003 0.032 0.043 0.957 0.5 100 0.328 0.028 0.155 0.126 0.986 0.703 0.003 0.103 0.138 0.949 200 0.315 0.015 0.063 0.085 0.945 0.696 -0.004 0.071 0.096 0.942 500 0.302 0.002 0.040 0.053 0.941 0.697 -0.003 0.042 0.053 0.941 1000 0.301 0.001 0.029 0.039 0.948 0.698 -0.003 0.032 0.044 0.953 0.8 100 0.322 0.022 0.181 0.134 0.979 0.698 -0.002 0.110 0.148 0.944 200 0.314 0.014 0.067 0.087 0.952 0.693 -0.007 0.075 0.101 0.974 500 0.301 0.001 0.041 0.054 0.949 0.697 -0.003 0.043 0.056 0.946 1000 0.301 0.001 0.030 0.039 0.951 0.697 -0.003 0.033 0.046 0.956 DGP2 0.2 100 0.348 0.048 0.134 0.137 0.935 0.697 -0.003 0.105 0.146 0.958 200 0.318 0.018 0.070 0.091 0.934 0.699 -0.001 0.071 0.094 0.955 500 0.303 0.003 0.043 0.057 0.944 0.702 0.002 0.044 0.057 0.948 1000 0.302 0.002 0.030 0.042 0.941 0.699 -0.001 0.031 0.041 0.943 0.5 100 0.344 0.044 0.145 0.144 0.945 0.697 -0.003 0.108 0.144 0.943 200 0.317 0.017 0.072 0.092 0.948 0.699 -0.001 0.071 0.095 0.948 500 0.304 0.004 0.044 0.058 0.955 0.701 0.001 0.045 0.060 0.949 1000 0.303 0.003 0.030 0.040 0.947 0.699 -0.001 0.031 0.042 0.949 0.8 100 0.340 0.040 0.158 0.144 0.945 0.697 -0.003 0.112 0.141 0.947 200 0.317 0.017 0.073 0.094 0.945 0.698 -0.002 0.074 0.100 0.958 500 0.304 0.004 0.045 0.061 0.957 0.700 0.000 0.047 0.063 0.948 1000 0.303 0.003 0.030 0.042 0.953 0.699 -0.001 0.032 0.043 0.953 DGP3 100 0.334 0.034 0.136 0.115 0.980 0.707 0.007 0.100 0.133 0.952 200 0.314 0.014 0.066 0.083 0.945 0.698 -0.002 0.069 0.090 0.952 500 0.304 0.004 0.040 0.053 0.947 0.697 -0.003 0.042 0.054 0.951 1000 0.300 0.000 0.028 0.039 0.952 0.698 -0.002 0.032 0.044 0.956 DGP4 100 0.319 0.019 0.088 0.110 0.943 0.704 0.004 0.097 0.121 0.946 200 0.310 0.010 0.061 0.083 0.954 0.702 0.002 0.066 0.091 0.950 500 0.303 0.003 0.038 0.052 0.947 0.698 -0.002 0.046 0.060 0.949 1000 0.301 0.001 0.027 0.035 0.955 0.697 -0.003 0.033 0.045 0.948 Note: The simulation is based on DGP 1-4 of experiment A. We provide mean estimates, bias, root mean squared error, IQR (inter-quantile range, 75% 25% percentile) and 95% percentage coverage probability (which is equivalent to 1 size). 37 Table 1.2: Simulation results of threshold parameters for Experiment A g 1 g 2 v N mean bias rmse IQR mean bias rmse IQR DGP1 0.2 100 1.030 0.030 0.499 0.246 0.500 0.000 0.101 0.057 200 1.013 0.013 0.196 0.113 0.500 0.000 0.040 0.031 500 1.005 0.005 0.092 0.038 0.500 0.000 0.017 0.019 1000 1.000 0.000 0.031 0.023 0.500 0.000 0.012 0.016 0.5 100 1.036 0.036 0.506 0.271 0.502 0.002 0.099 0.057 200 1.018 0.018 0.192 0.116 0.501 0.001 0.045 0.031 500 1.004 0.004 0.092 0.040 0.500 0.000 0.017 0.019 1000 0.999 -0.001 0.031 0.024 0.500 0.000 0.012 0.016 0.8 100 1.025 0.025 0.555 0.315 0.506 0.006 0.115 0.062 200 1.014 0.014 0.240 0.136 0.502 0.002 0.048 0.033 500 1.004 0.004 0.098 0.046 0.500 0.000 0.019 0.020 1000 0.998 -0.002 0.034 0.026 0.500 0.000 0.012 0.016 DGP2 0.2 100 0.614 -0.386 1.106 0.669 0.375 -0.125 0.523 0.223 200 0.857 -0.143 0.600 0.210 0.440 -0.060 0.271 0.107 500 0.988 -0.012 0.164 0.064 0.485 -0.015 0.105 0.035 1000 0.995 -0.005 0.076 0.040 0.496 -0.005 0.040 0.022 0.5 100 0.605 -0.395 1.094 0.665 0.367 -0.133 0.522 0.250 200 0.887 -0.113 0.584 0.180 0.440 -0.069 0.256 0.100 500 0.977 -0.023 0.188 0.071 0.485 -0.015 0.103 0.035 1000 0.997 -0.003 0.066 0.040 0.497 -0.003 0.029 0.021 0.8 100 0.634 -0.366 1.060 0.608 0.370 -0.130 0.527 0.216 200 0.861 -0.139 0.626 0.224 0.446 -0.054 0.260 0.094 500 0.972 -0.028 0.219 0.074 0.480 -0.020 0.110 0.036 1000 0.999 -0.001 0.064 0.042 0.497 -0.003 0.036 0.021 DGP3 100 1.029 0.029 0.466 0.256 0.499 -0.001 0.089 0.057 200 1.002 0.002 0.194 0.095 0.499 -0.001 0.041 0.029 500 1.005 0.005 0.086 0.040 0.500 0.000 0.017 0.019 1000 0.998 -0.002 0.031 0.024 0.500 0.000 0.012 0.016 DGP4 100 0.737 -0.263 0.935 0.457 0.511 0.011 0.123 0.067 200 0.912 -0.088 0.490 0.192 0.501 0.001 0.043 0.034 500 0.985 -0.015 0.168 0.072 0.500 0.000 0.018 0.020 1000 0.998 -0.003 0.061 0.042 0.499 -0.001 0.013 0.018 Note: The simulation is based on DGP 1-4 of experiment A. We provide mean estimates, bias, root mean squared error and IQR (inter-quantile range, 75% 25% percentile). 38 Table 1.3: Simulation results of threshold numbers for Experiment B 1st dimension 2nd dimension v N BIC AIC HQ BIC2 BIC3 BIC AIC HQ BIC2 BIC3 DGP5 0.2 100 11% 60% 40% 0% 0% 88% 85% 95% 27% 2% 200 61% 85% 90% 3% 0% 100% 92% 100% 95% 57% 500 100% 90% 99% 78% 17% 100% 94% 100% 100% 100% 1000 100% 84% 99% 100% 97% 100% 93% 100% 100% 100% 0.5 100 10% 57% 34% 0% 0% 85% 84% 94% 21% 1% 200 53% 85% 87% 2% 0% 100% 91% 99% 90% 42% 500 100% 91% 100% 66% 10% 100% 94% 100% 100% 100% 1000 100% 86% 99% 100% 92% 100% 94% 100% 100% 100% 0.8 100 4% 50% 25% 0% 0% 76% 85% 92% 14% 1% 200 4% 50% 25% 0% 0% 100% 91% 100% 78% 22% 500 98% 91% 99% 45% 2% 100% 95% 100% 100% 100% 1000 100% 89% 100% 99% 68% 100% 94% 100% 100% 100% DGP6 0.2 100 2% 40% 14% 0% 0% 4% 96% 97% 19% 1% 200 21% 84% 63% 0% 0% 100% 96% 100% 93% 45% 500 92% 78% 96% 23% 1% 100% 92% 100% 100% 100% 1000 97% 63% 80% 89% 36% 100% 87% 99% 100% 100% 0.5 100 1% 40% 11% 0% 0% 77% 96% 94% 11% 0% 200 17% 81% 55% 0% 0% 100% 95% 100% 87% 31% 500 88% 80% 98% 16% 0% 100% 96% 100% 100% 100% 1000 98% 65% 82% 86% 24% 100% 94% 100% 100% 100% 0.8 100 1% 36% 10% 0% 0% 70% 94% 91% 8% 0% 200 13% 79% 51% 0% 0% 94% 96% 100% 77% 19% 500 82% 77% 95% 9% 0% 100% 96% 100% 100% 100% 1000 98% 65% 82% 77% 15% 100% 95% 100% 100% 100% DGP7 100 86% 76% 91% 29% 4% 88% 86% 95% 25% 3% 200 100% 82% 98% 93% 55% 100% 92% 99% 92% 45% 500 99% 71% 94% 100% 100% 100% 93% 100% 100% 100% 1000 95% 55% 81% 100% 100% 100% 92% 100% 100% 100% DGP8 100 0% 12% 2% 0% 0% 5% 36% 18% 0% 0% 200 0% 21% 2% 0% 0% 19% 70% 45% 1% 0% 500 2% 64% 22% 0% 0% 73% 96% 94% 14% 2% 1000 17% 78% 69% 0% 0% 97% 95% 100% 56% 13% Note: The simulation is based on DGP 5-8 of experiment B. Five information criteria are considered, namely BIC, AIC, HQ, BIC2 and BIC3, of which accordinglyl N = logN,l N = 2,l N = 2loglogN,l N = 2logN,l N = 3logN.The number of threshold is selected from a pool of(0;1;2;3) and the table reports the determination frequency of the true factor number. 39 Table 1.4: The number and values of thresholds for each dimension Intercept QD CF Number of thresholds 1 2 1 1st threshold 0.0059 0.1842 0.7997 2nd threshold NA 0.3125 NA 3rd threshold NA NA NA Note: The table reports the number of thresholds selected for each dimension of investment spending model. The number of thresholds is selected from a pool of(0;1;2;3) by HQ information criterion. Table 1.5: Estimation results of investment spending model Dimension Status Value t-statistic P-value Intercept D i;t1 0:0059 0.038 12.247 0.000 D i;t1 > 0:0059 0.050 18.985 0.000 Q i;t1 D i;t1 0:1842 0.020 8.131 0.000 0:1842< D i;t1 0:3125 0.044 4.696 0.000 D i;t1 > 0:3125 0.047 9.248 0.000 Q 2 i;t1 D i;t1 0:1842 -0.002 -4.570 0.000 0:1842< D i;t1 0:3125 -0.012 -4.281 0.000 D i;t1 > 0:3125 -0.007 -3.522 0.000 Q 3 i;t1 D i;t1 0:1842 0.000 3.456 0.001 0:1842< D i;t1 0:3125 0.001 2.539 0.011 D i;t1 > 0:3125 0.000 1.886 0.059 D i;t1 D i;t1 0:1842 0.049 2.303 0.021 0:1842< D i;t1 0:3125 0.013 0.912 0.362 D i;t1 > 0:3125 0.002 0.377 0.706 Q i;t1 D i;t1 D i;t1 0:1842 0.016 1.396 0.163 0:1842< D i;t1 0:3125 -0.015 -0.459 0.646 D i;t1 > 0:3125 -0.003 -0.651 0.515 CF i;t1 D i;t1 0:7997 0.059 13.126 0.000 D i;t1 > 0:7997 0.015 0.694 0.488 Note: The table reports the estimation results for each dimension of investment spending model given the number of thresholds, which is given by HQ information criterion. 40 Table 1.6: The number and values of thresholds for each dimension Intercept QD CF Number of thresholds 1 2 1 1st threshold 0.0584 0.0072 0.0521 2nd threshold NA 0.1842 0.9142 3rd threshold NA 0.3645 NA Note: The table reports the number of thresholds selected for each dimension of investment and debt model where the fixed effect satisfies Mundlak specification. The number of thresholds is selected from a pool of(0;1;2;3) by HQ information criterion. Table 1.7: Estimation results of investment spending model Dimension Status Value t-statistic P-value Intercept D i;t1 0:0584 0.043 10.216 0.000 D i;t1 > 0:0584 0.044 10.550 0.000 Q i;t1 D i;t1 0:0072 0.030 4.616 0.000 0:0072< D i;t1 0:1842 0.031 6.175 0.000 0:1842< D i;t1 0:3645 0.055 5.363 0.000 D i;t1 > 0:3645 0.074 8.765 0.000 Q 2 i;t1 D i;t1 0:0072 -0.003 -2.744 0.006 0:0072< D i;t1 0:1842 -0.005 -4.832 0.000 0:1842< D i;t1 0:3645 -0.016 -5.662 0.000 D i;t1 > 0:3645 -0.014 -4.841 0.000 Q 3 i;t1 D i;t1 0:0072 0.000 1.836 0.066 0:0072< D i;t1 0:1842 0.000 4.183 0.000 0:1842< D i;t1 0:3645 0.001 3.936 0.000 D i;t1 > 0:3645 0.001 3.610 0.000 D i;t1 D i;t1 0:0072 3.042 1.781 0.075 0:0072< D i;t1 0:1842 -0.044 -1.277 0.202 0:1842< D i;t1 0:3645 -0.018 -0.783 0.434 D i;t1 > 0:3645 -0.020 -1.906 0.057 Q i;t1 D i;t1 D i;t1 0:0072 -1.554 -2.579 0.010 0:0072< D i;t1 0:1842 -0.003 -0.158 0.875 0:1842< D i;t1 0:3645 0.017 0.637 0.524 D i;t1 > 0:3645 -0.005 -0.708 0.479 CF i;t1 D i;t1 0:0521 0.034 3.033 0.002 0:0521< D i;t1 0:9142 0.081 11.012 0.000 D i;t1 > 0:9142 0.010 0.328 0.743 Note: The table reports the estimation results for each dimension of investment spending model where the fixed effect satisfies Mundlak specification. The number of thresholds is selected by HQ information criterion. 41 Chapter 2 Factor Dimension Detection in Panel Interactive Effects Models 2.1 Introduction In this chapter 1 we will study the panel interactive effects models, of which the form is y it = x 0 it b+ v it ; (2.1) v it = G 0 i f t + u it ; (2.2) has become popular in empirical analysis because it provides a parsimonious way to control the impact of possible correlation between x it and unobserved error across i and over t; v it ; on the esti- mation of the slope coefficientsb (e.g., Xu (2017), Hsiao and Zhou (2019), where y it denotes the scalar observation on the ith cross-sectional unit at time t for i= 1;:::;N;t= 1;:::;T , x it is a K1 vector of observed individual specific regressors on the ith cross-sectional unit at time t, andb is a K1 vector of unknown parameters. We assume the composite scalar error v it can be decomposed as the sum of p-dimensional unobserved individual effectsG i times the p-dimensional time-varying effects f t ; and u it is the scalar idiosyncratic error term assumed independent of(x it ;G i ;f t ): The di- mension of factors, p; is unknown to researchers. Since the dimension of factor structure p is usually unknown, the estimation of (2.1)-(2.2) re- quires simultaneously estimating(b;G i ;f t ): However, simultaneous estimation of panel interactive 1 This chapter is based on Cheng Hsiao, Yimeng Xie and Qiankun Zhou (2021). 42 effects model is computationally complicated. A feasible estimation procedure suggested by Bai (2009) is to iteratively estimateb conditional on the assumed dimension, say p ( j) ; then identifying p conditional on ˆ b ( j) recursively until convergence (e.g., Bai (2009), Moon and Weidner (2015, 2017, 2018). Although when both N and T are large as (N;T)!¥; Moon and Weidner (2015) have shown that as long as the specified dimension is greater than p; the Bai (2009) least squares es- timator remains consistent and asymptotically normally distributed. However, as shown by Lu and Su (2016), the finite sample properties of the Bai (2009) least square estimator is greatly affected when conditional on the factor structure greater than p. In this note, we propose an orthogonal projection approach that bypasses the issues arising from joint estimation ofb and p: The rest of the paper is organized as follows. We introduce the Bai and Ng (2002) model selection criterion for dimension determination and their modifications in finite sample in Section 2.2. Section 2.4 provides the model and describes a recursively iterating method. Section 2.5 introduces the orthogonal projection approach. Section 2.6 provides some simulation results to demonstrate the feasibility of using the orthogonal projection method to implement the BN model selection criterion. Concluding remarks are in Section 2.7. Proof of consistency is relegated to the online appendix. 2.2 The Bai and Ng (2002) Factor Model Dimension Determination Criterion and Their Finite Sample Modifications When v it is observed, we consider a factor model of the form v it =G 0 i f t + u it ; i= 1;:::;N;t= 1;:::;T; (2.3) whereG i and f t represent p-dimensional factor loadings and unobserved factors, respectively, u it is an idiosyncratic error term. Let v i =(v i1 ;:::;v iT ) 0 be a T 1 vector, F=(f 1 ;:::;f T ) 0 be a T p 43 matrix of factor andG=(G 1 ;:::;g N ) 0 be an N p matrix factor loading. We assume both matrices F andG are of rank p; and model (2.3) satisfies the assumptions in Bai and Ng (2002). When(N;T)!¥; Bai and Ng (2002) (BN) suggest selecting the dimension p in model (2.3) using the following information criterion (IC) of the form, min k IC(k)= logV(k)+ kg(N;T); (2.4) for k= 1;:::; p max using a priorly chosen p max ; where V(k)= 1 NT N å i=1 T å t=1 ˆ u 2 it (k); (2.5) with ˆ u it (k)= v it ˆ g (k)0 i ˆ f (k) t ; and ˆ f (k) t is a k 1 vector. The k T matrix ˆ F (k)0 = ˆ f (k) 1 ;:::; ˆ f (k) T are p T times the k-eigenvectors corresponding to the k-largest eigenvalues of the T T matrix of 1 NT N å i=1 v i v 0 i ; (2.6) where v i =(v i1 ;:::;v iT ) 0 ; and ˆ G (k)0 = 1 T ˆ F (k)0 V; (2.7) where ˆ G (k)0 = ˆ g (k) 1 ;:::; ˆ g (k) N and V=(v 1 ;:::;v N ): BN suggest three different forms of the penalty function g(N;T) for (2.4): IC 1 (k) = log V k; ˆ F (k) + k N+ T NT log NT N+ T ; (2.8) IC 2 (k) = log V k; ˆ F (k) + k N+ T NT logC 2 NT ; (2.9) IC 3 (k) = log V k; ˆ F (k) + k logC 2 NT C 2 NT ; (2.10) 44 where C NT = min p N; p T : For each information criterion, the number of factors is estimated as ˆ p l = argmin 0kp max IC l (k); l= 1;2;3; (2.11) where p max is a pre-specified positive integer. When (N;T)!¥; Bai and Ng (2002) show that (2.11) consistently identify the dimension of the factor structure, p: Alessi et al. (2010) introduce a tuning multiplicative constant in the penalty of the informa- tion criterion of Bai and Ng (2002). They suggest two “improved/modified” information criteria (hereafter, we refer them as ABC1 and ABC2, respectively): IC 1;c (k) = log h V k; ˆ F (k) i + c k N+ T NT log NT N+ T ; c> 0; (2.12) IC 2;c (k) = log h V k; ˆ F (k) i + c k N+ T NT logC 2 NT ; c> 0: (2.13) The estimated number of factors is now also a function of c and, depending on the chosen criterion, is given by ˆ p c;(N;T) = argmin 0kp max IC 1;c (k); or ˆ p c;(N;T) = argmin 0kp max IC 2;c (k): For implementation of the improved approach, Alessi et al. (2010) suggest to consider the subsamples of sizes N j ;T j with N 0 = 0< N 1 << N J = N and T 0 = 0< T 1 << T J = T: For any j, we can compute ˆ p c;(T j ;N j) , which is a non-increasing function of c. Due to the monotonicity of ˆ p c;(T j ;N j) as a function of c, between the “small” underpenalizing values of c and the “large” overpenalizing values of c, Alessi et al. (2010) argue that there must exist a range of “moderate” values of c such that ˆ p c;(T j ;N j) is a stable function of the subsample size N j ;T j . The stability with respect to sample size can be measured by the empirical variance of ˆ p c;(T j ;N j) as a function of j, i.e., S c = 1 J J å j=1 ˆ p c;(T j ;N j) 1 J J å l=1 ˆ p c;(T l ;N l ) ! 2 : (2.14) 45 Consequently, the search of c can be made automatic by considering the mapping c7! S c and by choosing ˆ p c;(N;T) = ˆ p ˆ c;(N;T) ; (2.15) where ˆ c belongs to an interval of c implying S c = 0: 2.3 The Model and the Recursive Iterating Procedure to Implement Bai and Ng (2002) Information Criterion Let y i = (y i1 ;:::;y iT ) 0 ; X i = (x i1 ;:::;x iT ) 0 ; F = (f 1 ;:::;f T ) 0 and u i = (u i1 ;:::;u iT ) 0 be T 1; TK; T p and T1 vector or matrix, respectively, and y t =(y 1t ;:::;y Nt ) 0 ; X t =(x 1t ;:::;x Nt ) 0 ; G = (g 1 ;:::;G N ) 0 and u t = (u 1t ;:::;u Nt ) 0 be N 1; N K; N p and N 1 vector or matrix, respectively. Model (2.1) and (2.2) can be stacked in either y i = X i b+ Fg i + u i ; i= 1;2;:::;N; (2.16) or y t = X t b+Gf t + u t ; t= 1;:::;T: (2.17) We assume the assumptions by Bai (2009) hold for model (2.16) or (2.17). To determine the dimension of the unobservable factors f t in model (2.1)-(2.2), a commonly used procedure is to recursively iterate the following steps until convergence: 2 2 Pesaran (2006) has proposed a common correlated effects (CCE) estimator forb that does not require the knowl- edge of the factor structure. However, the implementation of the CCE requires the data generating process of x it to satisfy certain conditions such as: x it =X i f t +e it ; (2.18) such that z it = y it x it = b 0 X i +l 0 i X i f t + v it +b 0 e it e it = C i f t + v it +b 0 e it e it ; (2.19) 46 Step 1. Conditional on ˆ b ( j) and for each k2(1;2;:::; p max ); find the k (T 1) eigenvectors such that ˆ F ( j+1) k = ˆ f ( j+1) 1(k) ;:::;f ( j+1) T(k) 0 that correspond to the k largest eigenvalues of the matrix N 1 N å i=1 y i X i ˆ b ( j) y i X i ˆ b ( j) 0 ; (2.21) and calculate ˆ b ( j+1) k = N å i=1 X 0 i M ( j+1) k X i ! 1 N å i=1 X 0 i M ( j+1) k y i ! ; (2.22) where M ( j+1) k = I T ˆ F ( j+1) k ˆ F ( j+1)0 k ˆ F ( j+1) k 1 ˆ F ( j+1)0 k ; and ˆ v ( j+1) i(k) = y i X i ˆ b ( j+1) k ; i= 1;:::;N; (2.23) ˆ G ( j+1)0 k = ˆ F ( j+1)0 k ˆ V ( j+1) k = ˆ g ( j+1) 1(k) ;:::; ˆ g ( j+1) N(k) ; (2.24) for each k2(1;2;:::; p max ); where ˆ V ( j+1) k = ˆ v ( j+1) 1(k) ;:::; ˆ v ( j+1) N(k) : Step 2. Compute ˆ u ( j+1) it(k) = ˆ v ( j+1) it(k) ˆ g ( j+1)0 i(k) ˆ f ( j+1) t(k) ; (2.25) and V ( j+1) (k)= 1 NT N å i=1 T å t=1 ˆ u ( j+1)2 it(k) ; (2.26) for each k2(1;2;:::; p max ): Step 3. Substitute (2.26) into (2.8) or (2.9) or (2.10) to find ˆ p ( j+1) ; and let ˆ b ( j+1) = ˆ b ( j+1) ˆ p ( j+1) : Step 4. Repeat Step 1-3 until ˆ p ( j) = ˆ p ( j+1) or stop at a pre-specified maximum number of iteration. where rank(C i )= p and plim N!¥ 1 N N å i=1 e it = 0: (2.20) Since not all the data generating processes of x it satisfy these conditions, for instance, x it could be orthogonal to f t ; in this paper, we consider identifying factor dimensions without imposing the Pesaran (2006) conditions on x it : 47 The recursively iterating procedure can lead to consistent selection of p as(N;T)!¥: How- ever, the computation is very intensive. Not only steps 1-3 require the computation of (2.21)-(2.26) for each k2(1;2;:::; p max ); it also requires iteration of those steps. 2.4 Orthogonal Projection Approach The consistency and asymptotic normality of the Bai (2009) type least squares estimator when (N;T)!¥ is based on the simultaneously estimated(b;G;F): However, simultaneously estimat- ing(b;G;F) is computationally very complicated. Neither can one be sure the convergent solution is a global minimum because the objective function is nonlinear in (b;G;F): The recursive esti- mation procedure is proposed to simplify the computation. Although it is shown by Moon and Weidner (2015) that the recursive procedure is consistent for (2.22) as long as ˆ p ( j) p; apart from the finite sample properties pointed out by Lu and Su (2016), there is also an issue of the initial es- timator when x it and v it are correlated. It is shown by Jiang et al. (2020) that if the initial estimator ˆ b (0) b = O p (1); the recursive estimation method may not converge at all no matter how many iterations are conducted. In this paper, we propose an orthogonal projection approach to bypass the issues of recursively estimating(b;G;F): Consider model (2.16), we note that using the projection matrix M i = I T X i X 0 i X i 1 X 0 i ; (2.27) we can eliminate X i b from (2.16) M i y i = M i Fg i + M i u i : (2.28) However, for the above transformed model (2.28), different individual i will have different X i ; hence different orthogonal projection matrix, M i and different M i F for each i: Thus, we need to find an identical orthogonal transformation matrix M that could eliminate X i b from model (2.16) 48 for each i while transforming F into MF; that is identical over i: We therefore propose to use the projection matrix M T = I T ˜ X ˜ X 0 ˜ X ˜ X 0 : (2.29) where ˜ X=(X 1 ;:::;X N ) and A denotes the Moore-Penrose generalized inverse of matrix A: 3 The projection matrix (2.29) is orthogonal to every X i ; i.e., M T X i = 0 for i= 1;:::;N: (2.30) Multiplying M T to the both sides of (2.16) yields M T y i = M T Fg i + M T u i ; i= 1;:::;N; (2.31) where M T F is identical for all i= 1;:::;N: For notational simplicity, we let y i = M T y i ; F = M T F and u i = M T u i ; then model (2.31) can be written compactly as y i = F G i + u i ; i= 1;:::;N: (2.32) The standard assumptions for the panel interactive effects model (2.16) is that x it andG 0 i f t make independent contributions to the outcomes, y it ; namely, for any K 1 vector b such that b 0 b= 1; b 0 plim (N;T)!¥ 1 NT T å t=1 X 0 t M G X t ! b6= 0; (2.33) and b 0 plim (N;T)!¥ 1 NT N å i=1 X 0 i M F X i ! b6= 0; (2.34) where M G = I N G(G 0 G) 1 G 0 and M F = I T F(F 0 F) 1 F 0 (e.g., Bai (2009), Moon and Weidner (2015, 2017)): 3 It should be noted that the properties of the projection matrix M T is invariant to the choice of ˜ X 0 ˜ X (e.g., page 374 of Borowiak (2001). 49 When 1 T F 0 F converges to a nonsingular matrix of rank p; one can then apply Bai and Ng (2002) information criterion to (2.32) to identify the rank of the factor structure. However, the requirement for rank 1 T F 0 F = p requires rank(M T ) p: If rank ˜ X ˜ X 0 ˜ X ˜ X 0 = T; then (M T ) is a null matrix and thus (2.32) does not exist. When M T is a null matrix, we may consider using the orthogonal projection matrix M N = I N ˇ X ˇ X 0 ˇ X ˇ X 0 ; (2.35) to eliminate X t b from (2.17), where ˇ X =(X 1 ;:::;X T ). Multiplying (2.35) to (2.17) for each t yields ˇ y t = ˇ G f t + ˇ u t ; t= 1;:::;T: (2.36) where ˇ y t = M N y t ; ˇ G= M N G and ˇ u t = M N u t . Provided 1 N ˇ G 0 ˇ G converges to an p p nonsingular matrix, Bai and Ng (2002) information criterion can again be applied to model (2.36) to select the dimension of factor structure. The orthogonal projection approach avoids the computationally intensive steps 1-3 in the re- cursively iterating procedure, nor does it need iteration. However, to implement the orthogonal projection approach, it is important to distinguish the case between T > N or N > T because the stability of the eigenvalues and eigenvectors depend on the stability of the NN or TT estimated covariance matrix of y i or ˘ y t : Although we know for sure if T N > K; M T will have rank greater than p and if T N < 1 K ; M N will have rank greater than p when (N;T)!¥: If 1 K T N K; we are not sure if rank(M T )> K or rank(M N )> K: one quick way to check this is to compute the trace(M T )> ¯ p or rank ˜ X ˜ X 0 ˜ X ˜ X 0 T ¯ p; (2.37) 50 where ¯ p p denotes the maximum rank assumed for the dimension of the factor structure in the model. If (2.37) is not satisfied, we can compute trace(M N )> ¯ p or rank ˇ X ˇ X 0 ˇ X ˇ X 0 N ¯ p; (2.38) before applying the orthogonal projection approach to (2.16) or (2.17). When either (2.37) or (2.38) holds, the standard assumptions for the panel interactive effects models (e.g., the assumptions made by Bai (2009)) ensures rank 1 T F 0 F = p or rank 1 N ˇ G 0 ˇ G = p: If neither (2.37) nor (2.38) hold, one will have to resort to the computationally cumbersome joint estimation of(b;G;F) bearing in mind that the convergence of the recursively iterative estimation procedure between ˆ b ( j) ; ˆ F ( j+1) and ˆ G ( j+1) depends critically on the initial estimator ˆ b (0) (Jiang et al. (2020)). However, most panel empirical applications use data set that could satisfy N T > K because typical econometric model specifications involve only finite K while N is substantially larger than T: For instance, the well known annual National Longitudinal Surveys of Young Women and Mature Women (NLSW) started in 1968 consisted of over 5000 individuals and ended in 2003, the National Longitudinal Surveys of Young Men and Older Men (NLSM) started in 1966 consisted of over 5000 individuals and ended in 1981, the Panel Study of Income Dynamics (PSID) began in 1968 with over 18000 individuals. The Household Income and Labour Dynamics Annual Survey in Australia (HILDA) began in 2001 with over 17000 individuals. 2.5 Simulation In this section, we consider the feasibility of the orthogonal projection method to implement the BN information criterion to select factor dimension in panel interactive effects model. For comparison, 51 we also consider the recursively iterating method discussed above and the method suggested by Bai (2009) CP(k) = ˆ s 2 (k)+ ˆ s 2 (p max ) k(N+ T) k 2 log(NT) NT ; (2.39) IC(k) = log ˆ s 2 (k) + k(N+ T) k 2 log(NT) NT ; (2.40) where ˆ s 2 (k)= 1 NT å N i=1 å T t=1 ˆ v (k)2 it and ˆ s 2 (p max )= 1 NT å N i=1 å T t=1 ˆ v (p max )2 it ; and to select the factor dimension that is ˆ p= argmin 0kp max CP(k); or ˆ p= argmin 0kp max IC(k): (2.41) We consider the following data generating process (DGP) as the base models for investigation. DGP: Model with three factors 4 y it = x 1;it b 1 + x 2;it b 2 +l 1;i f 1;t +l 2;i f 2;t +l 3;i f 3;t + u it ; (2.42) and x k;it =a x;ki +g k;1 f 1t +g k;2 f 2t +h k;it ; k= 1;2: For this DGPs, we assume thata x;1i ;a x;2i iid N(0;1) andh k;it =r ki h k;it1 +v k;it withr ki iid U(0:1;0:9) and v k;it iid 0;s 2 vk;i for k= 1;2 ands 2 v1;i ;s 2 v2;i are independent draws from 0:5 1+c 2 (2) : The common factors f j;t are i.i.d. draw from N(0;1) for j= 1;2;3; and the factor loadings are set as • l 1;i iid N(1;1),l 2;i iid N(1;0:5),l 3;i iid N(1;2). • g 1;1 iid U(0;1),g 1;2 iid U(0;2), andg 1;3 iid U(1;2). • g 2;1 iid U(1;1),g 2;2 iid U(1;2), andg 2;3 iid U(0;1). For the errors in (2.42), we consider the following three specifications. 4 Additional simulation results for model with 5 factors are available upon request. 52 Case 1: Heteroscedastic errors We assume the errors u it iid N 0;s 2 u;i wheres 2 u;i are independent draws from 0:5 1+c 2 (2) : Case 2: Serially correlated errors We assume the errors are generated as u it =r ui u it1 +e it ; (2.43) wherer ui IIDU(0:1;0:9) ande it iid N 0;s 2 e;i withs 2 e;i is independent draws from 0:5 1+c 2 (2) : Case 3: Weakly dependent errors We assume the errors are generated as u it = 1+ b 2 v it + bv i+1;t + bv i1;t ; (2.44) where b= 0:2 and v it iid N 0;s 2 v;i wheres 2 v;i are independent draws from 0:5 1+c 2 (2) : For the DGP described above, the true value of b 1 and b 2 are set at b 1 = 1 and b 2 = 2: The combinations of N= 50;100;200;500;1000 and T = 40;60;80;100;150;200 are used for simula- tion. The number of replication is set as 1000. For the implementation of the above approaches, we consider the projection model (2.32) when (2:37) is satisfied; and consider the projection model (2.36) when(2:38) is satisfied: 5 For the choice of the number of blocks in the ABC approach, fol- lowing Alessi et al. (2010), we let c max = 2 and J=[s y N=20] where s 2 y denotes the sample variance of the dependent variable y and[] denotes the integer part of. Moreover, following Bai and Ng (2002), we standardize the projection data for both the BN and ABC approaches. To implement the Bai (2009) iterative approach, for a given number of factors, we first use the pooled estimator as the initial estimator, and then iterate between the least squares and principal component analysis to estimate the coefficient and the dimension of factors, respectively. We set the maximum number 5 When neither(2:37) nor(2:38) is satisfied, we only report the results based on recursively iterating procedure. 53 of iteration at 10. Based on these estimates, we use the (2.39) and (2.40) to determine the dimen- sion of unobserved factors. Finally, for the implementation of the recursively iterating approach to implement IC, we set the maximum number of iteration to 5 to reduce the computation time. The estimated number of factors is the average of the number obtained in each replication. We also report the percentage of correctly estimating the number of factors for these cases. The results are provided in Table 2.1-2.6. Our limited simulation results appear to show that: (1) The Bai (2009) CP and IC tend to se- lect the factor dimension less than the true dimension; (2) Among the three information criterion suggested by Bai and Ng (2002), IC2 appears to dominate; (3) The Alessi et al. (2010) modified BN information criterion does improve the accuracy of selecting correct number of factors in finite sample. In general, the recursively iterating procedure to implement the Alessi et al. (2010) modi- fied IC performs very well; (4) The performance of the orthogonal projection method depends on the stability of the estimated T T covariance matrix of y i or or the N N covariance matrix ˘ y t . It is important to distinguish the relative size between N and T for the conditions(2:37) or(2:38) in implementing the orthogonal projection approach; (5) When N and T satisfy (2.37) or (2.38), the performance of the orthogonal projection method is comparable with the recursively iterating procedure of Alessi et al. (2010). However, the computation time for the orthogonal projection approach is at least 7 times faster than the recursively iterating approach based on Alessi et al. (2010) for case 1 when N= 1000 and T = 40 if we set the maximum number of iteration to 10. 6 2.6 Empirical Application For illustration, here we provide a simple empirical application of how our orthogonal projection method can be applied in practice to determine the factor dimension in panel interactive effects 6 For one replication of Case 1 when N= 1000;T = 40; the CPU time for the ABC approach is 12.11 seconds and 83.55 seconds for the orthogonal projection method and recursively iterating method with the maximum number of iteration equals 10, respectively. The computer running these codes is a Dell Precision Tower 7910 with two Intel Xeon E5-2698v4 CPUs and 64 Gb RAM based on Matlab R2019b. 54 model. We consider the model of Bloom et al. (2013) and Burda and Harding (2013) assuming the errors following an interactive structure, y it = x 0 it b+g 0 i f t + u it ; (2.45) where y it denotes whether firm had any patents in that year, and x it =(Lg Spillt, Lg Spills, Lg Malspt, Lg sale): 7 We consider the determination of the dimension of f t . The original data is an unbalanced panel with 729 firms from 1981-2001 (i.e., T max = 21) and contains 12928 observations in total. 8 Here we extract a subsample of balanced panel with 401 firms from 1981-2000 such that N = 401 and T = 20 for determination of the dimension of f t : Since N > T dim(x it ) where p max = 10; and trace(M N )= 321; not a null matrix, we therefore consider the orthogonal projection approach (2.36). Using the orthogonal projection approach discussed at Section 2.5, the number of factors is estimated in Table 2.7. Since our limited Monte Carlo simulations appear to indicate that ABC is more likely to correctly identify the dimension of common factors, we conclude that the dimension of unobserved factors in model (2.45) is 3. 2.7 Concluding Remarks In this paper, we suggest to implement the Bai and Ng (2002) information criterion to determine the dimension of factor structure in panel interactive effects models without simultaneously estimating the slope coefficients and factor structure. Monte Carlo simulations appears to show that our proposed orthogonal projection method performs well when data satisfies the condition that the 7 More specifically, Lg Spillt = lagged log of stock of tec weighted R&D, Lg Spills = lagged log of stock of sic weighted R&D, Lg Gmalpt = lagged log of stock of sic weighted R&D, Lg Sale = lagged log sales. See Bloom et al. (2013) and Burda and Harding (2013) for more details on these variables. 8 The data can be downloaded from the data archive of the Journal of Applied Econometrics at http://qed.econ.queensu.ca/jae/2013-v28.6/. 55 orthogonal projection matrix is not a null matrix, and it is also much faster in terms of the CPU time than the recursively iterating approach. 56 Table 2.1: Average Number of Factors Selected during Replications for Case 1 with three factors Projection Recursive N T BN ABC Bai ABC IC1 IC2 IC3 ABC1 ABC2 CP IC ABC1 ABC2 50 40 - - - - - 1 1 3 3 60 - - - - - 1 1 3 3 40 10 3 10 3 3 1 1 3 3 100 80 - - - - - 1 1 3 3 100 - - - - - 1 1 3 3 150 - - - - - 1 1 3 3 40 3 3 10 3 3 1 1 3 3 200 60 3 3 10 3 3 1 1 3 3 80 10 3 10 3 3 1 1 3 3 100 - - - - - 1 1 3 3 150 - - - - - 1 1 3 3 200 - - - - - 1 1 3 3 40 3 3 10 3 3 1 1 3 3 500 60 3 3 10 3 3 1 1 3 3 80 3 3 10 3 3 1 1 3 3 100 3 3 10 3 3 1 1 3 3 150 3 3 10 3 3 1 1 3 3 200 3.2 3 10 3 3 1 1 3 3 40 3 3 10 3 3 1 1 3 3 1000 60 3 3 10 3 3 1 1 3 3 80 3 3 10 3 3 1 1 3 3 100 3 3 10 3 3 1 1 3 3 150 3 3 10 3 3 1 1 3 3 200 3 3 10 3 3 1 1 3 3 Notes: 1. ”Projection” and ”Recursive” refer to orthogonal projection and recursive method, respec- tively. 2. ”BN”, ”Bai” and ”ABC” refers to the IC of Bai and Ng (2002), Bai (2009), and Alessi et al. (2010), respectively. 3. ”-” refers to the orthogonal projection does not work for this particular(N;T): 57 Table 2.2: Percentage of correctly estimating the number of factors for Case 1 with three factors Projection Recursive N T BN ABC Bai ABC IC1 IC2 IC3 ABC1 ABC2 CP IC ABC1 ABC2 50 40 - - - - - 0% 0% 98% 99% 60 - - - - - 0% 0% 99% 100% 40 0% 98% 0% 98% 99% 0% 0% 100% 100% 100 80 - - - - - 0% 0% 100% 100% 100 - - - - - 0% 0% 100% 100% 150 - - - - - 0% 0% 100% 100% 40 100% 100% 0% 100% 100% 0% 0% 100% 100% 200 60 100% 100% 0% 100% 100% 0% 0% 100% 100% 80 0% 100% 0% 100% 100% 0% 0% 100% 100% 100 - - - - - 0% 0% 100% 100% 150 - - - - - 0% 0% 100% 100% 200 - - - - - 0% 0% 100% 100% 40 100% 100% 0% 100% 100% 0% 0% 100% 100% 500 60 100% 100% 0% 100% 100% 0% 0% 100% 100% 80 100% 100% 0% 100% 100% 0% 0% 100% 100% 100 100% 100% 0% 100% 100% 0% 0% 100% 100% 150 100% 100% 0% 100% 100% 0% 0% 100% 100% 200 83% 100% 0% 100% 100% 0% 0% 100% 100% 40 100% 100% 0% 100% 100% 0% 0% 100% 100% 1000 60 100% 100% 0% 100% 100% 0% 0% 100% 100% 80 100% 100% 0% 100% 100% 0% 0% 100% 100% 100 100% 100% 0% 100% 100% 0% 0% 100% 100% 150 100% 100% 0% 100% 100% 0% 0% 100% 100% 200 100% 100% 0% 100% 100% 0% 0% 100% 100% See notes of Table 2.1. 58 Table 2.3: Average Number of Factors Selected during Replications for Case 2 with three factors Projection Recursive N T BN ABC Bai ABC IC1 IC2 IC3 ABC1 ABC2 CP IC ABC1 ABC2 50 40 - - - - - 1 1 3.3 3.2 60 - - - - - 1 1 3.2 3.1 40 10 3.6 10 3.3 3.2 1 1 3.3 3.2 100 80 - - - - - 1 1 3.1 3.1 100 - - - - - 1 1 3.1 3.1 150 - - - - - 1 1 3 3 40 8.6 2.9 10 3.1 3.1 1 1 3.1 3.1 200 60 9.8 3 10 3.1 3.1 1 1 3.1 3 80 10 3.1 10 3.1 3.1 1 1 3 3 100 - - - - - 1 1 3 3 150 - - - - - 1 1 3 3 200 - - - - - 1 1 3 3 40 8.1 3 10 3.1 3.1 1 1 3.2 3.2 500 60 8.6 3 10 3 3 1 1 3.1 3.1 80 9.2 3 10 3 3 1 1 3 3 100 9.6 3 10 3 3 1 1 3 3 150 10 3 10 3 3 1 1 3 3 200 10 3 10 3 3 1 1 3 3 40 6.9 3 10 3.1 3.1 1 1 3.2 3.2 1000 60 7.2 3 10 3 3 1 1 3 3 80 7.5 3 10 3 3 1 1 3 3 100 7.8 3 10 3 3 1 1 3 3 150 8.6 3 10 3 3 1 1 3 3 200 9.3 3 10 3 3 1 1 3 3 See notes of Table 2.1. 59 Table 2.4: Percentage of correctly estimating the number of factors for Case 2 with three factors Projection Recursive N T BN ABC Bai ABC IC1 IC2 IC3 ABC1 ABC2 CP IC ABC1 ABC2 50 40 - - - - - 0% 0% 72% 80% 60 - - - - - 0% 0% 79% 86% 40 0% 46% 0% 53% 56% 0% 0% 75% 80% 100 80 - - - - - 0% 0% 87% 91% 100 - - - - - 0% 0% 91% 93% 150 - - - - - 0% 0% 96% 97% 40 0% 91% 0% 87% 90% 0% 0% 90% 91% 200 60 0% 98% 0% 90% 92% 0% 0% 95% 97% 80 0% 88% 0% 87% 90% 0% 0% 98% 98% 100 - - - - - 0% 0% 99% 99% 150 - - - - - 0% 0% 100% 100% 200 - - - - - 0% 0% 100% 100% 40 0% 99% 0% 89% 90% 0% 0% 82% 83% 500 60 0% 100% 0% 96% 07% 0% 0% 94% 94% 80 0% 100% 0% 98% 98% 0% 0% 98% 99% 100 0% 100% 0% 99% 99% 0% 0% 99% 99% 150 0% 100% 0% 99% 99% 0% 0% 100% 100% 200 0% 98% 0% 98% 99% 0% 0% 100% 100% 40 0% 99% 0% 93% 93% 0% 0% 82% 82% 1000 60 0% 100% 0% 99% 99% 0% 0% 97% 98% 80 0% 100% 0% 100% 100% 0% 0% 99% 99% 100 0% 100% 0% 100% 100% 0% 0% 100% 100% 150 0% 100% 0% 100% 100% 0% 0% 100% 100% 200 0% 100% 0% 100% 100% 0% 0% 100% 100% See notes of Table 2.1. 60 Table 2.5: Average Number of Factors Selected during Replications for Case 3 with three factors Projection Recursive N T BN ABC Bai ABC IC1 IC2 IC3 ABC1 ABC2 CP IC ABC1 ABC2 50 40 - - - - - 1 1 3 3 60 - - - - - 1 1 3.2 3.1 40 10 2.9 10 3 3 1 1 3 3 100 80 - - - - - 1 1 3 3 100 - - - - - 1 1 3 3 150 - - - - - 1 1 3 3 40 3 2.9 10 3 3 1 1 3 3 200 60 3 3 10 3 3 1 1 3 3 80 10 3 10 3 3 1 1 3 3 100 - - - - - 1 1 3 3 150 - - - - - 1 1 3 3 200 - - - - - 1 1 3 3 40 3 3 10 3 3 1 1 3 3 500 60 3 3 10 3 3 1 1 3 3 80 3 3 10 3 3 1 1 3 3 100 3 3 10 3 3 1 1 3 3 150 3 3 10 3 3 1 1 3 3 200 3.7 3 10 3 3 1 1 3 3 40 3 3 10 3 3 1 1 3 3 1000 60 3 3 10 3 3 1 1 3 3 80 3 3 10 3 3 1 1 3 3 100 3 3 10 3 3 1 1 3 3 150 3 3 10 3 3 1 1 3 3 200 3 3 10 3 3 1 1 3 3 See notes of Table 2.1. 61 Table 2.6: Percentage of correctly estimating the number of factors for Case 3 with three factors Projection Recursive N T BN ABC Bai ABC IC1 IC2 IC3 ABC1 ABC2 CP IC ABC1 ABC2 50 40 - - - - - 0% 0% 96% 98% 60 - - - - - 0% 0% 96% 98% 40 0% 94% 0% 97% 97% 0% 0% 100% 100% 100 80 - - - - - 0% 0% 100% 100% 100 - - - - - 0% 0% 100% 100% 150 - - - - - 0% 0% 100% 100% 40 100% 97% 0% 100% 100% 0% 0% 100% 100% 200 60 99% 100% 0% 100% 100% 0% 0% 100% 100% 80 0% 100% 0% 100% 100% 0% 0% 100% 100% 100 - - - - - 0% 0% 100% 100% 150 - - - - - 0% 0% 100% 100% 200 - - - - - 0% 0% 100% 100% 40 100% 100% 0% 100% 100% 0% 0% 100% 100% 500 60 100% 100% 0% 100% 100% 0% 0% 100% 100% 80 100% 100% 0% 100% 100% 0% 0% 100% 100% 100 100% 100% 0% 100% 100% 0% 0% 100% 100% 150 100% 100% 0% 100% 100% 0% 0% 100% 100% 200 50% 100% 0% 100% 100% 0% 0% 100% 100% 40 100% 100% 0% 100% 100% 0% 0% 100% 100% 1000 60 100% 100% 0% 100% 100% 0% 0% 100% 100% 80 100% 100% 0% 100% 100% 0% 0% 100% 100% 100 100% 100% 0% 100% 100% 0% 0% 100% 100% 150 100% 100% 0% 100% 100% 0% 0% 100% 100% 200 100% 100% 0% 100% 100% 0% 0% 100% 100% See notes of Table 2.1. Table 2.7: Determination of dim(f t ) in model (2.45) Projection Recursive BN ABC Bai ABC IC1 IC2 IC3 ABC1 ABC2 CP IC ABC1 ABC2 3 1 10 3 3 2 2 3 3 62 Chapter 3 Semiparametric Least Squares Estimation of Binary Choice Panel Data Models with Endogeneity 3.1 Introduction In this chapter 1 we will study binary choice panel data model which is frequently used in empirical economics research and has the advantage of incorporating heterogeneity in individual responses. The model requires a standard conditional exogeneity assumption, which can be violated because of the measurement error, simultaneity, or omitted variable problem. This paper proposes a non- linear least squares estimator of such a model, where we further assume the distribution of the error term to be unknown. Our method relies on the semiparametric estimation of the conditional expectation of the choice variable and accounts for both the correlated unobserved heterogeneity and endogeneity due to correlation with an idiosyncratic error. In this paper, the potential correlation between the unobserved time-invariant effect and exoge- nous covariates is addressed using a correlated random effects approach, which is free from the incidental parameters problem associated with the fixed effects estimation. Specifically, we model the unobserved effect using the Mundlak (1978) and Chamberlain (1980) specification, which as- sumes that the unobserved effect is a function of observed independent variables. Although such procedure induces serial correlation at the individual level, we show that in short panels standard 1 This chapter is based on Semykina, Xie, Yang and Zhou (2021). 63 asymptotic theory applies. Additionally, following Song (2017), we assume that there is at least one continuous independent variable. Under such a condition, it is shown that unlike Manski (1985), we can identify parameters without necessarily requiring the unboundedness of other in- dependent variables. Because variables in applied economics research are usually bounded, this assumption is more suitable for observational data. Endogeneity due to correlation with time-varying unobservables (idiosyncratic error) is ad- dressed using a control function method. In a parametric setting, this method was initially proposed by Smith and Blundell (1986) and Rivers and Vuong (1988). Recently, Blundell and Powell (2004), Rothe (2009), and Song (2017) used the control function technique to correct for endogeneity in semiparametric binary response models for cross section data. Generally, control function meth- ods have been used for estimating a wide variety of linear and nonlinear models (e.g., Wooldridge (2015)). Our estimator is developed from the least squares estimator of a single index model considered by Ichimura (1993) and can be applied to panel data. The asymptotic theory is developed based on Chen et al. (2003). This strategy is commonly adopted in the literature related to the present paper, such as Rothe (2009) and Song (2017). However, the latter two papers impose the IID assumption, similar to Chen et al. (2003) and Blundell and Powell Blundell and Powell (2004), and focus on cross section data. In this paper, we consider panel data and assume independence across cross- section units, but permit serial correlation at the individual level. We formulate a model that accounts for the specificities of panel data, show that the proposed estimator is consistent, and derive its asymptotic distribution, which is different from the asymptotic distribution discussed in the previous studies of cross section models. The rest of the paper is organized as follows. The model is presented in Section 3.2, followed by the discussion of the estimator and identification conditions in Section 3.3. Asymptotic theory is provided in Section 3.4. Monte Carlo simulations and empirical application are discussed in Sections 3.5 and 3.6, respectively. Section 3.7 provides concluding remarks. 64 Here we specify notations used in this paper.jjjj denotes the Frobenius norm of vector,jjjj ¥ denotes the uniform norm of vector, and vec denotes the vectorization operator. For arbitrary variables a and b, a_ b= max(a;b), denotes “is distributed as,”! p denotes convergence in probability and! d denotes convergence in distribution. For a sequence of random variablesfX N : N= 1;2;:::g, X N = o p (a N ) means that X N =a N converges in probability to zero as N goes to infinity, while X N = O p (a N ) means that X N =a N is stochastically bounded. Moreover, for m being a k- vector of nonnegative integers, we define: (i)jmj =å k i=1 m i ; (ii) for any function f(x) on R k , ¶ m x f(x)=¶ jmj f(x)=(¶ m 1 x 1 ;:::;¶ m k x k ) and (iii) x m =P k i=1 x m i i . 3.2 The Model Consider a binary response panel data model of the form y it1 = y 0 it2 a 0 + x 0 it b 0 c i1 u it1 ; (3.1) y it1 = 1[y it1 > 0]; i= 1;:::;N;t= 1;:::;T; (3.2) where y it1 is a latent outcome for unit i in period t, y it1 is the observed binary outcome, and 1[] is an indicator function, which equals one if the condition in brackets holds and is zero otherwise; y it2 and x it are k e 1 and k x 1 vectors of time-varying covariates, respectively, which may be correlated with the unobserved individual effects, c i1 . Here, we assume that variables in y it2 are continuous and could be endogenous in the sense that they may be correlated with the idiosyncratic error, u it1 , because of simultaneity, measurement error, or an omitted time-varying covariate. The parameters of interest in the above model area 0 andb 0 . We consider the panel structure with large N and fixed T . Let z it =(x 0 it ;z 0 it1 ) 0 be a k z 1 vector of instruments, k z k x + k e . Assume that endogenous variables, y it2 , are determined by the following reduced form equations: y it2 = H it g 0 + c i2 + u it2 ; (3.3) 65 where H it is a k e (k e k z ) block diagonal matrix with vectors z 0 it on its principal diagonal, g 0 = (g 0 01 ;:::;g 0 0k e ) 0 , g 0 j is k z 1 for j= 1;:::;k e , and c i2 is a k e 1 vector of unobserved effects that may be correlated with H it . In (3.3), instruments are assumed to be strictly exogenous conditional on the unobserved effect, so that for Z i =(z 0 i1 ;:::;z 0 iT ), we have u it2 j(Z i ;c i2 ) u it2 j(z it ;c i2 ) u it2 : (3.4) Note that the strict exogeneity assumption in (3.4) implies that z it is conditionally independent of u ir for all t and r. For example, it rules out the possibility of a “feedback effect” (a situation where a shock to the outcome in period t affects the value of the instrument in t+ 1) and implies that z it cannot include the lagged dependent variable. To account for the correlation between covariates and unobserved effects, we model c i1 and c i2 as c i1 = h 1 (Z i )+ a i1 ; (3.5) c i2 = h 2 (Z i )+ a i2 ; (3.6) where h 1 ()= E(c i1 jZ i ), a i1 = c i1 E(c i1 jZ i ); h 2 ()= E(c i2 jZ i ), and a i2 = c i2 E(c i2 jZ i ). In practice, most studies use Mundlak (1978) specification, h 1 (Z i ) = ¯ z 0 i x 01 ; (3.7) h 2 (Z i ) = ¯ H i x 02 ; (3.8) 66 where ¯ z i = T 1 å T t=1 z it , ¯ H i is a k e (k e k z ) block diagonal matrix with vectors ¯ z 0 i on its prin- cipal diagonal, x 02 = x 0 021 ;:::;x 0 02k e 0 , x 02 j is k z 1, j = 1;:::;k e . Another possible choice is Chamberlain (1980) specification, h 1 (Z i ) = Z i z 01 ; (3.9) h 2 (Z i ) = ˜ H i z 02 ; (3.10) where ˜ H i is a k e (k e k z T) block diagonal matrix with vectors Z i on its principal diagonal, and z 02 = z 0 021 ;:::;z 0 02k e 0 , where z 02 j is (k z T) 1 for j= 1;:::;k e . Mundlak’s specification is a special case of Chamberlain’s model, as it imposes restrictions on coefficients. See, for example, Hsiao and Zhou (2018) for a discussion on these two specifications. 2 Remark 3.2.1. These specifications of the unobserved effects play an important role in our model. Functions h 1 (Z i ) and h 2 (Z i ) capture the correlation between instrumental variables and unob- served effects, while the remaining unobserved effects, a i1 and a i2 , are mean independent of the instruments. That is, these specifications allow us to decompose the unobserved effect into the observed and unobserved components. Importantly, the distribution of a i1 and a i2 is unrestricted. Remark 3.2.2. Because correlation with the unobserved effect in Mundlak’s and Chamberlain’s models is captured through a linear function of instruments in all time periods, identification is possible only when the instrumental variables are time-varying. For variables that are constant over time, their causal effects cannot be distinguished from the impact of the unobserved effect, so the corresponding population parameters are not identified. However, time constant explanatory variables can be used as controls. 2 It would be optimal to allow arbitrary correlation between the unobserved effect and explanatory variables. How- ever, Chamberlain (2010) has shown that when the observed covariates have bounded support, identification is only possible when the error term has logistic distribution. 67 Let v it1 = a i1 + u it1 and v it2 = a i2 + u it2 . Then, model (3.1) and (3.2) can be rewritten as y it1 = 1 y 0 it2 a 0 + x 0 it b 0 h 1 (Z i ) v it1 0 ; (3.11) y it2 = H it g 0 +h 2 (Z i )+ v it2 ; i= 1;:::;N;t= 1;:::;T: (3.12) Remark 3.2.3. Although we assume linearity for the reduced form equation, this specification is not essential for us to derive our main theoretical results. As a matter of fact, one could also consider a more general version of the reduced form equations, y it2 = h t (Z it )+ v it2 : (3.13) where h t is the same for all units at time t and can be unknown. Since our estimation procedure is based on a control function method, all we need is to obtain ˆ v it2 = y it2 ˆ h t (Z it ). This can be achieved by employing any asymptotically normal estimator with bounded variance. The qualified choices include, but are not limited to estimators proposed by Ichimura (1993) and Klein and Spady (1993). For notational simplicity, in what follows we use (3.3) and (3.12). In equations (3.11)-(3.12), following Blundell and Powell (2004) and Semykina and Wooldridge (2018), we assume v it1 j(Z i ;y it2 ) v it1 j(Z i ;v it2 ) v it1 jv it2 ; (3.14) v it2 jZ i v it2 ; i= 1;:::;N;t= 1;:::;T: Under assumption (3.14), the endogeneity problem can be addressed by including v it2 (control functions) as additional covariates in (3.12). If(v it1 ;v it2 ) have a bivariate normal distribution, the system can be estimated by MLE. In this paper, we do not make distributional assumptions, but instead consider a semiparametric approach proposed by Rothe (2009) and Song (2017) and extend their approach to panel data models with fixed T . 68 Note that a similar control function approach can be used with pooled independent cross sec- tions. In that case, cross section units are different in each time period, which makes it impossible to account for the unobserved time-constant effect. Instead, a comprehensive set of controls should be used to ensure that assumption (3.14) holds. On the positive side, errors are serially independent when using pooled independent cross sections. This would make the asymptotic distribution of the estimator more similar to the cross section case, although not identical because error variances may change over time. Remark 3.2.4. Given that the reduced form equation is linear, one could estimate the parameters using the first-difference estimator. In that case, it would only be possible to obtain residuals ˆ v it2 = y it2 H it ˆ g 0 , which correspond to the composite error c i2 + u it2 . Because the unobserved effects in the main and reduced form equations are likely correlated, x it may be correlated with c i2 and the composite error c i2 + u it2 . Hence, ˆ v it2 = y it2 H it ˆ g 0 is not a valid control function, as it violates assumption (3.14). Therefore, we do not pursue this approach in this paper. 3.3 Identification and Estimation In this section, we consider the identification and semiparametric least squares estimation for model (3.11)-(3.12). 3.3.1 Identification For illustration purposes, we adapt the Mundlak (1978) specification in this paper forh 1 (Z i ) and h 2 (Z i ); i.e., h 1 (Z i )= ¯ z 0 i x 01 and h 2 (Z i )= ¯ z 0 i x 02 . Let w it =(y it2 ;x 0 it ; ¯ z 0 i ) 0 and p 0 =(a 0 0 ;b 0 0 ;x 0 01 ) 0 with the dimension dim(w it )= k w = k e + k x + k z . Then, model (3.11) can be rewritten as y it1 = 1[w 0 it p 0 v it1 0]; i= 1;:::;N;t= 1;:::;T: (3.15) 69 Furthermore, define F t (;v it2 ) as a function mappingR to[0;1]. Throughout this paper we use it to define the conditional cumulative distribution function (CDF) of v it1 given the value of v it2 , such that F t (;v it2 ) is the true conditional CDF of v it1 and ˆ F t (;v it2 ) is the corresponding estimator. Note that the conditional CDF is time-specific, so that the error distribution may change over time (e.g., the error variance may change over time). For identification of (3.15), it is required that the population parameter vectorp 0 2P is unique and satisfies the following equality: E(y it1 jw it ;v it2 )= E y it1 jw 0 it p 0 ;v it2 : (3.16) To that end, we make the following assumption: Assumption IC: (1) [Conditional CDF] For each t, function F t (;v it2 ) is differentiable and strictly increasing in its first argument on a setA with positive probability under the distribution of w it ; (2) [Continuity of w it ] Conditional on the control variable v it2 , the vector w it contains at least one continuously distributed component, w (1) it ; with nonzero coefficient; (3) [No collinearity] The span of the remaining components of w it ; w (1) it ; contains no proper linear subspace which has probability 1 under the distribution of w it . The above assumptions are the extensions of those in Rothe (2009). We note that Assumption IC(2) indicates the fact that having continuous endogenous variables in w it is not sufficient for identification as the continuity might be from v it2 . Therefore, it is necessary to have at least one continuous variable in either x it or z it to achieve identification. Theorem 3.3.1. If Assumption IC holds, then there exists a unique interior pointp 0 2P when the relationship E(y it1 jw it ;v it2 )= E(y it1 jw 0 it p 0 ;v it2 ) holds for w it 2A with positive probability. See the Appendix for a proof. 70 3.3.2 Semiparametric Least Squares (SLS) Estimation Once the model is identified, we can apply the semiparametric least squares (SLS) approach to estimate the parameters of interest p 0 in the model. The SLS is an M-estimator that utilizes the conditional mean of y it1 given(w it ;v it2 ) to obtain moment conditions. Suppose now the true value of error v it2 in the reduced form model (3.12) is known. Then, the SLS estimator can be obtained by minimizing the following function with coefficientsp, 1 N N å i=1 T å t=1 [y it1 E(y it1 jw it ;v it2 )] 2 : (3.17) Based on the fact that the conditional expectation of y it1 (3.17) is equal to the conditional CDF of v it1 , it can be rewritten as E[y it1 jw it ;v it2 ] = E(1[w 0 it p v it1 0]jv it2 ) = F t (w 0 it p;v it2 ): (3.18) If F t (;v it2 ) is known, then the objective function for model (3.17) has the form of ˜ S N (p)= 1 N N å i=1 T å t=1 [y it1 F t (w 0 it p;v it2 )] 2 : (3.19) Unfortunately, the conditional CDF F t (;v it2 ) is unknown in general, and thus ˜ S N (p) is infeasible in practice. Following the strategy in Blundell and Powell (2004) and Rothe (2009), we can replace the conditional CDF (3.18) by the Nadaraya-Watson estimator, ˆ F t (w 0 it p;v it2 )= ˆ p t (w 0 it p;v it2 ) ˆ q t (w 0 it p;v it2 ) ; (3.20) 71 where ˆ p t (w 0 it p;v it2 )= 1 N å j6=i k 1 w 0 it p w 0 jt p h 1 ! k 2 v it2 v jt2 h 2 y jt1 ; (3.21) ˆ q t (w 0 it p;v it2 )= 1 N å j6=i k 1 w 0 it p w 0 jt p h 1 ! k 2 v it2 v jt2 h 2 ; (3.22) withk 1 () andk 2 () denoting the kernel densities, and h 1 and h 2 denote the corresponding band- width. We note that either v it2 or v jt2 is known in (3.21) or (3.22), and we suggest replacing the control variable v it2 by ˆ v it2 from the initial estimation of the reduced form model (3.12). For instance, if we estimate (3.12) by pooled OLS to obtain estimators ˆ g and ˆ x 2 , then ˆ v it2 can be obtained by computing pooled OLS residuals, ˆ v it2 = y it2 H it ˆ g ¯ Z i ˆ x 2 : (3.23) Consequently, we can replace the objective function (3.19) with ˆ S N (p)= 1 N N å i=1 T å t=1 [y it1 ˆ F t (w 0 it p; ˆ v it2 )] 2 ; (3.24) and the resulting semiparametric least squares estimator (SLS) ofp 0 is given by ˆ p SLS = arg min p2R k w ˆ S N (p); (3.25) where k w = k e + k x + k z denotes the number of parameters in model (3.15). Although the focus of this paper is on estimating parameter vector (a 0 0 ;b 0 0 ), researchers may also be interested in estimating the average structural function (ASF), which was introduced by Blundell and Powell (2004). The ASF returns the probability that y it1 = 1 for the given values 72 of explanatory variables. For the binary response panel data model formulated here, the ASF for period t is obtained by averaging over the marginal distribution of error v it2 , G t (w 0 t p 0 )= Z F t (w 0 t p 0 ;v t2 )dF v t2 ; (3.26) where F vt2 is the CDF of v it2 at time t. The derivative of G t (w 0 t p 0 ) with respect to variable x k represents the marginal change in the response probability due to an exogenous change in x k and is analogous to the partial effect of x k in parametric binary response models. Note that when evaluating partial effects, function h 1 (Z i ) should be fixed, along with other observed variables (other than x k ). Moreover, it is possible to average ASF over t: 1 T T å t=1 Z F t (w 0 t p 0 ;v t2 )dF v t2 : (3.27) To obtain a consistent estimator of ASF, Blundell and Powell (2004) suggest estimating F t (w 0 t p 0 ;v t2 ) using the Nadaraya-Watson estimator, where y it1 is nonparametrically regressed on w 0 it ˆ p and ˆ v it2 . Rothe (2009) estimates F t (w 0 t p 0 ;v t2 ) by the local linear estimator. Then, the ASF for period t can be estimated as 1 N N å i=1 ˆ F t (w 0 t ˆ p SLS ; ˆ v it2 ); (3.28) for particular values of w t , which can be sample means, medians, or any other values. 3.4 Asymptotics In this section, we provide the asymptotic results for the above SLS estimator. 3.4.1 Consistency To show that the SLS estimator in (3.25) is consistent, we assume 73 Assumption CON: (1) [Compact space] Bothp 0 and ˆ p belong to a compact parameter space P and are interior points; (2) [Kernel function]k 1 () :R!R andk 2 () :R k e !R satisfy: R k j (u)du= 1; R u s k j (u)du= 0, for s= 1;;r1 for some r2N; R u r k j (u)du<¥;k j (u) is r times continuously differentiable for j= 1;2. (3) [Bandwidth] There exist constants c 1 andd 1 such that c 1 > 0 and 1=(2r+1)d 1 < 1=(4k e ). Also, there exist c 2 j > 0 and 1=(2r+ 1)d 2 j < 1=(4k e ) for j= 1;;k e . Then the bandwidth h 1 and h 2 satisfy h 1 = c 1 N d 1 , h 2 =(h 21 ;h 22 ;;h 2k e ), where h 2 j = c 2 j N d 2 j for j= 1;;k e . (4) [Lipchitz continuity] For each t, the conditional CDF F t (;v it2 ) is r times continuously differentiable for some r2N, and its r-th derivatives are Lipchitz continuous and bounded; (5) [Boundedness] The estimator ˆ v it2 satisfies max i;t kˆ v it2 v it2 k= o p (N 1=4 ). Also define D it = (H it ; ¯ H i ) and ˆ v it2 v it2 = N 1 å N j=1 å T s=1 g(D it ;D js )y js + r it , where y js is an influence function with E(y js jD js ) = 0, Var(y 2 js jD js ) <¥, E g(D it ;D jt ) 2 = o(N); g(D it ;D js ) denotes the weight function satisfying E g(D it ;D js ) 2 = o(n) and the remaining term r it satisfies max i;t kr it k= o p (N 1=2 ). Remark 3.4.1. Assumptions CON(2)-(3) define a standard bias-reducing kernel of order r, which are used to reduce asymptotic bias in the estimator of F t (;v it2 ). The inequality regardingd 1 and d 2 j indicates r > 2k e 1=2. Therefore, in practice it is suggested using higher order kernels. CON(4) puts restrictions on the functions F t (;v it2 ) and is standard for kernel smoothing theory. It is required to obtain uniform convergence rate of F t (;v it2 ) and its derivatives. Finally, CON(5) regulates the estimation bias in v it2 in the reduced form equations and requires that it is asymptoti- cally expanded by the influence function. Essentially ˆ v it2 v it2 = H it ( ˆ gg 0 )+( ˆ h 2 (Z i )h 2 (Z i )); so that CON(5) will hold as long as the estimators ˆ g and ˆ h 2 () can be approximated as sample averages and are consistent. This assumption is similar to Rothe (2009) and Song (2017), who consider cross section models. In the assumption above, the influence function accounts for the time dimension in addition to cross section dimension. Due to the presence of the unobserved time-constant component and serial correlation in idiosyncratic errors the composite errors will 74 be serially dependent. This does not affect consistency of the estimator, but affects the asymptotic variance, as discussed in the next section. Remark 3.4.2. In Assumption CON(5),g andh 2 are assumed to be the same for all t. Generally, it is possible to allow the reduced-form parameters to be time-specific. In that case, the first-stage estimation would have to be done separately for each t, and assumption CON(5) would have to be stated for a given t. The consistency and asymptotic normality arguments would be largely unchanged, but the asymptotic variance formula would become more complicated. For simplicity, we assumeg andh 2 to be time-constant and use assumption CON(5) above. The consistency of the SLS estimator (3.25) is established in the following theorem. Theorem 3.4.1. If Assumptions IC and CON hold, then as N!¥, we have ˆ p SLS ! p p 0 : (3.29) See the Appendix for a proof. 3.4.2 Asymptotic Distribution of the SLS Estimator In this section, we show the asymptotic normality of estimator ˆ p SLS . As a matter of fact, even if our main interest is to estimate p 0 in model (3.15) from the criterion function (3.24), the estima- tion procedure depends on the nonparametric estimation of the conditional CDFs, which in turn depends on the estimators of reduced-form parameters and introduces challenges in the proof. Pre- vious literature tackles similar issues in cross-section models. Here, we rely on Chen et al. (2003) to develop additional assumptions and establish the asymptotic normality of our SLS estimator of the specified panel data model. To begin with, we define a class of functions to denote the nuisance parameters(v it2 ;F t (;v it2 )) as following, h=f(v(y it2 ;z it );F t (;v(y it2 ;z it )) : i= 1;:::;N;t= 1;:::;Tg; (3.30) 75 where v(y it2 ;z it )= v it2 = y it2 H it g ¯ H i x 2 . LetH be the vector space of h, and the normkhk H is defined as khk H = max i;t (kv it2 k ¥ _kF t (;v it2 )k ¥ ): (3.31) We assume: Assumption NR: (1) [Entropy condition] There exists a spaceH , such that Pr(h2H )! 1, and R ¥ 0 p logN(e;H ;kk H )de¥, where N(e;H ;kk H ) is the covering number with respect to kk H norm of classes of function h, i.e., the minimal number of balls withkk H -radiuse needed to coverH . (2) [Rank condition] The matrixW= E T å t=1 ¶F t (w 0 it p 0 ;v it2 ) ¶p 0 ¶F t (w 0 it p 0 ;v it2 ) ¶p 0 0 is of full rank. Remark 3.4.3. Assumption NR(1) regulates the complexity of function space of h, which is a necessary condition for the proof of stochastic equicontinuity. Similar condition has also been imposed by most nonparametric estimators (e.g., Linton et al. (2008)) and it requires estimators of nuisance parameters to take values in some well-behaved function spaces with probability ap- proaching 1. Assumption NR(2) requires the asymptotic variance-covariance matrix to be positive definite, which is required to construct variance matrix of ˆ p SLS and is a standard assumption in the literature. Now, we can establish the asymptotic normality of the SLS estimator in the following theorem. Theorem 3.4.2. Under Assumptions CI, CON and NR, as N!¥; we have p N( ˆ p SLS p 0 )! d N 0;W 1 VW 1 ; (3.32) whereW= E T å t=1 ¶F t (w 0 it p 0 ;v it2 ) ¶p 0 ¶F t (w 0 it p 0 ;v it2 ) 0 ¶p 0 0 , and V is defined as V= E f i (p 0 ;h 0 )f i (p 0 ;h 0 ) 0 ; (3.33) with f i (p;h)= m i (p;h)+Y i (p;h); (3.34) 76 where m i (p;h)= T å t=1 (y it1 F t (w 0 it p;v it2 )) ¶F t (w 0 it p;v it2 ) ¶p ; (3.35) and Y i (p;h)= T å t=1 E " T å s=1 ¶F t (w 0 js p;v js2 ) ¶p ¶F t (w 0 js p;v js2 ) ¶v 0 js2 g(D js ;D it )jD it # y it ; (3.36) with influence functiony it and the loading g(;) defined in Assumption CON(5). See the Appendix for a proof. Remark 3.4.4. The proof is based on the Taylor expansion from the criterion function (3.24), namely p N( ˆ p sls p 0 )= ¶ 2 ˆ S N ( ¯ p) ¶ ¯ p¶ ¯ p 0 1 p N ¶ ˆ S N (p 0 ) ¶p 0 (1); (3.37) where ¯ p lies between p 0 and ˆ p SLS . The uniform convergence of derivatives of ˆ S N (p 0 ) (which is shown in the Appendix) together with the consistency of ˆ p SLS and ˆ v it2 implies ¶ 2 ˆ S N ( ¯ p) ¶ ¯ p¶ ¯ p 0 ! p W as N!¥: It is also shown in the appendix (Lemma C.8) that p N ¶ ˆ S N (p 0 ) ¶p 0 = 1 p N N å i=1 m i (p 0 ;h 0 )+ 1 p N N å i=1 Y i (p 0 ;h 0 )+ o p (1): (3.38) It should be noted that the construction of (3.38) show the cost of using semiparametric least squares estimation in the first step. For an infeasible estimator of coefficients p assuming v it2 is known, the second term on the right-hand side of (3.38) will disappear regardless of whether the function F t (;) is known. So the second term incurs a cost on estimating control variable and could increase the variance V for semiparametric least squares estimator (e.g., (3.33)-(3.34)). Similar observation is also found in Rothe (2009) for a cross-sectional model. To perform inference, an estimate of the asymptotic variance-covariance (VC) matrix of the asymptotic distribution (3.32) is needed. Unfortunately, the VC matrix in (3.32) depends on a number of unknown and complicated functions. For instance, matrix V depends on the influence 77 function, which depends on the choice of the estimator of (g 0 0 ;x 0 02 ) 0 . In order to obtain a feasible estimator of the asymptotic VC matrix, one can either consider a nonparametric panel bootstrap procedure as in Chen et al. (2003) or a modified sample moment estimator as in Rothe (2009). For the bootstrap approach, following Chen et al. (2003), we let y i1 =(y i1;1 ;y i1;2 ;:::;y i1;T ) 0 , Y i2 = (y i2;1 ;y i2;2 ;:::;y i2;T ) 0 , and Z i =(z i1 ;z i2 ;:::;z iT ) 0 . Then we can drawf(y i1 ;Y i2 ;Z i )g N i=1 randomly with replacement from the original dataf(y i1 ;Y i2 ;Z i )g N i=1 . The SLS estimator using the bootstrap sample is given by ˆ p = arg min p2R k w 1 N N å i=1 T å t=1 y it1 ˆ F t (w 0 it p; ˆ v it2 ) 2 : (3.39) Chen et al. (2003) show that p N( ˆ p ˆ p SLS ) has the same asymptotic distribution as p N( ˆ p SLS p 0 ). On the other hand, the disadvantage of the nonparametric bootstrap procedure is that it is ex- tremely computationally expensive. Therefore, for practical purposes, here we follow Rothe (2009) to provide an estimator for the VC matrix in (3.32). Following the derivation in the Appendix, we are able to show a uniform convergence of ˆ F t (; ˆ v it2 ) to F t (;v it2 ) which also holds for the derivative of F t . ThenW can be estimated using a sample analogue, ˆ W= 1 N N å i=1 " T å t=1 ¶ ˆ F t (w 0 it ˆ p SLS ; ˆ v it2 ) ¶ ˆ p ¶ ˆ F t (w 0 it ˆ p SLS ; ˆ v it2 ) ¶ ˆ p 0 # : (3.40) For matrix V in (3.32), the exact form of the influence function depends on the particular estima- tor we adopt for equation (3.12). When function h 2 () in (3.12) is unknown, one can consider semi/non-parametric estimation, and the resulting influence function can be constructed based on the semi/non-parametric regression. 3 For illustration purposes, here we consider a simple case of 3 See page 55 of Rothe (2009) for an example of the influence function based on a kernel regression. 78 (3.12) (e.g.,h 2 () is specified in (3.5)) and use the pooled OLS estimator. The associated terms in the asymptotic expansion of ˆ v it2 v it2 are defined as follows: g(D js ;D it ) = D js ; (3.41) y it = 1 N N å l=1 T å p=1 D 0 l p D l p ! 1 D 0 it v it2 ; (3.42) where D it =(H it ; ¯ H i ): Consequently,Y i (p;h) in (3.36) can be rewritten as Y i (p;h)= T å t=1 T å s=1 E 2 4 ¶F t (w 0 js p;v js2 ) ¶p ¶F t (w 0 js p;v js2 ) ¶v 0 js2 D js 1 N N å l=1 T å p=1 D 0 l p D l p ! 1 3 5 D 0 it (v it2 ); (3.43) and thus V can be estimated as follows ˆ V= 1 N N å i=1 ˆ f i ( ˆ p SLS ; ˆ h) ˆ f 0 i ( ˆ p SLS ; ˆ h); (3.44) where ˆ f i ( ˆ p SLS ; ˆ h) = " 1 N N å j=1 T å s=1 ¶ ˆ F t (w 0 js ˆ p SLS ; ˆ v js2 ) ¶ ˆ p ¶ ˆ F t (w 0 js ˆ p SLS ; ˆ v js2 ) ¶ ˆ v 0 js2 D js # 1 N N å l=1 T å p=1 D 0 l p D l p ! 1 T å t=1 D 0 it (ˆ v it2 ) + T å t=1 y it1 ˆ F t (w 0 it ˆ p SLS ; ˆ v it2 ) ¶ ˆ F t (w 0 it ˆ p SLS ; ˆ v it2 ) ¶ ˆ p : (3.45) Combining (3.40)-(3.45) yields the estimator for the VC matrix in (3.32). 79 3.5 Monte Carlo Simulation To gain insights on the performance of the proposed estimator in finite samples we conduct limited Monte Carlo experiments. We consider the following data generating process (DGP) with a single endogenous variable: y it1 = 1[y it2 b 1 + z it1 b 2 c i1 > u it1 ]; (3.46) for i= 1;:::;N and t= 1;:::;T . The endogenous variable, y it2 , is generated by y it2 = z it1 g 1 + z it2 g 2 + z it3 g 3 + c i2 + u it2 ; (3.47) where z it =(z it1 ;z it2 ;z it3 ) 0 are exogenous variables. Let ¯ z i = T 1 å T t=1 z it . The unobserved individ- ual effects are generated as c i1 = ¯ z 0 i x 1 + a i1 ; (3.48) c i2 = h 2 + ¯ z 0 i x 2 + a i2 : (3.49) Note that inserting (3.48) into (3.46) yields y it1 = 1[w 0 it p v it1 0]; where w it =(y it2 ;z it1 ; ¯ z 0 i ) 0 , p =(1;b;x 0 1 ) 0 and v it1 = a i1 + u it1 . Substituting (3.49) into (3.47) gives y it2 =h 2 + z 0 it g+ ¯ z 0 i x 2 + v it2 ; where v it2 = a i2 + u it2 . Following Semykina and Wooldridge (2018), the exogenous variables z it j are generated by z it j = b i j +e it j , for j= 1;2;3, 80 where b i j are independent across i and distributed as N(0;1=4) with Corr b i j ;b i j 0 = 1=4, for j 0 = 1;2;3 and j 0 6= j;e it j are independent across i and t and distributed as N(0;3=4): The unobserved effects are generated as a i1 =ra i2 +e i , where a i2 IIDN 0;s 2 a ande it IIDN 0;(1r) 2 s 2 a : The idiosyncratic errors are generated as u it1 =ru it2 + e it . We consider the following designs for u it2 and e it : • Design I (Gaussian errors): For all i and t; u it2 IIDN 0;s 2 u ; e it IIDN 0;(1r) 2 s 2 u : (3.50) • Design II (Non-Gaussian errors): For all i and t; u it2 IID n s u c 2 (1) 1 = p 2 o ; e it IID n s e c 2 (1) 1 = p 2 o ; (3.51) wherec 2 (1) denotes a chi-square random variable with one degree of freedom. The true parameter values are set tob 1 = 1,b 2 = 1,g=(g 1 ;g 2 ;g 3 ) 0 =(2=3;2=3;1=3) 0 ;h 2 = 1, x 1 =(1=3;1=3;1=3) 0 , x 2 =(1=3;1=3;1=3) 0 , r = 0:5, (s a ) 2 = 1=4, s 2 u = 3=4, ands 2 e = 5: We focus on the estimated relative effect, ˆ b 2 = ˆ b 1 . In computing the SLS estimator, we treat the bandwidths, h 1 and h 2 , as additional parameters of the objective function as in Rothe (2009), and perform the following minimization with respect top, h 1 and h 2 jointly: ˆ p 0 SLS ; ˆ h 1 ; ˆ h 2 0 = arg min (p 0 ;h 1 ;h 2 ) 0 1 N N å i=1 T å t=1 [y it1 ˆ F t (w 0 it p; ˆ v it2 )] 2 ; (3.52) where ˆ F t (w 0 it p; ˆ v it2 )= å j6=i k 1 w 0 it pw 0 jt p h 1 k 2 ˆ v it2 ˆ v jt2 h 2 y jt1 å j6=i k 1 w 0 it pw 0 jt p h 1 k 2 ˆ v it2 ˆ v jt2 h 2 ; t= 1;:::;T; 81 and ˆ v it2 = y it2 ˆ h 2 z 0 it ˆ g ¯ z 0 i ˆ x 2 ; for i= 1;:::;N, is the residual from the initial pooled OLS estima- tion of the reduced form model for y it2 : After obtaining ˆ v it2 , but prior to solving the minimization problem, all regressors in w it and ˆ v it2 were orthogonalized by the Cholesky transformation. The estimated parameters are then recovered by the reverse transformation after the minimization. 4 Second-order Gaussian kernels were used in the computation. 5 For comparison, we computed the Two-step Probit estimator and the Two-step 2SLS estima- tor for a linear probability model, where the endogeneity was corrected using a control function approach, and the unobserved heterogeneity was modeled using the Mundlak device. We consider combinations of N = 250, 500, 1;000, and T = 3, 5, 10. The number of replica- tions is R= 1;000 for each experiment. The estimation results for the relative effects, ˆ b 2 = ˆ b 1 , are presented in Tables 3.1 and 3.2. For each estimator of b 2 =b 1 , we report bias, standard deviation (SD), root mean squared error (RMSE), median absolute deviation (MAD) and interquartile range (IQR). In addition, we compute the bootstrap standard errors averaged across 1;000 Monte Carlo samples. The number of bootstrap replications is B= 200 for the Probit and 2SLS estimators, and set to B= 100 for the SLS estimator due to high computational intensity. Analytical standard errors of the SLS estimator, averaged over all replications, are reported in the last column in Tables 3.1 and 3.2. Several patterns emerge. When errors have Gaussian distribution (Table 3.1), parametric es- timators tend to have smaller bias and less dispersion. However, as the sample size grows, the advantages of parametric methods diminish. For all N and T , the Two-step Probit estimator has the smallest RMSE thanks to both less bias and more precision. The latter finding is as expected, because under Design I the Probit model is correctly specified. On the other hand, the proposed SLS has quite comparable performance. 4 Specifically, let x it =(w 0 it ; ˆ v it2 ) 0 andS x be the sample covariance matrix of x it . By the Cholesky decomposition we haveS x = LL 0 , where L is a lower triangular matrix. Then we use ˜ x it = L 0 x it in the optimization given by (3.52) and obtain a vector of estimated parameters ˜ p. Finally the SLS estimates, ˆ p, are recovered from all but the last parameters in L( ˜ p 0 ;0) 0 : 5 We found no notable effect of trimming on the performance of the estimator, and hence, in accord with most of the related literature, no trimming was used in producing the following results. 82 When the error distribution is not Gaussian (Table 3.2), the SLS estimator tends to have a larger bias when N= 250. However, both the bias and the dispersion of the SLS estimator decrease with N. As a result, for each T , it has the smallest bias, RMSE, MAD, and IQR when N= 1;000. Note that bootstrap standard errors behave well in all experiments (both tables). Analytical standard errors are reasonably close to standard deviations for N = 1;000, but are rather inaccurate for smaller N. Hence, in empirical applications it is better to use bootstrap standard errors when performing the SLS estimation. 3.6 Empirical Application As an empirical application we study the labor force participation decisions of married women using the Panel Study of Income Dynamics (PSID) data, years 1982-1984. The sample includes 742 white women, ages 20-57, who were married in all three years. Thus, it is a balanced three-year panel. We consider the following model: inl f it = 1[b 1 nw f inc it +b 2 age it +b 3 educ it +b 4 kids it +a t c i1 > u it1 ]; (3.53) for i= 1;:::;N and t = 1;:::;T , where inl f it is an indicator equal to one if woman i reported positive work hours in year t, nw f inc is non-wife income, defined as the difference between the total family income and woman’s earnings (measured in thousands of dollars), age and educ are the woman’s age and years of schooling, respectively, kids is the number of children less than six years old,a t is a year-specific intercept, and c i1 is the unobserved individual effect. A significant portion of the non-wife income is husband’s earnings. As long as spouses’ labor market decisions are determined simultaneously, spousal earnings and, therefore, non-wife income is potentially endogenous. The reduced form equation is modeled as nw f inc it =g 1 age it +g 2 educ it +g 3 kids it +g 4 hage it +g 5 heduc it +h t + c i2 + u it2 ; (3.54) 83 where hage and heduc are husband’s age and years of education, respectively. The instruments (hage and heduc) are expected to affect non-wife income through their impact on husband’s earn- ings, but should have no partial effect on the woman’s labor force participation decision. We as- sume all explanatory variables except nw f inc are strictly exogenous conditional on the unobserved effect. Table 3.3 reports the descriptive statistics of the employed variables. To account for a non-zero correlation between the unobserved effect and covariates, the time means of exogenous variables were included in both (3.53) and (3.54). The time means of educ it and heduc it were not included due to no variation over time. This means that the effect of education cannot be separated from the impact of the unobserved heterogeneity and should be interpreted with caution. However, it does not affect the reliability of the estimated effects of other variables because educ it and heduc it serve as sufficient controls, as they capture both the direct effects and the impact of the unobserved heterogeneity. We estimate model parameters using the SLS and instrumental variables probit estimator (IV- Probit), which estimates both equations jointly by full MLE. For comparison, we also report results from a usual probit regression and a semiparametric least squares estimation that does not account for endogeneity (SLS-exog.). Because semiparametric methods can only estimate relative effects, we focus on discussing coefficient ratios. Moreover, only bootstrap standard errors are computed for the SLS and SLS-exog., as they behaved better in simulations. Table 3.4 summarizes estimation results. One notable finding is that ˆ b nwinc = ˆ b age are very similar for the IV-Probit and SLS. In contrast, the estimators that do not account for endogeneity produce much smaller relative effects of the non-wife income. Similarly, the semiparametric estimates of the coefficient ratios for education and number of children are slightly smaller when not correcting for endogeneity. Finally, we are interested in how the propensity of working would change as non-wife income changes while fixing all the other explanatory variables at their sample means. To this end, we estimate the average structural function (ASF) derived from the SLS estimation following (3.28), 84 where F t is estimated by the Nadaraya-Watson estimator with bandwidths selected by the leave- one-out least-squares cross-validation. Figures 3.1–3.3 present the estimated ASF for each year of our panel, respectively, and Figure 3.4 displays the ASF averaged over all three years. In each figure, the ASF is plotted over the 5–95% range of the non-wife income distribution observed in our sample ($7,721–$68,200). To examine the effect of correcting for endogeneity on the estimated marginal probability, we also estimate the ASF derived from the SLS-exog. estimation. As can be seen from the graphs, the probability of working decreases monotonically and significantly with non-wife income for all years. When endogeneity is accounted for, the probability of working drops from approximately 0.85 to 0.4, with slower (faster) decline at lower (higher) non-wife income level. In contrast, if endogeneity of non-wife income is neglected, we tend to underestimate (overestimate) the probability of working when non-wife income is low (high). In all but the third year of our panel, ignoring endogeneity results in a much flatter and almost linear estimate of ASF, with estimated probability of working lying in a narrower range of 0.8 to 0.6. These results exemplify the bias in the estimated response probability due to failure to control for endogeneity. 3.7 Concluding Remarks In this paper, we considered a semiparametric least squares method for estimating parameters in binary response panel data models with endogenous variables. The method permits a non-zero cor- relation between the covariates and unobserved time-invariant effect and accounts for endogeneity caused by the correlation with an idiosyncratic error. We show that the estimator is p N-consistent and asymptotically normal. The results of Monte Carlo experiments indicate that the estimator performs well in finite samples. When the error distribution is not normal, the SLS estimator has a smaller bias and less variance than a parametric IV-Probit estimator. An important advantage of the proposed estimator is that it can be used for estimating panel data models, where time dependence is present. Moreover, the employed least squares method 85 is less computationally intense than other existing approaches. Future research could consider extending the method to models with cross sectional dependence. 86 Table 3.1: Small sample properties of the estimators ofb 2 =b 1 for Design I (Gaussian errors) Bootstrap Average Bias SD RMSE MAD IQR SE SE T = 3 N= 250 SLS 0.0580 0.4360 0.4397 0.2330 0.4759 0.4235 0.2270 Two-step Probit 0.0273 0.2688 0.2701 0.1673 0.3383 0.2499 Two-step 2SLS 0.0366 0.3394 0.3412 0.1959 0.3967 0.3130 N= 500 SLS 0.0436 0.2480 0.2516 0.1569 0.3209 0.2328 0.1796 Two-step Probit 0.0286 0.1887 0.1908 0.1274 0.2535 0.1745 Two-step 2SLS 0.0365 0.2267 0.2295 0.1440 0.2879 0.2164 N= 1;000 SLS 0.0104 0.1506 0.1509 0.0995 0.2017 0.1438 0.1335 Two-step Probit 0.0057 0.1199 0.1200 0.0796 0.1619 0.1194 Two-step 2SLS 0.0060 0.1494 0.1495 0.0990 0.1986 0.1464 T = 5 N= 250 SLS 0.0141 0.2266 0.2269 0.1420 0.2937 0.2307 0.1933 Two-step Probit 0.0127 0.1699 0.1703 0.1163 0.2324 0.1688 Two-step 2SLS 0.0175 0.2098 0.2105 0.1354 0.2813 0.2082 N= 500 SLS 0.0141 0.1622 0.1627 0.1077 0.2145 0.1480 0.1485 Two-step Probit 0.0119 0.1242 0.1247 0.0748 0.1500 0.1191 Two-step 2SLS 0.0177 0.1478 0.1488 0.0988 0.2009 0.1459 N= 1;000 SLS 0.0044 0.0993 0.0994 0.0637 0.1277 0.0991 0.1106 Two-step Probit 0.0038 0.0821 0.0822 0.0514 0.1039 0.0829 Two-step 2SLS 0.0026 0.1000 0.1000 0.0655 0.1305 0.1007 T = 10 N= 250 SLS 0.0154 0.1448 0.1455 0.0908 0.1824 0.1419 0.1902 Two-step Probit 0.0116 0.1145 0.1150 0.0738 0.1453 0.1105 Two-step 2SLS 0.0153 0.1334 0.1342 0.0881 0.1762 0.1346 N= 500 SLS 0.0064 0.0986 0.0988 0.0673 0.1368 0.0952 0.1448 Two-step Probit 0.0045 0.0795 0.0796 0.0526 0.1048 0.0776 Two-step 2SLS 0.0050 0.0960 0.0961 0.0650 0.1303 0.0940 N= 1;000 SLS -0.0029 0.0670 0.0671 0.0446 0.0875 0.0649 0.1086 Two-step Probit -0.0030 0.0533 0.0534 0.0343 0.0683 0.0544 Two-step 2SLS -0.0020 0.0653 0.0653 0.0427 0.0840 0.0658 Notes: The DGP is given by (3.46), where the errors follow Gaussian distributions in Design I. The true value ofb 2 =b 1 is 1. The bootstrap standard errors of the Probit and 2SLS estimators are computed using B= 200 replications per sample and averaged across R= 1;000 Monte Carlo samples. The number of bootstrap replications for the SLS estimator is B= 100. 87 Table 3.2: Small sample properties of the estimators ofb 2 =b 1 for Design II (Non-Gaussian errors) Bootstrap Average Bias SD RMSE MAD IQR SE SE T = 3 N= 250 SLS 0.1583 0.7957 0.8109 0.2970 0.6329 0.9491 0.3487 Two-step Probit 0.1060 0.5177 0.5282 0.2853 0.5880 0.4994 Two-step 2SLS 0.1086 0.5464 0.5568 0.2796 0.5840 0.5042 N= 500 SLS 0.0334 0.3339 0.3354 0.1977 0.3962 0.3666 0.2279 Two-step Probit 0.0410 0.3394 0.3417 0.2056 0.4124 0.3203 Two-step 2SLS 0.0405 0.3376 0.3398 0.1975 0.3989 0.3200 N= 1;000 SLS 0.0100 0.2104 0.2106 0.1308 0.2642 0.2129 0.1646 Two-step Probit 0.0193 0.2215 0.2222 0.1475 0.2929 0.2172 Two-step 2SLS 0.0166 0.2207 0.2212 0.1435 0.2881 0.2159 T = 5 N= 250 SLS 0.0378 0.3322 0.3342 0.2017 0.4112 0.3945 0.2310 Two-step Probit 0.0363 0.3191 0.3210 0.2159 0.4280 0.3119 Two-step 2SLS 0.0339 0.3192 0.3208 0.2037 0.4144 0.3114 N= 500 SLS 0.0219 0.2173 0.2183 0.1471 0.2981 0.2289 0.1704 Two-step Probit 0.0257 0.2240 0.2253 0.1436 0.2857 0.2166 Two-step 2SLS 0.0263 0.2209 0.2224 0.1401 0.2870 0.2164 N= 1;000 SLS 0.0151 0.1477 0.1484 0.0951 0.1946 0.1470 0.1236 Two-step Probit 0.0164 0.1538 0.1546 0.0970 0.1949 0.1502 Two-step 2SLS 0.0159 0.1523 0.1530 0.0975 0.1964 0.1495 T = 10 N= 250 SLS 0.0179 0.2088 0.2094 0.1275 0.2636 0.2305 0.1744 Two-step Probit 0.0193 0.2032 0.2040 0.1329 0.2653 0.2004 Two-step 2SLS 0.0196 0.2037 0.2046 0.1333 0.2694 0.1998 N= 500 SLS 0.0088 0.1359 0.1361 0.0922 0.1851 0.1448 0.1293 Two-step Probit 0.0062 0.1405 0.1405 0.0918 0.1851 0.1397 Two-step 2SLS 0.0067 0.1388 0.1389 0.0903 0.1810 0.1392 N= 1;000 SLS 0.0074 0.0949 0.0952 0.0648 0.1307 0.0946 0.0973 Two-step Probit 0.0083 0.0978 0.0981 0.0673 0.1359 0.0986 Two-step 2SLS 0.0087 0.0984 0.0987 0.0675 0.1335 0.0981 Notes: The DGP is given by (3.46), where the errors follow chi-square distributions in Design II. The true value ofb 2 =b 1 is 1. The bootstrap standard errors of the Probit and 2SLS estimators are computed using B= 200 replications per sample and averaged across R= 1;000 Monte Carlo samples. The number of bootstrap replications for the SLS estimator is B= 100. 88 Table 3.3: Descriptive statistics Variable Mean Std. Dev. Min Max Labor force participation .6999 .4584 0 1 Nonwife income in $1,000 32.9073 25.0175 -14.9979 397.55 Age 37.4061 9.4811 20 57 Educ 12.7008 2.2018 2 17 Number of kids .5076 .7382 0 3 Husband’s age 40.0189 10.3895 20 76 Husband’s education 12.9811 2.7409 3 17 Notes: Sample size is N= 742 and T = 3: Labor force participation is an indicator equal to one if the woman worked (hours>0) in a given year. Age and education are measured in years. 89 Table 3.4: Estimation results of labor force participation for married women Reduced form Probit-exog. IVProbit SLS-exog. SLS for nwinc Independent Variable (1) (2) (3) (4) (5) Nonwife income -0.011*** -0.021*** (0.003) (0.008) Age 0.351 0.086 0.088 (0.669) (0.055) (0.054) Education 1.120** 0.147*** 0.154*** (0.480) (0.027) (0.029) Number of kids -0.410 -0.205*** -0.203*** (0.926) (.078) (.077) Husband’s age 0.068 (0.812) Husband’s education 2.413*** (0.387) ˆ b nwinc = ˆ b age -0.128 -0.240 -0.082 -0.239* (0.085) (0.168) [0.087] [0.171] [0.133] [0.133] ˆ b educ = ˆ b age 1.696 1.756 0.918 1.307* (1.100) (1.114) [1.075] [1.101] [0.635] [0.721] ˆ b kids = ˆ b age -2.370 -2.318 -1.960*** -2.502** (1.807) (1.742) [1.771] [1.719] [0.655] [1.188] Number of observations N= 742; T = 3 Notes: The main model and reduced form equation are specified in (3.53) and (3.54), respectively. All estimations include year indicators and time means of exogenous regressors. Time means of education variables were excluded due to no variation over time. Analytical clustered standard errors are in parentheses, and panel bootstrapped standard errors based on 1,000 replications are in brackets. *, **, *** indicate significance at the 10, 5, and 1 percent levels, respectively. 90 Figure 3.1: Average structural function for the first year of our panel 0 10 20 30 40 50 60 70 Non-wife income (in thousands of dollars) 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 Probability of working Corrected for endogeneity Not corrected for endogeneity Figure 3.2: Average structural function for the second year of our panel 0 10 20 30 40 50 60 70 Non-wife income (in thousands of dollars) 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 Probability of working Corrected for endogeneity Not corrected for endogeneity 91 Figure 3.3: Average structural function for the third year of our panel 0 10 20 30 40 50 60 70 Non-wife income (in thousands of dollars) 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 Probability of working Corrected for endogeneity Not corrected for endogeneity Figure 3.4: Average structural function averaged over all three yeas of our panel 0 10 20 30 40 50 60 70 Non-wife income (in thousands of dollars) 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 Probability of working Corrected for endogeneity Not corrected for endogeneity 92 References Ahn, Seung C and Alex R Horenstein (2013) “Eigenvalue ratio test for the number of factors,” Econometrica, 81 (3), 1203–1227. Alessi, Lucia, Matteo Barigozzi, and Marco Capasso (2010) “Improved penalization for determin- ing the number of factors in approximate factor models,” Statistics & Probability Letters, 80 (23-24), 1806–1813. Bai, Jushan (1997) “Estimation of a change point in multiple regression models,” Review of Eco- nomics and Statistics, 79 (4), 551–563. (2009) “Panel data models with interactive fixed effects,” Econometrica, 77 (4), 1229– 1279. (2013) “Fixed-effects dynamic panel models, a factor analytical method,” Econometrica, 81 (1), 285–314. Bai, Jushan and Serena Ng (2002) “Determining the number of factors in approximate factor mod- els,” Econometrica, 70 (1), 191–221. Beaudry, Paul and Gary Koop (1993) “Do recessions permanently change output?” Journal of Monetary economics, 31 (2), 149–163. Bloom, Nicholas, Mark Schankerman, and John Van Reenen (2013) “Identifying technology spillovers and product market rivalry,” Econometrica, 81 (4), 1347–1393. Blundell, Richard W and James L Powell (2004) “Endogeneity in semiparametric binary response models,” The Review of Economic Studies, 71 (3), 655–679. Borowiak, Dale (2001) “Linear models: least squares and alternatives.” Burda, Martin and Matthew Harding (2013) “Panel probit with flexible correlated effects: quanti- fying technology spillovers in the presence of latent heterogeneity,” Journal of Applied Econo- metrics, 28 (6), 956–981. Caner, Mehmet and Bruce E Hansen (2004) “Instrumental variable estimation of a threshold model,” Econometric Theory, 813–843. Card, David, Alexandre Mas, and Jesse Rothstein (2008) “Tipping and the Dynamics of Segrega- tion,” The Quarterly Journal of Economics, 123 (1), 177–218. 93 Chamberlain, Gary (1980) “Analysis of covariance with qualitative data,” The review of economic studies, 47 (1), 225–238. (2010) “Binary response models for panel data: Identification and information,” Econo- metrica, 78 (1), 159–168. Chan, Kung-Sik et al. (1993) “Consistency and limiting distribution of the least squares estimator of a threshold autoregressive model,” The annals of statistics, 21 (1), 520–533. Chen, Xiaohong, Oliver Linton, and Ingrid Van Keilegom (2003) “Estimation of semiparametric models when the criterion function is not smooth,” Econometrica, 71 (5), 1591–1608. Cheng, Xu, Frank Schorfheide, and Peng Shao (2019) “Clustering for Multi-Dimensional Hetero- geneity.” Chudik, Alexander, Kamiar Mohaddes, M Hashem Pesaran, Mehdi Raissi, and Alessandro Re- bucci (2020) “A counterfactual economic analysis of Covid-19 using a threshold augmented multi-country model,”Technical report, National Bureau of Economic Research. Clements, Michael P and Hans-Martin Krolzig (1998) “A comparison of the forecast performance of Markov-switching and threshold autoregressive models of US GNP,” The Econometrics Jour- nal, 1 (1), 47–75. Gonzalez, A, Timo Ter¨ asvirta, Dick Van Dijk, and Yukai Yang (2017) Panel smooth transition regression models. Gonz´ alez, Andr´ es, Timo Ter¨ asvirta, and Dick van Dijk (2004) “Panel smooth transition regres- sion model and an application to investment under credit constraints,” Unpublished manuscript, Stockholm School of Economics. Gonzalo, Jes´ us and Jean-Yves Pitarakis (2002) “Estimation and model selection based inference in single and multiple threshold models,” Journal of Econometrics, 110 (2), 319–352. Hahn, Jinyong and Hyungsik Roger Moon (2010) “Panel data models with finite number of multi- ple equilibria,” Econometric Theory, 863–881. Hansen, Bruce E (1996) “Inference when a nuisance parameter is not identified under the null hypothesis,” Econometrica: Journal of the econometric society, 413–430. (1999) “Threshold effects in non-dynamic panels: Estimation, testing, and inference,” Journal of econometrics, 93 (2), 345–368. (2000) “Sample splitting and threshold estimation,” Econometrica, 68 (3), 575–603. (2008) “Uniform convergence rates for kernel estimation with dependent data,” Econo- metric Theory, 726–748. (2011) “Threshold autoregression in economics,” Statistics and its Interface, 4 (2), 123– 127. 94 Hsiao, Cheng and Qiankun Zhou (2018) “Incidental parameters, initial conditions and sample size in statistical inference for dynamic panel data models,” Journal of Econometrics, 207 (1), 114– 128. (2019) “Panel parametric, semiparametric, and nonparametric construction of counter- factuals,” Journal of Applied Econometrics, 34 (4), 463–481. Ichimura, Hidehiko (1993) “Semiparametric least squares (SLS) and weighted SLS estimation of single-index models,” Journal of Econometrics, 58 (1-2), 71–120. Jiang, Bin, Yanrong Yang, Jiti Gao, and Cheng Hsiao (2020) “Recursive estimation in large panel data models: Theory and practice,” Journal of Econometrics. Kasahara, Hiroyuki and Katsumi Shimotsu (2009) “Nonparametric identification of finite mixture models of dynamic discrete choices,” Econometrica, 77 (1), 135–175. Kim, Young-Joo, Myung Hwan Seo, and Hyun-E Yeom (2020) “Estimating a breakpoint in the pattern of spread of COVID-19 in South Korea,” International Journal of Infectious Diseases, 97, 360–364. Klein, Roger W and Richard H Spady (1993) “An efficient semiparametric estimator for binary response models,” Econometrica: Journal of the Econometric Society, 387–421. Kourtellos, Andros, Thanasis Stengos, and Chih Ming Tan (2014) “Structural threshold regres- sion,” Econometric Theory, Forthcoming. Liao, Qin, Peter CB Phillips, and Ping Yu (2015) “Inferences and Specification Testing in Thresh- old Regression with Endogeneity,”Technical report, mimeo, HKU. Linton, Oliver, Stefan Sperlich, Ingrid Van Keilegom et al. (2008) “Estimation of a semiparametric transformation model,” Annals of Statistics, 36 (2), 686–718. Lu, Xun and Liangjun Su (2016) “Shrinkage estimation of dynamic panel data models with inter- active fixed effects,” Journal of Econometrics, 190 (1), 148–175. Manski, Charles F (1985) “Semiparametric analysis of discrete response: Asymptotic properties of the maximum score estimator,” Journal of econometrics, 27 (3), 313–333. (1988) “Identification of binary response models,” Journal of the American statistical Association, 83 (403), 729–738. Moon, Hyungsik Roger and Martin Weidner (2015) “Linear regression for panel with unknown number of factors as interactive fixed effects,” Econometrica, 83 (4), 1543–1579. (2017) “Dynamic linear panel regression models with interactive fixed effects,” Econo- metric Theory, 33 (1), 158–195. (2018) “Nuclear norm regularized estimation of panel regression models,” arXiv preprint arXiv:1810.10987. 95 Mundlak, Yair (1978) “On the pooling of time series and cross section data,” Econometrica: jour- nal of the Econometric Society, 69–85. Newey, KW (1994) “Large sample estimation and hypothesis.” Newey, Whitney K (1994) “Kernel estimation of partial means and a general variance estimator,” Econometric Theory, 233–253. Pesaran, M Hashem (2006) “Estimation and inference in large heterogeneous panels with a multi- factor error structure,” Econometrica, 74 (4), 967–1012. Ram´ ırez-Rond´ an, NR (2020) “Maximum likelihood estimation of dynamic panel threshold mod- els,” Econometric Reviews, 39 (3), 260–276. Rivers, Douglas and Quang H Vuong (1988) “Limited information estimators and exogeneity tests for simultaneous probit models,” Journal of econometrics, 39 (3), 347–366. Rothe, Christoph (2009) “Semiparametric estimation of binary response models with endogenous regressors,” Journal of Econometrics, 153 (1), 51–64. Semykina, Anastasia and Jeffrey M Wooldridge (2018) “Binary response panel data models with sample selection and self-selection,” Journal of Applied Econometrics, 33 (2), 179–197. Seo, Myung Hwan and Yongcheol Shin (2016) “Dynamic panels with threshold effect and endo- geneity,” Journal of Econometrics, 195 (2), 169–186. Smith, Richard J and Richard W Blundell (1986) “An exogeneity test for a simultaneous equation Tobit model with an application to labor supply,” Econometrica: Journal of the Econometric Society, 679–685. Song, Wei (2017) “Least square estimation of semiparametric binary response model with endo- geneity.” Sun, Yixiao (2005) “Estimation and inference in panel structure models,” Available at SSRN 794884. Tong, Howell (1978) “On a threshold model.” Vaart, AW van der and Jon A Wellner (1997) “Weak convergence and empirical processes with applications to statistics,” Journal of the Royal Statistical Society-Series A Statistics in Society, 160 (3), 596–608. Wooldridge, Jeffrey M (2015) “Control function methods in applied econometrics,” Journal of Human Resources, 50 (2), 420–445. Xu, Yiqing (2017) “Generalized synthetic control method: Causal inference with interactive fixed effects models,” Political Analysis, 25 (1), 57–76. Yanai, Haruo, Kei Takeuchi, and Yoshio Takane (2011) “Projection matrices,” in Projection Ma- trices, Generalized Inverse Matrices, and Singular Value Decomposition, 25–54: Springer. 96 Yu, Ping (2013) “Inconsistency of 2SLS estimators in threshold regression with endogeneity,” Economics Letters, 120 (3), 532–536. Yu, Ping, Qin Liao, and Peter CB Phillips (2019) “New Control Function Approaches in Threshold Regression with Endogeneity,”Technical report, mimeo. Yu, Ping and Peter CB Phillips (2018) “Threshold regression with endogeneity,” Journal of Econo- metrics, 203 (1), 50–68. 97 Appendices A Appendix to Chapter 1 Proof of Proposition 1.3.1. Consider model (1.16). by multiplying both sides of equation by M 1 , we can get M 1 Y= M 1 X 1 b 1 + M 1 X 2 b 2 + M 1 X 1 (g 1 )d 1 + M 1 X 2 (g 2 )d 2 + M 1 U Since M 1 = I NT P 1 P 0 1 P 1 1 P 0 1 and P 1 spans the linear space that contains X 1 and X 1 (g 1 ), we can therefore filter out model (1.16) as M 1 Y= M 1 X 2 b 2 + M 1 X 2 (g 2 )d 2 + M 1 U which is indeed ˜ Y 1 = ˜ X 1 b 1 + ˜ X 1 (g 1 )d 1 + ˜ U 1 where ˜ U 1 = M 1 U. It is obvious that minimizing S 1 N (q) can be equivalent to minimizing the following equation S 1 N (q) ˜ U 00 1 ˜ U 0 1 N = 1 N ˜ Y 0 1 (I NT P 1 (g)) ˜ Y 1 1 N ˜ U 00 1 ˜ U 0 1 = 1 N d 00 1 ˜ X 1 (g 0 1 ) 0 (I NT P 1 (g)) ˜ X 1 (g 0 1 )d 0 1 + 2 N d 00 1 ˜ X 1 (g 0 1 ) 0 ˜ U 0 1 = d 00 1 R 1 (g)d 0 1 + o p (1) 98 where R 1 (g)= 1 N ˜ X 1 (g 0 1 ) 0 (I NT P 1 (g)) ˜ X 1 (g 0 1 ), Consider the caseg2[g 0 1 ; ¯ g], then R 1 (g) = 1 N ˜ X 1 (g 0 1 ) 0 ˜ X 1 (g 0 1 ) 1 N ˜ X 1 (g 0 1 ) 0 ˜ X 1 (g) ˜ X 1 (g) 0 ˜ X 1 (g) 1 ˜ X 1 (g) 0 ˜ X 1 (g 0 1 ) = 1 N ˜ X 1 (g 0 1 ) 0 ˜ X 1 (g 0 1 ) ˜ X 1 (g 0 1 ) 0 ˜ X 1 (g 0 1 ) N ˜ X 1 (g) 0 ˜ X 1 (g) N 1 ˜ X 1 (g 0 1 ) 0 ˜ X 1 (g 0 1 ) N p ! D 1 (g 0 1 ) D 1 (g 0 1 )D 1 (g) 1 D 1 (g 0 1 ) And by taking the derivative of D 1 (g) 1 with respect tog, we have dD 1 (g) 1 dg =D 1 (g) 1 dD 1 (g) dg D 1 (g) 1 where dD 1 (g) dg = N å i=1 T å t=1 E ˜ x 1 it ˜ x 10 it jq it =g f it (g) which is positive definite. Therefore for g2[g 0 1 ; ¯ g] the limit of function R 1 (g) is increasing in g. Similarly we can show the limit of R 1 (g) is decreasing in g when g2[g;g 0 1 ). Therefore g 0 1 minimizes the limit of criterion function S 1 (q). Since ˆ g 1 is the minimizer of S 1 (q), by Newey (1994), it indicates ˆ g 1 p !g 0 1 . By the symmetry of our model, we can infer the convergence for ˆ g 2 in the same way. Now with the consistency of threshold parameter, we can show the consistency of coefficients straightly as the remaining part is just linear model. Proof of Proposition 1.3.1. We just show the proof for convergence rate of ˆ g 1 as the procedure is the same for ˆ g 2 . The convergence rate of threshold estimator is proved by showing that S 1 N (g) S 1 N (g 0 1 ) 0 ifg2 g;g 0 B N [ g 0 + B N ; ¯ g where B can be any positive constant. This is equivalent tog2(g 0 B N ;g 0 + B N ) if S 1 N (g) S 1 N (g 0 1 )< 0 which is the fact based on our estimator. To show it holds, supposeg <g 0 1 , then we can decompose S 1 N (g) S 1 N (g 0 1 ) as S 1 N (g) S 1 N (g 0 1 )=(S 1 N (g) S 1 N (g;g 0 1 ))(S 1 N (g 0 1 ) S 1 N (g;g 0 1 )) (A.1) 99 where S 1 N (g;g 0 1 ) is the concentrated sum of squared errors function from the following specification ˜ Y 1 = ˜ X 1 (g)r 1 + ˜ X 1 (g;g 0 1 )r 2 + ˜ X + 1 (g 0 1 )r 3 + ˜ U 1 (A.2) where ˜ X 1 (g;g 0 1 )= ˜ X 1 (g 0 1 ) ˜ X 1 (g) and ˜ X + 1 (g 0 1 )= ˜ X 1 ˜ X 1 (g 0 1 ). This specification can therefore regard S 1 N (g) as concentrated sum of squared errors function from (A.2) by restricting r 2 =r 3 . Then by the fact that ˜ X 1 (g;g 0 1 ) and ˜ X + 1 (g 0 1 ) are orthogonal and straight algebra the first part of (A.1) can be represented by function of ˆ r 2 and ˆ r 3 , namely S 1 N (g) S 1 N (g;g 0 1 )=( ˆ r 2 ˆ r 3 ) 0 ˜ X 1 (g;g 0 1 ) 0 ˜ X 1 (g;g 0 1 )(Z 0 1 Z 1 ) 1 ˜ X + 1 (g 0 1 ) 0 ˜ X + 1 (g 0 1 )( ˆ r 2 ˆ r 3 ) (A.3) where Z 1 = ˜ X 1 (g;g 0 1 )+ ˜ X + 1 (g 0 1 ). Similarly S 1 N (g 0 1 ) can be regarded as concentrated sum of squared errors function from (A.2) by restricting r 1 =r 2 . Then by the fact that ˜ X 1 (g) and ˜ X 1 (g;g 0 1 ) are orthogonal and straight algebra the second part of (A.1) can be represented by function of ˆ r 1 and ˆ r 2 , namely S 1 N (g 0 1 ) S 1 N (g;g 0 1 )=( ˆ r 1 ˆ r 2 ) 0 ˜ X 1 (g) 0 ˜ X 1 (g)(Z 0 2 Z 2 ) 1 ˜ X 1 (g;g 0 1 ) 0 ˜ X 1 (g;g 0 1 )( ˆ r 1 ˆ r 2 ) (A.4) where Z 2 = ˜ X 1 (g)+ ˜ X 1 (g;g 0 1 ). Now we can divide both sides of (A.1) by N(gg 0 1 ) and decompose as S 1 N (g) S 1 N (g 0 1 ) N(gg 0 1 ) = ( ˆ r 2 ˆ r 3 ) 0 ˜ X 1 (g;g 0 1 ) 0 ˜ X 1 (g;g 0 1 )(Z 0 1 Z 1 ) 1 ˜ X + 1 (g 0 1 ) 0 ˜ X + 1 (g 0 1 ) N(gg 0 1 ) ( ˆ r 2 ˆ r 3 ) ( ˆ r 1 ˆ r 2 ) 0 ˜ X 1 (g) 0 ˜ X 1 (g)(Z 0 2 Z 2 ) 1 ˜ X 1 (g;g 0 1 ) 0 ˜ X 1 (g;g 0 1 ) N(gg 0 1 ) ( ˆ r 1 ˆ r 2 ) (A.5) 100 Now taking the advantage of orthogonality between ˜ X 1 (g;g 0 1 ) and ˜ X + 1 (g 0 1 ), we have ˜ X 1 (g;g 0 1 ) 0 Z 1 = ˜ X 1 (g;g 0 1 ) 0 ˜ X 1 (g;g 0 1 ). Then as ˜ X + 1 (g 0 1 )= Z 1 ˜ X 1 (g;g 0 1 ), we have ˜ X 1 (g;g 0 1 ) 0 ˜ X 1 (g;g 0 1 )(Z 0 1 Z 1 ) 1 ˜ X + 1 (g 0 1 ) 0 ˜ X + 1 (g 0 1 )= ˜ X 1 (g;g 0 1 ) 0 (I Z 1 (Z 0 1 Z 1 ) 1 Z 0 1 ) ˜ X 1 (g;g 0 1 ) which allows us to decompose (A.5) as S 1 N (g) S 1 N (g 0 1 ) N(gg 0 1 ) = ( ˆ r 2 ˆ r 3 ) 0 ˜ X 1 (g;g 0 1 ) 0 ˜ X 1 (g;g 0 1 ) N(gg 0 1 ) ( ˆ r 2 ˆ r 3 ) ( ˆ r 2 ˆ r 3 ) 0 ˜ X 1 (g;g 0 1 ) 0 (Z 1 (Z 0 1 Z 1 ) 1 Z 0 1 ) ˜ X 1 (g;g 0 1 ) N(gg 0 1 ) ( ˆ r 2 ˆ r 3 ) ( ˆ r 1 ˆ r 2 ) 0 ˜ X 1 (g) 0 ˜ X 1 (g)(Z 0 2 Z 2 ) 1 ˜ X 1 (g;g 0 1 ) 0 ˜ X 1 (g;g 0 1 ) N(gg 0 1 ) ( ˆ r 1 ˆ r 2 ) (A.6) By the setups of variables and parameters we assume before, the true model should be ˜ Y 1 = ˜ X 1 (g 0 1 )f 0 1 + ˜ X + 1 (g 0 1 )f 0 2 + ˜ U 1 (A.7) wheref 0 1 =b 0 1 +d 0 1 andf 0 2 =b 0 1 . we can deduce the closed form value of( ˆ r 1 ; ˆ r 2 ; ˆ r 3 ) as ˆ r 1 = f 0 1 + ˜ X 1 (g 0 1 ) 0 ˜ X 1 (g 0 1 ) N 1 ˜ X 1 (g 0 1 ) 0 ˜ U 1 N (A.8) ˆ r 2 = f 0 1 + ˜ X 1 (g;g 0 1 ) 0 ˜ X 1 (g;g 0 1 ) N 1 ˜ X 1 (g;g 0 1 ) 0 ˜ U 1 N (A.9) ˆ r 3 = f 0 2 + ˜ X + 1 (g 0 1 ) 0 ˜ X + 1 (g 0 1 ) N 1 ˜ X + 1 (g 0 1 ) 0 ˜ U 1 N (A.10) Under our assumptions, ˆ r 1 ˆ r 2 = o p (1) and ˆ r 2 ˆ r 3 =d 0 1 + o p (1): 101 Also we have ˜ X(g;g 0 1 ) 0 ˜ X(g;g 0 1 ) N(gg 0 1 ) = O p (1) by simple Taylor expansion. Then the third term in (A.6) is arbitrarily small. As for the second term, we can show ˜ X 1 (g;g 0 1 ) 0 (Z 1 (Z 0 1 Z 1 ) 1 Z 0 1 ) ˜ X 1 (g;g 0 1 ) N(gg 0 1 ) = ˜ X 1 (g;g 0 1 ) 0 Z 1 N(gg 0 1 ) Z 0 1 Z 1 T 1 Z 0 1 ˜ X 1 (g;g 0 1 ) N(gg 0 1 ) (gg 0 1 ) which therefore indicates ˜ X 1 (g;g 0 1 ) 0 (Z 1 (Z 0 1 Z 1 ) 1 Z 0 1 ) ˜ X 1 (g;g 0 1 ) N(gg 0 1 ) ˜ X 1 (g;g 0 1 ) 0 Z 1 N(gg 0 1 ) Z 0 1 Z 1 T 1 Z 0 1 ˜ X 1 (g;g 0 1 ) N(gg 0 1 ) (gg 0 1 ); therefore the second term is also arbitrarily small. Compared to that, the first term is bounded by some constant multiplied bykd 0 1 k 2 , therefore S 1 N (g) S 1 N (g 0 1 )> 0 wheng2(g;g 0 1 B N ). Similarly we can show the inequality also holds wheng2(g 0 1 + B N ). Therefore our proof is completed. Proof of Theorem 1.3.2. The estimator ˆ q can be written as ˆ q = ¯ X( ˆ g 1 ; ˆ g 2 ) 0 ¯ X( ˆ g 1 ; ˆ g 2 ) 1 ¯ X( ˆ g 1 ; ˆ g 2 ) 0 Y (A.11) where ¯ X( ˆ g 1 ; ˆ g 2 )=(X 1 ;X 2 ;X ( ˆ g 1 );X 2 ( ˆ g 2 )): Therefore the closed form of the estimator can be rewritten as ˆ q = ¯ X( ˆ g 1 ; ˆ g 2 ) 0 ¯ X( ˆ g 1 ; ˆ g 2 ) 1 ¯ X( ˆ g 1 ; ˆ g 2 )( ¯ X(g 0 1 ;g 0 2 )q 0 + U) = q 0 + X( ˆ g 1 ; ˆ g 2 ) 0 X( ˆ g 1 ; ˆ g 2 ) 1 X( ˆ g 1 ; ˆ g 2 )[(X(g 0 1 ;g 0 2 ) X( ˆ g 1 ; ˆ g 2 ))q 0 + U] 102 As we show the convergence rates of ˆ g 1 and ˆ g 2 are both O p (1=N), it is straightforward to infer ¯ X(g 0 1 ;g 0 2 ) ¯ X( ˆ g 1 ; ˆ g 2 ) = O p (1=N). Then p N( ˆ qq 0 ) = 1 N ¯ X( ˆ g 1 ; ˆ g 2 ) 0 ¯ X( ˆ g 1 ; ˆ g 2 ) 1 1 p N ¯ X( ˆ g 1 ; ˆ g 2 ) 0 [( ¯ X(g 0 1 ;g 0 2 ) ¯ X( ˆ g 1 ; ˆ g 2 ))q 0 + U] = 1 N ¯ X(g 0 1 ;g 0 2 ) 0 ¯ X(g 0 1 ;g 0 2 )+ o P (1) 1 1 p N ¯ X(g 0 1 ;g 0 2 ) 0 U+ o p (1) Then the asymptotic distribution in Theorem 1.3.2 follows. Lemma A.1. X 0 j M j X j (g) N = X j (g) 0 M j X j (g) N + o p (1) Proof. X 0 j M j X j (g) = X j (g)+ X + j (g) 0 M j X j (g) = X j (g) 0 M j X j (g)+ X + j (g) 0 M j X j (g) = X j (g) 0 M j X j (g)+ X + j (g) 0 P j P 0 j P j 1 P 0 j X j (g) where we take the advantage of fact that X + j (g) 0 X j (g)= 0. Then, X 0 j M j X j (g) N = X j (g) 0 M j X j (g) N + X + j (g) 0 P j N P 0 j P j N ! 1 P 0 j X j (g) N (A.12) if X 0 j and P j are independent, then X + j (g) 0 P j P 0 j P j 1 P 0 j X j (g)=N= o p (1). If X 0 j and P j are weakly dependent, without loss of generality we can set X j (g)= P j r+ v j ; 103 then the least squares estimator is ˆ r = P 0 j P j N ! 1 P 0 j X j (g) N which indicates X j (g) P j ˆ r = v j P j P 0 j P j 1 P 0 j v j (A.13) Now we have the second term of (A.12) can be decomposed to X + j (g)X j (g) N X + j (g) 0 v j P j P 0 j P j 1 P 0 j v j N (A.14) The first term is obviously zero. If v j ? P j , together with E(v j )= 0 it can be indicated that the second term will be o p (1). If v j 6? P j , we can infer v j = P 0 j h+ h j such that h j ? P j which is usually used in instrumental literature. Therefore the second term of (A.14) is X + j (g) 0 h j P j P 0 j P j 1 P 0 j h j N which should be o p (1). B Appendix to Chapter 2 LetkAk= p tr(A 0 A): In what follows, we assume Assumption A1: E kf t k 4 C<¥; and T 1 F 0 F= T 1 å T t=1 f t f 0 t !S f as T!¥; whereS f is a positive definite matrix of rank p: 104 Assumption A2:kg i k C <¥; and N 1 G 0 G= N 1 å N i=1 g i g 0 i ! D as N!¥; where D is a positive definite matrix of rank p: Assumption A3: There exists a positive constant C<¥ such that for all N and T; (a) E(u it )= 0 and E u 8 it <¥: (b) Letr i (s;t)= E(u it u is ); and E(u 0 s u t )= E N 1 å N i=1 u it u is = ¯ r N (s;t) with u t =(u 1t ;:::;u Nt ) 0 ; thenjr i (s;s)j C<¥ for all i and s and(NT) 1 å T s=1 å T t=1 å N i=1 jr i (s;t)j C<¥; (c) Lett i j;t = E u it u jt with t i j;t t i j <¥ for all t and N 1 å N i=1 å N j=1 t i j C<¥; (d) Lett i j;ts = E u it u js , and(NT) 1 å N i=1 å N j=1 å T s=1 å T t=1 t i j;ts C<¥; (e) For every(s;t); E N 1=2 å N i=1 [u it u is E(u it u is )] 4 C<¥. Assumption A4: The idiosyncratic errors u it are independent of ˜ X; F andG: Assumption A5: None of the vectors in F can be rewritten as a linear combination of vectors in ˜ X: Assumption A6: T >(Nk+ r) or N >(T k+ r): Assumption A7: E kx it k 4 <¥ for all i;t: B.1 Recursively Iterating Procedure For the interactive procedure to determine the number of factors proposed in Section 2.4, it is able to provide a consistent estimator for the number of unobserved factors as shown below. Proposition B.1. Suppose assumptions A1-A4 hold and let ˆ p iter be the number of factors selected by the above recursively iterating procedure, then lim N;T!¥ Pr( ˆ p iter = p)= 1 if (i) g(N;T)! 0 and (ii) C 2 NT g(N;T)!¥ as N;T!¥, where C NT = minf p N; p Tg. ProofofPropositionB.1. Let ˆ v it = y it x 0 it ˆ b itr denotes the residual based on the recursive estima- tion ofb; denoted as ˆ b itr ; we also let ˆ v i =( ˆ v i1 ;:::; ˆ v iT ) 0 ; then ˆ v i = y i X 0 i ˆ b itr : (B.1) 105 If ˆ b itr is a consistent estimator forb; then ˆ v i = F 0 g 0 i + u i + o p (1); (B.2) and thus ˆ v i reduces an approximate pure factor mode. Consequently the methods of both IC of Bai and Ng (2002) can provide consistent estimator of p based on (B.2). For our proposed iteratively iterating procedure, we note that for the estimated number of factor in the j-th iteration, denoted as ˆ p; will only result in two cases: (a) ˆ p p and (b) ˆ p< p: To facilitate our analysis, we define the true SSR (which is used in the information criterion) as V(p) = 1 NT N å i=1 (y i X 0 i b F 0 g 0 i ) 0 (y i X 0 i b F 0 g 0 i ) ! p plim (N;T)!¥ 1 NT N å i=1 u 0 i u i : (B.3) We also define the SSR using ˆ p factors as V( ˆ p)= 1 NT N å i=1 (y i X 0 i ˆ b b ¯ F b ¯ g i ) 0 (y i X 0 i ˆ b b ¯ F b ¯ g i ); (B.4) where ˆ b; b ¯ F; b ¯ g i are the least squares estimation of (1.1) and (1.2) under the assumption that ¯ F is of dimension ˆ p: For case (a) when ˆ p p; as shown by Moon and Weidner (2015) that ˆ b! p b and b ¯ F! p F 0 ; ˘ F where ˘ F is orthogonal to F 0 : As a result, V( ˆ p)! p V(p); as (N;T)!¥; and thus the penalty function will lead to a decrease in ˆ p: Consequently, the methods of both IC of Bai and Ng (2002) and ER of Ahn and Horenstein (2013) can provide consistent estimator of p: 106 For case (b) when ˆ p< p; we note that V( ˆ p) = 1 NT N å i=1 (y i X 0 i ˆ b itr b ¯ F b ¯ g i ) 0 (y i X 0 i ˆ b itr b ¯ F b ¯ g i ) 1 NT N å i=1 (y i X 0 i ˆ b itr b ¯ F b ¯ g i ) 0 M b ¯ F (y i X 0 i ˆ b itr b ¯ F b ¯ g i ) 1 NT N å i=1 X i b ˆ b itr + F 0 g 0 i + u i 0 M b ¯ F X i b ˆ b itr + F 0 g 0 i + u i ; (B.5) where M b ¯ F = I T T 1b ¯ F b ¯ F 0 : When(N;T)!¥; we note that (B.5) converges to plim (N;T)!¥ 1 NT N å i=1 u 0 i M b ¯ F u i + plim (N;T)!¥ 1 NT N å i=1 X i b ˆ b itr + F 0 g 0 i 0 M b ¯ F X i b ˆ b itr + F 0 g 0 i ; under Assumption A1. It is obvious that 1 NT N å i=1 X i b ˆ b itr + F 0 g 0 i 0 M b ¯ F X i b ˆ b itr + F 0 g 0 i 0; (B.6) and for each i; X i b ˆ b itr + F 0 g 0 i 0 M b ¯ F X i b ˆ b itr + F 0 g 0 i 0: (B.7) We observe that (B.6) is greater than 0 if (B.7) is greater than 0 for each i: However, if ˆ p< p; then ˆ b itr is inconsistent, i.e.,b ˆ b itr = c6= 0: So for (B.7) to be exactly 0, we need X i c=F 0 g 0 i : (B.8) If T > p; there is no way (B.8) can hold since X i is observed while F 0 ; although unobserved, is a fixed T p matrix. Since plim (N;T)!¥ 1 NT N å i=1 u 0 i M b ¯ F u i = plim (N;T)!¥ 1 NT N å i=1 u 0 i u i + plim (N;T)!¥ 1 NT 2 N å i=1 u 0 i b ¯ F b ¯ F 0 u i ; (B.9) 107 and the second term of (B.9) is O p T 1 = o p (1) under Assumption A1. In other words, when ˆ p< p; we have V( ˆ p)> V(p): (B.10) Since the penalty function goes to zero as(N;T)!¥; the inequality (B.10) means that ˆ p has to increase based on the information selection criterion. Either ˆ p reaches p; or reaches a pre-specified maximum number of iteration, the iteration will stop, and we can follow either Bai and Ng (2002) or Ahn and Horenstein (2013) to establish the consistency of ˆ p itr under Assumption A1-A4 as required. B.2 Orthogonal Projection Method For the orthogonal projection method to estimate the number of factors, we let p max be a priorly chosen integer such that p p max , and let ˆ p pro j = argmin 0kp max IC P (k); (B.11) be the number of estimated dimension of factors, where IC P refers to the information criterion defined as (2.6)-(2.8) and using the projected model (4.6) or (4.10). Proposition B.2. Suppose assumptions A1-A6 hold and let ˆ p pro j be defined in (B.11), then as (N;T)!¥; lim N;T!¥ Pr ˆ p pro j = p = 1; where g(N;T)! 0 and C 2 NT g(N;T)!¥ as(N;T)!¥, and C NT = minf p N; p Tg. Before we proceed with the proof of the Proposition B.2, we first provide some lemmas. Lemma B.1. Under Assumptions A1, A2 0 ;A3-A4, for all N and T; we have (i) Define ¯ r N (s;t)= E 1 N å N i=1 u it u is , where u it is the t-th element of u i = M T u i ; then T 1 T å t=1 T å s=1 j ¯ r N (s;t)j C<¥ 108 and T 1 T å t=1 T å s=1 ¯ r 2 N (s;t) C<¥; (ii) E T 1 å T t=1 N 1=2 G 00 u t 2 = E T 1 å T t=1 N 1=2 å N i=1 u it g 0 i 2 C<¥; where u t = u 1t ;:::;u Nt 0 ; (iii) E (NT) 1=2 å T t=1 å N i=1 u it g 0 i C<¥; (iv) 1 NT 2 å N i=1 ky i k 4 = O p (1); (v) For every(s;t); E N 1=2 å N i=1 [u it u is E(u it u is )] C<¥: Proof. (i) Under Assumption A1(b) and from Bai and Ng (2002), we have T 1 å T s=1 å T t=1 j ¯ r N (s;t)j C<¥ and(NT) 1 å T s=1 å T t=1 å N i=1 r 2 i (s;t) C<¥; we also have T 1 å T t=1 å T s=1 ¯ r 2 N (s;t)<¥. To show our results, we note that u it u is can be written as u it u is = u 0 i t Tt t 0 T s u i wheret Tt is a T1 vec- tor of zeros except the t-th element is 1 andt T s is a T 1 vector of zeros except the s-th element is 1: For notational simplicity, we let M T = I T P X with P X = ˜ X ˜ X 0 ˜ X ˜ X 0 being the orthogonal projectors onto space spanned by ˜ X; then we have jr N (s;t)j = E 1 N N å i=1 u 0 i t Tt t 0 T s u i ! = E 1 N N å i=1 u 0 i M T t Tt t 0 T s M T u i ! = E 1 N N å i=1 u 0 i (I T P X )t Tt t 0 T s (I T P X )u i ! = 1 N N å i=1 E u 0 i t Tt t 0 T s u i u 0 i t Tt t 0 T s P X u i u 0 i P X t Tt t 0 T s u i + u 0 i P X t Tt t 0 T s P X u i 1 N N å i=1 E u 0 i t Tt t 0 T s u i + 1 N N å i=1 E u 0 i t Tt t 0 T s P X u i + 1 N N å i=1 E u 0 i P X t Tt t 0 T s u i + 1 N N å i=1 E u 0 i P X t Tt t 0 T s P X u i ; (B.12) 109 then it suffices to show that last three items are bounded over s and t. For the first one, 1 T T å s=1 T å t=1 1 N N å i=1 E u 0 i P X t Tt t 0 T s u i 1 NT T å s=1 T å t=1 N å i=1 E u 0 i P X t Tt t 0 T s u i 1 NT T å s=1 T å t=1 P X t Tt t 0 T s E u i u 0 i 1 NT T å s=1 T å t=1 t Tt t 0 T s E u i u 0 i 1 NT T å s=1 T å t=1 N å i=1 jE(u it u is )j = O(1); under Assumption A1, and the third inequality follows Page 50 of Yanai et al. (2011). Similarly, we can show that 1 T å T s=1 å T t=1 1 N å N i=1 E(u 0 i t Tt t 0 T s P X u i ) O(1): For the last term, we have 1 T T å s=1 T å t=1 1 N N å i=1 E u 0 i P X t Tt t 0 T s P X u i 1 NT T å s=1 T å t=1 N å i=1 E u 0 i P X t Tt t 0 T s P X u i 1 NT T å s=1 T å t=1 N å i=1 P X t Tt t 0 T s P X E u i u 0 i 1 NT T å s=1 T å t=1 N å i=1 t Tt t 0 T s P X E u i u 0 i 1 NT T å s=1 T å t=1 N å i=1 P X E u i u 0 i t Tt t 0 T s 1 NT T å s=1 T å t=1 N å i=1 jE(u it u is )j = O(1): Consequently, we have shown that 1 T T å t=1 T å s=1 j ¯ r N (s;t)j 1 T T å t=1 T å s=1 j ¯ r N (s;t)j+ O(1)= O(1); 110 as required. Similarly, after taking squares on the both sides of (B.12), then 1 T T å t=1 T å s=1 r 2 N (s;t) C 1 T T å t=1 T å s=1 ¯ r 2 N (s;t)+ 1 NT T å s=1 T å t=1 N å i=1 r 2 i (s;t) ! = O(1); under Assumption A1 and by using the fact that (NT) 1 T å s=1 T å t=1 N å i=1 r 2 i (s;t)= O(1) and T 1 T å t=1 T å s=1 ¯ r 2 N (s;t)= O(1): (ii) Consider E 0 @ T 1 T å t=1 N 1=2 N å i=1 u it g 0 i 2 1 A = 1 NT T å t=1 E 0 @ N å i=1 u it g 0 i 2 1 A = 1 NT T å t=1 E N å i; j=1 u it u jt g 00 i g 0 j ! = 1 NT T å t=1 E N å i=1 u 2 it g 0 i ! = 1 NT N å i=1 g 0 i 2 E u 0 i u i C N N å i=1 g 0 i 2 1 T E u 0 i u i = O(1); by using the fact that 1 N å N i=1 g 0 i 2 = O(1) and 1 T E(u 0 i u i )= O(1) under Assumption A1 and A3. Consequently, we have E T 1 å T t=1 N 1=2 G 00 u t 2 = O(1) as required. 111 (iii) We note that E (NT) 1=2 T å t=1 N å i=1 u it g 0 i ! = 1 NT N å i; j=1 T å s;t=1 E u it u js g 00 i g 0 j = 1 NT N å i=1 g 0 i T å s;t=1 E(u it u is ) C 1 T T å s;t=1 1 N N å i=1 E(u it u is ) C 1 T T å s;t=1 jr N (s;t)j = O(1); by using the result of (i) and Assumption A3. Thus, we have proved (iii) as required. (iv) It suffices to show that T 1=2 y i 2 = O p (1): To this end, we note that y i = F g 0 i + u i ; then T 1=2 y i 2 = 1 T y 0 i y i = 1 T F g 0 i + u i 0 F g 0 i + u i = g 00 i g 0 i + 1 T u 0 i u i + o p (1) g 00 i g 0 i + C T u 0 i u i + o p (1) C g 0 i 2 + C T u 0 i u i + o p (1) = O p (1); under Assumption A1, A2 0 that T 1 F 0 F = I p and A3 that g 0 i C: 112 (v) Using the notations in (i), we obtain N 1=2 N å i=1 [u it u is E(u it u is )] = N 1=2 N å i=1 u 0 i t Tt t 0 T s u i E u 0 i t Tt t 0 T s u i = N 1=2 N å i=1 u 0 i (I T P X )t Tt t 0 T s (I T P X )u i E u 0 i (I T P X )t Tt t 0 T s (I T P X )u i = A 1 + A 2 + A 3 + A 4 ; (B.13) where A 1 = N 1=2 N å i=1 u 0 i t Tt t 0 T s u i E u 0 i t Tt t 0 T s u i ; A 2 = N 1=2 N å i=1 u 0 i P X t Tt t 0 T s u i E u 0 i P X t Tt t 0 T s u i ; A 3 = N 1=2 N å i=1 u 0 i t Tt t 0 T s P X u i E u 0 i t Tt t 0 T s P X u i ; A 4 = N 1=2 N å i=1 u 0 i P X t Tt t 0 T s P X u i E u 0 i P X t Tt t 0 T s P X u i : To show the required results, we need to show E A j = O(1); for j= 1;2;3;4: We first have E(jA 1 j)= O(1); under Assumption A1. Then we need to show that E A j = O(1); for j= 2;3;4: To this end, we note that for A 2 ; we have A 2 = N 1=2 N å i=1 x 2i withx 2i = u 0 i P X t Tt t 0 T s u i E u 0 i P X t Tt t 0 T s u i : 113 It is obvious that E(x 2i )= 0; and E x 2 2i = E u 0 i P X t Tt t 0 T s u i E u 0 i P X t Tt t 0 T s u i 2 = E u 0 i P X t Tt t 0 T s u i u 0 i P X t Tt t 0 T s u i E u 0 i P X t Tt t 0 T s u i 2 = O(1); since u 0 i P X t Tt t 0 T s u i u 0 i P X t Tt t 0 T s u i = P X t Tt t 0 T s u i u 0 i P X t Tt t 0 T s u i u 0 i t Tt t 0 T s u i u 0 i P X t Tt t 0 T s u i u 0 i P X t Tt t 0 T s u i u 0 i t Tt t 0 T s u i u 0 i t Tt t 0 T s u i u 0 i t Tt t 0 T s u i u 0 i ; from page 50 of Yanai et al. (2011), thus E u 0 i P X t Tt t 0 T s u i u 0 i P X t Tt t 0 T s u i E t Tt t 0 T s u i u 0 i t Tt t 0 T s u i u 0 i E u 2 it u 2 is = O(1); under Assumption A1. Similarly, we have E(u 0 i P X t Tt t 0 T s u i )= O(1) using the same argument. Thus, we have E(A 2 )= 0 and E A 2 2 = O(1): By the Liapunov’s inequality, we obtain EjA 2 j E A 2 2 1=2 = O(1): Similarly, we can show EjA 3 j O(1); and EjA 4 j O(1): 114 Substituting the results into (B.13) yields E N 1=2 N å i=1 [u it u is E(u it u is )] 4 å j=1 E A j = O(1); as required. The following proposition provide the relation between the estimated ˆ F (k) and the true un- known composite factors F 0 = M T F 0 . Lemma B.2. Under Assumption A1, A2 0 ; A3-A4, for any fixed k 1; there exists a(p k) matrix H (k) with rank H (k) = min(p;k) and C NT = min p N; p T ; such that C 2 NT 1 T T å t=1 ˆ f (k) t H (k)0 f t;0 2 ! = O p (1): ProofofLemmaB.2. Since neither F 0 norg 0 i are uniquely determined, we use the mathematical identities ˆ F (k) = N 1 Y ˜ G (k) and ˜ G (k) = T 1 Y 0 ˜ F (k) by assuming there are k unknown factors in the model. We also have 1 T å T t=1 ˜ f t 2 = O p (1) from the normalization 1 T ˜ F (k)0 ˜ F (k) = I k assumed in the estimation: Then, for H (k)0 = ˆ F (k)0 F 0 =T G 00 G 0 =N ; we have ˆ f (k) t H (k)0 f t;0 = 1 T T å s=1 ˜ f (k) s r N (s;t)+ 1 T T å s=1 ˜ f (k) s z st + 1 T T å s=1 ˜ f (k) s h st + 1 T T å s=1 ˜ f (k) s x st ; where z st = u 0 s u t N r N (s;t); h st = f 0 s;0 G 00 u t N ; x st = f 0 t;0 G 00 u s N =h ts : By definition, we have H (k) ˆ F (k)0 ˆ F (k) =T F 0 0 F 0 =T G 00 G 0 =N ; and each of the norms is stochastically bounded by the assumption A2 0 ; A3 and the normalization 1 T ˜ F (k)0 ˜ F (k) = I k as- sumed in the estimation, then H (k) = O p (1): Moreover, using the inequality that (x+ y+ z+ u) 2 4(x 2 + y 2 + z 2 + u 2 ), we have ˆ f (k) t H (k)0 f t;0 2 4(a 1t + a 2t + a 3t + a 4t ); (B.14) 115 where a 1t = T 2 T å s=1 ˜ f (k) s r N (s;t) 2 ;a 2t = T 2 T å s=1 ˜ f (k) s z st 2 ; a 3t = T 2 T å s=1 ˜ f (k) s h st 2 ;a 4t = T 2 T å s=1 ˜ f (k) s x st 2 : It is straightforward that 1 T T å t=1 ˆ f (k) t H (k)0 f t;0 2 4 T T å t=1 (a 1t + a 2t + a 3t + a 4t ): (B.15) For the first term, a 1t ; by using the Cauchy-Schwarz inequality, we obtain 1 T T å t=1 a 1t = 1 T 3 T å t=1 T å s=1 ˜ f (k) s r N (s;t) 2 1 T 1 T T å s=1 ˜ f (k) s 2 ! 1 T T å t=1 T å s=1 r 2 N (s;t) ! = O p 1 T ; (B.16) by using the fact that 1 T å T t=1 ˜ f t 2 = O p (1) from the normalization 1 T ˜ F (k)0 ˜ F (k) = I k and 1 T T å t=1 T å s=1 r 2 N (s;t)= O(1) from Lemma B.1. 116 Similarly, we have 1 T T å t=1 a 2t = 1 T 3 T å t=1 T å s=1 ˜ f (k) s z st 2 = 1 T 3 T å t=1 T å s=1 T å s 0 =1 ˜ f (k)0 s ˜ f (k) s 0 z st z s 0 t 1 T 1 T 2 T å s=1 T å s 0 =1 ˜ f (k)0 s ˜ f (k) s 0 2 ! 1=2 2 4 1 T 2 T å s=1 T å s 0 =1 T å t=1 z st z s 0 t ! 2 3 5 1=2 1 T 1 T T å s=1 ˜ f (k) s 2 ! 2 4 1 T 2 T å s=1 T å s 0 =1 T å t=1 z st z s 0 t ! 2 3 5 1=2 = O p 1 N ; (B.17) since 1 T å T s=1 ˜ f (k) s 2 = O p (1) and 1 T 2 å T s=1 å T s 0 =1 å T t=1 z st z s 0 t 2 = O T 2 N 2 due to the fact that E 0 @ T å t=1 z st z s 0 t ! 2 1 A = E T å t=1 T å t 0 =1 z st z s 0 t z st 0z s 0 t 0 ! T 2 max s;t E jz st j 4 ; and E jz st j 4 = 1 N 2 E 1 p N N å i=1 (u it u is E(u it u is )) C N 2 ; using the results from Lemma B.1. Furthermore, we have 1 T T å t=1 a 3t = 1 T 3 T å t=1 T å s=1 ˜ f (k) s h st 2 = 1 T 3 T å t=1 T å s=1 ˜ f (k) s f 0 s;0 G 00 u t N 2 1 T T å t=1 1 T T å s=1 ˜ f (k) s 2 ! 0 B @ 1 T T å s=1 f (k)0 s;0 G 00 u t N 2 1 C A 1 T N 2 1 T T å s=1 ˜ f (k) s 2 ! 1 T T å s=1 f s;0 2 ! T å t=1 G 00 u t 2 ! = 1 N O p (1) 1 T T å t=1 N 1=2 G 00 u t 2 ! = O p 1 N ; (B.18) 117 since 1 T å T t=1 N 1=2 G 00 u t 2 = O p (1) by using the results of Lemma B.1. Similarly, we have 1 T å T t=1 a 4t = O p 1 N : Substituting the above results of (B.16)-(B.18) into (B.15) yields 1 T T å t=1 ˆ f (k) t H (k)0 f t;0 2 = O p 1 N + O p 1 T ; as required, which is also equivalent to C 2 NT 1 T ˆ F (k) H (k)0 F 0 2 = O p (1): Lemma B.3. For8k2[1; p], and H (k) defined in Proposition B.2, we have V k; ˆ F (k) V k;F 0 H (k) = O p (C 1 NT ): (B.19) Proof. Define M (k) H = I T P (k) H as the idempotent matrix spanned by the null space of F 0 H (k) , and correspondingly, let ˆ M (k) = I T ˆ F (k) ( ˆ F (k)0 ˆ F (k) ) 1 ˆ F (k)0 = I T P (k) ˆ F . Then we can rewrite V k; ˆ F (k) V k;F 0 H (k) as V k; ˆ F (k) V k;F 0 H (k) = 1 NT N å i=1 y 0 i ˆ M (k) y i 1 NT N å i=1 y 0 i M (k) H y i = 1 NT N å i=1 y 0 i P (k) H P (k) ˆ F y i : (B.20) 118 Let D k = ˆ F (k)0 ˆ F (k) =T and D 0 = H (k)0 F 0 0 F 0 H (k) =T , then we have P (k) ˆ F P (k) H = T 1 ˆ F (k) T 1 ˆ F (k)0 ˆ F (k) 1 ˆ F (k)0 T 1 F 0 H (k) T 1 H (k)0 F 0 0 F 0 H (k) 1 H (k)0 F 0 0 = T 1 ˆ F (k) D 1 k ˆ F (k)0 T 1 F 0 H (k) D 1 0 H (k)0 F 0 0 = T 1 [( ˆ F (k) F 0 H (k) + F 0 H (k) )D 1 k ( ˆ F (k) F 0 H (k) + F 0 H (k) ) 0 ] T 1 F 0 H (k) D 1 0 H (k)0 F 0 0 = T 1 ( ˆ F (k) F 0 H (k) )D 1 k ( ˆ F (k) F 0 H (k) ) 0 + T 1 ( ˆ F (k) F 0 H (k) )D 1 k H (k)0 F 0 0 +T 1 F 0 H (k) D 1 k ( ˆ F (k) F 0 H (k) ) 0 + T 1 F 0 H (k) (D 1 k D 1 0 )H (k)0 F 0 0 : (B.21) Consequently, we can decompose (B.20) as V k; ˆ F (k) V k;F 0 H (k) = b 1 + b 2 + b 3 + b 4 ; (B.22) where b j ( j= 1;2;3;4) are defined according to the four terms of (B.21), and we consider these terms one by one. For the first term, kb 1 k = 1 NT 2 N å i=1 y 0 i ( ˆ F (k) F 0 H (k) )D 1 k ( ˆ F (k) F 0 H (k) ) 0 y i 1 NT 2 N å i=1 y 0 i ( ˆ F (k) F 0 H (k) )D 1 k ( ˆ F (k) F 0 H (k) ) 0 y i 1 NT 2 N å i=1 ˆ F (k) F 0 H (k) 4 D 1 k 2 ! 1=2 1 NT 2 N å i=1 ky i k 4 ! 1=2 1 T ˆ F (k) F 0 H (k) 2 D 1 k 1 NT 2 N å i=1 ky i k 4 ! 1=2 = O p C 2 NT ; (B.23) by using the results of Proposition B.2 and Lemma B.1. 119 For the second term, kb 2 k = 1 NT 2 N å i=1 y 0 i T 1 ( ˆ F (k) F 0 H (k) )D 1 k F 0 H (k)0 y i 1 NT 2 N å i=1 ˆ F (k) F 0 H (k) 2 F 0 H (k) 2 D 1 k 2 ! 1=2 1 NT 2 N å i=1 ky i k 4 ! 1=2 1 T ˆ F (k) F 0 H (k) 2 1 T F 0 H (k) 2 D 1 k 2 1=2 1 NT 2 N å i=1 ky i k 4 ! 1=2 = O p C 1 NT : (B.24) Similarly, it can be verified thatkb 3 k= O p C 1 NT : For the last term, we have kb 4 k = 1 NT 2 N å i=1 y 0 i F 0 H (k) (D 1 k D 1 0 )H (k)0 F 0 0 y i D 1 k D 1 0 1 T F 0 H (k) 1 NT N å i=1 ky i k 2 = O p C 1 NT ; (B.25) by using the result that D 1 k D 1 0 = O p C 1 NT ; which is proved below. Substituting the above results of (B.23)-(B.25) into (B.21) and (B.20) we obtain V k; ˆ F (k) V k;F 0 H (k) = O p (C 1 NT ); as required. To finish the proof, we need to show that D 1 k D 1 0 = O p C 1 NT ; it is equivalent to show thatkD k D 0 k= O p C 1 NT due to the fact that D 1 k D 1 0 = D 1 k (D 0 D k )D 1 0 ; 120 and both D 1 0 = O p (1) and D 1 k = O p (1): For the desired result, we note that D k D 0 = 1 T ˆ F (k)0 ˆ F (k) H (k)0 F 0 0 F 0 H (k) = 1 T T å t=1 ˆ f (k) t ˆ f (k)0 t H (k)0 f t0 f 0 t0 H (k) = 1 T T å t=1 ˆ f (k) t H (k)0 f t0 ˆ f (k) t H (k)0 f t0 0 + 1 T T å t=1 ˆ f (k) t H (k)0 f t0 f 0 t0 H (k) + 1 T T å t=1 H (k)0 f t0 ˆ f (k) t H (k)0 f t0 0 ; and thus kD k D 0 k 1 T T å t=1 ˆ f (k) t H (k)0 f t0 2 + 2 1 T T å t=1 ˆ f (k) t H (k)0 f t0 2 ! 1=2 1 T T å t=1 H (k)0 f t0 2 ! 1=2 = O p (C 2 NT )+ O p (C 1 NT ) = O p (C 1 NT ); by using the results from Lemma B.1. Lemma B.4. For the matrix H (k) defined in Proposition 1, and for each k with k< p, there exists at k > 0 such that plim inf (N;T)!¥ V k;F 0 H (k) V(p;F 0 )=t k : (B.26) Proof. By the definition of P 0 = F 0 (F 0 0 F 0 ) 1 F 0 0 ; we note that V k;F 0 H (k) V(p;F 0 ) = 1 NT N å i=1 y 0 i P 0 P (k) H y i = 1 NT N å i=1 F 0 g 0 i + u i 0 P 0 P (k) H F 0 g 0 i + u i = 1 NT N å i=1 g 00 i F 0 0 P 0 P (k) H F 0 g 0 i + 2 NT N å i=1 u 0 i P 0 P (k) H F 0 g 0 i + 1 NT N å i=1 u 0 i P 0 P (k) H u i = c 1 + c 2 + c 3 ; say. (B.27) 121 For c 1 ; we have c 1 = 1 NT N å i=1 g 00 i F 0 0 P 0 P (k) H F 0 g 0 i = tr " 1 T F 0 0 P 0 P (k) H F 0 1 N N å i=1 g 0 i g 00 i # = tr 2 4 0 @ F 0 0 F 0 T F 0 0 F 0 H (k) T H (k)0 F 0 0 F 0 H (k) T ! 1 H (k)0 F 0 0 F 0 T 1 A 1 N N å i=1 g 0 i g 00 i 3 5 ! p tr S F S F H (k) 0 H (k)0 0 S F H (k) 0 1 H (k)0 S F D = tr(AD); (B.28) where A=S F S F H (k) 0 H (k)0 0 S F H (k) 0 1 H (k)0 S F withS F = lim T!¥ F 0 0 F 0 =T; H (k) 0 = lim (N;T)!¥ H (k) (since H (k) depends on N;T by definition). We note that A6= 0 since rank(S F )= p (Assumption 2 0 ). Also, A is positive semi-definite by construction and D> 0 (Assumption 4). This implies that tr(AD)> 0. For c 2 ; we have c 2 = 2 NT N å i=1 u 0 i P 0 P (k) H F 0 g 0 i = 2 NT N å i=1 u 0 i P 0 H F 0 g 0 i 2 NT N å i=1 u 0 i P (k) H F 0 g 0 i ; (B.29) where for the first term 2 NT N å i=1 u 0 i P 0 F 0 g 0 i = 2 NT N å i=1 u 0 i F 0 g 0 i 2 1 T kF 0 k 2 1=2 1 p N 1 T T å t=1 1 p N u 0 i g 0 i 2 ! 1=2 = O p 1 p N ; by using the results of Lemma A1. It can be easily verified that the second term is also O p N 1=2 : Consequently, we have c 2 = O p N 1=2 : 122 For c 3 ; by construction we have c 3 0 since P 0 P (k) H = F 0 (F 0 0 F 0 ) 1 F 0 0 F 0 H (k) T 1 H (k)0 F 0 0 F 0 H (k) 1 H (k)0 F 0 0 = F 0 (F 0 0 F 0 ) 1=2 I p (F 0 0 F 0 ) 1=2 H (k) H (k)0 F 0 0 F 0 H (k) 1 H (k)0 (F 0 0 F 0 ) 1=2 (F 0 0 F 0 ) 1=2 F 0 0 0; since I p (F 0 0 F 0 ) 1=2 H (k) H (k)0 F 0 0 F 0 H (k) 1 H (k)0 (F 0 0 F 0 ) 1=2 is an idempotent matrix with trace of p k> 0. Combining the above results yields plim inf (N;T)!¥ V k;F 0 H (k) V(p;F 0 )=t k ; (B.30) wheret k > 0 as required. Lemma B.5. For any fixed k with k p, V k; ˆ F (k) V p; ˆ F (p) = O p (C 2 NT ). Proof. We note that V k; ˆ F (k) V p; ˆ F (p) V k; ˆ F (k) V(p;F 0 ) + V(p;F 0 )V p; ˆ F (p) 2 max pkp max V k; ˆ F (k) V(p;F 0 ) ; (B.31) thus it is sufficient to prove for each k with k p such that V k; ˆ F (k) V(p;F 0 )= O p (C 2 NT ): (B.32) 123 To this end, let H (k) be defined in the same way as in Proposition B.2, but with rank p since k p: Let H (k) be the generalized inverse of H (k) such that H (k) H (k) = I p : Consequently, y i = F 0 g 0 i + u i = F 0 H (k) H (k) g 0 i + u i ; or y i = ˆ F (k) H (k) g 0 i + u i ˆ F (k) F 0 H (k) H (k) g 0 i = ˆ F (k) H (k) g 0 i + e i ; where e i = u i ˆ F (k) F 0 H (k) H (k) g 0 i : By construction, we have V k; ˆ F (k) = 1 NT N å i=1 e 0 i M (k) ˆ F e i and V(p;F 0 )= 1 NT N å i=1 u 0 i M 0 F u i ; then V k; ˆ F (k) = 1 NT N å i=1 h u i ˆ F (k) F 0 H (k) H (k) g 0 i i 0 M (k) ˆ F h u i ˆ F (k) F 0 H (k) H (k) g 0 i i = 1 NT N å i=1 u 0 i M (k) ˆ F u i + 1 NT N å i=1 g 00 i H (k)0 ˆ F (k) F 0 H (k) 0 M (k) ˆ F ˆ F (k) F 0 H (k) H (k) g 0 i 2 NT N å i=1 g 00 i H (k)0 ˆ F (k) F 0 H (k) 0 M (k) ˆ F u i : (B.33) It can be shown that V k; ˆ F (k) = 1 NT N å i=1 u 0 i M (k) ˆ F u i + O p (C 2 NT ); (B.34) 124 since the last term of (B.33) can be shown that 2 NT N å i=1 g 00 i H (k)0 ˆ F (k) F 0 H (k) 0 M (k) ˆ F u i = 2 NT tr H (k)0 ˆ F (k) F 0 H (k) 0 M (k) ˆ F N å i=1 u i g 00 i ! 2p 1 NT H (k)0 ˆ F (k) F 0 H (k) 0 M (k) ˆ F N å i=1 u i g 00 i 2p H (k) ˆ F (k) F 0 H (k) p T 1 p T N N å i=1 u i g 00 i 2p H (k) 1 T T å t=1 ˆ f (k) t H (k)0 f t;0 2 ! 1=2 1 p N 1 T N å i=1 1 p N u i g 00 i 2 ! 1=2 = O p (C 1 NT ) 1 p N ; and the penultimate term of (B.33) is of order O p (C 2 NT ) because 1 NT N å i=1 g 00 i H (k)0 ˆ F (k) F 0 H (k) 0 M (k) ˆ F ˆ F (k) F 0 H (k) H (k) g 0 i 1 NT N å i=1 g 00 i H (k)0 ˆ F (k) F 0 H (k) 0 ˆ F (k) F 0 H (k) H (k) g 0 i 1 T T å t=1 ˆ f (k) t H (k)0 f t;0 2 ! H (k) 2 1 N N å i=1 g 0 i 2 ! = O p (C 2 NT ); following the results of Proposition B.2. As a result, we obtain V k; ˆ F (k) V(p;F 0 )= 1 NT N å i=1 u 0 i P (k) ˆ F u i 1 NT N å i=1 u 0 i P 0 F u i + O p (C 2 NT ); (B.35) 125 where 1 NT N å i=1 u 0 i P 0 F u i F 0 0 F 0 =T 1 1 NT 2 N å i=1 u 0 i F 0 F 0 0 u i = O p (1) 1 NT N å i=1 T 1=2 F 0 0 u i 2 = O p 1 T O p (C 1 NT ): Also, we note that the first term of (B.35) is bounded by the sum of the first k-largest values of the matrix A NT = 1 NT U 0 U ; thus it is sufficient to show that the largest eigenvalue of A NT is of order O p C 2 NT : To this end, we note that A NT = 1 NT U 0 U = 1 NT U 0 M T U 1 NT U 0 U; while it is shown by Bai and Ng (2002) that the largest eigenvalue 1 NT U 0 U is of order O p C 2 NT under Assumption A1. As a result, we obtain 1 NT N å i=1 u 0 i P (k) ˆ F u i = O p (C 2 NT ): Consequently, we have V k; ˆ F (k) V(p;F 0 )= O p (C 2 NT ); for any k p as required. ProofofPropositionB.2. We need to show that lim (N;T)!¥ Pr(IC(k)< IC(p))= 0; for all k6= p and k p max : To show this, we note that IC(k) IC(p)= ln V(k) V(p) +(k p)g(N;T): (B.36) 126 From Lemma B.4, when k < p; we have V(k) V(p) > 1+e 0 for some e 0 with large probability for all large N and T: Thus we have ln h V(k) V(p) i > e 0 2 with large probability for all large N and T: Since g(N;T)! 0; we have IC(k) IC(p) e 0 2 (p k)g(N;T) e 0 3 ; (B.37) for all large N and T: On the other hand, when k> p; Lemma B.5 implies that V(k) V(p) = 1+ O p (C 2 NT ); and thus ln h V(k) V(p) i = O p (C 2 NT ) for all large N and T: Moreover, when k> p; we have(k p)g(N;T) g(N;T); and g(N;T) converges to zero at a slower rate than C 2 NT : It is obvious that Pr(IC(k)< IC(p)) Pr O p (C 2 NT )+ g(N;T)< 0 ! 0; (B.38) all large N and T when k> p: Combining the results of (B.37) and (B.38) yields lim (N;T)!¥ Pr(IC(k)< IC(p))= 0; as required for all k6= p and k p max : C Appendix to Chapter 3 This appendix contains mathematical derivations for the main results in the paper. 127 C.1 Proof for Theorem 3.3.1 Proof. The proof is similar to Manski (1988) and Rothe (2009). Here we only sketch the main steps. By the setup of the model, we have E[y it1 jw it ;v it2 ]= E[y it1 jw 0 it p 0 ;v it2 ]: (C.1) Suppose now there exists a parameter vector ˇ p6=p 0 , which also satisfies E[y it1 jw it ;v it2 ]= E[y it1 jw 0 it ˇ p;v it2 ]: (C.2) (C.1) and (C.2) imply E[y it1 jw 0 it p 0 ;v it2 ]= E[y it1 jw 0 it ˇ p;v it2 ]; which indicates that we can recover information on(w 0 it p 0 ;v it2 ) from(w 0 it ˇ p;v it2 ): Therefore we can find a monotonic function m(;v it2 ) which is continuous for all v it2 such that w 0 it ˇ p = m(w 0 it p 0 ;v it2 ): (C.3) Assumption IC(2) implies that there exists a continuous regressor w (1) it ; whose coefficient can be normalized to 1. Then we can differentiate (C.3) with respect to w (1) it to obtain 1= ¶m(w 0 it p 0 ;v it2 ) ¶w (1) it ; (C.4) which verifies that m(;v it2 ) is an identity function, and hence w 0 it ˇ p= w 0 it p 0 . By Assumption IC(3), this equation holds if and only if ˇ p =p 0 . C.2 Proof for Theorem 3.4.1 Before giving the proof of Theorem 3.4.1, we provide some preliminary lemmas. 128 Lemma C.1. Under Assumption CON, we have the following equations hold uniformly forp2P ˆ p t (w 0 it p; ˆ v it2 ) = ˆ p t (w 0 it p;v it2 )+ o p (N 1 4 ); (C.5) ˆ q t (w 0 it p; ˆ v it2 ) = ˆ q t (w 0 it p;v it2 )+ o p (N 1 4 ): (C.6) Proof. Here we just prove the first equation since the rest can be shown similarly. We note that by applying Taylor expansion of ˆ p t (w 0 it p; ˆ v it2 ) around v it2 , we obtain ˆ p t (w 0 it p; ˆ v it2 )= ˆ p t (w 0 it p;v it2 )+ ¶ ˆ p t (w 0 it p;v it2 ) ¶v it2 (ˆ v it2 v it2 )+ o p (N 1=4 ); (C.7) where the last term follows from Assumption CON(5) that max i;t kˆ v it2 v it k= o p N 1=4 . Now for the second term, we can decompose it further as follows, where for convenience we drop the arguments of kernel functionsk 1 andk 2 , ¶ ˆ p t (w 0 it p;v it2 ) ¶v it2 (ˆ v it2 v it2 ) = 1 Nh k e 2 å j6=i k 1 ¶k 2 ¶v it2 y jt1 (ˆ v it2 v it2 ) = 1 Nh k e 2 å j6=i k 1 ¶k 2 ¶v it2 y jt1 1 N N å l=1 T å s=1 g(D it ;D ls )y ls + o p (N 1=2 ) ! = 1 N 2 h k e 2 N å l=1 T å s=1 å j6=i k 1 ¶k 2 ¶v it2 y jt1 g(D it ;D ls )y ls + o p (N 1=2 ) = 1 Nh k e 2 N å l=1 T å s=1 E E k 1 ¶k 2 ¶v it2 y jt1 jD it g(D it ;D ls )jD ls y ls + o p (N 1=2 ) = O p (N 1=2 h k e 2 ); (C.8) where the fourth equation follows by projection for U-statistics. Under Assumption CON(3) for j= 1;;k e , O p (N 1=2 h k e 2 )= O p (N k e d 2 j 1=2 )< O p (N 1=4 ): (C.9) As a result, substituting (C.8) and (C.9) into (C.7) yields ˆ p t (w 0 it p; ˆ v it2 )= ˆ p(w 0 it p;v it2 )+ o p (N 1=4 ); 129 as required. Lemma C.2. Under Assumption CON and as N!¥, sup p2P ˆ F t (w 0 it p; ˆ v it2 ) F t (w 0 it p;v it2 ) = o p (N 1=4 ); (C.10) sup p2P ¶ ˆ F t (w 0 it p; ˆ v it2 ) ¶p ¶F t (w 0 it p;v it2 ) ¶p = o p (N 1=4 ); (C.11) sup p2P ¶ ˆ F t (w 0 it p; ˆ v it2 ) ¶ ˆ v it2 ¶F t (w 0 it p;v it2 ) ¶v it2 = o p (N 1=4 ): (C.12) Proof. We first note that the standard kernel smoothing theory (e.g., Hansen (2008), Newey (1994)) shows sup p2P ˆ F t (w 0 it p;v it2 ) F t (w 0 it p;v it2 ) = o p (N 1=4 ); (C.13) sup p2P ¶ ˆ F t (w 0 it p;v it2 ) ¶p ¶F t (w 0 it p;v it2 ) ¶p = o p N 1=4 ; (C.14) sup p2P ¶ ˆ F t (w 0 it p;v it2 ) ¶v it2 ¶F t (w 0 it p;v it2 ) ¶v it2 = o p N 1=4 ; (C.15) and higher order derivatives of ˆ F t (w 0 it p;v it2 )converge to that of F t (w 0 it p;v it2 ): Moreover, the deriva- tives of F t (w 0 it p;v it2 ) are bounded by Assumption CON(4) and max i;t kˆ v it2 v it k= o p N 1=4 by Assumption CON(5), we have ˆ F t (w 0 it p; ˆ v it2 ) ˆ F t (w 0 it p;v it2 ) = ¶ ˆ F t (w 0 it p;v it2 ) ¶v it2 (ˆ v it2 v it2 )+ O jjˆ v it2 v it2 jj 2 = o p N 1=4 ; (C.16) ¶ ˆ F t (w 0 it p; ˆ v it2 ) ¶p ¶ ˆ F t (w 0 it p;v it2 ) ¶p = ¶ 2 ˆ F t (w 0 it p;v it2 ) ¶p¶v it2 (ˆ v it2 v it2 )+ O jjˆ v it2 v it2 jj 2 = o p N 1=4 ; (C.17) ¶ ˆ F t (w 0 it p; ˆ v it2 ) ¶ ˆ v it2 ¶ ˆ F t (w 0 it p;v it2 ) ¶v it2 = ¶ 2 ˆ F t (w 0 it p;v it2 ) ¶v it2 ¶v it2 (ˆ v it2 v it2 )+ O jjˆ v it2 v it2 jj 2 = o p N 1=4 : (C.18) 130 Then by the triangle inequality, we obtain sup p2P ˆ F t (w 0 it p; ˆ v it2 ) F t (w 0 it p;v it2 ) sup p2P ˆ F t (w 0 it p; ˆ v it2 ) ˆ F t (w 0 it p;v it2 ) + sup p2P ˆ F t (w 0 it p;v it2 ) F t (w 0 it p;v it2 ) = o p N 1=4 ; and sup p2P ¶ ˆ F t (w 0 it p; ˆ v it2 ) ¶p ¶F t (w 0 it p;v it2 ) ¶p sup p2P ¶ ˆ F t (w 0 it p; ˆ v it2 ) ¶p ¶ ˆ F t (w 0 it p;v it2 ) ¶p + sup p2P ¶ ˆ F t (w 0 it p;v it2 ) ¶p ¶F t (w 0 it p;v it2 ) ¶p = o p N 1=4 ; and sup p2P ¶ ˆ F t (w 0 it p; ˆ v it2 ) ¶ ˆ v it2 ¶F t (w 0 it p;v it2 ) ¶v it2 sup p2P ¶ ˆ F t (w 0 it p; ˆ v it2 ) ¶ ˆ v it2 ¶ ˆ F t (w 0 it p;v it2 ) ¶v it2 + sup p2P ¶ ˆ F t (w 0 it p;v it2 ) ¶v it2 ¶F t (w 0 it p;v it2 ) ¶v it2 = o p N 1=4 ; as required. Now we are ready to prove Theorem 3.4.1. 131 Proof of Theorem 3.4.1. To establish the consistency of ˆ p SLS , we first show the uniform conver- gence of ˆ S N (p) to ˜ S N (p). Namely, for anyp2P we have ˆ S N (p) ˜ S N (p) = 1 N N å i=1 T å t=1 2y it1 ˆ F t (w 0 it p; ˆ v it2 ) F t (w 0 it p;v it2 ) + ˆ F t (w 0 it p; ˆ v it2 ) 2 F t (w 0 it p;v it2 ) 2 2 N N å i=1 T å t=1 jy it1 j ˆ F t (w 0 it p; ˆ v it2 ) F t (w 0 it p;v it2 ) + 1 N N å i=1 T å t=1 ˆ F t (w 0 it p; ˆ v it2 ) 2 F t (w 0 it p;v it2 ) 2 : (C.19) Following Lemma C.2, we have sup p2P ˆ S N (p) ˜ S N (p) = o p (1): (C.20) Moreover, define S(p) as the probability limit of ˜ S N (p). Then, by the uniform law of large numbers, sup p2P ˜ S N (p) S(p) = o p (1): (C.21) and by the triangle inequality, sup p2P ˆ S N (p) S(p) = o p (1): (C.22) On the other hand, based on the data generating process for y it1 , we have S(p) = T å t=1 n E y it1 F t (w 0 it p;v it2 ) 2 o = T å t=1 E y 2 it1 2y it1 F t (w 0 it p;v it2 )+ F t (w 0 it p;v it2 ) 2 = T å t=1 E F t (w 0 it p 0 ;v it2 ) 2 2F t (w 0 it p 0 ;v it2 )F t (w 0 it p;v it2 )+ F t (w 0 it p;v it2 ) 2 :(C.23) 132 Taking the first derivative of (C.23) with respect to w 0 it p, the first order condition becomes F t (w 0 it p 0 ;v it2 )= F t (w 0 it p;v it2 ); (C.24) which implies that E(y it1 jw it ;v it2 )= E(y it1 jw 0 it p;v it2 ). By the identification condition of Theorem 3.3.1, this holds only ifp =p 0 . Then, following Theorem 2.1 in Newey (1994), we obtain ˆ p SLS = p 0 + o p (1) as required. C.3 Proof for Theorem 3.4.2 The proof of Theorem 3.4.2 is accomplished by showing that the six conditions in Theorem 2 of Chen et al. (2003) are satisfied. In our context, the optimization of criterion function (3.24) is equivalent to showing the moment function M N (p;h)= N 1 å T i=1 m i (p;h) equals zero, where m i (p;h)= T å t=1 (y it1 F t (w 0 it p;v it2 )) ¶F t (w 0 it p;v it2 ) ¶p : (C.25) We also let M(p;h)= E[m i (p;h)]; (C.26) and denote the ordinary derivative of M(p;h) with respect top by G 1 (p;h)= ¶M(p;h) ¶p ; (C.27) for allp2P. Now we show the conditions of Chen et al. (2003) hold in the following six lemmas. Lemma C.3. Under Assumptions CI, CON and NR, M N ( ˆ p SLS ; ˆ h) = inf p2P M N (p; ˆ h) + o p N 1=2 : Proof. This is obvious since with ˆ p the moment function M N (p; ˆ h) achieves the value of zero. 133 Lemma C.4. Under Assumptions CI, CON and NR,G 1 (p;h 0 ) exists forp2P and is continuous atp 0 . Proof. The existence ofG 1 (p;h 0 ) is obvious by the definition in (C.27), and the continuity can be verified by showing G 1 (p 0 ;h 0 ) = E T å t=1 (y it1 F t (w 0 it p 0 ;v it2 )) ¶ 2 F t (w 0 it p 0 ;v it2 ) ¶p 0 ¶p 0 0 ¶F t (w 0 it p 0 ;v it2 ) ¶p 0 ¶F t (w 0 it p 0 ;v it2 ) ¶p 0 0 ! = E T å t=1 ¶F t (w 0 it p 0 ;v it2 ) ¶p 0 ¶F t (w 0 it p 0 ;v it2 ) ¶p 0 0 ! = W; (C.28) which together with Assumption CON(4) establishes the continuity ofG 1 (p;h 0 ) atp =p 0 . Lemma C.5. Under Assumptions CI, CON and NR, the pathwise derivativeG 2 (p;h 0 )[ ¯ hh 0 ] exists in all directions[ ¯ h h 0 ]2H , and for all(p; ¯ h)2PH with a positive sequenced n = o(1), we have: (1) M(p; ¯ h) M(p;h 0 )G 2 (p;h 0 )[ ¯ h h 0 ] cjj ¯ h h 0 jj 2 H for a constant c 0; (2) G 2 (p;h 0 )[ ¯ h h 0 ]G 2 (p 0 ;h 0 )[ ¯ h h 0 ] o(1)d n : Proof. We first note that for ¯ h that is close to h 0 , the pathwise derivative can be calculated as follows G 2 (p;h 0 )[ ¯ h h 0 ] = E T å t=1 ¶F t (w 0 it p;v it2 ) ¶p ¶F t (w 0 it p;v it2 ) ¶v 0 it2 (¯ v it2 v it2 ) ¶F t (w 0 it p;v it2 ) ¶p ( ¯ F t (w 0 it p;v it2 ) F t (w 0 it p;v it2 )) y it1 F t (w 0 it p;v it2 ) ¶ 2 F t (w 0 it p;v it2 ) ¶p¶v 0 it2 (¯ v it2 v it2 ) : (C.29) 134 Also, by the law of iterated expectations we obtain G 2 (p 0 ;h 0 )[ ¯ h h 0 ] = E T å t=1 ¶F t (w 0 it p 0 ;v it2 ) ¶p 0 ¶F t (w 0 it p 0 ;v it2 ) ¶v 0 it2 (¯ v it2 v it2 ) ¶F t (w 0 it p 0 ;v it2 ) ¶p 0 ( ¯ F t (w 0 it p 0 ;v it2 ) F t (w 0 it p 0 ;v it2 )) : (C.30) The two inequalities can be easily verified using (C.29)-(C.30) under Assumption CON(4) that function F t is Lipchitz continuous for high orders. Lemma C.6. Under Assumptions CI, CON and NR, ˆ h2H with probability tending to one, and jj ˆ h h 0 jj H = o p (N 1=4 ). Proof. We note that ˆ v it2 v it2 = o p (N 1=4 ) under Assumption CON(5). Following Lemma C.2, we obtain ˆ F t (; ˆ v it2 )F t (;v it2 )= o p (N 1=4 ) for each t, andjj ˆ hh 0 jj H = o p (N 1=4 ) as required. Lemma C.7. Under Assumptions CI, CON and NR, for all sequences of positive numbersfd N g withd N = o(1), sup jjpp 0 jjd N ;jjhh 0 jj H d N p NkM N (p;h) M(p;h) M N (p 0 ;h 0 )k= o p (1): Proof. This result can be verified by showing that Donsker Theorem holds for M N (p;h)M(p;h). For notational simplicity, let q it (p;F t ;v it2 )=(y it1 F t (w 0 it p;v it2 )) F t (w 0 it p;v it2 ) ¶p ; (C.31) so that m i (p;h)=å T t=1 q it (p;F t ;v it2 ): Following Chen et al. (2003), we first establish the L P (2)-continuity of M N (p;h) with respect top and h, that is E sup jjp pjjd;jjh hjj H d km i (p ;h ) m i (p;h)k 2 ! Cd 2 ; (C.32) 135 where C is a finite positive constant, andd = o(1) is a small positive number. The right-hand side of (C.32) can be written as E 0 @ sup jjp pjjd;jjh hjj H d T å t=1 q it (p ;F t ;v it2 ) q it (p;F t ;v it2 ) 2 1 A T å t=1 E sup jjp pjjd;jjh hjj H d kq it (p ;F t ;v it2 ) q it (p;F t ;v it2 )k 2 ! : It can be shown that kq it (p ;F t ;v it2 ) q it (p;F t ;v it2 )k 2 kq it (p ;F t ;v it2 ) q it (p;F t ;v it2 )k 2 +kq it (p;F t ;v it2 ) q it (p;F t ;v it2 )k 2 +kq it (p;F t ;v it2 ) q it (p;F t ;v it2 )k 2 : (C.33) For the first term of (C.33), the mean value theorem implies that there exists some ˜ p between p andp such that sup jjp pjjd kq it (p ;F t ;v it2 ) q it (p;F t ;v it2 )k 2 = sup jjp pjjd ¶q it ( ˜ p;F t ;v it2 ) ¶p (p p) 2 Cd 2 ¶q it ( ˜ p;F t ;v it2 ) ¶ ˜ p 2 ; (C.34) which can be decomposed as ¶q it ( ˜ p;F t ;v it2 ) ¶ ˜ p = ¶F t ¶ ˜ p ¶F t ¶ ˜ p 0 +(y it1 F t ) ¶ 2 F t ¶p¶p 0 ; (C.35) where we drop the arguments of F t on the right-hand side for convenience; the arguments are the same as those for q it on the left-hand side. Then, (C.35) is bounded by the continuity and boundedness of function F t and its derivatives, as stated in Assumption CON(4). Therefore, E sup jjp pjjd kq it (p ;F t ;v it2 ) q it (p;F t ;v it2 )k 2 ! C 1 d 2 ; (C.36) 136 where C 1 is a positive constant. For the second term of (C.33), we can replace F t and obtain sup jjp pjjd;jjh hjj H d kq it (p;F t ;v it2 ) q it (p;F t ;v it2 )k 2 sup jjp pjjd;jjh hjj H d (y it1 F t +d=2) ¶F t ¶p (y it1 F t ) ¶F t ¶p 2 = sup jjp pjjd;jjh hjj H d d 2 4 ¶F t ¶p 2 ; (C.37) which is bounded almost everywhere. Therefore, E sup jjp pjjd;jjh hjj H d kq it (p;F t ;v it2 ) q it (p;F t ;v it2 )k 2 ! C 2 d 2 ; (C.38) for some C 2 > 0: Finally, the third term of (C.33) can be rewritten as kq it (p;F t ;v it2 ) q it (p;F t ;v it2 )k 2 = ¶q it (p;F t ;v it2 ) ¶v it2 (v it2 v it2 ) 2 ; (C.39) by the mean value theorem. Since ¶q it ¶v it2 = ¶F t ¶p ¶F t ¶v it2 +(y it1 F t ) ¶ 2 F t ¶p¶v it2 (C.40) is bounded almost everywhere, then sup jjh hjj H d kq it (p;F t ;v it2 ) q it (p;F t ;v it2 )k 2 C 3 d 2 ; (C.41) for some C 3 > 0. Combining (C.33), (C.36), (C.38) and (C.41) yields (C.32). Under Assumptions CON(4) and NR(1), it can be verified that Donsker Theorem holds by Corollary 2.7.4 in Vaart and Wellner (1997). Therefore, the stated stochastic equicontinuity immediately follows. 137 Lemma C.8. Under Assumptions CI, CON and NR, as N!¥, T fixed, we have p N M N (p 0 ;h 0 )+G 2 (p 0 ;h 0 ) ˆ h h 0 ! d N (0;V); where V= E(f i f 0 i ) withf i provided in (C.44). Proof. Since it is already shown in Lemma C.2 that ˆ F t (; ˆ v it2 ) converges uniformly to F t (;v it2 ), we can simplifyG 2 (p;h)[ ˆ h h] as G 2 (p 0 ;h 0 )[ ˆ h h 0 ]= E " T å t=1 ¶F t (w 0 it p 0 ;v it2 ) ¶p 0 ¶F t (w 0 it p 0 ;v it2 ) ¶v 0 it2 (ˆ v it2 v it2 ) # : (C.42) If the reduced form parameters are estimated by pooled OLS, we can rewrite it using a specific influence function and obtain G 2 (p 0 ;h 0 )[ ˆ h h 0 ] = E " T å t=1 ¶F t (w 0 it p 0 ;v it2 ) ¶p 0 ¶F t (w 0 it p 0 ;v it2 ) ¶v 0 it2 1 N N å j=1 T å s=1 g(D it ;D js )y js # + o p (N 1=2 ) = 1 N N å j=1 E " T å s=1 T å t=1 ¶F t (w 0 it p 0 ;v it2 ) ¶p 0 ¶F t (w 0 it p 0 ;v it2 ) ¶v 0 it2 g(D it ;D js )jD js # y js + o p (N 1=2 ) = 1 N N å j=1 Y j (p 0 ;h 0 )+ o p (N 1=2 ): (C.43) By change of variables, we can obtain p NfM N (p 0 ;h 0 )+G 2 (p 0 ;h 0 )[ ˆ h h 0 ]g = p N " 1 N N å i=1 m i (p 0 ;h 0 )+ 1 N N å i=1 Y i (p 0 ;h 0 )+ o p (N 1=2 ) # = 1 p N N å i=1 f i (p 0 ;h 0 )+ o p (1): (C.44) 138 Following Chen et al. (2003) and Rothe (2009), and applying a standard central limit theorem, we conclude that p NfM N (p 0 ;h 0 )+G 2 (p 0 ;h 0 )[ ˆ h h 0 ]g! d N (0;V); (C.45) as required, where V= E(f i (p 0 ;h 0 )f i (p 0 ;h 0 ) 0 ). Now we prove Theorem 3.4.2 as follows. Proof of Theorem 3.4.2. Based on Chen et al. (2003), by applying stochastic equicontinuity in Lemma C.8 and Taylor expansion of the moment function M N ( ˆ p SLS ; ˆ h) around(p 0 ;h 0 ), we obtain p N( ˆ p SLS p 0 )=G 1 1 (p 0 ;h 0 ) p N(M N (p 0 ;h 0 )+G 2 (p 0 ;h 0 )[ ˆ h h 0 ])+ o p (1): (C.46) By Lemma C.4 and C.8, we conclude that p N( ˆ p SLS p 0 )! d N (0;W 1 VW 1 ); (C.47) as desired. 139
Abstract (if available)
Abstract
Panel data models have been increasingly popular due to its flexibility to describe economics phenomena and the increasing availability of the source of big data. This dissertation contributes to the econometric analysis of panel data in mainly nonlinear model and latent factor model. ❧ The first chapter is concerned with unobserved heterogeneity between regressors of a panel data threshold model. Previous studies unnecessarily assume coefficients of arbitrary regressors change across the same thresholds which may cause misspecification of model and break down inference and prediction. In this chapter we allow coefficient of each regressor to be changed across specific thresholds such that regressors sharing the same thresholds are defined as one dimension while across dimensions thresholds can be different. We propose a new estimator for this generalized model by exploiting the characteristics of modeling threshold. The estimator is shown to be asymptotically valid regardless of the numbers of both dimensions and thresholds. It is computationally efficient especially when there is sparse interaction, i.e. a threshold of one dimension is too close to some threshold of another dimension. Small sample properties of the estimator are investigated by Monte Carlo simulations and shown to be appealing under many different settings. Moreover, the estimator is applied to two empirical finance studies and thus verifies the importance of discussing dimension heterogeneity in threshold models. ❧ In the second chapter (co-authored with Cheng Hsiao and Qiankun Zhou), we consider a computationally simple orthogonal projection method to implement the Bai and Ng (2002) information criterion to select the factor dimension for panel interactive effects models that bypasses issues arising from the joint estimation of the slope coefficients and factor structure. Our simulations show that it performs well in cases the method can be implemented. ❧ The third chapter (co-authored with Anastasia Semykina, Fan Yang and Qiankun Zhou) considers a semiparametric least squares estimation of binary response panel data models with endogenous regressors. The estimator relies on the correlated random effects model and control function approach to address the endogeneity due to the presence of the unobserved time-invariant effect and nonzero correlation of the idiosyncratic error with one or more explanatory variables. We derive the asymptotic properties of the proposed estimator and use Monte Carlo simulations to show that it performs well in finite samples. As an illustration, the considered method is used for estimating the effect of non-wife income on labor force participation of married women.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Three essays on the statistical inference of dynamic panel models
PDF
Three essays on econometrics
PDF
Essays on estimation and inference for heterogeneous panel data models with large n and short T
PDF
Essays on the econometric analysis of cross-sectional dependence
PDF
Essays in panel data analysis
PDF
Essays on high-dimensional econometric models
PDF
Three essays on linear and non-linear econometric dependencies
PDF
Essays on factor in high-dimensional regression settings
PDF
Essays on beliefs, networks and spatial modeling
PDF
Feature selection in high-dimensional modeling with thresholded regression
PDF
Large N, T asymptotic analysis of panel data models with incidental parameters
PDF
Panel data forecasting and application to epidemic disease
PDF
Essay on monetary policy, macroprudential policy, and financial integration
PDF
Three essays on the identification and estimation of structural economic models
PDF
Latent unknown clustering with integrated data (LUCID)
PDF
Essays on nonparametric and finite-sample econometrics
PDF
Genome-wide studies reveal the function and evolution of DNA shape
PDF
Two essays on financial econometrics
PDF
Essays on health economics
PDF
Applications of Markov‐switching models in economics
Asset Metadata
Creator
Xie, Yimeng
(author)
Core Title
Essays on econometrics analysis of panel data models
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Economics
Publication Date
04/07/2021
Defense Date
03/18/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
binary model,factor model,OAI-PMH Harvest,panel data,threshold
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Hsiao, Cheng (
committee chair
), Joslin, Scott (
committee member
), Pesaran, Hashem (
committee member
), Zhou, Qiankun (
committee member
)
Creator Email
yimengxi@usc.edu,yimengxie69@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-439144
Unique identifier
UC11669058
Identifier
etd-XieYimeng-9413.pdf (filename),usctheses-c89-439144 (legacy record id)
Legacy Identifier
etd-XieYimeng-9413.pdf
Dmrecord
439144
Document Type
Dissertation
Rights
Xie, Yimeng
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
binary model
factor model
panel data
threshold