Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Essays on the econometric analysis of cross-sectional dependence
(USC Thesis Other)
Essays on the econometric analysis of cross-sectional dependence
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ESSAYS ON THE ECONOMETRIC ANALYSIS OF CROSS-SECTIONAL DEPENDENCE by Fan Yang A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ECONOMICS) May 2018 Copyright 2018 Fan Yang Acknowledgments I would first like to express my deepest gratitude to my advisor, M. Hashem Pesaran, for his immense knowledge, invaluable guidance, and rigorous training. His strong intellectual curiosity, indefatigable pursuit of excellence, and enormous enthusiasm for econometric research are contagious. He has been and will always be my best role model for an outstanding scholar, mentor, and teacher. In addition, I am very grateful to Cheng Hsiao, Wenguang Sun, Hyungsik Roger Moon, Yu-Wei Hsieh and Gareth James for serving on my dissertation/qualifying committee. Their insightful comments and suggestions helped me improve my research from various perspectives. This dissertation has also benefited from helpful discussions with many fellow doctoral students at USC as well as conference and seminar participants. I am particularly thankful to Joe David, whom I had the great pleasure to work for and learned from as a research assistant at the beginning of my Ph.D. study. My thanks also go to Young Miller, Morgan Ponder, and Fatima Perez for providing wonderful adminis- trative assistance. Financial support from the university is greatly appreciated. Last but not least, I am truly grateful to my family and friends for their constant understanding, support and encouragement. ii Table of Contents Acknowledgments ii List of Tables v List of Figures vii Abstract viii 1 Common Factors and Spatial Dependence: An Application to US House Prices 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 The model and assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.4.1 2SLS estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.4.2 Best 2SLS estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.4.3 GMM estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.5 Monte Carlo experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 1.5.1 Identification experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 1.5.2 Estimation experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 1.6 An empirical application to US house prices . . . . . . . . . . . . . . . . . . . . . . . . . . 39 1.6.1 The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 1.6.2 Different spatial weights matrix specifications . . . . . . . . . . . . . . . . . . . . . 45 1.7 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2 Econometric Analysis of Production Networks with Dominant Units 50 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.2 Production network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.3 Price network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 2.4 Degrees of dominance of units in a network and network pervasiveness . . . . . . . . . . . 59 2.5 Price networks with one dominant unit and aggregate fluctuations . . . . . . . . . . . . . . 62 2.6 δ max andβ measures of network pervasiveness . . . . . . . . . . . . . . . . . . . . . . . . 69 2.7 Estimation and inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 2.7.1 Power law estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 2.7.2 Extremum estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 2.8 Monte Carlo experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 2.9 Dominant units in US production networks . . . . . . . . . . . . . . . . . . . . . . . . . . 86 2.10 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 iii 3 Estimation and Inference in Spatial Models with Dominant Units 95 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 3.2 The model and assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 3.3 GMM estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 3.4 BMM estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 3.5 Monte Carlo experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 3.6 An empirical application to US price networks . . . . . . . . . . . . . . . . . . . . . . . . . 110 3.7 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 References 128 A Appendix to Chapter 1 133 A.1 Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 A.2 Proofs of main theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 A.3 Data appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 B Appendix to Chapter 2 145 B.1 Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 B.2 Multiple dominant units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 B.3 Consistency of ˆ δ max . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 B.4 Data appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 C Appendix to Chapter 3 151 C.1 Lemmas and supplementary theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 C.2 Proofs of main theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 iv List of Tables 1.1 Small sample properties of the maximum likelihood estimator of the spatial autoregressive coefficient,ρ, for the identification experiments under different values ofα . . . . . . . . . 32 1.2a Small sample properties of estimators for the spatial parameterρ (ρ = 0.4, i.i.d. errors) . . . 33 1.2b Small sample properties of estimators for the slope parameterβ 1 (β 1 = 1, i.i.d. errors) . . . 34 1.3a Small sample properties of estimators for the spatial parameterρ (ρ = 0.4, independent and heteroskedastic errors) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 1.3b Small sample properties of estimators for the slope parameterβ 1 (β 1 = 1, independent and heteroskedastic errors) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 1.4a Small sample properties of estimators for the spatial parameterρ (ρ = 0.4, serially correlated and heteroskedastic errors) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 1.4b Small sample properties of estimators for the slope parameterβ 1 (β 1 = 1, serially correlated and heteroskedastic errors) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 1.5 Efficient GMM estimation results of model (1.60) . . . . . . . . . . . . . . . . . . . . . . . 42 1.6 Average direct and indirect effects of population and income growth on house price changes 45 1.7 Efficient GMM estimation results of model (1.60) using different spatial weights matrices . 47 2.1 Bias, RMSE, size and power of the extremum estimator for the dominant unit or units under Exponent DGP for Experiments A.1 to A.3 . . . . . . . . . . . . . . . . . . . . . . . . . . 83 2.2 Frequencies with which the dominant unit or units are jointly selected, under Exponent DGP for Experiments A.1 to A.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 2.3 Estimates of the shape parameter, β, of the power law and inverse of the exponent, δ max , under Pareto DGP for Experiment B.1 (β = 1) . . . . . . . . . . . . . . . . . . . . . . . . . 85 v 2.4 Estimates of the shape parameter, β, of the power law and inverse of the exponent, δ max , under Pareto DGP for Experiment B.2 (β = 1.3) . . . . . . . . . . . . . . . . . . . . . . . 86 2.5 Estimates of the shape parameter, β, of the power law and inverse of the exponent, δ max , under Exponent DGP for Experiment A.1 (β = 1) . . . . . . . . . . . . . . . . . . . . . . . 87 2.6 Estimates of the shape parameter, β, of the power law and inverse of the exponent, δ max , under Exponent DGP for Experiment A.1 (β = 1.3) . . . . . . . . . . . . . . . . . . . . . . 88 2.7 Yearly estimates of the degree of dominance, δ max , and inverse of the shape parameter of power law, β, based on the first-order interconnections, using US input-ouput tables com- piled by Acemoglu et al. (2012) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 2.8 Yearly estimates of the degree of dominance, δ max , and inverse of the shape parameter of power law,β, based on the second-order interconnections, using US input-ouput tables com- piled by Acemoglu et al. (2012) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 2.9 Yearly estimates of the degree of dominance,δ, for the top five pervasive sectors, using US input-ouput tables (our data) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 2.10 Identities of the top five pervasive sectors based on the yearly estimates ofδ . . . . . . . . . 92 2.11 Pooled panel estimates of the degree of dominance,δ, for the top five pervasive sectors, using US input-output tables for the two sub-periods 1972 -1992 and 1997-2007 . . . . . . . . . . 94 3.1 Estimates of the degree of dominance,δ, for the top five pervasive sectors using US input- output tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 3.2 Estimation results of the cross-section model using the 2002 input-output table . . . . . . . 114 3.3 Estimation results of the cross-section model using the 2007 input-output table . . . . . . . 115 3.4 Small sample properties of the GMM and BMM estimators of ρ for the experiments with Gaussian errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 3.5 Small sample properties of the GMM and BMM estimators ofβ for the experiments with Gaussian errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 3.6 Small sample properties of the GMM and BMM estimators of ρ for the experiments with non-Gaussian errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 3.7 Small sample properties of the GMM and BMM estimators ofβ for the experiments with non-Gaussian errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 vi List of Figures 3.1 Empirical power functions of testingρ 0 = 0.5 for different values ofδ, in the case of Gaus- sian errors andn = 100 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 3.2 Empirical power functions of testing ρ 0 = 0.75 for different values of δ, in the case of Gaussian errors andn = 100 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 3.3 Empirical power functions of testingρ 0 = 0.5 for different values ofδ, in the case of non- Gaussian errors andn = 100 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 3.4 Empirical power functions of testing ρ 0 = 0.75 for different values of δ, in the case of non-Gaussian errors andn = 100 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 A.1 The eight BEA Regions of the United States . . . . . . . . . . . . . . . . . . . . . . . . . . 142 A.2 Histogram of inter-MSA migration distance, based on the migration flow matrix W m . . . . 143 A.3 Intensity plots of the spatial weights matrices . . . . . . . . . . . . . . . . . . . . . . . . . 144 vii Abstract This dissertation contributes to the econometric analysis of cross-sectional dependence in the framework of factor, spatial and network analysis. Chapter 1 considers panel data models with cross-sectional dependence arising from both spatial auto- correlation and unobserved common factors. It derives identification conditions and proposes estimation methods that employ cross-sectional averages as factor proxies, including the 2SLS, Best 2SLS, and GMM estimations. The proposed estimators are robust to unknown heteroskedasticity and serial correlation in the errors, unrequired to estimate the number of unknown factors, and computationally tractable. I establish the asymptotic distributions of these estimators and compare their consistency and efficiency properties. An empirical application finds strong evidence of spatial dependence of real house price changes across 377 Metropolitan Statistical Areas in the US from 1975Q1 to 2014Q4. The results also reveal that population and income growth have significantly positive direct and spillover effects on house price changes. Chapter 2 (co-authored with M. Hashem Pesaran) considers production and price networks with unob- served common factors, and derives an exact expression for the rate at which aggregate fluctuations vary with the dimension of the network. It introduces the notions of strongly and weakly dominant and non-dominant units, and shows that at most a finite number of units in the network can be strongly dominant. The perva- siveness of a network is measured by the degree of dominance of the most pervasive unit in the network, and is shown to be equivalent to the inverse of the shape parameter of the power law fitted to the network out- degrees. New cross-section and panel extremum estimators for the degree of dominance of individual units in the network are proposed and their asymptotic properties investigated. An application to US input-output tables spanning the period 1972 to 2007 suggests that no sector in the US economy is strongly dominant. The most dominant sector turns out to be the wholesale trade with an estimated degree of dominance ranging from 0.72 to 0.82. viii Estimation and inference in the spatial econometrics literature are carried out assuming that the matrix of spatial or network connections have uniformly bounded absolute column sums in the number of cross- section units, n. In Chapter 3, M. Hashem Pesaran and I consider spatial models where this restriction is relaxed. We begin by extending the GMM estimator introduced by Lee (2007) and deriving the asymptotic properties. We then propose a Bias-Corrected Method of Moments (BMM) estimator that avoids the problem of weak instruments by self-instrumenting the spatially lagged dependent variable and correcting the bias of the moment conditions. Both estimators are consistent and asymptotically normal, depending on the rate at which the maximum column sum of the weights matrix rises withn. The estimation methods are applied to examine the inflation spillovers across more than 300 industries in the US from 1997 to 2014. ix Chapter 1 Common Factors and Spatial Dependence: An Application to US House Prices 1.1 Introduction The past decade has seen a growing attention to panel data models with cross-sectional dependence, which refers to the interaction between cross-section units such as households, firms, regions, and countries. Researchers have become increasingly aware that ignoring cross-sectional dependence in panel data analysis could lead to inconsistent estimates and misleading inferences. The interdependence among individual units is prevalent in all kinds of economic activities. It could arise from common factors that influence a large number of economic agents, such as technological change and oil price fluctuations. It could also origi- nate from certain explicit correlation structures formed by spatial arrangements, production networks, and social interactions. Accordingly, two main modeling approaches have been proposed to characterize this phenomenon: the common factor models and the spatial econometric models. In the former, cross-sectional dependence is captured by a number of observable or latent factors (or common shocks); in the latter, it is represented by spatial weights matrices typically based on physical, economic, or social distance. Although describing the same phenomenon, these two strands of literature have been developing separately, with dif- ferent sets of assumptions and emphases. Therefore, efforts are called for to investigate the connections and differences between these two modeling approaches. This paper aims to bring together factor and spatial models for a unified characterization of cross- sectional dependence. The main contributions of the paper are twofold. First, it considers a joint modeling of the two sources of cross-sectional dependence in panel data models: common factors and spatial interac- tions. It establishes identification conditions and proposes estimation methods for the joint model. Second, the paper provides a detailed empirical application to house price changes in the US and finds strong evi- dence of spatial effects. The empirical findings are robust and could carry important policy and business implications. 1 Specifically, our model specifications allow the common effects to be unobservable and the spatial depen- dence to be an inherent property of the dependent variable. We begin by deriving the identification conditions for the joint model. In particular, a simple necessary condition is provided, which is both verifiable and of practical relevance, especially for large sparse networks. We then propose a number of estimators for the model and establish their asymptotic distributions. We are faced with two major challenges in devising an estimation strategy. One is related to the unobserved factors, and the other is associated with the endogenous spatial lags of the dependent variable. The estimators developed in this paper approximate the unobserved factors by cross-sectional averages of the dependent and independent variables, and then utilize instrumental variables and other moment conditions to resolve the endogeneity problem. These estimators do not require estimating the number of factors, which is well known to be a challenging task. Moreover, they are robust to both heteroskedasticity and serial correlations in the disturbances, and they are computationally attrac- tive. We show that the proposed estimators, including the two-stage least squares (2SLS), Best 2SLS, and generalized method of moments (GMM) estimators, are consistent as long as the cross-section dimension (N) is large, irrespective of the size of the time series dimension (T ). Furthermore, they are asymptotically normally distributed without nuisance parameters, provided thatT is relatively smaller thanN, as bothN andT tend jointly towards infinity. The Monte Carlo simulation results support the identification conditions. A series of detailed experiments also demonstrate the satisfactory finite-sample properties of the proposed estimators. The proposed estimation methods are applied in order to analyze changes in real house price the US across 377 Metropolitan Statistical Areas (MSAs) from 1975Q1 to 2014Q4. The study demonstrates the importance of the effective removal of common effects in evaluating the strength of spatial connections. It documents significant spatial dependence in house price changes. It also shows that population and income growth significantly increase house price growth through both direct effect and spillover effect. These find- ings are fairly robust to various specifications of the spatial weights, including weights based on distance, on migration flows, and on pairwise correlations of the de-factored observations. Related Literature The theoretical analysis in this paper belongs to a recent and growing literature on panel data models with cross-sectional dependence (CSD). Chudik et al. (2011) introduce the notions of weak and strong CSD. Applying these concepts, a spatial model can be shown to be a form of weak CSD, whereas the standard factor model represents a form of strong CSD (Pesaran and Tosetti, 2011; Bailey, Holly, and Pesaran, 2016a). Bailey, Kapetanios, and Pesaran (2016b) propose measuring the degree of CSD 2 by an exponent of dependence, which captures how fast the variance of the cross-sectional average declines with the cross-section dimension,N. Using this exponent of cross-sectional dependence, Pesaran (2015a) further discusses testing for weak CSD in large panels. 1 The characterization of CSD is divided into two areas of writing. On the one hand, there is a large body of literature on common factor models. Recent contributions on large panel data models with common factors include Pesaran (2006), Bai (2009), Bai and Li (2012), and Moon and Weidner (2015), just to name a few. Our study is particularly related to an influential paper by Pesaran (2006), who develops Common Correlated Effects (CCE) estimators for panel data models with multifactor error structure. The basic idea behind the CCE estimators is to filter the unobserved factors with cross-sectional averages. In follow-up studies, Kapetanios et al. (2011) show that the CCE estimators are still applicable if the unobserved factors follow unit root processes; Chudik and Pesaran (2015a) extend the estimation approach to models with lagged dependent variables and weakly exogenous regressors. On the other hand, the present paper also draws from the spatial econometrics literature. 2 Two main classes of methods have been developed to estimate spatial models: the maximum likelihood (ML) tech- niques (Anselin, 1988; Lee, 2004; Yu et al., 2008; Lee and Yu, 2010a; Aquaro et al., 2015), and the instru- mental variables (IV)/GMM approaches (Kelejian and Prucha, 1999, 2010; Lee, 2007; Lin and Lee, 2010; Lee and Yu, 2014). The estimation strategy in the current article is related to and builds on the GMM frame- work. Regarding the identification conditions of spatial models, a systematic discussion is provided in a recent study by Lee and Yu (2016) under the assumption that the sample size is finite. Aquaro et al. (2015) also conduct a detailed investigation of the identifiability of spatial models with heterogeneous coefficients. The present paper sheds new light on the identification of spatial models with factors, and it shows that the conditions in Lee and Yu (2016) cannot be applied whenN tends to infinity. The current paper is most closely related to a number of more recent studies that consider both common factors and spatial effects. Pesaran and Tosetti (2011) consider models where the idiosyncratic errors are spatially correlated and subject to common shocks. Bai and Li (2014) specify the spatial autocorrelation on the dependent variable while assuming the presence of unobserved common shocks. They advocate a pseudo-ML method that simultaneously estimates a large group of parameters, including the heterogeneous 1 For overviews of the literature on panel data models with error cross-sectional dependence, see Sarafidis and Wansbeek (2012) and Chudik and Pesaran (2015b). 2 Comprehensive reviews of spatial econometrics can be found in books including Anselin (1988) and Elhorst (2014). Also see the survey article by Lee and Yu (2010b) for the latest developments in spatial panel data models. 3 factor loadings and heterogeneous variances of the disturbances. A similar approach is considered by Bai and Li (2015) for dynamic models. Other studies within the ML framework include W. Shi and Lee (2017), and Lu (2017). However, besides computational complexities, the ML methods are not robust to serial correlation in the errors, and they require knowing or estimating the number of latent factors. 3 Instead of estimating the two effects jointly, Bailey, Holly, and Pesaran (2016a) propose a two-stage approach that extracts the common factors in the first stage and then estimates the spatial connections in the second stage. Nonetheless, a formal distribution theory that takes into account the first-stage sampling errors is not yet available. The empirical investigation in the present paper is concerned with the spatial dependence in house prices. The phenomenon that house price variations tend to exhibit spatial correlations has received increasing atten- tion from economists over the past two decades, although little consensus has been reached regarding the spatial transmission mechanism. Possible explanations include migration, equity transfer, spatial arbitrage, and spatial patterns in the determinants of house prices (Meen, 1999). Researchers have obtained evidence on the spatial spillovers of house prices in the US at different levels of aggregation using various meth- ods. 4 For example, Pollakowski and Ray (1997) examine nine US Census divisions as well as the New York metropolitan area using a vector autoregressive (V AR) model. Brady (2011) focuses on the diffusion of house prices across a panel of California counties by means of impulse response functions. Holly et al. (2010) analyze US house prices the State level using a spatial error model, where the importance of spatial effects is evaluated by fitting a spatial model to the residuals from a CCE estimation procedure. Brady (2014) also consider State level house prices but utilize spatial impulse response functions from a single-equation spatial autoregressive model. The current paper focuses on the extent to which house prices are interdepen- dent among near 400 Metropolitan Statistical Areas (MSAs) in the US. Little research has investigated this issue at the MSA level. One exception is the study undertaken by Cohen et al. (2016), who incorporate geog- raphy into an autoregressive model via cross-lag effects and do not employ a spatial econometric approach. 5 Our empirical analysis is closely related to the inquiry by Bailey, Holly, and Pesaran (2016a), who examine 3 Much is written on estimating the number of unobservable factors. See, for example, Bai and Ng (2002) (2007), Kapetanios (2010), and Stock and Watson (2011). 4 International evidence on the spatial interconnections of house prices are provided by Luo et al. (2007) for Australia, S. Shi et al. (2009) for New Zealand, and Holly et al. (2011) for the UK, just to name a few. 5 Cohen et al. (2016) also use a house price index different from ours. Specifically, the authors adopt the consolidated house price index by the Office of Federal Housing Enterprise Oversight (OFHEO) that covers 363 MSAs over the period of 1996-2013. 4 MSA level house price changes with a two-stage procedure. In comparison, besides using more recent data on updated MSA delineations, the present paper adopts a different estimation approach that jointly considers common factors and spatial dependence. It also explores the direct and indirect effects of possible determi- nant variables on house price growth. Another contribution of this paper involves the specification of spatial weights matrix based on migration flows. The rest of this chapter is organized as follows. Section 1.2 specifies the model and describes the idea of approximating the unobserved factors with cross-sectional averages. Section 1.3 investigates the identifi- cation conditions. Section 1.4 establishes the asymptotic distributions of the 2SLS, Best 2SLS, and GMM estimators. Section 1.5 reports the Monte Carlo experiments for the identification and estimation experi- ments. Section 1.6 presents an empirical application to US house prices, and finally, Section 1.7 concludes. The Appendix provides proofs of the main theorems and technical details. It also contains a full description of data sources, variable transformations, and summary statistics. Notations For anN×N real matrix A = (a ij ),||A|| = q tr(AA 0 ),||A|| ∞ = max 1≤i≤N P N j=1 |a ij,N | and||A|| 1 = max 1≤j≤N P N i=1 |a ij | denote the Frobenius norm, the maximum row sum norm and maximum column sum norm of matrix A, respectively. We say that the row (column) sums of a (sequence of) matrix A are uniformly bounded in absolute value, or A has bounded row (column) norm for short, if there exists a constantK, such that||A|| ∞ <K <∞ (||A|| 1 <K <∞) for allN. vec(A) is the column vector obtained by stacking the columns of A. Diag (A) = Diag (a 11 ,a 22 ,...,a NN ) represents anN×N diagonal matrix formed with the diagonal entries of A, whereasdiag(A) = (a 11 ,a 22 ,...,a NN ) 0 denotes anN× 1 vector.λ max (A) and λ min (A) are the largest and smallest eigenvalues of matrix A, respectively.tr(A) denotes the trace of matrix A, and det(A) denotes the determinant of A. stands for the Hadamard product, and⊗ is the Kronecker product. (N,T ) j →∞ denotes joint convergence of N and T . Let{x N } ∞ N=1 be any real sequence and {y N } ∞ N=1 be a sequence of positive real numbers; we adopt the Landau’s symbols and writex N =O (y N ) if there exists a positive finite constantK such that|x N |≤Ky N for allN, andx N =o (y N ) ifx N /y N → 0 asN→∞.O p (.) ando p (.) are the equivalent stochastic orders in probability.bxc denotes the integral part of a real numberx.K is used generically for a finite positive constant. 5 1.2 The model and assumptions Consider the following spatial autoregressive (SAR) model with common factors, y it =ρy ∗ it +β 0 x it +γ 0 i f t +e it , x it = A 0 i f t + v it , (1.1) for i = 1, 2,...,N, and t = 1, 2,...,T , where y it is the dependent variable of unit i at time t, and y ∗ it = P N j=1 w ij y jt , which represents the endogenous interaction effects (or spatial lag effects) among the dependent variable. The matrix W = (w ij ) N×N is a specified spatial weights matrix of known constants. It characterizes neighborhood relations, which are typically based on a geographical arrangement or on socio- economic connections of the cross-section units. The parameter ρ captures the strength of spatial depen- dence across observations on the dependent variable and is known as the spatial autoregressive coefficient. Thek× 1 vector x it = (x it,1 ,x it,2 ,...,x it,k ) 0 contains individual-specific explanatory variables, andβ is the corresponding vector of coefficients, where k is assumed to be a known fixed number. The variables e it and v it = (v it,1 ,v it,2 ,...,v it,k ) 0 are the idiosyncratic disturbances associated withy it and x it processes, respectively. Them× 1 vector f t = (f 1t ,f 2t ,...,f mt ) 0 represents unobserved common factors, wherem is fixed but possibly unknown. The factor loadingsγ i and A i capture heterogeneous impacts from the com- mon effects on cross-section units. 6 Overall, the termρy ∗ it captures the spatial effect, whileγ 0 i f t captures the common factor effect. The latter is also referred to in the literature as an interactive effect, since it can be viewed as a generalization of the traditional additive fixed effect. The parameters of interest throughout this paper areδ = ρ,β 0 0 . In model (1.1), the explanatory variables are specified so that they can be influenced by the same factors that affect the dependent variable. Such a specification is reasonable in practice and has been considered in studies including Pesaran (2006) and Bai and Li (2014). Also note that this model can be readily extended without additional complication to include observable factors such as intercepts, seasonal dummies, and deterministic trends; 7 here we focus on unobservable factors to facilitate exposition. 6 The heterogeneity in factor loadings may arise, for example, from differences in endowment, technical rigidities, or innate ability. 7 See Remark 2 of Pesaran (2006). 6 To cope with the unknown factors in model (1.1), we replace them with cross-sectional averages of the dependent and individual-specific independent variables, following the idea pioneered by Pesaran (2006). To see why this approximation works for the SAR model, we begin by rewriting model (1.1) as follows: y it −ρ P N j=1 w ij y jt −β 0 x it x it = Φ 0 i f t + u it , (1.2) where Φ i = (γ i , A i ), u it = (e it , v 0 it ) 0 . Then, stacking (1.2) by individual unit for each time period, the model can be expressed more compactly as Δ (ρ,β) z .t = Φf t + u .t , fort = 1, 2,...,T, (1.3) where z .t = (z 0 1t , z 0 2t ,..., z 0 Nt ) 0 is anN (k + 1)-dimensional vector of observations, with z it = (y it , x 0 it ) 0 , Φ = (Φ 1 , Φ 2 ,..., Φ N ) 0 , u .t = (u 0 1t , u 0 2t ,..., u 0 Nt ) 0 , and Δ = Δ (ρ,β) is a square matrix, of which the (i,j) th subblock of size (k + 1), fori,j = 1, 2,...,N, is given by Δ ii = 1 −β 0 0 I k , ifi =j; and Δ ij = −ρw ij 0 0 0 , ifi6=j. The way of stacking the equations in (1.2) follows that in Bai and Li (2014), who show that Δ −1 = Δ −1 (ρ,β) exists and its (i,j) th subblock is given by 8 Δ −1 ii = ˇ s ii ˇ s ii β 0 0 I k , ifi =j; and Δ −1 ij = ˇ s ij ˇ s ij β 0 0 0 , ifi6=j, (1.4) where ˇ s ij denotes the (i,j) th element of S −1 (ρ), and S (ρ) = I N −ρW. The inverse of S (ρ) exists under certain regularity conditions, which will be discussed later. It then follows from (1.4) that (1.3) is equivalent to z .t = Δ −1 (Φf t + u .t ) = C 0 f t + .t , (1.5) where C = Δ −1 Φ 0 and .t = Δ −1 u .t = ( 0 1t , 0 2t ,..., 0 Nt ) 0 are the transformed new error terms. 8 See Lemma A.1 of Bai and Li (2014). 7 Now letting Θ a = N −1 τ 0 N ⊗ I k+1 , whereτ N is an N× 1 vector of ones, it is easily verified that ¯ z .t = Θ a z .t = (¯ y .t , ¯ x 0 .t ) 0 , where ¯ y .t =T −1 P N i=1 y it and ¯ x .t =T −1 P N i=1 x it . As shown,Θ a is a matrix that operates on anyN (k + 1)-dimensional vector that is stacked in the same order as z .t and produces ank× 1 vector of cross-sectional averages. Similarly, we have ¯ .t = Θ a .t = T −1 P N i=1 it . Premultiplying both sides of (1.5) with Θ a yields ¯ z .t = ¯ C 0 f t + ¯ .t , (1.6) where ¯ C = Θ a C 0 0 =N −1 N X i=1 N X j=1 ˇ s ij γ j + A j β , N X j=1 A j , (1.7) Assuming that ¯ C has full row rank, namely, Rank ¯ C = m≤ k + 1, for allN includingN →∞, we obtain f t = ¯ C ¯ C 0 −1 ¯ C (¯ z .t − ¯ .t ). (1.8) The task now is to show that ¯ .t diminishes for sufficiently large N. We establish in Lemma A.2 that ¯ .t converges to zero in quadratic mean asN→∞, for anyt. It follows from (1.8) that f t can be approximated by the cross-sectional averages ¯ z .t with an error of orderO p (1/ √ N). More formally, we have f t p → C 0 C 0 0 −1 C 0 ¯ z .t , asN→∞, (1.9) where C 0 = lim N→∞ ¯ C = [E (γ i ),E (A i )] ¯ ˇ s 0 ¯ ˇ sβ I k , ¯ ˇ s =N −1 τ 0 N S −1 (ρ)τ N =N −1 N X i=1 N X j=1 ˇ s ij . It is clear from (1.9) that ¯ z .t serve fairly well as factor proxies as long asN is large. 9 Note that the use of equal weights in constructing the cross-sectional averages is nonessential to the asymptotic analysis, which can be readily carried through with other weighting schemes satisfying the granularity conditions. 10 Thus, the current paper will focus on simple cross-sectional averages for ease of exposition. 9 In practice, it may also worth including ¯ y ∗ t as factor proxies if ¯ y ∗ t is not highly correlated with ¯ yt, where ¯ y ∗ t =N −1 P N i=1 y ∗ t . 10 See Assumption 5 in Pesaran (2006). 8 To facilitate formal analysis, it is convenient to define the infeasible de-factoring matrices (or residual maker) as follows: M f = I T − F F 0 F − F 0 , M b f = M f ⊗ I N , (1.10) where F = (f 1 , f 2 ,..., f T ) 0 is a T×m matrix of unobserved common factors, and (F 0 F) − denotes the generalized inverse of F 0 F. The observable counterparts of (1.10) that utilize cross-sectional averages are given by ¯ M = I T − ¯ Z ¯ Z 0 ¯ Z − ¯ Z 0 , M b = ¯ M⊗ I N , (1.11) where ¯ Z = (¯ z .1 , ¯ z .2 ,..., ¯ z .T ) 0 . Note that M b f and M b are de-factoring matrices of NT dimension that operate on the observations stacked as successive cross sections, namely, Y = (y 0 .1 , y 0 .2 ,..., y 0 .T ) 0 and X = (X 0 .1 , X 0 .2 ,..., X 0 .T ) 0 , where y .t = (y 1t ,y 2t ,...,y NT ) 0 and X .t = (x 1t , x 2t ,..., x Nt ) 0 , fort = 1, 2,...,T . Throughout this paper,K is used generically to denote a finite positive constant. In order to formally analyze model (1.1), we will make the following assumptions: Assumption 1.1. The unobserved common factors f t are covariance stationary with absolutely summable autocovariances, and they are distributed independently ofe it 0 and v it 0 for alli,t,t 0 . Assumption 1.2. The idiosyncratic errors, u it = (e it , v 0 it ) 0 , are such that (i) For eachi,e it and v it follow linear stationary processes with absolutely summable autocovariances: e it = P ∞ l=0 a il ζ i,t−l and v it = P ∞ l=0 Ξ il ς i,t−l , where (ζ it ,ς 0 it ) 0 ∼IID (0 k+1 , I k+1 ) with finite fourth- order moments. The errorse it and v jt 0 are distributed independently of each other, for alli,j,t,t 0 . In addition, Var (e it ) = P ∞ l=0 a 2 il = σ 2 i < K andVar (v it ) = P ∞ l=0 Ξ il Ξ 0 il = Σ v,i < K, where σ 2 i > 0 and Σ v,i is positive definite. (ii) The error terme it has absolutely summable cumulants up to the fourth order. Assumption 1.3. The factor loadings,γ i and A i , are independently and identically distributed across i, and independent ofe jt , v jt , and f t , for alli,j, andt. Bothγ i and A i have fixed means, which are given by γ and A, respectively, and finite variances. In particular, for alli,γ i =γ +η i ,η i ∼IID (0, Ω η ), where Ω η is a symmetric non-negative definite matrix,kγk<K,kAk<K, andkΩ η k<K. Assumption 1.4. The true parameter vector, δ 0 = ρ 0 ,β 0 0 0 , is in the interior of the parameter space, denoted byΔ sp , which is a compact subset of the (k + 1)-dimensional Euclidean space,R k+1 . 9 Assumption 1.5. The matrix ¯ C, given by (1.7), has full row rank for allN, includingN→∞. Assumption 1.6. The N×N nonstochastic spatial weights matrix, W = (w ij ), has bounded row and column sum norms, namely,||W|| ∞ <K and||W|| 1 <K, respectively, and |ρ|< max{1/||W|| 1 , 1/||W|| ∞ } for all values ofρ. In addition, the diagonal entries of W are zero, that is,w ii = 0, for alli = 1, 2,...,N. Assumption 1.7. TheN×q matrix of instrumental variables, Q .t , fort = 1, 2,...,T , is composed of a subset of the columns of X .t , WX .t , W 2 X .t ,... , and its column dimensionq is fixed for allN andt. The matrix Q = (Q 0 .1 , Q 0 .2 ,..., Q 0 .T ) 0 represents the IV matrix of dimensionNT×q. Assumption 1.8. (i) There exists N 0 and T 0 , such that for all N > N 0 and T > T 0 , the matrices (NT ) −1 Q 0 M b Q and (NT ) −1 Q 0 M b f Q exist and are nonsingular. (ii) The matrix p lim N,T→∞ (NT ) −1 Q 0 M b f L 0 is of full column rank, where L 0 = G b 0 Xβ, X , G b 0 = I T ⊗ G 0 , and G 0 = WS −1 (ρ 0 ). (iii) E|x it,p | 2+δ <K, for someδ> 0, and for alli = 1, 2,...,N,t = 1, 2,...,T , andp = 1, 2,...,k. Remark 1.1. An attractive feature of the model is that it allows for the presence of both heteroskedasticity and serial correlation in the disturbance processes, as stated in Assumption 1.2. 11 The asymptotic analysis in the current paper is conducted under this fairly general configuration, and the theoretical findings are corroborated by Monte Carlo evidence. Note that Assumption 1.2(ii) is only made for the limit theory of the GMM estimator. Under Assumption 1.2, we haveVar (u .t ) = Σ u =Diag (Σ u,1 , Σ u,2 ,..., Σ u,N ) and Var (u it ) = Σ u,i =Diag σ 2 i , Σ v,i , fori = 1, 2,...,N; both Σ u and Σ u,i are block-diagonal matrices. Remark 1.2. The assumptions on the factors and factor loadings (Assumptions 1.1 and 1.3) follow the specifications in Pesaran (2006). The compactness of the parameter space in Assumption 1.4 is a condition to facilitate the theoretical analysis of the GMM estimation. This condition is usually assumed when the objective function for an estimator is highly nonlinear. The rank condition in Assumption 1.5 is imposed for analytical convenience and can be relaxed following similar arguments as in Pesaran (2006). 12 11 This model can be further extended to accommodate spatial correlations in the error processes. 12 Also see Kapetanios et al. (2011) and Chudik and Pesaran (2015a) for discussions about the Common Correlated Effects (CCE) estimators in the rank deficiency case. 10 Remark 1.3. Assumption 1.6 ensures that S(ρ) is nonsingular for all possible values ofρ, where S(ρ) = I N −ρW. To see this, note that S(ρ) is invertible if|λ max (ρW)|< 1. Sinceλ max (ρW)<|ρ|||W|| 1 and λ max (ρW)<|ρ|||W|| ∞ , therefore S(ρ) is invertible if|ρ|< max{1/||W|| 1 , 1/||W|| ∞ }. Assumption 1.6 also implies that S −1 (ρ) is uniformly bounded in row and column sums in absolute value for all values ofρ, since ||S −1 || 1 =||I N +ρW +ρ 2 W 2 +...|| 1 ≤ 1 +|ρ|||W|| 1 +|ρ| 2 ||W|| 2 1 +... = 1 1−|ρ|||W|| 1 <K, and similarly, it can be shown that||S −1 || ∞ < K. The uniform boundedness assumption is standard in the spatial econometrics literature. It essentially imposes sparsity restrictions on W so that the degree of cross-sectional correlation is manageable. As we shall see, this assumption plays an important role in the asymptotic analysis. Also note that W need not to be row-standardized so that each row sums to unity, which is often performed in practice for ease of interpretation. If all the elements of W are non-negative, row-standardization implies thaty ∗ it is a weighted average of neighboring values. Lastly, the zero diagonal assumption for the W matrix is innocuous and only for notational convenience in discussing the GMM estimation. No unit has self-influence under this assumption, which is clearly satisfied if W represents geographical distance or social interactions. Remark 1.4. The spatially lagged dependent variable, y ∗ it , is in general correlated with the error term. The selection of the instrumental variables in Assumption 1.7 originates from Kelejian and Prucha (1998) for cross-sectional SAR models. This choice is motivated by the spatial power series expansion of the expectation of the spatial lag (see Kelejian and Prucha, 1998, p.104). Remark 1.5. Assumptions 1.8(i) and 1.8(ii) are the standard rank conditions for the 2SLS and GMM esti- mators analyzed below to be well defined asymptotically. The existence of higher-than-second moments in Assumption 1.8(iii) is required for the GMM estimation to apply a central limit theorem (CLT) for the linear and quadratic form, which is an extension of Theorem 1 in Kelejian and Prucha (2001). For the 2SLS estimations, the existence of the second moments would be sufficient. 11 1.3 Identification Before discussing how to estimate the joint model (1.1), it is important to make sure that the parameters are identified. Since we are only interested in estimatingδ = ρ,β 0 0 , we will derive the identification conditions ofδ assuming the factors are known. 13 It should be noted that whether the factors are observable will not affect the identification conditions. If there are unobserved factors, replacing them with certain proxies will only affect the consistency and efficiency properties of an estimator. Furthermore, as has been seen from (1.9), the unknown factors can be well approximated by cross-sectional averages for all values ofρ andβ under the given assumptions, with an approximation error of orderO p (1/ √ N). Hence, the following analysis on the identification problem is undertaken conditional on observable factors. We will begin by examining SAR models with factors but without exogenous explanatory variables, x it , and return to models with x it afterwards. Now let us consider the following model, y it =ρy ∗ it +γ 0 i f t +e it , i = 1, 2,...,N; t = 1, 2,...,T, (1.12) where f t is anm× 1 vector of observable factors, and the errorse it are assumed to be independently and normally distributed with zero means and constant variances for alli andt, i.e.,e it ∼IIDN(0,σ 2 ), where 0<σ 2 <K. Writing (1.12) in stacked form, we have y .t =ρy ∗ .t + Γf t + e .t , t = 1, 2,...,T, where y ∗ .t = Wy .t = (y ∗ 1t ,y ∗ 2t ,...,y ∗ Nt ) 0 , Γ = (γ 1 ,γ 2 ,...,γ N ) 0 is anN×m matrix of factor loadings, and e .t = (e 1t ,e 2t ,...,e Nt ) 0 . Defineγ = (γ 0 1 ,γ 0 2 ,...,γ 0 N ) 0 , and letϕ 0 = ρ 0 ,γ 0 0 ,σ 2 0 0 denote the true value ofϕ = ρ,γ 0 ,σ 2 0 . We adopt the most general identification framework based on the likelihood function proposed by Rothenberg (1971). The (quasi) log-likelihood function of (1.12) is given by l (ϕ) =− NT 2 ln(2π)− NT 2 lnσ 2 +Tln|S(ρ)|− 1 2σ 2 T X t=1 [S(ρ)y .t − Γf t ] 0 [S(ρ)y .t − Γf t ], 13 The factor loadings are identified up to a rotation if factors are unobserved. 12 and it follows that 1 NT E 0 l (ϕ) =− 1 2 ln(2π)− 1 2 lnσ 2 + 1 N ln|S(ρ)| − 1 2σ 2 ( ρ−ρ 0 , (γ−γ 0 ) 0 H f ρ 0 ,γ 0 0 ρ−ρ 0 , (γ−γ 0 ) 0 0 + σ 2 0 N tr h S −1 0 S(ρ)S 0 (ρ)S −10 0 i ) , 1 NT E 0 l (ϕ 0 ) =− 1 2 [ln(2π) + 1]− 1 2 lnσ 2 0 + 1 N ln|S 0 |, where H f ρ 0 ,γ 0 0 = (NT ) −1 E 0 T X t=1 J 0 0,t J 0,t , J 0,t = G 0 Γ 0 f t , F t , (1.13) G(ρ) = WS −1 (ρ), G 0 = G(ρ 0 ) = WS −1 0 , F t = I N ⊗ f 0 t , and for the discussion of identification, we use E 0 to emphasize that the expectation is calculated using the true values of the parameters. LettingQ NT (ψ) = (NT ) −1 E 0 [l(ϕ 0 )−l(ϕ)], whereψ = (d,ζ 0 ,ϑ) 0 ,d = ρ−ρ 0 ,ζ =γ−γ 0 , and ϑ = (σ 2 −σ 2 0 )/σ 2 < 1, we obtain Q NT (ψ) =− 1 2 [ln(1−ϑ) +ϑ]− 1 N ln|I N −dG 0 |− 1 N (1−ϑ)dtr (G 0 ) + 1 2 (1−ϑ)d 2 tr (G 0 G 0 0 ) N + 1 2 σ 2 0 (1−ϑ) d,ζ 0 H f (ρ 0 ,γ 0 0 ) d,ζ 0 0 . (1.14) Then, by a mean value expansion and noting that ∂Q NT (0)/∂ψ = 0, we have Q N (ψ) = (1/2)ψ 0 Λ f,NT ¯ ψ ψ, where Λ f,NT (ψ) = ∂ 2 Q NT (ψ)/∂ψψ 0 , ¯ ψ = ¯ d, ¯ ζ 0 ,ϑ 0 = ¯ ρ−ρ 0 , ¯ γ 0 −γ 0 0 , (¯ σ 2 −σ 2 0 )/¯ σ 2 0 , where ρ, ¯ γ, and ¯ σ 2 lie between 0 and ρ 0 , γ 0 , σ 2 0 , respectively. 14 It follows immediately that for allN (includingN→∞) and allT , the parametersψ 0 are locally identified if and only ifλ min [Λ f,NT (0)]> 0. We formally state the results in the following proposition. Proposition 1.1. Consider the model given by (1.12). For allN (includingN →∞) and allT , the true parameter valuesρ 0 ,γ 0 , andσ 2 0 are locally identified if and only if h g ≡ tr G 2 0 + G 0 G 0 0 N − 2 [tr (G 0 )] 2 N 2 > 0, (1.15) andT −1 E 0 (f t f 0 t ) is positive definite. 14 A detailed expression of Λ f,NT ψ can be found in Appendix B of Yang (2017). 13 Notice that model (1.12) reduces to a pure SAR model if there are no common factors; the identification condition would becomeh g > 0, for allN (includingN→∞). This condition is in line with the findings in a recent study by Aquaro et al. (2015), who investigate the identification of a spatial model with heteroge- neous spatial coefficients without factors. By replacing the heterogeneous coefficients in their identification condition with homogeneous ρ, one would arrive at the same inequality given by (1.15). To further our understanding of (1.15), we make the following four observations. First, it is worth pointing out that a necessary condition for (1.15) is that there exists anε> 0 such that N −1 tr G 0 G 0 0 >ε> 0, for allN, includingN→∞. (1.16) To see this, using Schur’s inequality,tr(G 2 0 )/N≤tr(G 0 G 0 0 )/N, we have tr G 2 0 + G 0 G 0 0 N − 2 [tr (G 0 )] 2 N 2 = ( tr (G 0 G 0 0 ) N − [tr (G 0 )] 2 N 2 ) + ( tr G 2 0 N − [tr (G 0 )] 2 N 2 ) ≤ 2 ( tr (G 0 G 0 0 ) N − [tr (G 0 )] 2 N 2 ) . Therefore, for (1.15) to hold it is necessary that tr (G 0 G 0 0 ) N > [tr (G 0 )] 2 N 2 . (1.17) However, by the Cauchy-Schwarz inequality, we have tr(G 0 G 0 0 )/N ≥ [tr(G 0 )] 2 /N 2 . To exclude the equality, (1.16) is needed becausetr(G 0 G 0 0 )/N = 0 impliestr(G 0 )/N = 0 for allN, includingN→∞. Also required for the strict inequality is that G 0 cannot be proportional to I N , namely, G 0 6= cI N for all c6= 0. Second, under Assumption 1.6, a necessary and sufficient condition for (1.16) is that there exists an ε> 0 such that N −1 tr W 0 W >ε> 0, for allN, includingN→∞. (1.18) To see why, we note thatλ min [S 0 (ρ)S(ρ)]> 0, which immediately follows from the non-singularity of S(ρ), and also λ max S 0 (ρ)S(ρ) ≤||S(ρ)|| 1 ||S(ρ)|| ∞ ≤ (1 +|ρ|||W|| 1 ) (1 +|ρ|||W|| ∞ )<K <∞. 14 Therefore, we haveλ max n [S 0 (ρ)S(ρ)] −1 o <K <∞ andλ min n [S 0 (ρ)S(ρ)] −1 o > 0. It then follows that 15 tr (G 0 G 0 0 ) N = tr h (S 0 0 S 0 ) −1 W 0 W i N ≤λ max h S 0 0 S 0 −1 i tr (W 0 W) N <K tr (W 0 W) N , which establishes necessity, and tr (G 0 G 0 0 ) N = tr h (S 0 0 S 0 ) −1 W 0 W i N ≥λ min h S 0 0 S 0 −1 i tr (W 0 W) N , which establishes sufficiency. As a simple necessary condition for identification, (1.18) does not depend on any unknown parameters and can be easily employed to check identifiability in practice. Third, (1.16) is both a necessary and a sufficient identification condition ifρ 0 = 0. This can be seen by replacing G 0 with W in (1.15) and by usingtr(G 0 ) = 0. Finally, it should be noted that the condition (1.18) requiresN −1 tr (W 0 W) to be strictly positive for N→∞. This is an important consideration because the distinction between strong and weak cross-sectional dependence relies onN approaching infinity (Chudik et al., 2011). Notice that model (1.12) can be seen as a special case of the spatial Durbin models if there are no common factors. Lee and Yu (2016) investigates the identification conditions of these models but restrict their attention to finite sample sizes. The authors conclude that the parameterρ 0 is identifiable if I N , W+W 0 , and W 0 W are linearly independent. However, it is possible that this condition is met whereas (1.18) is violated asN→∞. In such a case, our findings suggest thatρ 0 cannot be identified. An example is provided in Section 1.5.1 to demonstrate the necessity of (1.18) for identification. We now proceed to include exogenous regressors x it in (1.12) and consider the following model, y it =ρy ∗ it +β 0 x it +γ 0 i f t +e it . (1.19) In contrast with model (1.1), here we assume that x it are uncorrelated with f t for all i and t, and e it ∼ IIDN(0,σ 2 ). With a slight abuse of notation, we use the same letterϕ to denote the parameters 15 For real symmetric matrix A and real positive semidefinite matrix B of the same size, we haveλ min(A)tr(B)≤ tr(AB)≤ λ max(A)tr(B). 15 of this model,ϕ = ρ,β 0 ,γ 0 ,σ 2 0 , and their true values are denoted byϕ 0 = ρ 0 ,β 0 0 ,γ 0 0 ,σ 2 0 0 . By similar reasoning, we proclaim the following identification proposition. 16 Proposition 1.2. Consider the model given by (1.19), where x it are exogenous and uncorrelated with f t for alli andt. For allN (includingN→∞) and allT , the true parameter valuesρ 0 andσ 2 0 are locally identified ifh g > 0, or/and if H ρ 0 ,β 0 0 is positive definite, whereh g is given by (1.15), H ρ 0 ,β 0 0 = (NT ) −1 E 0 L 0 0 L 0 , (1.20) L 0 = G b 0 Xβ 0 , X , and G b 0 = I T ⊗ G 0 . (1.21) Provided thatρ 0 is identifiable, the parameter vectorβ 0 is identified if (NT ) −1 E 0 (X 0 X) is positive definite. The vectorγ 0 is identified ifT −1 E 0 (f t f 0 t ) is positive definite. Remark 1.6. Note that if H ρ 0 ,β 0 0 is positive definite, both ρ 0 andβ 0 are identified; if it is not, the identification ofρ 0 can be achieved byh g > 0. Comparing with the identification conditions for the pure SAR model, including individual-specific exogenous variables x it introduces an additional means to identify ρ 0 ; however, including common factors does not help. This is not surprising, because common factors do not contain information regarding cross-sectional variations. Remark 1.7. If there were no common factors, model (1.1) would reduce to a SAR model with exogenous regressors. Proposition 1.2 provides the identification conditions of parameters ρ 0 ,β 0 and σ 2 0 . Note that these conditions are valid even ifN→∞. Finally, let us return to model (1.1). Writing it in stacked form for each time period, we obtain y .t =ρy ∗ .t + X .t β + Γf t + e .t , t = 1, 2,...,T. (1.22) Supposing that we are only interested in identifyingρ 0 andβ 0 , as is the case in the following analysis, we can remove the effects of f t by premultiplying (1.22) with M f . The identification conditions can be established as a corollary to Proposition 1.2. 16 See Appendix B of Yang (2017) for a proof. 16 Corollary 1.1. Consider the model given by (1.1). For all N (including N →∞) and all T , the true parameter valueρ 0 is locally identified ifh g > 0, or/and if ˚ H ρ 0 ,β 0 0 is positive definite, whereh g is given by (1.15) and ˚ H ρ 0 ,β 0 0 is defined by ˚ H(ρ 0 ,β 0 0 ) = (NT ) −1 E 0 L 0 0 M b f L 0 . (1.23) Provided thatρ 0 is identifiable, the parameter vectorβ 0 is identified if (NT ) −1 E 0 X 0 M b f X is positive definite, which is ensured if ˚ H ρ 0 ,β 0 0 is positive definite. 1.4 Estimation Having established the identification conditions, we now turn to considering the estimation of model (1.1). We suggest three estimation methods, including the 2SLS, Best 2SLS, and GMM estimations. This section formally establishes the asymptotic distributions of these estimators. 1.4.1 2SLS estimation The first estimation method we propose is the 2SLS estimation using the instrumental variables, Q, as specified in Assumption 1.7. As before,δ 0 = (ρ 0 ,β 0 0 ) 0 denotes the true parameter vector. The 2SLS estimator ofδ 0 , denoted by ˆ δ 2sls , is defined as ˆ δ 2sls = L 0 P Q L −1 L 0 P Q Y, (1.24) where P Q = M b Q Q 0 M b Q −1 Q 0 M b , L = (Y ∗ , X) and Y ∗ = (I T ⊗ W) Y. There are two ways to interpret (1.24). One way is to de-factor the data with cross-sectional averages, namely, ˚ Y = M b Y and ˚ L = M b L, and then apply the standard 2SLS procedure to the de-factored observations ˚ Y and ˚ L. Alternatively, the matrix M b Q can be directly considered as instruments. We begin by showing that the 2SLS estimator, ˆ δ 2sls , is consistent asN→∞, forT fixed orT→∞. To see this, note that ˆ δ 2sls −δ 0 = L 0 P Q L −1 L 0 P Q [(I T ⊗ Γ 0 ) f + e], 17 and then √ NT ˆ δ 2sls −δ 0 = " 1 NT L 0 M b Q 1 NT Q 0 M b Q −1 1 NT Q 0 M b L # −1 × ( 1 NT L 0 M b Q 1 NT Q 0 M b Q −1 1 √ NT Q 0 M b [(I T ⊗ Γ 0 ) f + e] ) . Applying Lemma A.6, we have 1 NT Q 0 M b Q = 1 NT Q 0 M b f Q +O p 1 N +O p 1 √ NT , 1 NT Q 0 M b L = 1 NT Q 0 M b f L 0 +O p 1 N +O p 1 √ NT , where L 0 is given by (1.21), and it follows that 1 NT L 0 P Q L = 1 NT L 0 0 P Q,f L 0 +O p 1 N +O p 1 √ NT , where P Q,f = M b f Q Q 0 M b f Q −1 Q 0 M b f . Under Assumption 1.8, plim N→∞ (NT ) −1 L 0 0 P Q,f L 0 exists and is nonsingular. Furthermore, we have shown in Lemma A.6 that plim N→∞ (NT ) −1 Q 0 M b f [(I T ⊗ Γ 0 )f + e] = 0. As a result, ˆ δ 2sls is consistent forδ 0 , asN→∞. For the asymptotic distribution of ˆ δ 2sls , we show in Appendix A.2 that as (N,T ) j →∞ andT/N → 0, the term (NT ) −1/2 Q 0 M b [(I T ⊗ Γ 0 ) f] converges in probability to zero, and (NT ) −1/2 Q 0 M b e tends toward a normal distribution. The relative rate of expansion ofT andN is imposed to eliminate the nuisance parameters from the limiting distribution. The following theorem summarizes the limiting distribution of the 2SLS estimator. Theorem 1.1. Consider the panel data model given by (1.1) and suppose that Assumptions 1.1, 1.2(i), and 1.3–1.8 hold. The 2SLS estimator, ˆ δ 2sls , defined by (1.24), is consistent forδ 0 , asN→∞, forT fixed or T→∞. Moreover, as (N,T ) j →∞ andT/N→ 0, we have √ NT ˆ δ 2sls −δ 0 d →N (0, Σ 2sls ), (1.25) where Σ 2sls = Ψ −1 LPL Ω LPe Ψ −1 LPL , (1.26) 18 Ψ LPL = plim N,T→∞ (NT ) −1 L 0 0 P Q,f L 0 , Ω LPe = Ψ 0 QML Ψ −1 QMQ Ω QMe Ψ −1 QMQ Ψ QML , (1.27) Ψ QMQ = plim N,T→∞ (NT ) −1 Q 0 M b f Q, Ψ QML = plim N,T→∞ (NT ) −1 Q 0 M b f L 0 , (1.28) Ω QMe = lim N→∞ N −1 N X i=1 Ω iQMe ! , Ω iQMe = plim T→∞ T −1 Q 0 i. M f Ω e,i M f Q i. , (1.29) Ω e,i =E (e i. e 0 i. ), and Q i. = (Q i1 , Q i2 ,..., Q iT ) 0 . A consistent estimator for the asymptotic variance matrix, Σ 2sls , is given by ˆ Σ 2sls = 1 NT LP Q L −1 ˆ Ω LPe 1 NT LP Q L −1 , (1.30) where ˆ Ω LPe can be obtained by a Newey-West type robust estimator as follows: ˆ Ω LPe =N −1 N X i=1 ˆ Ω iLPe , (1.31) ˆ Ω iLPe = ˆ Ω iLPe,0 + M l X h=1 1− h M l + 1 ˆ Ω iLPe,h + ˆ Ω 0 iLPe,h , ˆ Ω iLPe,h =T −1 T X t=h+1 ˆ e it ˆ e i,t−h ˆ l it ˆ l 0 i,t−h , where M l is the the window size (or bandwidth) of the Bartlett kernel, ˆ e = M b Y− L ˆ δ 2sls = (ˆ e 0 .1 , ˆ e 0 .2 ,..., ˆ e 0 .T ) 0 , ˆ e it is the t th element of ˆ e .t , ˆ L = P Q L = ˆ L 0 .1 , ˆ L 0 .2 ,..., ˆ L 0 .T 0 , and ˆ l 0 it is the i th row of ˆ L .t . Remark 1.8. Although our interest lies in the parametersδ, we can gain insight into the variability of the factor loadings after obtaining estimates ofδ. This can be done by regressingy it −l 0 it ˆ δ on ¯ z .t and an intercept for each cross-section uniti, wherel it = (y ∗ it , x 0 it ) 0 , and ¯ z .t = (¯ y .t , ¯ x 0 .t ) 0 . 1.4.2 Best 2SLS estimation Having established the asymptotic distribution of the 2SLS estimator, the question then naturally arises whether optimal instrumental variables are available for model (1.1). An instrument is considered optimal or “best” if it produces an estimator that has the smallest asymptotic variance among all the IV estimators for the model. For cross-sectional spatial models, Lee (2003) suggests a best generalized spatial 2SLS 19 estimator, and he shows that it is asymptotically optimal under a set of regularity conditions. In this section, we generalize this estimation procedure to spatial models with common factors. Specifically, let ˆ δ = ˆ ρ, ˆ β denote some consistent initial estimate ofδ 0 , possibly obtained by the 2SLS estimation described in the previous section. We will investigate if the IV estimator, ˆ δ b2sls can achieve the smallest asymptotic variance for model (1.1), where ˆ δ b2sls = ˆ Q ∗0 L −1 ˆ Q ∗0 Y, (1.32) ˆ Q ∗ = M b h I T ⊗ ˆ G X ˆ β, X i , (1.33) and ˆ G = G(ˆ ρ). We refer to ˆ δ b2sls as the best 2SLS (B2SLS) estimator and ˆ Q ∗ as the (feasible) best IV . The intuition behind the formulation of ˆ Q ∗ is to exploit the part of Y ∗ that is uncorrelated with the errors. To see this, suppose for simplicity that there are no common factors. The structural equation (1.22) implies the following reduced form equation: y .t = S −1 0 X .t β 0 + S −1 0 e .t , which further leads to y ∗ .t = G 0 X .t β 0 + G 0 e .t . It is readily seen that the term G 0 X .t β is correlated with y ∗ .t but uncorrelated with e .t given that X .t is exogenous. Since G 0 X .t β 0 depends on the unknown parameters ρ 0 andβ 0 , a feasible IV for y ∗ .t can be constructed as ˆ GX .t ˆ β. Accordingly, the B2SLS estimation can be implemented in two steps: first obtaining some preliminary consistent estimates of the parameters, and then conducting an IV estimation using the best IV based on the first-step estimates. A similar argument applies to model (1.1) with common factors. Equation (1.33) indicates that in constructing the best IV , we need to filter out the common effects from the observations using the de-factoring matrix M b . The following theorem states the asymptotic properties of the B2SLS estimator and shows that it is the best IV estimator provided that the error terms are independently and identically distributed. The proof is given in Appendix A.2. Theorem 1.2. Consider the panel data model given by (1.1). Suppose that Assumptions 1.1, 1.2(i), and 1.3–1.8 hold and ˚ H(ρ 0 ,β 0 0 ) is positive definite, where ˚ H(ρ 0 ,β 0 0 ) is given by (1.23). Then, the best 2SLS (B2SLS) estimator, ˆ δ b2sls , defined by (1.32), is consistent forδ 0 , asN →∞, forT fixed orT →∞; as (N,T ) j →∞ andT/N→ 0, it has the following distribution: √ NT ˆ δ b2sls −δ 0 d →N (0, Σ b2sls ), (1.34) 20 where Σ b2sls = Ψ −1 LML Ω LMe Ψ −1 LML , (1.35) Ψ LML = plim N,T→∞ (NT ) −1 L 0 0 M b f L 0 , Ω LMe = lim N→∞ N −1 N X i=1 Ω iLMe ! , Ω iLMe = plim T→∞ T −1 L 0 0,i. M f Ω e,i M f L 0,i. , (1.36) L 0 is given by (1.21), L 0,i. = (l 0,i1 ,l 0,i2 ,...,l 0,iT ) 0 , andl 0 0,it is the [N (t− 1) +i] th row of L 0 . The B2SLS estimator is the best IV estimator if the disturbances{e it } are independently and identically distributed with mean zero and varianceσ 2 e . Note that under Assumption 1.6, (I N −ρ 0 W) −1 X .t β 0 = P N s=1 ρ s 0 W s X .t β 0 . Hence, the term G 0 X .t β 0 can be approximated by linear combinations of X .t β, WX .t β, W 2 X .t β,.... Clearly, the higher the power of W included, the better the approximation. However, the efficiency gain by including more instruments may not be significant. In practice, the 2SLS estimator with instruments X .t , WX .t , W 2 X .t is often found to perform well enough. The finite sample properties of ˆ δ b2sls will be compared with that of ˆ δ 2sls using Monte Carlo techniques in Section 1.5. 1.4.3 GMM estimation The third estimator we propose is the GMM estimator that utilizes quadratic moment conditions based on the properties of the idiosyncratic errors in addition to the 2SLS-type linear moments. The use of the quadratic moments for SAR models is proposed by Lee (2007) and later extended by Lin and Lee (2010) and Lee and Yu (2014). The advantages of adopting the quadratic moments lie both in improving efficiency and in making it possible to estimate the spatial autoregressive coefficient when there are no exogenous regressors. In this section, we show that this idea can be extended to spatial models with common factors. 21 Specifically, we consider the following sample moment conditions, which consist ofr quadratic moments andq linear moments: 17 g NT (δ) = ξ 0 (δ)M b P b 1 M b ξ(δ) . . . ξ 0 (δ)M b P b r M b ξ(δ) Q 0 M b ξ(δ) , (1.37) where M b is the de-factoring matrix defined by (1.11),ξ(δ) is the vector of residuals given by ξ(δ) = [I T ⊗ S(ρ)] Y− Xβ, (1.38) andδ = (ρ,β 0 ) 0 represents the unknown parameters in the parameter space,Δ sp . Forl = 1, 2,...,r, we define P b l = I T ⊗ P l , where P l = (p l,ij ) is anN-dimensional square matrix with zero diagonal, namely, diag (P l ) = (p l,11 ,p l,22 ,...,p l,NN ) 0 = 0. Intuitively, the idea behind the quadratic moments is to use some matrix P l to eliminate the correlations among the elements of the idiosyncratic error e .t in order to achieve zero expectations. To see this, consider the simpler scenario where there are no common factors: thel th population quadratic moment atδ 0 will be reduced to E e 0 P b l e = N X i=1 N X j=1 p l,ji E e 0 i. e j. = N X i=1 p l,ii E e 0 i. e i. = 0, where p l,ji is the (j,i) th element of matrix P l , and the last equality follows from the assumption that diag(P l ) = 0. It is worth noting that the moment conditions are built on the key assumption of the cross- sectional uncorrelatedness between e it and e jt 0, i6= j, for all t and t 0 . Also note that since we allow for unknown heteroskedasticity, we need all diagonal elements of P l to be zero in order to remove the variances ofe it from the moments. By contrast, imposingtr(P l ) = 0 would be sufficient ife it are homoskedastic (see, for example, Lee, 2007, and Lee and Yu, 2014). The GMM estimator, ˆ δ GMM , is then defined as ˆ δ GMM = argmin δ∈Δsp g 0 NT (δ) A w0 NT A w NT g NT (δ), (1.39) 17 We use the aggregated moment conditions over time instead of a moment condition for each period separately, since the latter approach may induce the many-moment bias problem and is beyond the scope of the current paper. See Lee and Yu (2014) for a discussion of this issue for spatial models. 22 where g NT (δ) is given by (1.37), and A w NT is a constant full row rank matrix ofk a × (r +q) dimension, wherek a ≥k + 1, and A w0 NT A w NT is assumed to converge to a positive definite matrix A w0 A. The following additional assumption is needed for the asymptotic analysis of the GMM estimator. Assumption 1.9. The matrices P l , for l = 1, 2,...,r, used in the moment conditions given by (1.37), are nonstochastic and have bounded maximum row and column sum norms, namely,||P l || ∞ < K and ||P l || 1 <K. Theorem 1.3. Consider the panel data model given by (1.1) and suppose that Assumptions 1.1–1.9 hold. The GMM estimator, ˆ δ GMM , defined by (1.39), is consistent forδ 0 asN →∞, forT fixed orT →∞. Furthermore, as (N,T ) j →∞ andT/N→ 0, we have √ NT ˆ δ GMM −δ 0 d →N (0, Σ GMM ), (1.40) where Σ GMM = D 0 A w0 A w D −1 D 0 A w0 A w Σ g A w0 A w D D 0 A w0 A w D −1 , (1.41) D = D 0 p , Ψ 0 QML 0 , D p = d p , 0 r×k , (1.42) d p = lim N→∞ N −1 N X i=1 ˜ g s ii,1 σ 2 i , N X i=1 ˜ g s ii,2 σ 2 i , ..., N X i=1 ˜ g s ii,r σ 2 i ! 0 , (1.43) Σ g = Σ p 0 r×(k+1) 0 (k+1)×p Ω QMe , (1.44) Σ p = lim N→∞ N −1 tr [(P 1 P s 1 )Σ e ] ··· tr [(P 1 P s r )Σ e ] . . . . . . tr [(P r P s 1 )Σ e ] ··· tr [(P r P s r )Σ e ] , (1.45) where ˜ g s ii,l (l = 1, 2,...,r) is thei th diagonal element of matrix ˜ G l (ρ 0 ) = P s l G 0 , P s l = P l + P 0 l ,Σ e = (ς e,ij ) is an N×N matrix of which the (i,j) th element is given by ς e,ij = lim T→∞ T −1 tr (Ω e,i Ω e,j ), Ψ QML and Ω QMe are given by (1.28) and (1.29), respectively, and denotes the Hadamard (or entrywise) product. 23 The (infeasible) efficient GMM estimator can be obtained using the optimal weighting matrix, Σ −1 g , in the usual fashion, namely, ˆ δ ∗ GMM = argmin δ∈Δsp g 0 NT (δ) Σ −1 g g NT (δ). (1.46) The asymptotic distribution of ˆ δ ∗ GMM is formally stated in the next theorem. Theorem 1.4. Under the same assumptions as in Theorem 1.3, the efficient GMM estimator, ˆ δ ∗ GMM , defined by (1.46), has the following asymptotic distribution as (N,T ) j →∞ andT/N→ 0: √ NT ˆ δ ∗ GMM −δ 0 d →N (0, Σ ∗ GMM ), (1.47) where Σ ∗ GMM = D 0 Σ −1 g D −1 . (1.48) A consistent estimator of Σ GMM can be obtained by replacing D and Σ g in (1.41) with ˆ D and ˆ Σ g , respectively, where ˆ D = ˆ D 0 p , ˆ Ψ 0 QML 0 , ˆ D p = ˆ d p 0 r×k , ˆ Ψ = (NT ) −1 Q 0 M b L, ˆ d p = (NT ) −1 N X i=1 ˆ ˜ g s ii,1 ˆ e 0 i. ˆ e i. , N X i=1 ˆ ˜ g s ii,2 ˆ e 0 i. ˆ e i. , ..., N X i=1 ˆ ˜ g s ii,r ˆ e 0 i. ˆ e i. ! 0 , ˆ Σ g = 1 NT P N i=1 P N j=1 p 1,ji (p 1,ij +p 1,ji ) ˆ s e,ij ∗ ··· 0 P N i=1 P N j=1 p 2,ji (p 1,ij +p 1,ji ) ˆ s e,ij ∗ ··· 0 . . . . . . . . . 0 0 ··· ˆ Ω QMe , ˆ e = M b Y− L ˆ δ GMM , ˆ ˜ g s ii,l is thei th diagonal element of ˜ G l (ˆ ρ), ˆ s e,ij =T ˆ γ e,i (0)ˆ γ e,j (0) + 2 M l X h=1 (T−h) 1− h M l + 1 ˆ γ e,i (h)ˆ γ e,j (h), ˆ γ e,i (h) = T −1 P T t=h+1 ˆ e it ˆ e i,t−h , andM l is the maximum lag length (or window size). Similarly, we can estimate Σ ∗ GMM by ˆ Σ ∗ GMM = ˆ D ∗0 ˆ Σ ∗−1 g ˆ D ∗ −1 , where ˆ D ∗ and ˆ Σ ∗ g would be computed using ˆ e ∗ = M b Y− L ˆ δ ∗ GMM instead of ˆ e. 24 It is straightforward to see that the 2SLS estimator ˆ δ 2sls is asymptotically less efficient than ˆ δ GMM , since the latter makes use of quadratic moments in addition to the linear moments. Turning to the choice of P l for the quadratic moments, note that the precision matrix of the efficient GMM estimator is given by D 0 Σ −1 g D = d 0 p Σ −1 p d p 0 1×k 0 k×1 0 k×k + 1 NT Q 0 M b f L 0 0 Ω QMe Q 0 M b f L 0 . (1.49) It can be seen from (1.49) that, ideally, one should choose P l (l = 1, 2...,r) to maximize d 0 p Σ −1 p d p . However, this term depends on the unknown variance structure of the disturbances. If the disturbances are independent and identically distributed (i.i.d.), it is known in the spatial literature that the best P l within the class of matrices with zero diagonal is given by P ∗ = G 0 − Diag(G 0 ) (Lee, 2007; Lee and Yu, 2014). Using similar arguments, the results can be extended to our model with common factors. To put it more clearly, provided that the disturbances are i.i.d., a best GMM (BGMM) estimator can be obtained by minimizing the optimally weighted moments (1.37), where P is set to ˆ P ∗ = G(ˆ ρ)−Diag (G(ˆ ρ)), and Q is replaced by ˆ Q ∗ given in (1.33). Nonetheless, in the presence of unknown heteroskedasticity and serial correlations, the BGMM estimator in general will not be the most efficient. This conclusion can be drawn by applying similar reasoning as in the proof of Theorem 1.2 for the B2SLS estimator. The present paper omits further discussions on the BGMM estimator in view of the strong assumption required for it to have optimal properties. 1.5 Monte Carlo experiments This section first provides Monte Carlo evidence in support of the identification conditions, then docu- ments the finite sample properties of the proposed estimators under various specifications of the disturbance process and under different intensities of spatial dependence. It also compares the performance of the pro- posed estimators with that of alternative estimators. 1.5.1 Identification experiments We now construct an example to show that the condition given by (1.18), namely, N −1 tr W 0 W >ε> 0, for allN, includingN→∞, (1.50) 25 is indeed necessary for identification. Consider the following data generating process (DGP), y it =ρy ∗ it +e it , (1.51) fori = 1, 2,...,N, andt = 1, 2,...,T , wherey ∗ it = P N j=1 w ij y jt ande it ∼ IIDN(0,σ 2 ). Suppose that N 1 =bN α c rows of W are nonzero and that the otherN 2 = N−N 1 rows are all zeros, in whichbN α c denotes the integer part ofN α , andα is a constant that does not depend onN and lies in the range[0, 1]. In other words, we allow the number of nonzero rows of W to rise more slowly than the sample size,N, and the rate at which it rises withN is measured byα. Note that the identification condition, (1.18), fails to hold if α < 1. To see this, there is no loss of generality in assuming that the firstN 1 rows of W are nonzero, and it follows that tr (W 0 W) N = P N i=1 P N j=1 w 2 ij N = P N 1 i=1 K i + P N i=N 1 +1 0 N ≤K bN α c N ≤KN α−1 , where the second equality follows from P N j=1 w 2 ij = K i <∞, for alli. Hence, N −1 tr(W 0 W)→ 0, as N→∞, ifα< 1, and it approaches zero faster for smallerα. In the Monte Carlo experiments, we consider the q-ahead-and-q-behind circular neighbors spatial weights, which are commonly employed in the literature. Anm-ahead-and-m-behind matrix is motivated to capture spatial relations in which all units are located in a circle; theq units immediately ahead of and behind a particular unit are considered “neighbors” and assigned equal weights. For example, for the 2-ahead-and- 2-behind spatial matrix, thei th row of W has nonzero elements in the positionsi− 2,i− 1,i + 1,i + 2, and each weigh 1/4 due to row normalization. Without loss of generality, we adopt the 5-ahead-and-5-behind spatial weights in the firstN 1 rows of W, and we set the remaining rows to zeros. It is worth noting that the identification condition for model (1.51) proposed by Lee and Yu (2016), which states that the matrices, I N , W + W 0 , and W 0 W are linearly independent, is satisfied in this case. To see this, letc 1 ,c 2 , andc 3 be constants such that c 1 + 2c 2 w ii +c 3 N X k=1 w 2 ki = 0, for alli = 1, 2,...,N, (1.52) c 2 (w ij +w ji ) +c 3 N X k=1 w ki w kj = 0, for alli,j = 1, 2,...,N, andi6=j. (1.53) 26 Then I N , W + W 0 , and W 0 W are linearly independent if and only ifc 1 =c 2 =c 3 = 0. Suppose first that c 3 = 0. From (1.53) we must havec 2 = 0, sincew ij +w ji > 0 exists for somei andj. Then, using (1.52), we obtainc 1 = 0. If, on the other hand, c 3 6= 0, then it can be easily verified that there are no constants ˜ c 1 > 0 and ˜ c 2 6= 0, such that P N k=1 w 2 ki = ˜ c 1 , for alli P N k=1 w ki w kj = ˜ c 2 (w ij +w ji ), for alli,j, andi6=j . (1.54) This establishes the linear independence of I N , W + W 0 , and W 0 W. In sum, we have shown that the W matrix as described above meets the independence condition by Lee and Yu (2016), but it violates the necessary condition for identification given by (1.18) ifα< 1. Using this spatial weights matrix, we generate data following (1.51) for combinations ofN = 20, 50, 100, 500, 1, 000, andT = 1, 20, 50, 100, underα = 1, 1/2, 1/3, 1/4, respectively. The true values of the parameters are set toρ = 0.2 andσ 2 = 1. Each experiment is replicated 2, 000 times. Model (1.51) can be estimated by the standard maximum likelihood approach for SAR models, 18 and Table 1.1 reports the bias, root mean squared error (RMSE), size, and power of the MLE under different values ofα. We first observe that whenα = 1, the MLE performs properly with declining bias and RMSE asN and/orT increases, and with correct empirical size and good power. Nonetheless, as expected, the bias and RMSE are substantial whenα< 1, and they are especially severe ifT is small. Even when bothN and T are large, there is considerably greater variation in the estimates whenα< 1 as compared toα = 1. For instance, forN = 1, 000 andT = 100, the bias (×100) is−1.10 whenα = 1/4, whereas it is 0.00 when α = 1; in addition, the RMSE (×100) is 11.19 whenα = 1/4, which by contrast is only 0.68 whenα = 1. It is also evident that the smaller the value ofα, the greater the RMSE. Overall, these results corroborate our finding thattr (W 0 W)/N > ε > 0 for allN (includingN→∞) is essential for the identification of the spatial autoregressive models. 18 See, for example, Anselin (1988), Chapter 6. 27 1.5.2 Estimation experiments For the estimation experiments, we follow the Monte Carlo design of Pesaran (2006) and consider the following DGP: y it =ρy ∗ it +β 1 x it1 +β 2 x it2 +γ 0 y,i f t +e it , (1.55) x itp =γ 0 x,ip f t +υ itp , p = 1, 2, fori = 1, 2,...,N, andt = 1, 2,...,T . The unobserved factors are generated by f lt =ρ fl f l,t−1 +ς f lt , l = 1, 2,...,m; t =−49,−48,..., 0, 1,...,T, ς f lt ∼IIDN 0, 1−ρ 2 fl , ρ fl = 0.5, f l,−50 = 0, where the first 50 observations are discarded. The factor loadings are assumed to beγ y,i1 ∼IIDN (1, 0.2), γ y,i2 ∼IIDN (1, 0.2), and γ x,i11 γ x,i12 γ x,i21 γ x,i22 ∼IID N(0.5, 0.5) N(0, 0.5) N(0, 0.5) N(0.5, 0.5) . The idiosyncratic errors of thex itp processes, (υ it1 ,υ it2 ) 0 , are generated as υ it,p =ρ υ ip υ it−1,p +ϑ it,p , t =−49,−48,..., 0, 1,...,T, ϑ it,p ∼N 0, 1−ρ 2 ϑ ip , υ ip,−50 = 0, ρ ϑ ip ∼IIDU (0.05, 0.95), p = 1, 2, where the first 50 observations are discarded. We consider three different designs for the idiosyncratic errors ofy it : The errors e it are generated from IIDN(0, 1). The main goal of this baseline setup is to compare the efficiency properties of the competing estimators. In particular, it is of interest to examine if the B2SLS and GMM estimators are more efficient than the 2SLS estimator. 28 The errorse it are independent over time and heteroskedastic. Specifically, we consider e it =σ i ζ it , i = 1, 2,...,N; t = 1, 2,...,T, (1.56) ζ it ∼IIDN (0, 1), σ 2 i ∼IIDU (0.5, 1.5). The errors e it are serially correlated and heteroskedastic. In particular, they are specified as AR(1) processes for the first half of individual units and as MA(1) processes for the remaining half: e it =ρ ie e i,t−1 +σ i 1−ρ 2 ie 1/2 ζ it , i = 1, 2,...,bN/2c, (1.57) e it =σ i 1 +θ 2 ie 1/2 (ζ it +θ ie ζ i,t−1 ), i =bN/2c + 1,bN/2c + 2,...,N, (1.58) ζ it ∼IIDN (0, 1), σ 2 i ∼IIDU (0.5, 1.5), ρ ie ∼IIDU (0.05, 0.95), e i,−50 = 0. The spatial weights matrix is specified as theq-ahead-and-q-behind circular neighbors weights matrix; without loss of generality, we setq = 1. 19 In all experiments, the true number of factors is set tom = 2, and the true values of coefficients areβ 1 = 1,β 2 = 2 andρ = 0.4. 20 The sample sizes areN = 30, 50, 100, 500, 1, 000; andT = 20, 30, 50, 100. Each experiment is replicated 2, 000 times. The parameters of interest for model (1.55) are (ρ,β 1 ,β 2 ) 0 , which are estimated by the following meth- ods: The naive 2SLS estimator, which ignores the latent factors and applies a standard 2SLS estimation procedure directly with instruments Q (2) .t = X .t , WX .t , W 2 X .t , for t = 1, 2,...,T, where the superscript of Q .t denotes that the highest power of W used in constructing the instruments. The infeasible 2SLS estimator, which assumes the factors are known and utilizes instruments Q (2) .t , fort = 1, 2,...,T . The 2SLS estimator given by (1.24) with instruments Q (2) .t , fort = 1, 2,...,T . 19 Other specifications of W have also been considered for robustness checks, and the results are available upon request. 20 We have also consideredρ = 0.8 and the results are presented in Yang (2017). 29 The B2SLS estimator given by (1.32), which is implemented in two steps. In the first step, we compute a preliminary 2SLS estimate following (1.24) using instruments Q (2) .t , for t = 1, 2,...,T . In the second step, the B2SLS estimate is obtained by using the estimated best IV matrix ˆ Q ∗ in (1.32), where ˆ Q ∗ = M b h I T ⊗ ˆ G X ˆ β 2sls , X i , (1.59) and ˆ G = G (ˆ ρ 2sls ). The efficient GMM estimator given by (1.46) that uses P 1 = W and P 2 = W 2 −Diag W 2 in the quadratic moments and Q (2) .t as IVs in the linear moments. It is obtained by a two-step procedure. In the first step, we take the identity matrix as the moments weighting matrix and compute a preliminary GMM estimate. In the second step, the estimated inverse of the covariance of moments is used as the weighting matrix, and the model is re-estimated using the same P 1 , P 2 , and IVs. The MLE developed by Bai and Li (2014). This procedure assumes that the disturbances of the model are independently distributed with heteroskedastic variances and explicitly estimates all of the het- eroskedasticity and factor loadings. It is important to note that the asymptotic distribution of the MLE was derived under the assumption thatN,T →∞ and √ N/T → 0. The incidental parameters in the time dimension are avoided by estimating the sample variance of the factors rather than individual factors. 21 We compute the MLE following the Expectation-Maximization (EM) algorithm suggested by Bai and Li (2014). The number of factors is assumed known in the experiments to reduce the computational burden. 22 The size and power properties of the MLE are not reported in their paper. For the robust variance estimation of the above methods (except the MLE), the Bartlett window width is chosen to be j 2 √ T k . 23 Tables 1.2a to 1.4b collect the results of the estimation experiments. Each table reports the estimates of bias, root mean squared error (RMSE), size, and power for the aforementioned estimators. Sub-table a reports the estimates of the spatial coefficient,ρ, and Sub-table b reports the estimates of the slope coefficient,β 1 . 21 Bai and Li (2014) point out that one could switch the role ofN andT ifT is much smaller thanN. We do not report results under this interchange, since it involves different stringent assumptions on the disturbances and does not improve the performance of MLE under our Monte Carlo designs. 22 Bai and Li (2014) propose using an information criterion to estimate the number of factors in their Monte Carlo experiments. 23 We have also considered T 1/3 as the window size. The results are close, but using 2 √ T has slightly better size properties. 30 We omit the results ofβ 2 to save space, as they are similar to those ofβ 1 . The results of the naive estimator are only presented in the first two tables, since ignoring the factors produces enormous biases and variances in all experiments, as expected. Specifically, Tables 1.2a to 1.4b present the results when the errorse it are generated as an i.i.d. process, an independent and heteroskedastic process, and a serially correlated and heteroskedastic process, respec- tively, under a relatively low level of spatial dependence (ρ = 0.4). We first observe that the 2SLS estimator exhibits very small biases and declining RMSEs asN and/orT increase. A comparison between the 2SLS and the infeasible 2SLS estimators suggests that the efficiency loss from using cross-sectional averages to approach the unobserved factors is quite small, almost indiscernible when the sample size is large. The B2SLS estimator is only marginally more efficient than the 2SLS estimator for the spatial parameterρ when N is small, and it provides little or no improvement for the slope parameterβ. This implies that the IV matrix Q (2) .t = X .t , WX .t , W 2 X .t used in computing the 2SLS estimates approximates the best IV quite well in our experimental designs. The GMM estimator forρ outperforms the 2SLS and B2SLS estimator in reducing the RMSEs, and it even beats the infeasible 2SLS estimator for modest to large sample size (N ≥ 100). Finally, the MLE developed by Bai and Li (2014) produces the smallest RMSEs among all estimation methods, and the improvement forρ is especially notable. Nonetheless, its computation for large values ofN andT is rather strenuous, and its performance could be weakened if the number of factors is estimated, especially when the estimated number of factors is smaller than the true value. Turning to size and power properties, as anticipated by the theoretical findings, the proposed estimators have good power and empirical sizes that are close to the 5% nominal size for largeN and small to modest T , irrespective of whether the errors are heteroskedastic and serially correlated. In cases whereN is much smaller thanT , the rejection frequencies under the null of the 2SLS and B2SLS estimators are slightly higher than 5%, and the GMM estimator is more oversized than the 2SLS estimators. It is also evident that the size distortion is more pronounced for the spatial parameter than for the slope coefficients. In view of these results, it is worthwhile to bear in mind that the variance estimators cannot be applied to the smallN large T scenarios. In contrast, the MLE performs well when the errors are independent; it has higher power than the other estimators and proper sizes close to the 5% nominal level whenN is not too large relative toT . However, as its theory does not permit the presence of serial correlation in the errors, the MLE based tests are significantly over-sized in this case. For the combinations ofN andT considered, the empirical sizes of the MLE range from 13% to 29%. 31 Table 1.1: Small sample properties of the maximum likelihood estimator of the spatial autoregressive coefficient,ρ, for the identification experiments under different values ofα Bias(×100) RMSE(×100) Size(×100) Power(×100) N\T 1 20 50 100 1 20 50 100 1 20 50 100 1 20 50 100 α = 1 20 -19.63 -1.35 -0.61 -0.38 51.24 9.85 6.17 4.36 3.50 5.30 4.95 5.25 6.25 17.05 37.20 60.60 50 -9.51 -0.59 -0.34 -0.08 31.42 6.25 3.92 2.77 4.85 5.50 5.05 5.50 7.45 38.40 69.75 94.05 100 -4.87 -0.41 -0.14 0.01 21.27 4.41 2.78 1.95 5.40 5.30 5.10 5.15 10.10 59.50 93.40 99.90 500 -0.97 0.03 0.06 0.03 8.84 1.95 1.24 0.90 5.00 5.15 5.10 6.45 21.60 99.90 100.00 100.00 1,000 -0.64 0.06 0.04 0.00 6.21 1.38 0.90 0.68 5.30 5.10 6.10 6.65 37.80 100.00 100.00 100.00 α = 1/2 20 -31.73 -4.64 -2.28 -1.13 85.60 31.07 19.83 13.80 0.00 5.80 5.80 6.00 0.00 7.30 9.10 12.00 50 -30.17 -3.10 -1.27 -0.59 73.08 20.45 12.56 8.71 0.00 5.60 5.55 4.75 0.00 8.70 12.95 19.70 100 -26.41 -2.64 -1.14 -0.60 64.30 15.76 9.82 7.01 1.90 5.25 5.25 6.00 3.55 9.85 16.80 28.90 500 -17.32 -0.89 -0.23 -0.05 47.67 9.74 6.17 4.34 2.35 5.20 5.40 4.90 4.55 17.95 39.25 64.65 1,000 -13.43 -0.84 -0.36 -0.20 39.93 8.39 5.19 3.56 5.05 6.00 6.15 5.30 7.00 23.30 48.10 78.70 α = 1/3 20 -25.27 -4.63 -2.09 -1.01 91.58 46.45 31.01 21.58 0.00 3.40 6.15 6.20 0.00 6.65 7.15 8.20 50 -28.33 -4.65 -1.87 -0.54 87.65 37.21 23.65 16.54 0.00 6.00 6.00 5.20 0.00 6.60 8.45 10.85 100 -28.66 -5.13 -1.96 -1.10 82.56 30.50 19.34 13.30 0.00 4.70 5.55 5.20 0.00 6.20 8.55 11.05 500 -30.78 -2.71 -0.73 -0.19 72.72 19.92 12.46 8.74 0.00 5.35 5.25 4.55 0.00 9.20 13.90 23.10 1,000 -28.90 -2.27 -0.89 -0.63 68.08 17.35 10.70 7.43 2.15 5.75 5.40 4.90 3.70 11.55 18.20 25.05 α = 1/4 20 -25.27 -4.63 -2.09 -1.01 91.58 46.45 31.01 21.58 0.00 3.40 6.15 6.20 0.00 6.65 7.15 8.20 50 -22.18 -4.42 -1.38 -0.41 90.22 46.33 30.66 21.17 0.00 3.25 5.80 4.80 0.00 7.20 6.95 7.80 100 -27.96 -5.64 -2.03 -1.23 87.23 37.53 23.76 16.27 0.00 4.85 5.15 5.65 0.00 6.65 8.25 9.10 500 -30.66 -3.97 -1.26 -0.58 83.89 30.54 18.91 13.30 0.00 5.65 5.40 5.80 0.00 6.85 8.45 11.60 1,000 -31.69 -3.58 -1.59 -1.10 80.58 26.16 16.11 11.19 0.00 6.20 5.65 5.15 0.00 7.55 10.10 13.85 Notes: The DGP is given by (1.51). The true value ofρ is 0.2, andρ is estimated by the maximum likelihood method. The spatial weights matrix W is constructed such that the first N1 =bN α c rows contain the 5-ahead-and-5-behind spatial weights, whereα∈ [0, 1], and the restN2 =N−N1 rows of W are all zeros. The number of replications is 2, 000. The 95% confidence interval for size 5% is [3.6%, 6.4%]. The power is calculated under the alternativeH1 :ρ = 0.1. 32 Table 1.2a: Small sample properties of estimators for the spatial parameterρ (ρ = 0.4, i.i.d. errors) Bias(×100) RMSE(×100) Size(×100) Power(×100) N\T 20 30 50 100 20 30 50 100 20 30 50 100 20 30 50 100 Naive 2SLS estimator (excluding factors) 30 16.06 16.21 16.34 16.41 16.58 16.61 16.65 16.66 99.40 99.65 99.95 99.95 99.65 99.85 99.95 100.00 50 16.04 16.23 16.34 16.38 16.44 16.52 16.55 16.55 99.90 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100 16.01 16.24 16.32 16.38 16.33 16.46 16.47 16.48 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 500 15.95 16.14 16.27 16.33 16.21 16.30 16.37 16.39 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 1,000 15.95 16.14 16.27 16.34 16.20 16.30 16.37 16.40 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 Infeasible 2SLS estimator (including factors) 30 -0.10 -0.04 0.00 0.01 2.40 1.91 1.47 0.99 5.10 4.95 5.15 4.45 13.50 18.15 29.10 52.10 50 0.04 0.05 0.00 0.00 1.85 1.48 1.11 0.77 5.75 5.40 5.25 5.15 20.45 29.70 43.00 72.25 100 0.01 0.01 0.01 0.01 1.30 1.03 0.78 0.55 5.25 4.35 4.35 4.40 34.75 48.50 71.15 95.20 500 -0.02 -0.01 0.00 0.00 0.59 0.46 0.35 0.25 5.60 4.85 4.65 4.50 92.60 98.75 100.00 100.00 1,000 0.00 0.00 -0.01 0.00 0.42 0.33 0.25 0.18 4.90 5.65 5.30 5.70 99.80 99.95 100.00 100.00 2SLS estimator 30 -0.08 -0.01 0.00 0.01 2.75 2.16 1.64 1.15 6.10 6.45 7.30 8.10 12.40 18.35 28.30 48.30 50 0.02 0.05 0.01 0.00 1.99 1.58 1.20 0.83 5.35 5.95 5.30 5.70 18.05 26.15 41.55 71.00 100 0.01 0.00 0.01 0.01 1.38 1.08 0.81 0.56 4.50 5.10 4.45 5.20 30.25 45.30 69.40 94.10 500 -0.02 -0.01 0.00 0.00 0.62 0.48 0.36 0.25 5.00 4.55 4.95 4.75 89.05 97.95 100.00 100.00 1,000 0.00 0.00 -0.01 0.00 0.44 0.34 0.26 0.18 4.70 4.90 5.20 5.40 99.45 99.95 100.00 100.00 B2SLS estimator 30 -0.12 -0.03 -0.01 0.00 2.74 2.16 1.63 1.15 6.00 6.40 7.10 8.00 12.25 18.40 27.90 48.05 50 0.00 0.03 0.00 0.00 1.99 1.58 1.19 0.83 5.30 6.10 5.25 5.45 18.10 25.75 41.30 70.90 100 -0.01 -0.01 0.01 0.01 1.38 1.08 0.80 0.56 4.75 5.05 4.45 5.20 29.75 45.10 69.20 94.15 500 -0.02 -0.01 0.00 0.00 0.62 0.48 0.36 0.25 4.95 4.55 4.85 4.60 88.65 98.05 100.00 100.00 1,000 0.00 0.00 -0.01 0.00 0.44 0.34 0.25 0.18 4.50 4.65 5.05 5.60 99.45 99.95 100.00 100.00 GMM estimator 30 -1.25 -1.11 -1.07 -1.02 2.60 2.11 1.76 1.41 10.30 11.45 16.00 24.20 8.75 10.85 14.10 23.15 50 -0.69 -0.64 -0.64 -0.60 1.86 1.52 1.22 0.94 8.50 9.80 12.15 16.00 15.80 21.95 32.10 54.25 100 -0.33 -0.32 -0.31 -0.29 1.24 0.98 0.75 0.57 6.90 6.85 7.05 9.95 33.60 47.30 69.85 94.85 500 -0.08 -0.07 -0.07 -0.06 0.52 0.41 0.31 0.22 6.00 5.20 6.00 6.65 96.15 99.80 100.00 100.00 1,000 -0.03 -0.03 -0.04 -0.03 0.36 0.29 0.22 0.15 5.25 5.60 5.65 6.25 100.00 100.00 100.00 100.00 MLE 30 0.30 0.23 0.18 0.16 2.32 1.79 1.36 0.92 11.80 10.20 8.65 7.85 30.65 34.95 46.80 71.20 50 0.35 0.23 0.14 0.11 1.79 1.39 1.02 0.70 12.45 10.00 8.30 7.30 41.45 49.05 64.00 88.70 100 0.29 0.17 0.11 0.09 1.26 0.95 0.70 0.49 11.55 9.25 7.00 6.95 59.75 71.85 89.05 99.25 500 0.22 0.11 0.05 0.04 0.59 0.43 0.31 0.22 13.40 9.30 7.20 7.70 99.00 100.00 100.00 100.00 1,000 0.20 0.11 0.06 0.04 0.42 0.32 0.23 0.16 14.40 11.10 8.40 7.00 100.00 100.00 100.00 100.00 Notes: The DGP is given by (1.55), whereeit∼IIDN(0, 1). The true parameter values areρ = 0.4,β1 = 1 andβ2 = 2. The true number of factors is 2. The spatial weights matrix is the 1-ahead-and-1-behind circular neighbors matrix. The naive estimator ignores latent factors, and the infeasible estimator treats factors as known. The naive 2SLS, infeasible 2SLS, and 2SLS estimators are computed using instruments Q (2) .t = X.t, WX.t, W 2 X.t , fort = 1, 2,...,T . The best 2SLS (B2SLS) estimator is computed using ˆ Q ∗ given by (1.59). The efficient two-step GMM estimator utilizes P1 = W and P2 = W 2 −Diag W 2 in the quadratic moments and Q (2) .t in the linear moments. The MLE is computed by the Expectation-Maximization (EM) algorithm described in Bai and Li (2014), assuming the number of factors is known. The number of replications is 2, 000. The 95% confidence interval for size 5% is [3.6%, 6.4%], and the power is computed underH1 :ρ = 0.38. 33 Table 1.2b: Small sample properties of estimators for the slope parameterβ 1 (β 1 = 1, i.i.d. errors) Bias(×100) RMSE(×100) Size(×100) Power(×100) N\T 20 30 50 100 20 30 50 100 20 30 50 100 20 30 50 100 Naive 2SLS estimator (excluding factors) 30 8.82 9.09 9.11 9.24 11.71 11.56 11.23 11.10 53.55 63.40 72.35 83.00 76.15 83.80 90.45 95.65 50 8.77 8.88 9.05 9.25 10.91 10.60 10.45 10.40 65.15 74.30 84.40 91.25 87.45 92.60 96.50 99.05 100 9.03 9.13 9.22 9.42 10.43 10.28 10.11 10.14 79.90 86.45 93.35 97.80 96.70 98.60 99.70 99.90 500 9.15 9.27 9.34 9.53 9.93 9.85 9.73 9.78 97.00 98.90 99.80 100.00 99.85 100.00 100.00 100.00 1,000 9.17 9.30 9.36 9.55 9.87 9.80 9.69 9.74 98.15 99.70 99.95 100.00 100.00 100.00 100.00 100.00 Infeasible 2SLS estimator (including factors) 30 0.05 0.02 -0.01 -0.01 4.50 3.57 2.73 1.88 5.50 5.20 5.40 5.10 21.80 30.45 47.85 75.95 50 -0.19 -0.17 -0.11 -0.09 3.45 2.66 2.01 1.40 5.45 4.40 4.65 4.50 28.55 42.75 66.25 92.75 100 -0.13 -0.04 -0.05 -0.05 2.46 1.92 1.48 1.01 5.65 5.65 5.35 5.05 52.45 73.75 91.90 99.85 500 -0.06 -0.04 -0.02 -0.01 1.07 0.84 0.66 0.47 4.85 4.35 5.20 5.80 99.80 100.00 100.00 100.00 1,000 0.01 0.01 0.01 0.00 0.79 0.62 0.47 0.33 6.10 6.00 5.30 5.30 100.00 100.00 100.00 100.00 2SLS estimator 30 0.06 0.04 0.03 0.03 4.73 3.77 2.91 2.05 5.75 6.30 7.35 7.35 20.00 29.20 47.00 75.60 50 -0.19 -0.18 -0.09 -0.09 3.61 2.76 2.08 1.45 4.70 5.10 4.75 4.85 26.05 39.60 65.80 92.60 100 -0.13 -0.05 -0.06 -0.05 2.54 1.96 1.51 1.03 5.40 5.05 4.75 5.05 46.85 70.70 90.80 99.85 500 -0.07 -0.05 -0.02 -0.01 1.12 0.86 0.67 0.48 4.15 4.30 5.00 5.30 99.25 100.00 100.00 100.00 1,000 0.02 0.01 0.01 0.00 0.82 0.64 0.48 0.33 5.45 5.20 5.25 5.25 100.00 100.00 100.00 100.00 B2SLS estimator 30 0.07 0.04 0.04 0.03 4.73 3.77 2.91 2.05 5.75 6.25 7.40 7.35 19.95 29.25 47.20 75.45 50 -0.19 -0.18 -0.09 -0.09 3.61 2.76 2.08 1.45 4.70 5.10 4.70 4.85 26.10 39.75 65.75 92.70 100 -0.13 -0.05 -0.06 -0.04 2.54 1.96 1.51 1.03 5.35 5.10 4.75 5.10 46.90 70.75 90.75 99.85 500 -0.07 -0.05 -0.02 -0.01 1.12 0.86 0.67 0.48 4.15 4.25 5.00 5.30 99.25 100.00 100.00 100.00 1,000 0.02 0.01 0.01 0.00 0.82 0.64 0.48 0.33 5.45 5.30 5.25 5.25 100.00 100.00 100.00 100.00 GMM estimator 30 0.16 0.17 0.17 0.17 4.79 3.80 2.92 2.06 5.70 6.95 7.30 7.55 21.35 30.70 49.25 77.55 50 -0.09 -0.08 0.01 0.01 3.64 2.77 2.07 1.45 4.75 4.90 4.80 4.85 27.90 41.60 67.25 93.55 100 -0.08 0.01 0.00 0.01 2.54 1.96 1.50 1.03 5.45 5.05 4.80 4.45 47.70 71.95 90.95 99.85 500 -0.06 -0.04 -0.01 0.00 1.11 0.86 0.67 0.47 4.10 4.30 5.00 5.30 99.20 100.00 100.00 100.00 1,000 0.02 0.02 0.02 0.01 0.81 0.64 0.48 0.33 5.30 5.40 5.50 5.40 100.00 100.00 100.00 100.00 MLE 30 -0.01 -0.01 -0.06 -0.04 5.05 3.85 2.86 1.93 11.45 9.30 7.70 6.10 29.80 35.10 48.45 75.80 50 -0.20 -0.16 -0.13 -0.11 3.76 2.79 2.06 1.43 10.20 7.00 5.95 5.55 36.20 47.15 68.15 93.15 100 -0.14 -0.05 -0.07 -0.06 2.68 2.01 1.52 1.04 10.60 8.00 6.80 6.05 58.45 76.55 91.85 99.90 500 -0.02 -0.01 -0.01 -0.01 1.18 0.88 0.67 0.48 10.70 6.50 5.90 7.00 99.60 100.00 100.00 100.00 1,000 0.04 0.00 0.02 -0.01 0.84 0.65 0.47 0.33 10.40 8.40 6.30 5.50 100.00 100.00 100.00 100.00 Notes: The DGP is given by (1.55), whereeit∼IIDN(0, 1). The true parameter values areρ = 0.4,β1 = 1 andβ2 = 2. The true number of factors is 2. The power is computed underH1 :β1 = 0.95. See the notes to Table 1.2a for other details. 34 Table 1.3a: Small sample properties of estimators for the spatial parameterρ (ρ = 0.4, independent and heteroskedastic errors) Bias(×100) RMSE(×100) Size(×100) Power(×100) N\T 20 30 50 100 20 30 50 100 20 30 50 100 20 30 50 100 Infeasible 2SLS estimator (including factors) 30 -0.10 -0.04 -0.01 0.00 2.40 1.91 1.48 0.99 5.55 5.15 5.65 4.50 13.50 18.25 29.20 51.75 50 0.05 0.06 0.00 0.00 1.85 1.48 1.11 0.77 5.45 5.60 5.40 5.10 20.60 30.10 43.60 72.90 100 0.02 0.01 0.01 0.01 1.30 1.03 0.78 0.55 4.80 3.95 4.40 4.10 34.90 49.10 70.95 95.50 500 -0.02 -0.01 0.00 0.00 0.59 0.46 0.35 0.25 5.15 4.95 4.30 4.70 92.55 98.85 100.00 100.00 1,000 0.00 0.00 -0.01 0.00 0.42 0.33 0.25 0.18 4.70 5.55 5.35 5.75 99.80 100.00 100.00 100.00 2SLS estimator 30 -0.09 -0.01 -0.01 0.01 2.74 2.15 1.64 1.15 5.90 6.05 7.10 7.90 12.70 18.25 28.35 48.40 50 0.03 0.06 0.02 0.01 1.99 1.57 1.19 0.82 5.15 5.90 5.45 5.05 18.05 27.00 42.35 71.00 100 0.01 0.01 0.02 0.01 1.38 1.08 0.81 0.57 4.85 5.55 4.55 5.30 30.45 45.25 69.20 94.30 500 -0.02 -0.01 0.00 0.00 0.62 0.48 0.36 0.25 4.70 4.50 4.95 4.40 88.80 98.15 100.00 100.00 1,000 0.00 0.00 -0.01 0.00 0.44 0.34 0.25 0.18 4.20 4.30 5.40 5.45 99.50 99.95 100.00 100.00 B2SLS estimator 30 -0.13 -0.03 -0.02 0.00 2.74 2.15 1.63 1.15 6.05 6.05 7.05 7.75 12.45 17.60 28.15 47.95 50 0.01 0.04 0.01 0.00 1.98 1.57 1.19 0.82 5.10 5.80 5.35 5.35 17.65 26.45 41.65 70.40 100 0.00 0.00 0.01 0.01 1.38 1.08 0.80 0.57 4.65 5.50 4.50 5.15 30.60 45.30 68.95 94.30 500 -0.02 -0.01 0.00 0.00 0.62 0.48 0.36 0.25 4.65 4.60 4.70 4.35 88.55 98.30 100.00 100.00 1,000 0.00 0.00 -0.01 0.00 0.44 0.34 0.25 0.18 4.35 4.40 5.05 5.60 99.55 99.95 100.00 100.00 GMM estimator 30 -1.27 -1.12 -1.08 -1.03 2.59 2.11 1.77 1.42 10.15 11.30 16.75 24.75 9.20 10.85 14.45 22.75 50 -0.69 -0.63 -0.64 -0.60 1.86 1.51 1.22 0.93 8.90 9.75 12.15 15.35 15.55 20.75 31.35 55.30 100 -0.32 -0.32 -0.30 -0.29 1.24 0.97 0.75 0.57 6.80 6.15 7.10 10.05 33.50 46.45 69.65 94.70 500 -0.08 -0.07 -0.07 -0.06 0.52 0.41 0.31 0.22 5.50 4.75 5.60 6.20 96.45 99.80 100.00 100.00 1,000 -0.03 -0.03 -0.04 -0.03 0.36 0.29 0.22 0.15 5.80 5.25 5.75 5.85 99.95 100.00 100.00 100.00 MLE 30 0.29 0.23 0.18 0.15 2.24 1.72 1.31 0.89 12.05 9.95 9.30 7.65 32.00 36.30 49.15 74.65 50 0.32 0.21 0.14 0.11 1.72 1.34 0.98 0.67 12.35 9.90 8.40 7.30 43.60 51.70 66.55 90.55 100 0.26 0.16 0.11 0.09 1.21 0.91 0.68 0.47 11.75 9.35 7.25 6.85 61.80 73.95 91.15 99.70 500 0.20 0.10 0.05 0.04 0.56 0.41 0.30 0.21 13.00 9.20 7.40 7.90 99.50 100.00 100.00 100.00 1,000 0.18 0.09 0.04 0.02 0.41 0.30 0.21 0.15 14.00 9.30 7.60 6.20 100.00 100.00 100.00 100.00 Notes: The DGP is given by (1.55), whereeit are generated by (1.56). The true parameter values areρ = 0.4,β1 = 1 andβ2 = 2. The true number of factors is 2. The power is computed underH1 :ρ = 0.38. See also the notes to Table 1.2a. 35 Table 1.3b: Small sample properties of estimators for the slope parameterβ 1 (β 1 = 1, independent and heteroskedastic errors) Bias(×100) RMSE(×100) Size(×100) Power(×100) N\T 20 30 50 100 20 30 50 100 20 30 50 100 20 30 50 100 Infeasible 2SLS estimator (including factors) 30 0.01 -0.01 -0.02 -0.03 4.44 3.56 2.72 1.88 5.25 5.05 5.45 5.05 21.20 30.10 46.95 75.90 50 -0.20 -0.18 -0.12 -0.10 3.45 2.68 2.03 1.40 5.60 4.70 4.85 4.25 29.70 42.55 66.60 92.85 100 -0.14 -0.05 -0.05 -0.04 2.45 1.93 1.48 1.02 6.05 6.00 5.45 4.75 52.35 73.80 91.80 99.85 500 -0.05 -0.03 -0.02 -0.01 1.07 0.84 0.66 0.47 4.50 4.10 5.15 5.60 99.90 100.00 100.00 100.00 1,000 0.01 0.01 0.02 0.00 0.79 0.63 0.47 0.33 6.15 5.90 4.95 4.70 100.00 100.00 100.00 100.00 2SLS estimator 30 0.03 0.02 0.03 0.02 4.70 3.77 2.92 2.06 5.30 6.00 7.00 7.90 20.05 28.95 47.50 75.50 50 -0.19 -0.18 -0.09 -0.09 3.62 2.77 2.09 1.46 4.85 5.25 4.90 4.90 26.75 40.15 65.80 92.55 100 -0.13 -0.05 -0.05 -0.04 2.53 1.97 1.52 1.04 5.55 5.30 5.30 5.15 46.80 70.30 90.30 99.80 500 -0.07 -0.05 -0.02 -0.01 1.12 0.86 0.67 0.48 4.25 3.95 4.80 5.55 99.15 100.00 100.00 100.00 1,000 0.02 0.01 0.02 0.00 0.82 0.65 0.48 0.33 5.50 5.60 5.00 4.80 100.00 100.00 100.00 100.00 B2SLS estimator 30 0.04 0.02 0.03 0.02 4.70 3.77 2.92 2.06 5.30 5.95 7.00 7.95 20.05 28.90 47.45 75.65 50 -0.19 -0.18 -0.09 -0.09 3.62 2.77 2.09 1.46 4.85 5.20 4.95 4.85 26.80 40.15 65.80 92.40 100 -0.13 -0.05 -0.05 -0.04 2.53 1.97 1.52 1.04 5.55 5.35 5.40 5.05 46.75 70.25 90.30 99.80 500 -0.07 -0.05 -0.02 -0.01 1.12 0.86 0.67 0.48 4.20 3.95 4.85 5.45 99.15 100.00 100.00 100.00 1,000 0.02 0.01 0.02 0.00 0.82 0.65 0.48 0.33 5.50 5.60 5.00 4.80 100.00 100.00 100.00 100.00 GMM estimator 30 0.14 0.15 0.16 0.16 4.76 3.81 2.93 2.07 5.65 6.70 7.30 8.05 21.05 30.05 49.25 77.70 50 -0.09 -0.07 0.01 0.01 3.64 2.78 2.09 1.45 5.00 5.30 5.25 5.10 28.45 41.85 68.10 93.75 100 -0.08 0.00 0.01 0.02 2.53 1.96 1.51 1.03 5.80 5.10 5.30 5.10 47.65 71.30 90.95 99.80 500 -0.06 -0.04 -0.01 0.00 1.11 0.86 0.67 0.47 3.95 3.70 5.00 5.45 99.05 100.00 100.00 100.00 1,000 0.02 0.02 0.02 0.01 0.81 0.64 0.48 0.33 5.30 5.60 5.25 5.10 100.00 100.00 100.00 100.00 MLE 30 0.02 0.00 -0.06 -0.02 4.88 3.70 2.75 1.85 11.80 8.75 7.55 6.20 31.25 37.10 52.50 79.70 50 -0.17 -0.15 -0.12 -0.10 3.59 2.66 1.95 1.37 10.05 6.55 5.45 5.55 38.45 50.95 72.65 94.85 100 -0.14 -0.05 -0.07 -0.06 2.56 1.92 1.45 0.99 10.60 7.65 6.50 5.75 61.90 79.80 94.25 99.95 500 -0.03 -0.02 -0.02 -0.01 1.13 0.85 0.65 0.46 10.60 6.70 6.10 6.90 99.60 100.00 100.00 100.00 1,000 0.03 0.01 0.02 -0.00 0.79 0.61 0.45 0.31 9.70 7.20 6.40 5.80 100.00 100.00 100.00 100.00 Notes: The DGP is given by (1.55), whereeit are generated by (1.56). The true parameter values areρ = 0.4,β1 = 1 andβ2 = 2. The true number of factors is 2. The power is computed underH1 :β1 = 0.95. See also the notes to Table 1.2a. 36 Table 1.4a: Small sample properties of estimators for the spatial parameterρ (ρ = 0.4, serially correlated and heteroskedastic errors) Bias(×100) RMSE(×100) Size(×100) Power(×100) N\T 20 30 50 100 20 30 50 100 20 30 50 100 20 30 50 100 Infeasible 2SLS estimator (including factors) 30 -0.16 -0.08 -0.01 0.00 2.85 2.34 1.81 1.26 5.85 7.00 6.65 5.85 13.60 16.95 23.20 37.55 50 0.07 0.05 -0.01 0.01 2.18 1.77 1.39 0.97 6.75 6.00 5.70 5.90 19.30 25.05 34.70 56.65 100 0.05 0.03 0.03 0.02 1.58 1.28 0.98 0.70 6.70 5.90 5.45 5.85 30.50 40.05 57.35 83.00 500 -0.04 -0.03 -0.02 -0.02 0.70 0.56 0.44 0.31 6.15 5.90 5.45 5.05 81.30 94.70 99.65 100.00 1,000 -0.01 -0.01 -0.01 0.00 0.50 0.40 0.31 0.22 6.05 5.95 6.40 6.75 98.45 99.95 100.00 100.00 2SLS estimator 30 -0.14 -0.07 -0.02 0.00 3.10 2.51 1.95 1.39 6.55 7.05 7.35 7.90 12.65 17.80 23.30 36.15 50 0.07 0.06 0.01 0.02 2.27 1.85 1.44 1.01 6.00 6.95 6.20 6.00 17.00 23.60 35.05 55.90 100 0.04 0.03 0.03 0.02 1.62 1.30 0.99 0.71 6.10 5.90 5.55 5.95 26.65 37.30 56.65 83.00 500 -0.04 -0.03 -0.02 -0.02 0.71 0.57 0.44 0.31 5.45 5.70 5.90 5.40 78.90 93.10 99.65 100.00 1,000 -0.01 -0.01 -0.01 0.00 0.51 0.40 0.31 0.23 5.65 5.25 6.05 6.55 97.65 99.95 100.00 100.00 B2SLS estimator 30 -0.19 -0.10 -0.04 -0.01 3.10 2.52 1.94 1.39 6.85 7.30 7.45 8.05 12.30 17.10 22.90 35.85 50 0.03 0.04 0.00 0.01 2.27 1.84 1.44 1.01 5.90 6.70 6.20 5.90 16.90 23.10 34.95 55.50 100 0.02 0.02 0.03 0.02 1.61 1.30 0.99 0.71 6.35 5.90 5.45 5.65 26.85 37.00 56.35 82.85 500 -0.05 -0.03 -0.02 -0.02 0.71 0.57 0.45 0.31 5.50 6.05 6.25 5.40 78.60 93.15 99.70 100.00 1,000 -0.01 -0.01 -0.01 0.00 0.51 0.40 0.31 0.23 5.60 5.15 5.95 6.30 97.70 99.95 100.00 100.00 GMM estimator 30 -1.11 -1.01 -0.97 -0.97 2.85 2.38 1.96 1.54 9.70 11.00 13.45 17.40 9.20 11.95 14.55 19.10 50 -0.54 -0.52 -0.55 -0.55 2.04 1.69 1.37 1.03 8.00 8.30 8.90 11.00 14.25 20.25 27.25 42.50 100 -0.24 -0.25 -0.25 -0.26 1.41 1.14 0.88 0.66 6.85 6.70 6.95 8.35 27.40 38.30 57.70 82.70 500 -0.08 -0.08 -0.07 -0.07 0.61 0.49 0.38 0.27 5.85 5.15 5.95 5.60 89.30 98.25 100.00 100.00 1,000 -0.04 -0.03 -0.04 -0.03 0.43 0.34 0.27 0.19 4.90 5.05 5.80 6.25 99.50 99.95 100.00 100.00 MLE 30 0.38 0.22 0.19 0.15 2.63 2.08 1.59 1.09 21.20 19.20 17.50 16.15 39.90 42.15 50.30 71.25 50 0.45 0.25 0.14 0.13 2.02 1.57 1.19 0.84 22.10 18.05 15.50 15.70 50.95 55.55 65.05 86.25 100 0.39 0.22 0.14 0.10 1.46 1.13 0.85 0.59 22.55 18.80 15.80 15.35 66.60 74.95 87.60 98.50 500 0.27 0.12 0.05 0.03 0.69 0.50 0.36 0.26 23.20 17.00 14.60 13.10 99.40 99.80 100.00 100.00 1,000 0.26 0.20 0.04 0.03 0.50 0.46 0.26 0.19 28.50 24.50 14.90 14.80 100.00 100.00 100.00 100.00 Notes: The DGP is given by (1.55), whereeit are given by (1.57) and (1.58). The true parameter values areρ = 0.4,β1 = 1 andβ2 = 2. The true number of factors is 2. The power is computed underH1 :ρ = 0.38. The maximum lag of the robust variance estimator is set to be 2 √ T . See also the notes to Table 1.2a. 37 Table 1.4b: Small sample properties of estimators for the slope parameterβ 1 (β 1 = 1, serially correlated and heteroskedastic errors) Bias(×100) RMSE(×100) Size(×100) Power(×100) N\T 20 30 50 100 20 30 50 100 20 30 50 100 20 30 50 100 Infeasible 2SLS estimator (including factors) 30 0.10 0.04 -0.02 -0.02 5.35 4.43 3.41 2.38 7.40 7.55 7.30 6.80 21.00 25.75 36.75 59.35 50 -0.23 -0.17 -0.09 -0.12 4.11 3.32 2.53 1.77 6.40 6.45 5.35 5.55 25.55 36.05 51.35 77.65 100 -0.18 -0.07 -0.04 -0.04 2.95 2.37 1.86 1.30 7.05 6.75 7.15 5.70 42.25 60.20 79.90 97.55 500 -0.03 -0.02 -0.01 0.00 1.26 1.04 0.82 0.59 6.00 5.55 6.15 5.90 97.70 99.80 100.00 100.00 1,000 0.02 0.01 0.02 0.00 0.94 0.76 0.60 0.42 7.10 6.70 6.55 6.05 100.00 100.00 100.00 100.00 2SLS estimator 30 0.12 0.07 0.05 0.04 5.45 4.49 3.51 2.52 6.65 7.10 8.25 8.45 19.70 26.15 37.15 60.20 50 -0.21 -0.16 -0.06 -0.11 4.17 3.34 2.54 1.79 5.75 6.45 6.05 4.95 24.30 34.00 51.40 78.50 100 -0.14 -0.06 -0.05 -0.04 2.93 2.37 1.86 1.30 5.70 6.85 6.70 5.85 39.85 57.05 78.80 97.50 500 -0.05 -0.04 -0.02 -0.01 1.29 1.05 0.83 0.59 5.30 5.15 5.80 5.95 96.65 99.70 100.00 100.00 1,000 0.02 0.00 0.02 0.00 0.95 0.76 0.60 0.42 6.25 6.35 6.55 5.95 100.00 100.00 100.00 100.00 B2SLS estimator 30 0.13 0.07 0.05 0.04 5.45 4.49 3.51 2.51 6.65 7.10 8.30 8.45 19.65 26.20 37.25 60.10 50 -0.20 -0.15 -0.06 -0.11 4.17 3.34 2.54 1.79 5.75 6.50 6.05 5.00 24.40 34.00 51.40 78.55 100 -0.14 -0.06 -0.05 -0.03 2.93 2.37 1.86 1.30 5.75 6.75 6.55 5.90 39.85 57.10 78.85 97.50 500 -0.05 -0.04 -0.02 -0.01 1.29 1.05 0.83 0.59 5.30 5.15 5.80 5.95 96.70 99.65 100.00 100.00 1,000 0.02 0.00 0.02 0.00 0.95 0.76 0.60 0.42 6.35 6.35 6.40 5.95 100.00 100.00 100.00 100.00 GMM estimator 30 0.21 0.19 0.17 0.18 5.55 4.55 3.55 2.55 7.80 8.25 8.95 8.75 22.00 28.40 39.35 63.40 50 -0.15 -0.08 0.01 -0.02 4.21 3.34 2.53 1.79 6.40 6.75 6.20 5.40 26.00 35.65 53.40 80.95 100 -0.11 -0.01 0.01 0.02 2.94 2.38 1.86 1.30 6.00 6.70 7.10 5.75 40.90 58.45 79.40 98.05 500 -0.05 -0.03 -0.01 0.00 1.29 1.05 0.82 0.59 5.40 4.95 5.75 5.80 96.75 99.70 100.00 100.00 1,000 0.02 0.01 0.03 0.01 0.95 0.76 0.60 0.42 6.00 6.60 6.85 6.10 100.00 100.00 100.00 100.00 MLE 20 0.09 0.02 0.09 0.06 6.94 5.36 4.18 2.95 22.00 16.90 17.55 16.15 34.65 36.45 45.05 62.15 30 0.10 -0.01 -0.05 -0.03 5.61 4.47 3.41 2.35 20.80 18.25 17.10 14.60 38.15 42.60 53.20 74.20 50 -0.16 -0.11 -0.11 -0.12 4.17 3.25 2.43 1.74 18.65 16.15 12.95 13.00 45.10 54.15 68.85 89.95 100 -0.10 -0.06 -0.06 -0.06 2.99 2.32 1.82 1.26 19.85 17.15 16.00 14.10 64.50 77.25 89.60 99.40 500 0.03 0.02 -0.01 -0.00 1.32 1.04 0.79 0.56 19.60 15.40 15.40 14.50 99.50 99.90 100.00 100.00 1,000 0.07 0.01 0.04 0.01 0.96 0.73 0.57 0.40 20.50 17.00 16.60 13.50 100.00 100.00 100.00 100.00 Notes: The DGP is given by (1.55), whereeit are given by (1.57) and (1.58). The true parameter values areρ = 0.4,β1 = 1 andβ2 = 2. The true number of factors is 2. The power is computed underH1 :β1 = 0.95. The maximum lag of the robust variance estimator is set to be 2 √ T . See also the notes to Table 1.2a. 38 In summary, the proposed estimators exhibit robust performance to unknown heteroskedasticity and serial correlation in the errors. Furthermore, the estimators are also robust to different intensity of spatial dependence. 24 1.6 An empirical application to US house prices In this section, we apply the proposed estimation methods to analyzing the spatial dependence of real house price changes in the US at the level of Metropolitan Statistical Areas. Since neighboring regions are often influenced by the same aggregate supply and demand shocks, it is the purpose of this exercise to properly assess the strength of the spatial interconnections while netting out the effects of common factors. As we will see, the degree of spatial dependence will be exaggerated if the unobserved common effects are not effectively removed. In addition, we are also interested in the effects of possible determinant variables on house price growth, including both direct and indirect (spillover) effects. 25 A Metropolitan Statistical Area (MSA) is defined by the United States Office of Management and Bud- get (OMB) as a core area with a relatively high population density (50, 000 people or more), including surrounding territory displaying a high level of economic and social integration with the core, as measured by commuting ties. We consider a total of 377 MSAs using the February 2013 delineations, excluding two MSAs in Alaska and two in Hawaii. 26 For the house price data, we use the Freddie Mac House Price Index (FMHPI) at the MSA level covering the period of 1975Q1–2014Q4. The FMHPI is constructed using a repeat-transactions methodology and published by Freddie Mac every quarter. The nominal house prices are deflated by the Consumer Price Index (CPI) for each MSA, and the following analysis is centered on the quarterly rate of changes in real house prices. For the explanatory variables, we are interested in examining the impact of population growth and real per capita income growth on house price growth. See Appendix A.3 for a detailed description of the data sources and transformations. As a preliminary examination of the data, we conduct the cross-sectional dependence (CD) test devel- oped by Pesaran (2015a) on the rate of changes in deseasonalized real house prices. The deseasonalization 24 See the results underρ = 0.8 in Yang (2017). 25 Cohen et al. (2016) and Bailey, Holly, and Pesaran (2016a) focus on house price series itself and do not consider any explanatory variables. 26 The Office of Management and Budget (OMB) periodically revises the MSA delineations to reflect the changes in population counts and commuting patterns. There are 381 MSAs in the US as of February 2013. The terms “area” and “MSA” are used interchangeably in the following discussions. 39 is performed by regressing the nominal house price changes on seasonal dummies and an intercept for each MSA. The CD statistic turns out to be 1364.110 (with the estimated average of the pairwise correlation coefficient being 0.406), which substantially exceeds the critical value of 1.96 at the 5% level and strongly rejects the null hypothesis of weak cross-sectional dependence. Additionally, we compute the exponent of cross-sectional dependence proposed by Bailey, Kapetanios, and Pesaran (2016b) and obtain a value of 1.000 (with a standard error 0.024). The value of the exponent, if it lies within the range [3/4, 1], would suggest that the cross-sectional dependence is fairly strong; lying in [1/2, 3/4), it would imply weak dependence of different degrees. Accordingly, the values of both the CD statistic and the estimated exponent clearly indi- cate the existence of strong cross-sectional dependence in real house price changes; hence, it is imperative to incorporate common factors into the standard spatial models, which capture only weak cross-sectional dependence. 1.6.1 The model Lety it denote the rate of changes in real house prices for areai at timet, which is computed byy it = log (P it /CPI it )−log (P i,t−1 /CPI i,t−1 ), whereP it is the house price index and CPI it is the Consumer Price Index. We consider the following model for house price changes written in stacked form: y .t =ρWy .t + (β 1 +θ 1 W) %ΔPopulation .t + (β 2 +θ 2 W) %ΔIncome .t + Υd t + Γf t + e .t , (1.60) for t = 1, 2,...,T , where y .t = (y 1t ,y 2t ,...,y NT ) 0 is a vector of observations on house price growth rates for all MSAs at period t; d t signifies an m d × 1 vector of observed factors that includes quarterly dummies and an intercept; f t represents anm f × 1 vector of unobserved factors; Υ = (υ 1 ,υ 2 ,...,υ N ) 0 and Γ = (γ 1 ,γ 2 ,...,γ N ) 0 are corresponding individual-specific factor loading matrices; and e .t is a vector of idiosyncratic error terms. It should be noted that this model accommodates individual fixed effects by including a constant term in d t and letting its loadings be heterogeneous. %ΔPopulation .t represents an N× 1 vector of percentage changes in population at timet, and %ΔIncome .t denotes a vector of percentage changes in real per capita income. Both variables are calculated as first differences of natural logarithms. W is the spatial weights matrix. For generality, model (1.60) also allows for spatial lags of the explanatory variables, namely W%ΔPopulation .t and W%ΔIncome .t , which are often referred to as Durbin terms in the literature and capture the interaction effects of exogenous variables. 40 When it comes to the specification of W, it is common practice to adopt distance- or contiguity- based weighting scheme in the studies of spatial dependence in housing markets. We will follow this tradition first and then explore other possibilities in subsequent analysis. In particular, we assume that contiguity relations are determined by radial distance, and we define “neighbors” of an MSA as those units located within a threshold distanced (miles). The weights of neighbors take a value of one, and the weights of non-neighbors take a value of zero. Then, W is row-standardized so that the weights across each row sums to unity. The spatial weights matrix constructed in this way is denoted by W d . Our analysis takesd = 100 miles as a point of departure and examines potential dependencies within commuting and transport distances around an MSA. The parameters of interest areδ = ρ,β 0 ,θ 0 0 , whereβ = (β 1 ,β 2 ) 0 andθ = (θ 1 ,θ 2 ) 0 . In what follows, we will focus on the efficient GMM estimator of δ defined by (1.46). 27 Specifically, the estimation is implemented by utilizing P 1 = W and P 2 = W 2 −Diag W 2 in the quadratic moments and Q (2) .t = X .t , WX .t , W 2 X .t as instruments in the linear moments, where X .t = (ΔPopulation .t , %ΔIncome .t ), fort = 1, 2,...,T. Table 1.5 summarizes the estimation results of model (1.60) based on W = W 100 . Findings using other specifications of W will be discussed later. In column (1), the Durbin terms are excluded and the unobserved factors are proxied by cross-sectional averages of both dependent and individual-specific regressors across all MSAs. 28 The estimated spatial coefficient is positive and highly significant, with a value of 0.730 (with a standard error of 0.004). Higher population and income growth are found to increase house price growth, as anticipated. We then include the Durbin terms, and we add to the list of factor proxies the cross-sectional averages of X ∗ .t across all MSAs, where X ∗ .t = WX .t . As can be seen from columns (3) and (5), population growth displays a positive and significant spatial interaction effect, but real income growth does not. Overall, the estimates ofρ andβ are very close across columns (1), (3), and (5). The CD statistics on the residuals of these specifications range from−5.11 to−4.93, which are substantially reduced from the previous test statistic, 1364.110, of the house price growth series itself. The exponents of cross-section dependence of the residuals, however, are about 0.73–0.74, which suggests that a moderate degree of cross-section dependence 27 The 2SLS estimates are omitted to save space, since they are very close to the GMM estimates but have larger standard errors, as expected. 28 In the empirical analysis, ¯ y ∗ t is also included as factor proxies since it may potentially improve the small sample properties of the estimator, where ¯ y ∗ t = N −1 P N i=1 y ∗ t andy ∗ it = P N j=1 wijyjt. However, it turns out that ¯ y ∗ t and ¯ yt are highly correlated for most the W matrices we considered; therefore, whether ¯ y ∗ t is included makes little difference to the results. 41 Table 1.5: Efficient GMM estimation results of model (1.60) %ΔHouse price (1) (2) (3) (4) (5) (6) ρ [W× %ΔHouse price] 0.730 0.643 0.732 0.648 0.731 0.648 (0.004) (0.005) (0.004) (0.005) (0.004) (0.005) β 1 [%ΔPopulation] 0.380 0.366 0.383 0.432 0.369 0.417 (0.035) (0.040) (0.037) (0.048) (0.036) (0.048) β 2 [%ΔIncome per capita] 0.099 0.093 0.106 0.096 0.111 0.094 (0.007) (0.007) (0.007) (0.008) (0.008) (0.008) θ 1 [W×%ΔPopulation] 0.078 0.063 0.063 0.069 (0.031) (0.036) (0.031) (0.037) θ 2 [W× %ΔIncome per capita] -0.006 0.019 (0.010) (0.012) Regional unobserved factors No Yes No Yes No Yes National unobserved factors Yes Yes Yes Yes Yes Yes MSA FE and seasonal dummies Yes Yes Yes Yes Yes Yes Residuals CD test statistic -4.946 -6.532 -4.927 -6.385 -5.111 -6.365 Exponent of cross-section dependence 0.734 0.674 0.743 0.690 0.734 0.652 (0.031) (0.019) (0.030) (0.019) (0.027) (0.019) ¯ R 2 0.808 0.837 0.813 0.844 0.817 0.847 Observations N = 377, T = 159 Notes: Dependent variable is the rate of changes in real house prices, which is computed by first difference of log of real house prices. The explanatory variables are population growth rate and real per capita income growth rate, as well as possibly their spatial lags. MSAs are classified into eight Bureau of Economic Analysis (BEA) Regions. All estimations consider national unobserved factors and include MSA fixed effects (FE) and quarterly dummies. To save space, factor estimates are not reported. The spatial weights matrix is W = W100. The efficient GMM estimates are obtained by (1.46), using P1 = W and P2 = W 2 −Diag W 2 in the quadratic moments and Q (2) .t = (X.t, WX.t, W 2 X.t) as IVs in the linear moments. Standard errors are in parentheses. The standard errors for the slope estimates are heteroskedasticity and autocorrelation consistent with the maximum lag length set to 2bT 1/2 c. ¯ R 2 is calculated by (1.61). may be unaccounted for. Therefore, we will next consider local (regional) unobserved factors in addition to global (national) factors, and we will investigate if strong dependence can be more effectively eliminated. 29 Suppose now that all MSAs are classified intoR regions. The model can still be represented by (1.60), but the observations are now ordered by regions. In specific, the N× 1 vector of house prices changes, y .t , can be written as y .t = (y 0 .1t , y 0 .2t ,..., y 0 .Rt ) 0 , where y .rt = (y 1rt ,y 2rt ,...,y Nrrt ) 0 is an N r × 1 vector of observations for the r th region, for r = 1, 2,...,R. N r is the number of MSAs in region r, and clearly we have P R r=1 N r = N. Observations on independent variables and spatial weights are 29 Bailey, Holly, and Pesaran (2016a) also consider regional effects, but the authors do not show the impact of eliminating regional factors to the estimated intensity of spatial dependence. 42 also sorted accordingly. Note that the latent factors, f t , are now assumed to have a hierarchical struc- ture, namely, f t = f 0 g,t , f 0 l,t 0 , where f g,t denotes an m g × 1 vector of global factors, f l,t denotes an m l × 1 vector of local factors, and m g +m l = m f . The associated factor loadings are partitioned as Γ = (Γ g , Γ l ), where Γ g = γ g,1 ,γ g,2 ,...,γ g,N 0 is anN×m g matrix of loadings for national factors, and Γ l = Γ 0 l,1 , Γ 0 l,2 ,..., Γ 0 l,R 0 is an N×m l matrix, with Γ l,r = γ l,1r ,γ l,2r ,...,γ l,Nrr 0 being the N r ×m l factor loadings for the r th region, r = 1, 2,...,R. The proposed GMM estimation procedure can easily accommodate regional unobservable factors by replacing them with cross-sectional averages of observations on both dependent and individual-specific independent variables for that region. Table 1.5 columns (2), (4), and (6) report the estimation results when both regional and national latent factors are taken into account. We group all 377 MSAs into R = 8 Regions based on the geographi- cal classification by the Bureau of Economic Analysis (BEA). 30 Compared with the earlier outcomes that did not assume regional factors, both the estimated spatial coefficients and the exponents of cross-section dependence of the residuals decline, suggesting that regional common shocks contribute to the strong cross- sectional dependence in house price changes in the US and that the strength of spatial connections will be overestimated if strong dependence is not effectively eliminated. In addition, after the inclusion of regional factors, the spatial interaction effect of population growth is no longer significant at the 5% level; the spa- tial interaction effect of income growth remains insignificant. Moreover, the values of ¯ R 2 indicate that the model’s goodness of fit improves if regional effects are considered, where ¯ R 2 is computed by ¯ R 2 = 1− ˆ σ 2 res /ˆ σ 2 tot , (1.61) with ˆ σ 2 tot = [N (T− 1)] −1 N X i=1 T X t=1 (y it − ¯ y i. ), ˆ σ 2 res = [N (T−k cs −k d )−k z ] −1 N X i=1 y i. − Z i. ˆ δ 0 ¯ M y i. − Z i. ˆ δ , 30 See Appendix A.3 for details about the Bureau of Economic Analysis (BEA) Regions. 43 ¯ y i. =T −1 P T t=1 y it , y i. = (y i1 ,y i2 ,...,y iT ) 0 , Z i. = (y ∗ i. , X i. , X ∗ i. ) is aT×k z matrix of regressors,k d is the number of observed factors, and ¯ M represents the de-factoring matrix ofT×k cs dimension. 31 According to the above comparisons, we conclude that column (2) provides the best estimation results among all the specifications in Table 1.5, which points to a significant neighborhood effect in house price changes, with an estimated spatial coefficient of 0.643 (0.005). 32 Care must be taken when interpreting the estimates ofβ andθ in model (1.60), as they do not directly signify the marginal effects of the independent variables on house price variations. An important feature of SAR models is that a change in an explanatory variable of a unit will affect not only the dependent variable of that unit itself but also the dependent variables of other cross-section units. The former is known as the direct effect, the latter as indirect effect, or spillover effect. Also notice that both effects in general vary across cross-section units. Therefore, to find out the marginal effects of population and income growth on house price changes, we calculate the summary measures of direct and indirect effects proposed by LeSage and Pace (2009). The average direct effect of thek th explanatory variable (k = 1, 2) is given by the average of the diagonal elements ofΠ k , where Π k = (I N −ρW) −1 (β k I N +θ k W), (1.62) and the average indirect effect is represented by the average row sum of the non-diagonal elements ofΠ k . It can be seen from (1.62) that imposingθ = 0 implies that the ratio of direct to indirect effects is the same for every explanatory variable, which may be too restrictive; hence, model (1.60) takes into account the Durbin terms. To test if the direct and spillover effects are significant, we compute the standard errors by simulation, due to the complex formula for the effects in terms of the parameters. 33 Table 1.6 shows the estimated average direct and spillover effects of population and income growth on house price growth based on the estimates in Table 1.5. The average total effect is the sum of average direct and indirect effects. When both national and regional unobserved factors are considered, the specification in column (2) of Table 1.5 outperforms its counterpart. When only national factors are taken into account, the 31 In specific,k d = 4 because the observed factors consist of quarterly dummies and an intercept. The values ofkz andkcs vary with detailed model specifications, that is, whether Durbin terms are included and if regional factors are considered. This measure of model fit in the presence of unobserved factors is in accordance with the suggestion by Holly et al. (2010, p.164). 32 Standard error in parentheses. 33 See LeSage and Pace (2009) and Section 2.7 of (Elhorst, 2014) for detailed discussions on the computation. 44 Table 1.6: Average direct and indirect effects of population and income growth on house price changes Direct Indirect Total Considering both national and regional factors %ΔPopulation 0.431 0.571 1.002 (0.047) (0.063) (0.110) %ΔIncome per capita 0.110 0.146 0.256 (0.009) (0.012) (0.020) Considering national factors only %ΔPopulation 0.518 1.153 1.672 (0.046) (0.112) (0.149) %ΔIncome per capita 0.135 0.249 0.384 (0.009) (0.017) (0.026) Notes: The effects of explanatory variables, taking both national and regional factors into account, are computed based on the estimates in column (2) of Table 1.5. When regional factors are neglected, the effects are computed using the estimates in column (3) of the same table. Bootstrapped standard errors based on 1, 000 iterations are in parentheses. See also the notes to Table 1.5. preferred specification is given by column (3). We compute the effects of the explanatory variables based on these estimates, respectively. It is not surprising to see from Table 1.6 that the estimated indirect effects are much higher if regional factors are neglected, as there is a relatively stronger degree of cross-sectional dependence in house prices left uncontrolled for. Both population growth and per capita income growth are found to exert both positive and significant direct and indirect impact on house price changes. Specifically, using the estimates produced assuming a hierarchical factor structure, on average a 1% increase in population growth in an MSA is predicted to lead to a 0.43% increase in house price growth in the MSA itself, and a 0.57% increase in house price growth in its neighboring MSAs, while holding other covariates fixed. In comparison, a 1% increase in income growth has much smaller direct and indirect effects on house price growth, which are estimated to be 0.11% and 0.15%, respectively. The spillover effect of population growth to neighboring MSAs appears to be slightly higher than its direct effect, while both effects of income growth are of similar magnitude. 1.6.2 Different spatial weights matrix specifications We now turn to inspecting the robustness of our findings to various specifications of the spatial weights matrix. Three types of weights are considered, which are constructed based on distance, migration flows, and pairwise correlations, respectively. In all of the following analysis, we will control for unobserved factors 45 at both national and regional levels, as the earlier discussion reveals the importance of both effects on the cross-sectional dependence in house price changes. We start with comparing the estimation results of model (1.60) using different radial distance matrices, W d . In specific, we consider three threshold values,d = 75, 100, and 125 miles. The estimation results are presented in columns (1) to (3) of Table 1.7, respectively. Overall, the estimates are found to be very stable as the cutoff distance varies. The estimated strength of spatial dependence rises slightly from 0.573 (0.005) to 0.693 (0.005) as the neighborhood boundary expands from 75 to 125 miles. This change is reasonable because more units are considered as neighbors of an MSA and their influences are taken into account. The average number of neighbors per MSA is 3.31 when W = W 75 , as compared to 8.65 when W = W 125 . In addition, the estimated coefficients of population and income growth remain in relatively narrow ranges asd changes. Both variables are highly significant and of reasonable magnitude. 34 Since many economic and demographic factors apart from geographical proximity may contribute to the cross-sectional dependence in house prices across MSAs, it is interesting to consider spatial weights based on other measures of closeness. In particular, the MSA-to-MSA migration flows are important indicators of the strength of interconnections. We construct a migration weights matrix, denoted by W m , of which the (i,j) th element represents the share of movers from areaj to areai of the total number of movers to areai. We do not consider non-movers or migration flows from/to non-MSAs. Notice that W m is an asymmetric matrix in which the immigration flow to each MSA is normalized to unity. The data on inter-MSA migration flows were introduced as part of the American Community Survey (ACS) dataset since 2009; therefore, W m is constructed using the migration data from the 2010–2014 ACS 5-year estimates. After dropping the estimates with high margin of errors, each MSA ends up having an average of 4.46 “neighbors.” 35 Since the most dominant migration ties are likely to be stable over a long period of time, the time invariability of W m does not give cause for concern. The estimation results of model (1.60) using W m are reported in column (4) of Table 1.7. Not surpris- ingly, we find strong evidence of spatial dependence based on migration relations. The estimated spatial parameter is significantly positive, slightly higher than the estimates using distance-based weights. The estimated coefficients of population and income growth are very close to the previous results, and both are 34 The spatially lagged population and income growth turn out to be insignificant in three cases and hence are excluded from the regressions. 35 Details on the construction of Wm and its characterization are given in Appendix A.3. 46 Table 1.7: Efficient GMM estimation results of model (1.60) using different spatial weights matrices Spatial weights matrix Distance Migration Pairwise correlations W 75 W 100 W 125 W m ˆ W + , ˆ W − %ΔHouse price (1) (2) (3) (4) (5) ρ [W× %ΔHouse price] 0.573 0.643 0.693 0.772 (0.005) (0.005) (0.005) (0.005) ρ + [ ˆ W + × %ΔHouse price] 0.715 (0.005) ρ − [ ˆ W − × %ΔHouse price] -0.308 (0.005) β 1 [%ΔPopulation] 0.432 0.366 0.294 0.230 0.147 (0.052) (0.040) (0.036) (0.031) (0.023) β 2 [%ΔIncome per capita] 0.099 0.093 0.089 0.075 0.049 (0.008) (0.007) (0.007) (0.007) (0.005) Natl. & Rgnl. unobserved factors Yes Yes Yes Yes Yes MSA FE and seasonal dummies Yes Yes Yes Yes Yes Residuals CD test statistic -6.678 -6.532 -7.127 -3.114 -6.846 Exponent of cross-section dependence 0.668 0.674 0.624 0.728 0.631 (0.023) (0.019) (0.017) (0.021) (0.014) Avg. no. neighbors 3.31 5.73 8.65 4.46 11.01 [ ˆ W + ], 8.02 [ ˆ W − ] ¯ R 2 0.833 0.837 0.833 0.840 0.908 Observations N = 377, T = 159 Notes: All estimations consider both national and regional (Natl. & Rgnl.) unobserved factors and also include MSA fixed effects (FE) and quarterly dummies. To save space, factor estimates are not reported. W d denotes radial distance weights matrix with threshold distanced, whered = 75, 100, and 125 miles. Wm denotes weights matrix constructed from MSA-to-MSA migration flows. ˆ W + and ˆ W − denote weights matrices constructed from significant positive and negative pairwise correlations of de-factored house price changes, respectively. See also the notes to Table 1.5. significantly different from zero. Both residual diagnostics and the value of ¯ R 2 indicate that the model is a good fit. The similarities between the results using distance and migration weights are quite striking, given that around 65% of the migration flows occur between MSAs located 100 miles apart. 36 The third type of spatial weights matrix we consider is created by a data-driven approach that detects significant bilateral relations using house price series itself. Essentially, this approach equates significant pairwise correlations with significant connections. Bailey, Holly, and Pesaran (2016a) suggest filtering out strong cross-section dependence from a series first, then applying a regularization or thresholding method to 36 See Figure A.2 in Appendix A.3 for the distribution of distance between the area of origin and the area of destination. 47 create sparse weights matrices. We follow this idea and construct weights matrices based on significantly positive and negative pair-wise correlations of de-factored house price changes, which are denoted by ˆ W + and ˆ W − , respectively. Specifically, the de-factoring process is conducted by regressing the house price growth rate for each MSA on an intercept, quarterly dummies, and cross-sectional averages of the dependent and explanatory variables at both national and regional levels. Then, significant connections are identified by applying the multiple testing procedure developed by Bailey et al. (2014) to the sample correlation matrix of the first-step residuals at the 5% significance level. If the corresponding correlation coefficient is positively significant, the element of ˆ W + is set to one, otherwise to zero. ˆ W − is created similarly but based on significantly negative correlations. ˆ W + and ˆ W − are then row-standardized so that each row sums to one. 37 With the correlation-based weights matrices, we are able to distinguish between the intensity of positive and negative spatial connections. Let us now consider the following model, y .t =ρ + ˆ W + y .t +ρ − ˆ W − y .t +β 1 %ΔPopulation .t +β 2 %ΔIncome .t + Υd t + Γf t + e .t , (1.63) where, as before, d t includes an intercept and quarterly dummies, and f t contains national and regional unobserved factors. 38 Table 1.7, column (5) presents the estimation results of model (1.63). The estimatedρ + andρ − have the correct sign, and both are highly significant. The magnitude of the positive spatial effect is notably greater than that of the negative effect, with a value of ˆ ρ + amounting to 0.715 (0.005) and a value of ˆ ρ − being−0.308 (0.005). The coefficients of population and income growth are again found to be positive and significant, with slightly smaller magnitude than those obtained using distance and migration weights matrices. The CD statistic is low, and the cross-section exponent is close to the borderline case of 0.5, suggesting that only weak dependence is left in the residuals. The model fits the data very well, as implied from the value of ¯ R 2 . 1.7 Concluding remarks This paper considers panel data models in the presence of two sources of cross-sectional dependence: endogenous spatial interactions and common effects. It derives identification conditions and proposes a 37 See Appendix A.3 for a more detailed characterization and comparison of different spatial weights matrices. 38 We have also considered the Durbin terms, but they are found to be insignificant. 48 number of estimators for the joint model. The estimation approach replaces the unobserved common factors with cross-sectional averages and utilizes instrumental variables and quadratic moment conditions in order to cope with the endogenous spatial effects. The proposed estimators are shown to be consistent as long as N is large, irrespective of the size ofT . The asymptotic distributions of these estimators are free of nuisance parameters, provided thatT is of a smaller order of magnitude thanN, as (N,T )→∞ jointly. Compared with the maximum likelihood approach, the number of latent factors need not be estimated, and more general forms of serial correlation in the disturbances are permitted. A wide range of Monte Carlo exercises lend further support to the theoretical results regarding identification and estimation. A detailed empirical application to real house price changes reveals that significant spatial dependence exists across MSAs in the US, and it demonstrates the importance of adequately removing common effects when analyzing the strength of spatial interconnections. The study also identifies significant effects of pop- ulation and income growth on house price growth. Besides geographical proximity, we also consider spatial weights based on migration flows and on pairwise correlations of de-factored house price changes. The main findings remain valid under the different measures of connections. These empirical results highlight the need to consider the spatial spillover effects in housing markets when making policy and business decisions. An important next step for future research is to incorporate rich spatio-temporal dynamics into the model specifications. Such extensions provide a full characterization of how an economic phenomenon transmits across space and over time, and they enable us to distinguish between short-term and long-term spillover effects. Another possible extension of the model is to include slope heterogeneity, which is especially rele- vant for studies covering different countries, regions, and industries. The present paper is also related to the recent study by Pesaran and Yang (2016), who consider networks with dominant units and common factors. The identification and estimation of these models, in which the spatial weights matrix may have unbounded column sums, are of practical importance and worth further investigation. 49 Chapter 2 Econometric Analysis of Production Networks with Dominant Units 1 2.1 Introduction Over the past decade, there has been renewed interest in production networks and the role that indi- vidual production units (firms/sectors) can play in propagation of shocks across the economy. This litera- ture builds on the multisectoral model of real business cycles pioneered by Long and Plosser (1983), and draws from a variety of studies on social and economic networks, including network games, cascades, and micro foundations of macro volatility. Notable theoretical contributions in this area include Acemoglu, Car- valho, Ozdaglar, and Tahbaz-Salehi (2012), Horvath (1998, 2000), Gabaix (2011), Acemoglu, Ozdaglar, and Tahbaz-Salehi (2016), and Siavash (2016). Empirical evidence for such propagation mechanism is presented in Foerster, Sarte, and Watson (2011), Acemoglu, Akcigit, and Kerr (2016), and Carvalho, Nirei, Saito, and Tahbaz-Salehi (2016). One important issue in this literature relates to conditions under which sector-specific shocks are likely to have lasting aggregate (macro) effects. Similar issues arise in financial networks where it is of interest to ascertain if an individual bank can be considered as “too big to fail”. Recent reviews are provided by Carvalho (2014) and Acemoglu, Ozdaglar and Tahbaz-Salehi (2016). In this paper we consider a production network with unobserved common technological factors, and derive an associated price network which is dual to the production network, which we use to derive an exact characterization of the effect of sector-specific shocks on aggregate output. We show that sector- specific shocks have aggregate effects if there are “dominant” sectors in the sense that their outdegrees are not bounded in the number of production units,N, in the economy. The outdegree of a sector is defined as the share of that sector’s output used as intermediate inputs by all other sectors in the economy. The degree of dominance (or pervasiveness) of a sector is measured by the exponentδ that controls the rate at which the 1 This chapter was co-authored with M. Hashem Pesaran. 50 outdegree of the sector in question rises withN. This measure turns out to be the same as the exponent of cross-sectional dependence introduced in Bailey et al. (2016b), for the analysis of cross-section dependence in panel data models with large cross-section and time dimensions. Our approach differs from Acemoglu et al. (2012) in three important respects. First, we provide a more general setting that allows for unobserved common factors and derive a spatial model in sectoral prices that can be taken directly to the data. We establish a one-to-one relationship between the pervasiveness of price shocks and aggregate output shocks. Second, Acemoglu et al. (2012) express the aggregate output as a reduced form function of the sector-specific shocks, based on which they are only able to derive a lower bound to the decay rate of sector-specific shocks on aggregate outcomes. They consider the first- and second-order effects, and acknowledge that ignoring higher-order interconnections might bias the results. In contrast, the present paper provides an exact expression for the effects of sector-specific shocks on aggregate fluctuations, and shows that its rate of decay only depends on the extent to which the dominant unit (sector) is pervasive, namely the one with the largestδ, denoted byδ max . We derive upper as well as lower bounds for the rate of convergence of the variability of aggregate output in terms ofN, and show that these bounds converge at the same rate, and thus establish an exact rate of convergence for aggregate output variability. Finally, Acemoglu et al. (2012) do not identify the dominant unit(s). Instead, they approximate the tail distribution, for some given cut-off value, of the outdegrees by a power law distribution and provide estimates for the shape parameters. By contrast, we propose a nonparametric approach, which is applicable irrespective of whether the outdegrees are Pareto distributed, and does not require knowing the cut-off value above which the Pareto tail behavior begins. The inverse of the proposed estimator ofδ max is an extremum estimator of the shape parameter of the Pareto distribution, β. It is simple to compute and is given by the average log of the largest outdegree relative to all other outdegrees, scaled by the size of the network,N. In addition, our approach also allows us to identify the most dominant units and their degrees of dominance,δ (i) , where δ (1) =δ max ≥δ (2) ≥..., for allδ (i) > 1/2. Small sample properties of the extremum estimator are investigated by Monte Carlo techniques and are shown to be satisfactory. A comparison of the estimates of the shape parameterβ based on Pareto distribution with the estimates based on the inverse of the extremum estimator ofδ max , shows that the latter performs much better, particularly whenN is large (300+). Furthermore, the extremum estimator is shown to perform well even under a Pareto tail distribution, whereas the commonly used estimators of the shape parameter,β, display substantial biases if the true underlying distribution is non–Pareto. 51 Application of our estimation procedure to US input-output tables over the period 1972-2002 yields yearly estimates ofδ max that lie between 0.72 and 0.82. These estimates are by and large close to the inverse of the estimates of the shape parameterβ considered in Acemoglu et al. (2012) when a 20% cut-off value is used, although the log-log regression estimates ofβ tend to be highly sensitive to the choice of the cut-off values and the different orders of interconnections considered. To provide more reliable estimates, we also conduct panel estimation and find that the largest estimate ofδ max is about 0.76 for the sub-sample covering 1972-1992 and 0.72 for the sub-sample covering 1997-2007. Quite remarkably, we find that estimates of δ max and the identity of the dominant sector are rather stable throughout the period from 1972 to 2007, with the wholesale trade sector identified as the most dominant sector for all years except for the year 2002 when the wholesale trade is estimated to be the second most dominant sector. Our estimates also suggest that no sector in the US economy is strongly dominant, which requires the value ofδ max to be close to unity, whilst the largest estimate we obtain is around 0.8. Overall, our analyses support the view that sector-specific shocks have some macro effects, but we do not find such effects to be sufficiently strong. The rest of this chapter is organized as follows. Section 2.2 presents the production network. Section 2.3 derives the associated price network. Section 2.4 introduces the concepts of strongly and weakly dominant, and non-dominant units, and network pervasiveness. Section 2.5 derives exact conditions under which micro (sectoral) shocks can lead to aggregate fluctuations. Section 2.6 shows the relation between the degree of network pervasiveness and the shape parameter of the power law distribution. Section 2.7 introduces the extremum estimator, derives its asymptotic distribution, and shows its robustness to the choice of the under- lying distribution. Section 2.8 provides evidence on the small sample properties of the alternative estimators of δ max using a number of Monte Carlo experiments. Section 2.9 presents the empirical application, and Section 2.10 concludes. Some of the mathematical details and a brief discussion of data sources are pro- vided in the Appendix. An extension of the extremum estimator to largeN andT panels, and additional Monte Carlo results are provided in an Online Supplement. Notations: The total number of cross section units (sectors) in the economy is denoted byN, which is then decomposed intom dominant units andn non-dominant units. The number of dominant units is also decomposed into strongly dominant units and weakly dominant units. (See Definition 2.1). If{f N } ∞ N=1 is any real sequence and{g N } ∞ N=1 is a sequence of positive real numbers, then f N = O(g N ) if there exists a positive finite constant K such that|f N |/g N ≤ K for all N. f N = o(g N ) if f N /g N → 0, as N →∞. If{f N } ∞ N=1 and{g N } ∞ N=1 are both positive sequences of real numbers, then f N = (g N ) 52 if there exists N 0 ≥ 1 and positive finite constants K 0 and K 1 , such that inf N≥N 0 (f N /g N )≥ K 0 , and sup N≥N 0 (f N /g N )≤K 1 . %(A) is the spectral radius of theN×N matrix A = (a ij ), defined as%(A) = max{|λ i |,i = 1, 2,...,N}, where λ i is an eigenvalue of A and|λ 1 (A)|≥|λ 2 (A)|≥···≥|λ N (A)|. kAk ∞ = max 1≤i≤N P N j=1 |a ij | andkAk 1 = max 1≤j≤N P N i=1 |a ij | are the maximum row sum norm and the max- imum column sum norm of matrix A, respectively. K is used for a generic finite positive constant not depending on N. δ i denotes the degree of dominance (or pervasiveness) of unit i in a network, where i = 1, 2,...,N, and 0≤δ i ≤ 1. 2.2 Production network To show how the two strands of literature on production networks and cross-sectional dependence are related, we begin with a panel version of the input-output model developed in Acemoglu et al. (2012). Our goal is to provide an exact characterization of the effect of unit-specific shocks on aggregate output. We assume that production of sectori at timet,q it , is determined by the following Cobb-Douglas production function subject to constant returns to scale q it =e αu it l α it N Y j=1 q ρw ij ij,t , for i = 1, 2,...,N; t = 1, 2,...,T, (2.1) where the productivity shock,u it , is given by u it =ε it +γ i f t , and is composed of a sector-specific shock,ε it , and a common technological factor,f t . The factor loading, γ i , measures the importance of technological change on sectori. Following Bailey et al. (2016b), we denote the cross-section exponent of the factor loadings byδ γ , defined by the value ofδ γ that ensures lim N→∞ N −δγ N X i=1 |γ i | =c γ > 0, (2.2) wherec γ is a finite constant, bounded inN. The standard factor model setsδ γ = 1, and treats the common factor as ‘strong’ or ‘pervasive’, in the sense that changes inf t affects all sectors of the economy. But in what follows we shall also consider cases whereδ γ < 1. In the case where the factor loadings are random 53 we shall assume thatE(γ i ) =γ6= 0, andVar(γ i ) =σ 2 γ > 0. The analysis can be easily extended to allow for multiple factors without additional complexity. In line with Acemoglu et al. (2012), we shall assume that the sector-specific shocks are cross-sectionally independent with zero means and finite variances,Var(ε it ) =σ 2 i , such that 0<σ 2 <σ 2 i < ¯ σ 2 <K <∞. The independence assumption is not necessary and can be relaxed by assuming that the errors are cross- sectionally weakly dependent. We also assume thatε it are serially uncorrelated, although this is not essential for our main theoretical results, and without loss of generality we assume that common and sector-specific shocks are uncorrelated. Returning to the other factors in the production function,l it denotes the labor input,q ij,t is the amount of output of sectorj used in production of sectori,α denotes the share of labor,ρ = 1−α, andρw ij is the share ofj th output in thei th sector. The amount of final goods,c it , are defined by c it =q it − N X j=1 q ji,t , i = 1, 2,...,N, (2.3) which are consumed by a representative household with the Cobb-Douglas preferences u (c 1t, c 2t ,...,c Nt ) =A N Y i=1 c 1/N it , A> 0. (2.4) We further assume that the aggregate labor supply, l t , is given exogenously and labor markets clear, l t = N P i=1 l it . LetP 1t ,P 2t ,...,P Nt be the sectoral equilibrium prices, Wage t the equilibrium wage rate, and denote their logarithms byp it = log (P it ),ω t = log (Wage t ). Then it can be shown that in the competitive equilibrium the logarithm of real wage, which is taken as a measure of GDP in the literature, is given by ω t −p t =μ + υ 0 N γ f t +υ 0 N ε t , (2.5) wherep t is the aggregate log price index defined by, p t =N −1 N X i=1 p it =N −1 τ 0 N p t , (2.6) 54 p t = (p 1t ,p 2t, ...,p Nt ) 0 ,γ = (γ 1 ,γ 2 ,...,γ N ) 0 ,ε t = (ε 1t ,ε 2t ,...,ε Nt ) 0 , and υ N = (υ 1 ,υ 2 ,...,υ N ) 0 = α N I N −ρW 0 −1 τ N , (2.7) where W is theN×N matrix W = (w ij ), andτ N is anN× 1 vector of ones.μ is a constant independent off t andε t , which is given by μ =α −1 α log (α) +ρ log (ρ) +ρ N X i=1 N X j=1 υ i w ij log (w ij ) . The (log) real-wage equation, (2.5), generalizes equation (3) in Acemoglu et al. (2012) by allowing for time variations in prices and technologies. By normalizing p t such that p t = −μ and ignoring the common factor, Acemoglu et al. (2012) concentrate onω t = υ 0 N ε t , as a measure of aggregate output, and refer to υ N as the ‘influence vector’ (their equation (4)), and show thatυ i ≥ 0, andτ 0 N υ N = 1. 2 They measure aggregate volatility by the standard deviation of aggregate output, namely [Var (υ 0 N ε t )] 1/2 , and focus on the asymptotic properties ofυ 0 N ε t , asN→∞. SinceVar (υ 0 N ε t ) =υ 0 N Var (ε t )υ N , it follows that ¯ σ 2 υ 0 N υ N ≥Var υ 0 N ε t ≥σ 2 υ 0 N υ N , and hence the asymptotic properties ofVar (υ 0 N ε t ) is governed by that ofυ 0 N υ N . Acemoglu et al. (2012, p.2009) derive a lower bound forυ 0 N υ N and show that 3 υ 0 N υ N ≥k α,0 N −1 +k α,1 N −2 N X j=1 d 2 j , (2.8) wherek α,0 andk α,1 are finite constants that depend onα, andd j is defined byd j = P N i=1 w ij , which is the j th column sum of W. Therefore, to analyze the limiting behavior ofυ 0 N υ N , it is sufficient to consider the limiting behavior ofN −2 P N j=1 d 2 j . This analysis is carried out in some detail by Acemoglu et al. (2012). But as we shall see, it is also important to consider the limiting behavior of individual column sums of W, and in particular to identify the ones that rise withN, as distinguished from those that are bounded inN. 2 See Appendix A of Acemoglu et al. (2012). 3 These authors also consider higher-order interconnection terms which they include on the right-hand-side ofυ 0 N υN , but these terms are dominated byN −2 P N j=1 d 2 j . 55 2.3 Price network Instead of analyzing the aggregate output directly in terms of the sector-specific shocks, we derive a price network which is dual to the production network discussed in Section 2.2. By doing so, we are able to obtain an exact expression for the decay rate of aggregate volatility, rather than just a lower bound. Given sector prices,P 1t ,P 2t ,...,P Nt , and the wage rate, Wage t , solving sectori’s problem leads to q ij,t = ρw ij P it q it P jt , (2.9) and l it = αP it q it Wage t . (2.10) Substituting the above results in (2.1) and simplifying yields p it =ρ N X j=1 w ij p jt +αω t −b i −α (γ i f t +ε it ), (2.11) where the price-specific intercepts,b i , depend only onα,ρ and the weight matrix W, b i =α log (α) +ρ log (ρ) +ρ N X j=1 w ij log(w ij ), (2.12) fori = 1, 2,...,N. In cases wherew ij = 0, we setw ij log (w ij ) = 0 as well. In matrix notation the ‘price network’, (2.11), can be written as p t =ρWp t +αω t τ N − (b +αγf t +αε t ), (2.13) where b = (b 1 ,b 2 ,...,b N ) 0 . A dual to the price equation in (2.11) can also be obtained using (2.9) in (2.3) to obtain S it =ρ N X j=1 w ji S jt +C it , (2.14) whereC it =P it c it , andS it =P it q it is the sales of sectori. The sales equation, (2.14), can also be written as S t =ρW 0 S t + C t , (2.15) 56 where S t = (S 1t ,S 2t ,...,S Nt ) 0 and C t = (C 1t ,C 2t ,...,C Nt ) 0 . Note that W enters as its transpose, W 0 , in (2.15) as compared to the price equations in (2.13). Aggregating (2.10) overi, we have Wage t N X i=1 l it =α N X i=1 P it q it , or l t Wage t =α N X i=1 S it =ατ 0 N S t . (2.16) Also using (2.15) S t = I N −ρW 0 −1 C t , (2.17) where (I N −ρW 0 ) −1 is known as the Leontief inverse. 4 Using (2.17) in (2.16) now yields the following expression for the total wage bill, l t Wage t =ατ 0 N I N −ρW 0 −1 C t . (2.18) Similarly, solving (2.13) for the log-price vector, p t , and applying Lemma B.1 in Appendix B.1 we have p t =αω t (I N −ρW) −1 τ N −α (I N −ρW) −1 ξ t , (2.19) whereξ t =α −1 b +γf t +ε t . Then the aggregate log price index,p t , defined in (2.6), is given by p t = α N τ 0 N (I N −ρW) −1 τ N ω t − α N τ 0 N (I N −ρW) −1 ξ t . (2.20) But sincew ij ≥ 0, Wτ N =τ N , and 0 < α < 1, then (I N −ρW) −1 τ N =τ N /α, and hence (2.20) can also be written as ω t −p t =υ 0 N ξ t , (2.21) 4 A proof that the Leontief matrix is invertible even in the presence of dominant units is provided in Lemma B.1 of Appendix B.1. 57 whereυ N is influence vector given by (2.7). Now let x t = p t −ω t τ N , (2.22) and rewrite (2.13) in terms of log price-wage ratios, x t , as x t =ρWx t − b−α (γf t +ε t ). (2.23) Equation (2.23) represents a first-order spatial autoregressive (SAR(1)) model with an unobserved common factor. Consider now the following simple average over the units,x it , fori = 1, 2,...,N, in the above network x N,t = 1 N τ 0 N x t =− (ω t − ¯ p t ), which is the negative of the aggregate output measure, defined by (2.5). Also, using (2.21) we have ω t − ¯ p t =−x N,t =α −1 υ 0 N b + υ 0 N γ f t +υ 0 N ε t , (2.24) which fully specifies the dependence of aggregate output on both common and sector-specific shocks. Note that equations (2.18) and (2.24) are dual of each other. (2.18) gives the total wage bill in terms of a weighted sum of consumption expenditures, with the weights given byα (I N −ρW) −1 τ N . Whilst (2.24) gives the log of the real wage rate in terms of the common and aggregate sectoral shocks, namely f t and υ 0 N ε t . Recall that common and sectoral shocks are assumed to be uncorrelated. The key issue is how much of the cyclical fluctuations in (log) real wages,Var (x N,t ), is due to common shocks (υ 0 N γ) 2 Var (f t ), and how much is due to sectoral shocks,Var (υ 0 N ε t ). There are two advantages in directly focusing on the price network model, (2.23). First, it allows us to relate the production network to the literature on spatial econometrics that should facilitate the econometric analysis of production networks, and allows us to address more easily the issues of identification and esti- mation of the structural parameters α,γ and σ 2 i , for i = 1, 2,...,N. 5 The direct use of the SAR model, (2.23), also enables us to provide exact bounds onVar (x N,t ) =Var (ω t − ¯ p t ) rather than the lower bounds 5 For example, see the recent contributions of Bai and Li (2014) and Yang (2017) on estimation of SAR models with unobserved common factors. 58 obtained by Acemoglu et al. (2012). Instead, by considering the price network explicitly we are able to show that at most only a few sectors can have significant aggregate effects, and these sectors are those with out- degrees that rise withN. The rate at which the outdegrees rise withN could very well differ across sectors and it is important that such sectors are identified and their empirical contribution to aggregate fluctuations evaluated. 2.4 Degrees of dominance of units in a network and network pervasiveness Consider a network represented by theN×N adjacency matrix W = (w ij ), wherew ij ≥ 0 for alli andj, and W is row-normalized such that P N j=1 w ij = 1, for alli. Denote thej th column of W by w ·j and the associated column sum by d j = τ 0 N w ·j , the outdegree of unit j. The outdegree is one of many network centrality measures considered in the literature. The most widely used centrality measures is degree centrality, which refers to the number of ties a node has, and in a directed network can be classified into indegree and outdegree. The indegree counts the number of ties a node receives, and the outdegree counts the number of ties a node directs to others. In this paper, we are focusing on how the weighted outdegree vary with N and normalize the weighted indegree (row sums of W) to one, because we are interested in studying the influence of a unit to other downstream units. Other centrality measures, including closeness, betweenness, and eigenvector centralities, are not relevant for our purpose, since we aim to characterize the effects of idiosyncratic shocks to a unit on some aggregate measure of the network, rather than the pattern of interdependencies of the network. To this end, we introduce the notions of strongly and weakly dominant units in the following definition. We consider units with nonzero outdegrees and assume throughout that d j > 0, for allj. Definition 2.1. (δ-dominance) We shall refer to unitj of the row-standardized network W = (w ij ≥ 0) as δ j -dominant if its (weighted) outdegree,d j = P N i=1 w ij > 0, is of orderN δ j , whereδ j is a fixed constant in the range 0≤δ j ≤ 1. More specifically, d j =κ j N δ j , forj = 1, 2,...,N, (2.25) whereκ j is a fixed positive constant which does not depend onN. The unitj is said to be strongly dominant ifδ j = 1, weakly dominant if 0 < δ j < 1, and non-dominant ifδ j = 0. We refer toδ j as the degree of dominance of unitj in the network. 59 Remark 2.1. Sinced j > 0 for allj, we must haveκ j > 0 andκ min = min (κ 1 ,κ 2 ,...,κ N )> 0. It is also worth noting thatδ j is identified by requiring thatκ j andδ j do not depend onN. In the standard case where the column sum of W is bounded inN we must haveδ j = 0 for allj, that is, all units are non-dominant. W will have an unbounded column sum ifδ j > 0 for at least onej. But due to the bounded nature of the rows of W, not all columns of W can beδ-dominant withδ j > 0 for allj. To see this, let d = (d 1 ,d 2 ,...,d N ) 0 = W 0 τ N , and note that τ 0 N d =τ 0 N W 0 τ N =N. (2.26) Hence, there must exist 0<κ j <K <∞ forj = 1, 2,...,N, such that N P j=1 κ j N δ j =N, for a fixedN and asN→∞. Letδ min = min (δ 1 ,δ 2 ,...,δ N ), and note thatN = N P j=1 κ j N δ j ≥Nκ min N δ min , which in turn implies κ min N δ min ≤ 1. (2.27) Since by assumptionκ min > 0 andδ min ≥ 0, it is clear that (2.27) cannot be satisfied for all values ofN unless δ min = 0, which establishes that not all units in a given network can be dominant. This result is summarized in the following proposition. Proposition 1. Consider the network represented by W = (w ij ≥ 0), and assume that W is row- standardized. Suppose that the outdegrees of the network, d j = P N i=1 w ij , are non-zero (d j > 0) and follow the power function, (2.25), withδ j being the degree of dominance of unitj in the network. Then not all units of the network can beδ-dominant, withδ j > 0 for allj. LetS N =N −1 P N j=1 κ j N δ j , and note that sinceκ j > 0 for allj,κ min = min j (κ j )> 0, and hence S N =N −1 N X j=1 κ j e δ j lnN ≥κ min N −1 N X j=1 e δ j lnN . (2.28) Now using a Taylor series expansion ofe δ j lnN , we obtain N X j=1 e δ j lnN = N X j=1 " 1 + ∞ X s=1 δ s j (lnN) s s! # = N + ∞ X s=1 (lnN) s s! N X j=1 δ s j , (2.29) 60 which if substituted in (2.28) yields S N ≥κ min 1 + ∞ X s=1 P N j=1 δ s j (lnN) s s!N . (2.30) SinceS N = 1, and all the summands overs in (2.30) are nonnegative asδ j ≥ 0 and lnN > 0, it is necessary that P N j=1 δ s j (lnN) s s!N → 0, asN→∞, for alls = 1, 2, 3,.... (2.31) Also note that for any finites, (lnN) s / (s!N)→ 0, asN→∞, and since ∞ X s=1 (lnN) s s!N = N− 1 N → 1, asN→∞, then it must be that (lnN) s / (s!N)→ 0, as N →∞, for all s, including s→∞. Furthermore, since 0≤δ j ≤ 1 then N X j=1 δ s j ≤ N X j=1 δ j , fors≥ 1, (2.32) and P N j=1 δ s j (lnN) s s!N ≤ N X j=1 δ j (lnN) s s!N . Hence, for conditions in (2.31) to be met it is sufficient that{δ j } is summable, namely N X j=1 δ j <K <∞. (2.33) As we shall see, this condition plays an important role in the proof of consistency of the extremum estimator proposed in the sub-section 2.7.2 below. Suppose now thatm units are strongly dominant withδ j = 1, and the rest are non-dominant withδ j = 0. Then using (2.30) we have S N ≥κ min " 1 +m ∞ X s=1 (lnN) s s!N # =κ min 1 +m N− 1 N , and sinceS N = 1, it follows thatm cannot rise withN, and must be a fixed integer. 61 In the case wherem units are dominant withδ j > 0, thenm must be finite if the summability condition given by (2.33) is to hold. For example, suppose that onlym units are dominant. Then N P j=1 δ j ≥mδ min > 0, and from the summability condition (2.33) we have K > N P j=1 δ j ≥ mδ min , from which it follows that m≤K/δ min which is bounded inN. These findings are summarized in the next proposition. Proposition 2. Consider the network represented by W = (w ij ≥ 0), and assume that W is row- standardized, and the outdegrees of the network, d j = P N i=1 w ij , are non-zero (d j > 0). Then the num- ber of strongly dominant units must be fixed and cannot rise with N. Moreover, if{δ j } are summable, P N j=1 δ j <K <∞, then the number of dominant units withδ j 6= 0 must be finite, whereδ j is the degree of dominance of unitj in the network. Remark 2.2. Analogous results have also been found in Chudik, Pesaran, and Tosetti (2011) regarding the possible number of strong factors, and in Chudik and Pesaran (2013) on the number of dominant units in large dimensional vector autoregressions. Using the concept ofδ-dominance of units in a given network, we now introduce the idea of network pervasiveness which is relevant for characterization of the degree to which shocks to an individual unit diffuse across the other units in the network. Definition 2.2. (Network pervasiveness) Degree of pervasiveness of a given row-standardized network, W = (w ij ≥ 0, P N j=1 w ij = 1), is defined by δ max = max (δ 1 ,δ 2 ,...,δ N ), where δ j is the degree of dominance of itsj th unit. Using the above concepts we now consider the rate at whichVar (x N,t ) varies withN, and show that it is governed by the pervasiveness of the network, measured by δ max , and the exponent of the common factor, δ γ , defined by (2.2). For unit-specific shocks to dominate the macro or common factor shocks we needδ max >δ γ > 1/2. We also show how our measure of network pervasiveness,δ max , is related toβ, the shape parameter of the Pareto distribution fitted to the ordered outdegrees,d (i) , fori = 1, 2,...,N. 2.5 Price networks with one dominant unit and aggregate fluctuations Consider the price network (2.23), and assume that it contains 1 dominant unit and n = N− 1 non- dominant units. The analysis can be readily extended to networks withm dominant units (m fixed), but to 62 simplify the exposition here we confine our analysis to networks with one dominant unit. (The derivations for the general case is provided in Appendix B.2). Without loss of generality suppose the first element of x t , namelyx 1t , is the dominant unit, and write (2.23) in the partitioned form as (settingw 11 = 0) x 1t x 2t = 0 ρw 0 12 ρw 21 ρW 22 x 1t x 2t + g 1t g 2t , (2.34) where x 2t = (x 2t ,x 3t ,...,x Nt ) 0 , w 21 = (w 21 ,w 31 ,...,w N1 ) 0 , w 12 = (w 12 ,w 13 ,...,w 1N ) 0 , g 2t = (g 2t ,g 3t ,...,g Nt ) 0 , and g it =−b i −α(γ i f t +ε it ), for i = 1, 2,...,N. W 22 is the n×n weight matrix associated with then non-dominant units and is assumed to satisfy the condition|ρ|kW 22 k 1 < 1. Further- more, note that since Wτ N = 0 w 0 12 w 21 W 22 1 τ n = 1 τ n , then w 0 12 τ n = 1, andτ n − w 21 = W 22 τ n . The latter result states that thei th row sum of W 22 is given by 1−w i1 ≤ 1, and considering that 0≤ w i1 < 1, then we must havekW 22 k ∞ ≤ 1, which also establishes that %(W 22 )≤ 1, where %(A) denotes the spectral radius of A. Under the assumption that|ρ| < 1, by Lemma B.1 in Appendix B.1 the system of equations (2.34) has a unique solution given by x 1t x 2t = 1 −ρw 0 12 −ρw 21 S 22 −1 g 1t g 2t (2.35) = S −1 (ρ)g t , where S 22 = I n −ρW 22 . In addition, since|ρ|kW 22 k 1 < 1, it follows from Lemma B.2 in Appendix B.1 that S −1 22 has bounded row and column norms. For future reference also note that the (1, 1) th element of S −1 (ρ) is given byζ −1 1 , where ζ 1 = 1−ρ 2 w 0 12 S −1 22 w 21 6= 0. (2.36) Finally, to allow unit 1 to beδ-dominant we consider the following exponent formulation d 1 = N X i=2 w i1 =κ 1 N δ 1 , (2.37) 63 whered 1 is allowed to rise withN, withκ 1 > 0 and 0 < δ 1 ≤ 1. Recall thatκ 1 andδ 1 can not vary with N. 6 The system of equations (2.34) can now be solved for x 2t in terms ofx 1t , namely (recall that by assump- tion|ρ|kW 22 k 1 < 1) x 2t =x 1t ρ S −1 22 w 21 + S −1 22 g 2t , (2.38) and 7 x 1t =ζ −1 1 g 1t +ρw 0 12 S −1 22 g 2t . (2.39) Using the above in (2.38), we now have x 2t = (ρ/ζ 1 ) g 1t +ρw 0 12 S −1 22 g 2t S −1 22 w 21 + S −1 22 g 2t . (2.40) The first term of x 2t refers to the direct and indirect effects of the dominant unit, and the second term relates to the network dependence of the non-dominant units. Our primary focus is the extent to which shocks to individual units affect aggregate measures over the network. A standard aggregate measure is cross-section averages of x it over i = 1, 2,...,N. Here we consider the simple average x N,t = x 1t + P N i=2 x it N = x 1t +τ 0 n x 2t N , but our analysis equally applies to weighted averages, x ∗ N,t = P N i=1 $ i x it , so long as the weights$ i are granular in the sense that$ i =O N −1 . Using (2.38) and (2.39) we have x N,t = x 1t +τ 0 n S −1 22 (ρw 21 x 1t − b 2 −αε 2t −αγ 2 f t ) N , where b 2 = (b 2 ,b 3 ,...,b N ) 0 andγ 2 = (γ 2 ,γ 3 ,...,γ N ) 0 . Hence x N,t =N −1 −a n +θ n x 1t −αψ n f t −αφ 0 n ε 2t , (2.41) 6 The exponent formulation of column sums in (2.37) will be compared and contrasted to the power law specification favored in the literature in Section 2.7 below. 7 In deriving (2.39), it is required thatζ16= 0. This condition is met since theN×N matrix on the right-hand-side of (2.35) is non-singular. 64 wherea n =τ 0 n S −1 22 b 2 ,φ 0 n =τ 0 n S −1 22 , and θ n = 1 +ρφ 0 n w 21 , (2.42) ψ n = φ 0 n γ 2 . (2.43) The first term of (2.41),N −1 a n , is bounded inN, sincekW 22 k ∞ ≤ 1 andρkW 22 k 1 < 1, and as a result S −1 22 will have bounded row and column norms by Lemma B.2 in Appendix B.1. The second term captures the effect of the dominant unit. The third term is due to the common factor,f t , and the final term represents the average effects of the micro productivity shocks. N −1 φ n is the influence vector associated with the non-dominant units. It is analogous to the influence vector defined by (2.7) which applies to all units. Starting with the final term of (2.41), we first note that σ 2 N −2 φ 0 n φ n ≤Var N −1 φ 0 n ε 2t ≤ ¯ σ 2 N −2 φ 0 n φ n , (2.44) whereφ 0 n = (φ 2 ,φ 3 ,...,φ N ) is ann× 1 vector of column sums of S −1 22 and has bounded elements. Further- more, since φ 0 n =τ 0 n +ρτ 0 n W 22 +ρ 2 τ 0 n W 2 22 +..., ρ > 0 andw ij ≥ 0, thenφ min = min(φ 2 ,φ 3 ,...,φ N ) > 1, andφ max = max(φ 2 ,φ 3 ,...,φ N ) < K <∞. Hence, 1<φ 2 min ≤N −1 φ 0 n φ n ≤φ 2 max <K <∞, andN −1 φ 0 n φ n is bounded from below and above by finite non-zero constants. Using this result in (2.44) we also have σ 2 <NVar N −1 φ 0 n ε 2t < ¯ σ 2 φ 2 max <∞, which establishes that Var N −1 φ 0 n ε 2t = N −1 , (2.45) where N −1 denotes the convergence rate of Var N −1 φ 0 n ε 2t in terms of N, and should be distin- guished from theO N −1 notation, which provides only an upper bound onVar N −1 φ 0 n ε 2t . 65 Next, using (2.39) we have Cov x 1t ,N −1 φ 0 n ε 2t =−αρζ −1 1 N −1 w 0 12 H 22 τ n , (2.46) and Cov (x 1t ,f t ) =−α ζ −1 1 γ 1 +ρζ −1 1 h 2 Var (f t ), (2.47) where H 22 = S −1 22 V 22,ε S 0−1 22 , V 22,ε = diag σ 2 2 ,σ 2 3 ,...,σ 2 N , and h 2 = w 0 12 S −1 22 γ 2 . It then follows that overall (recalling thatf t andε it are independently distributed), we have Var (x N,t ) = N −2 θ 2 n Var (x 1t )− 2αN −2 θ n Cov x 1t ,φ 0 n ε 2t +α 2 N −2 Var φ 0 n ε 2t +α 2 N −2 χ n Var (f t ), (2.48) where χ n =ψ 2 n + 2ψ n θ n ζ −1 1 γ 1 + 2ρψ n θ n ζ −1 1 h 2 . Also, using (2.39) we have Var (x 1t ) =ζ −2 1 α 2 h γ 2 1 +ρ 2 h 2 2 Var(f t ) +σ 2 1 i +ζ −2 1 ρ 2 α 2 w 0 12 H 22 w 21 , (2.49) which is easily seen to be bounded inN. A number of results can now be obtained from (2.48). First, without a common factor and a dominant unit,Var (x N,t ) = (N −1 ), and the effects of idiosyncratic shocks onx N,t will vanish at the rate ofN −1/2 , asN→∞. This rate matches the decay rate of shocks in models without a network structure, namely even if we set W = 0. Therefore, for micro shocks to have macroeconomic implications there must be at least one dominant unit in the network. To see this consider now the case where there is no common factor but the network includes a dominant unit. Then using (2.45) and (2.48) we have Var (x N,t ) =N −2 θ 2 n Var (x 1t )− 2αN −2 θ n Cov x 1t ,φ 0 n ε 2t +O N −1 . (2.50) 66 Recall that Var(x 1t ) is bounded in N, and consider the limiting properties of N −1 θ n defined by (2.42). Since N −1 +φ min ρN −1 d 1 ≤N −1 θ n ≤N −1 +φ max ρN −1 d 1 , (2.51) where 1≤φ min ≤φ max <K, then the asymptotic behavior ofN −1 θ n depends on the way the outdegree of the dominant unit, namelyd 1 , varies withN. Using the exponent specification given by (2.25),d 1 =κ 1 N δ 1 , it follows that N −1 +φ min ρκ 1 N δ 1 −1 ≤N −1 θ n ≤N −1 +φ max ρκ 1 N δ 1 −1 , (2.52) which leads to N −1 θ n = (N δ 1 −1 ), 0<δ 1 ≤ 1. (2.53) Consider now the second term of (2.50), and note from (2.46) that Cov x 1t , v 0 n ε 2t ≤ (1−ρ)ρ ζ 1 N −1 w 0 12 ∞ S −1 22 ∞ kV 22,ε k ∞ kφ n k ∞ =O N −1 , sincekw 0 12 k ∞ =kw 12 k 1 = P N i=2 w 1i = 1, S −1 22 ∞ <K,kV 22,ε k ∞ = ¯ σ 2 <K, andkφ n k ∞ =φ max < K. Using the above results in (2.50) we have Var (x N,t ) = (N 2δ 1 −2 ) +O(N δ 1 −2 ) +O N −1 , which simplifies to (sinceδ 1 ≤ 1) Var (x N,t ) = (N 2δ 1 −2 ) +O N −1 , (2.54) and hence Var (x N,t ) = (N 2δ 1 −2 ), ifδ 1 > 1/2. (2.55) This is the main result for the analysis of macro economic implications of micro shocks, and is more general than the one established by Acemoglu et al. (2012) who only provide a lower bound on the rate at which aggregate volatility changes withN. 67 It is also instructive to relate N −1 θ n to the first- and higher-order network connections discussed in Acemoglu et al. (2012). Expanding the terms of the inverse S −1 22 ,N −1 θ n can also be written as N −1 θ n =N −1 1 +ρτ 0 n w 21 +ρ 2 τ 0 n W 22 w 21 +ρ 3 τ 0 n W 2 22 w 21 +... , where N −1 ρτ 0 n w 21 = ρN −1 d 1 represents the effects of the first-order network connections on θ N , N −1 ρ 2 τ 0 n W 22 w 21 , the effects of the second-order network connections and so on. But in view of (2.51) and (2.53) all these higher order interconnections (individually and together) at most behave as (N δ 1 −1 ). Therefore, the rate at which micro shocks influence the macro economy depends onδ 1 , which measures the strength of the dominant unit. But it should be noted from (2.54) that to ensure a non-vanishing variance, Var (x N,t ) > 0, asN→∞, we need a value ofδ 1 = 1. When 1/2 < δ 1 < 1, the network accentuates the diffusion of the idiosyncratic shocks across the network but does not lead to lasting impacts. No network effects of unit-specific shocks can be identified whenδ 1 ≤ 1/2. Hence, for the dominant unit to have any impact over and above the standard rates of diversification of micro shocks onx N,t , we needδ 1 > 1/2. Remark 2.3. The finding thatδ 1 cannot be distinguished from zero ifδ 1 < 1/2 is also related to the study by Bailey et al. (2016b), who show that the exponent of cross-sectional dependence,α, can only be identified and consistently estimated for values ofα> 1/2. Suppose now that the network is subject to common shocks without a dominant unit. In this case we have Var (x N,t ) =α 2 N −2 ψ 2 n Var (f t ) + N −1 , and the rate of convergence ofx N,t is determined by the strength of the factor as given byN −2 ψ 2 n . Using (2.43) we have (recall that% (W 22 )≤ 1), N −1 ψ n =N −1 τ 0 n S −1 22 γ 2 =N −1 τ 0 n γ 2 +ρτ 0 n W 22 γ 2 +ρ 2 τ 0 n W 2 22 γ 2 +... . By a similar line of reasoning as before, it is then easily seen thatN −1 ψ n = N δγ−1 , whereδ γ (already defined by (2.2)) is the cross-section exponent of the factor loadings,γ i , and measures the degree to which the common factor is pervasive in its effects on sector-specific productivity. 68 Finally, suppose that the production network is subject to a common factor as well as containing a dominant unit. Then forδ 1 > 1/2 andδ γ > 1/2 we have 8 Var (x N,t ) = N 2δ 1 −2 + N 2δγ−2 + N −1 . (2.56) It is clear that the relative importance of the dominant unit and the common factor depends on the relative magnitudes ofδ 1 andδ γ . We need estimates of these exponents for a further understanding of the relative importance of macro and micro shocks in business cycle analysis. Allowing for multiple factors and multiple dominant units does not alter the main results, and the general expression in (2.56) will continue to apply, with the difference thatδ γ andδ 1 in such a general setting will refer to the maximum of the exponents of the factors andδ max = max(δ 1 ,δ 2 ,...,δ N ). 2.6 δ max and β measures of network pervasiveness The degree of network pervasiveness,δ max , defined in Definition 2.2, is related toβ, the shape param- eter of the power law assumed by Acemoglu et al. (2012, Definition 2) for the outdegree sequence, {d 1 ,d 2 ,...,d N }. To see this we first use the specification of the outdegrees given by (2.25) in (2.8) to obtain υ 0 N υ N ≥k α,0 N −1 +k α,1 N −2 m X j=1 κ 2 j N 2δ j +k α,1 N−m N 2 N X j=m+1 κ 2 j / (N−m) , where (N−m) −1 P N j=m+1 κ 2 j = O(1). Also, recall that m must be finite if{δ j } is to be summable (Proposition 2), and therefore N −2 m X j=1 κ 2 j N 2δ j ≤mκ 2 max N 2(δmax−1) , where as before κ max = max (κ 1 ,κ 2 ,...,κ N ). Then the limiting behavior ofυ 0 N υ N will be determined by N 2(δmax−1) , namely the cross section exponent of the strongest of the dominant units, δ max . For the production network to affect macro economic fluctuations we need δ max > 1/2. But to make sure that υ 0 N υ N does not converge to zero asN→∞, as noted earlier a much stronger condition, namelyδ max = 1, is required. It is clear that the same result follows if W has one strongly dominant unit withδ = 1. The 8 Note that for values ofδ1 andδγ < 1/2 the third term in (2.56) will dominate the network and the common factor effects. 69 remaining dominant units either behave similarly, or will be dominated by the leading dominant unit, with δ max . Consider now Corollary 1 of Acemoglu et al. (2012), where it is established that aggregate volatility behaves asymptotically asN −2(β−1)/β−2 β , for some small β > 0 andβ∈ (1, 2). Matching this rate of expansion withN 2(δmax−1) , we have 2 (δ max − 1)≥−2(β− 1)/β− 2 β , orδ max ≥ 1/β− β . Therefore, δ max can be viewed as measuring the inverse ofβ, a result that we formally establish below. It is also easily seen thatδ max ≥ 1/ς− ς , where ς > 0, andς is the shape parameter of the power law assumed by Acemoglu et al. (2012, Corollary 2) for the second-order degree sequences (based on the second-order interactions in the network). Other shape parameters are also considered by Acemoglu et al. (2012) for higher-order degree sequences. But clearly these shape parameters are closely related, and are not needed since as shown above it is possible to derive an exact rate for the asymptotic behavior ofυ 0 N υ N . 2.7 Estimation and inference In this section we consider the problem of estimating the degree of dominance of units in a given network. We consider the power law approach employed in the literature as well as a new method that we propose when the outdegrees,{d 1 ,d 2 ,...,d N }, follow the exponent specification defined by (2.37). It is unclear if a power law specification for the outdegrees (above a given cut-off value) is necessarily to be preferred to a specification which relates the outdegrees directly to the size of the network, N, without the need to specify a cut-off value. The exponent specification of outdegrees has the added advantage that it also allows identification of more than one dominant units in the network. 2.7.1 Power law estimators Suppose that we have observations on the outdegrees, d it , fori = 1, 2,...,N; andt = 1, 2,...T . The power law estimate ofδ max is given by 1/ ˆ β, where ˆ β is an estimator of the shape parameter of the power law distribution fitted to the outdegrees that lie above a given minimum cut-off value, d min . (See Section 2.6). A random variableD is said to follow a power law distribution if its complementary cumulative density function (CCDF) satisfies Pr (D≥d)∝d −β , (2.57) 70 where β > 0 is a constant known as the shape parameter of the power law, and∝ denotes asymptotic equivalence. 9 As the name suggests, the tail of the power law distribution decays asymptotically at the power ofβ. It is readily seen that the probability density function ofD followsf D (d)∝d −(β+1) . A popular specific case of power laws is the Pareto distribution. Its CCDF is given by Pr (D≥d) = (d/d min ) −β ,d≥d min , (2.58) for some shape parameterβ > 0 and lower boundd min > 0. The Pareto distribution has been widely used to study the heavy-tailed phenomena in many fields including economics, finance, geology, physics, just to name a few. Since our focus is on the estimation of the shape parameterβ, in what follows we briefly describe three approaches that are frequently used in the literature. 10 The first is to run the following log-log regression (also known as Zipf regression), lni =a−β lnd (i) ,i = 1, 2,...,N min , (2.59) wherea is a constant, i is the rank of the uniti in the sequence{d (i) }, andd max = d (1) ≥ d (2) ≥ ...≥ d (N min ) , are the largest ordered outdegrees such that d (N min ) ≥ d min , and N min is the number of cut-off observations used in the regression. A bias-corrected version of the log-log estimator ofβ, is proposed by Gabaix and Ibragimov (2011) who suggest shifting the ranki by 1/2 and estimatingβ by Ordinary Least Squares (OLS) using the following regression ln (i− 1/2) =a−β lnd (i) ,i = 1, 2,...,N min . (2.60) In what follows we consider this log-log estimator and refer to it as the Gabaix-Ibragimov (GI) estimator, which we denote by ˆ β GI . The standard error of ˆ β GI is estimated by ˆ σ ˆ β GI = p 2/N min ˆ β GI . 9 More generally, power law distributions take the form Pr (D≥d)∝L(d)d −β , whereL(d) is some slowly varying function, which satisfies lim d→∞ L (rd)/L (d) = 1, for anyr> 0. 10 More sophisticated techniques are reviewed in, among others, Beirlant et al. (2006), and are beyond the scope of this paper. 71 Another often-used estimator of β is the maximum likelihood estimator (MLE), denoted by ˆ β MLE , which is also the well-known Hill estimator (Hill, 1975). It can be easily verified that 11 ˆ β MLE = N min P N min i=1 lnd (i) −N min lnd (N min ) , (2.61) and its standard error is given by ˆ σ ˆ β MLE = ˆ β MLE / √ N min . The ML estimator is most efficient ifd min is known and the underlying distribution above the cut-off point is Pareto. Finally, some researchers, notably Clauset, Shalzi and Newman (2009, CSN), propose joint estimation of β andd min and recommend estimatingd min by minimizing the Kolmogorov-Smirnov or KS statistics, which is the maximum distance between the empirical cumulative distribution function (CDF) of the sample,S (d), and the CDF of the reference distribution,F (d), namely, T KS = max d≥d min |S (d)−F (d)|. Here F (d) is the CDF of the Pareto distribution that best fits the data for d≥ d min . The MLE in (2.61) is then computed using the estimated value of d min . Hereafter, we refer to this estimator as the feasible maximum likelihood estimator and denote it by ˆ β CSN . 12 In the subsequent analysis, we examine how the inverse of β, which is estimated by the three proce- dures discussed above, behave as an estimator ofδ max , and how these estimators compare to the extremum estimators that we now consider. 2.7.2 Extremum estimators We now propose an extremum estimator of δ max which could also be used to estimate β, and will be shown to be more generally applicable as compared to the power law estimators of β. Our estimator is motivated by the exponent specification of outdegrees given by (2.25). 11 See, for example, Appendix B of Newman (2005). 12 The code implementing this method can be downloaded from http://tuvalu.santafe.edu/~aaronc/powerlaws/. 72 Pure cross sections In line with the literature on estimation ofβ, we begin with the case where only a single set of obser- vations on the outdegrees,{d i }, is available, but instead of the power law specification we assume thatd i , i = 1, 2,...,N, are generated according to the following exponent specification: d i =κN δ i exp(υ i ), i = 1, 2,...,N, (2.62) whereκ> 0,υ i sIID(0,σ 2 υ ), andυ i is distributed independently ofδ i . Furthermore, given the constraint (2.26), we must have κ N X i=1 N δ i exp(υ i ) =N. (2.63) Taking the log transformation of (2.62) we have lnd i = lnκ +δ i lnN +υ i , i = 1, 2,...,N. Averaging acrossi now yields N −1 N X i=1 lnd i = lnκ + N −1 N X i=1 δ i ! lnN +N −1 N X i=1 υ i . (2.64) But from the summability condition, (2.33), it follows that N −1 N X i=1 δ i ! lnN≤K lnN N , and hence N −1 N X i=1 δ i ! lnN→ 0, asN→∞. (2.65) Considering that by assumptionυ i areIID overi, then the last term of (2.64) also tends to zero, and lnκ can be estimated by d lnκ =N −1 N X i=1 lnd i . (2.66) An extremum estimator ofδ max now emerges as ˆ δ max = lnd max −N −1 P N i=1 lnd i lnN = N −1 P N i=1 ln (d max /d i ) lnN , (2.67) 73 whered max is the largest value ofd i > 0. As we shall see ˆ δ max is a consistent estimator of the degree of pervasiveness of the most dominant unit in the network under fairly general specifications of the outdegrees, including the exponent specification given by (2.62). The exponent specification has the advantage that it is closely related to (2.25) in thatκ j =κ exp(υ i )> 0, and is in line with the production network model derived from a set of underlying economic relations. Nonetheless, in practice it is difficult to know if the true data generating process follows the exponent or a power law specification. But it turns out that 1/ ˆ δ max is a consistent estimator ofβ, the shape parameter of the Pareto distribution, even under the Pareto distribution. To see this, suppose that the observations on the outdegrees, d i , for i = 1, 2,...,N, are independent draws from the following mixed-Pareto distribution f(d i ) ∝ d −1−β i , ifd i ≥d min , (2.68) ∝ ψ(d i ), ifd i <d min , whered i > 0 follows a Pareto distribution with the shape parameterβ for values ofd i aboved min , and an arbitrary non-Pareto distribution,ψ(d it ), for values ofd i belowd min . The constants of the proportionality for both branches of the distribution are set to ensure that R ∞ 0 f(x)dx = 1, and that a given non-zero proportion of the observations fall aboved min . Using (2.67), the extremum estimator, ˆ δ max , can be rewritten as ˆ δ max = z max −N −1 P N i=1 z i lnN , (2.69) where z i = ln(d i /d min ), for all i, and z (i) = ln(d (i) /d min ), with d (i) being the i th largest value of d i as before. Sinced min is a given constant and by assumptiond i are independently distributed, it then follows that forz i ≥ 0,z i are independent draws from an exponential distribution with parameterβ, namely f Z (z) =βe −βz , forz≥ 0, 74 withE (z|z≥ 0) = 1/β, andVar (z|z≥ 0) = 1/β 2 , forβ > 0. 13 We also assume thatE (z i |z i < 0) exists, which is a mild moment condition imposed onψ(d i ) for ln (d i /d min )< 0. The following proposition summarizes the consistency property of ˆ δ max as an estimator of 1/β. Proposition 3. Suppose thatd i , fori = 1, 2,...,N, are independent draws from the Pareto tail distribution given by (2.68) with the shape parameterβ > 0, and assume thatz i = ln(d i /d min ) has finite second order moments for all values ofz i < 0. It then follows that lim N→∞ E ˆ δ max = 1/β, andVar ˆ δ max =O " 1 (lnN) 2 # , (2.70) where ˆ δ max is defined by (2.69). A proof is provided in Appendix B.3. Remark 2.4. The convergence of ˆ δ max toδ = 1/β, is at the rate of 1/ lnN which is rather slow. But it is obtained without making any assumptions aboutd min and/or the shape ofψ(d), the non-Pareto part of the distribution. ShortT panels Suppose now that observations on outdegrees,d it , are available acrosst = 1, 2,...,T and are generated as before, namely d it =κN δ i exp(υ it ), i = 1, 2,...,N; t = 1, 2,...,T, (2.71) whereT is finite (T > 1) andN is large,κ > 0, andδ i ≥ 0 are fixed constants. υ it s IID(0,σ 2 υ ) overi andt. 14 Noting that (2.63) must hold and using a similar line of reasoning as in the pure cross section case we obtain the following "panel extremum estimator" ˆ δ max = T −1 P T t=1 lnd max,t − (TN) −1 P T t=1 P N j=1 lnd jt lnN , (2.72) 13 It is worth noting thatz has moments even ifβ≤ 1, although the Pareto distribution has moments only forβ > 1. 14 This assumption can be relaxed to allow for both heteroskedasticity and serial correlation if T is sufficiently large. This extension is considered in an Online Supplement. 75 whered max,t is the largest value ofd it for periodt. Also for other units we have ˆ δ i = T −1 P T t=1 lnd it − (TN) −1 P T t=1 P N j=1 lnd jt lnN , (2.73) where ˆ δ i is the estimator ofδ i . Recall that only estimates of ˆ δ i > 1/2 should be considered, since any unit with an estimate ofδ i below 1/2 will not have any network effects (see Remark 2.3). We denote the degree of dominance of uniti byδ i , and the associated ordered values byδ (i) , whereδ max =δ (1) ≥δ (2) ≥...≥δ (N) . Then the second largest ˆ δ i , denoted by ˆ δ (2) , is a consistent estimator of δ (2) ; the third largest ˆ δ i , denoted by ˆ δ (3) , is a consistent estimator of δ (3) ; and so on. We refer to ˆ δ (i) , for i = 1, 2,...,m, as the extrema estimators, with ˆ δ (m) > 1/2. To investigate the asymptotic properties of the extrema estimators, we first note that under (2.71), d lnκ and ˆ δ i can be written as d lnκ− lnκ = ¯ δ lnN + ¯ υ, (2.74) ˆ δ i −δ i = ¯ δ + ¯ υ i − ¯ υ lnN , (2.75) where d lnκ = (TN) −1 P T t=1 P N i=1 lnd it , ¯ δ =N −1 P N i=1 δ i , ¯ υ i =T −1 P T t=1 υ it , and ¯ υ =N −1 P N i=1 ¯ υ i . It is now easily seen that Cov( ˆ δ i , ˆ δ j ) = − 1 (lnN) 2 N σ 2 υ T , for alli6=j, Var ˆ δ i = σ 2 υ (lnN) 2 T 1− 1 N . (2.76) Clearly for any giveni,Var ˆ δ i → 0, for a fixedT asN→∞, and the rate of convergence ofVar ˆ δ i is given by (lnN) −2 . Also, it is already established that ¯ δ→ 0, asN→∞, and as a result ˆ δ i → p δ i , which establishes that for any finite T ≥ 1, ˆ δ i is a lnN consistent estimator of δ i . The estimators of δ for two different units (i,j), are asymptotically independent and their covariance converges to zero at the faster rate of (lnN) −2 N −1 . Then, for any giveni we have (applying standard central limit theorem to ¯ υ i − ¯ υ) ˆ δ i −δ i − ¯ δ h σ 2 υ (lnN) 2 T 1− 1 N i 1/2 → d N(0, 1), asN→∞. 76 Ignoring lower order terms inN we now obtain (lnN) √ T ˆ δ i −δ i − ¯ δ σ υ → d N(0, 1), asN→∞. To ensure that the above statistic does not depend on the nuisance parameter, ¯ δ, we need ¯ δ (lnN) √ T = N X i=1 δ i ! (lnN) √ T N → 0, asN→∞. (2.77) But given the summability condition, (2.33), it is clear that this condition is met ifT is fixed, asN→∞. To obtain a consistent estimator ofσ 2 υ , note that ˆ υ it = lnd it − d lnκ− ˆ δ i lnN (2.78) = − d lnκ− lnκ − ˆ δ i −δ i lnN +υ it . Now using (2.74) and (2.75) ˆ υ it =−2 ¯ δ lnN +υ it − ¯ υ i =υ it − ¯ υ i +O lnN N . In view of this result,σ 2 υ can be consistently estimated by (forT > 1) ˆ σ 2 υ = P N i=1 P T t=1 ˆ υ 2 it N (T− 1) . (2.79) A test of the null hypothesis thatδ max =δ 0 max , whereδ 0 max > 1/2, can be based on the statistic D max = (lnN) ˆ δ max −δ 0 max ˆ σ υ 1 T − 1 TN 1/2 . (2.80) It then follows thatD max → d N(0, 1), asN→∞, for a fixedT > 1. Unbalanced panels In empirical applications, production networks observed at different points in time might not have the same units in common. As a result we are often faced with unbalanced panel data sets. One approach would 77 be to employ a sufficiently high level of aggregation so that we end up with a balanced panel. But this procedure is likely to be inefficient as we end up with a smaller number of units in the network. Here we consider estimatingδ i with the unbalanced panel, without any aggregation. We suppose for each uniti we have observations on its outdegrees for at least two time periods. We denote the unbalanced panel of observations on the outdegrees byd it , fort = T 0 i ,T 0 i + 1,...,T 1 i , (T 1 i ≥T 0 i ), andi = 1,...,N. Then using a similar line of reasoning as above we have ˆ δ i = T −1 i P T 1 i t=T 0 i lnd it −N −1 P N i=1 T −1 i P T 1 i t=T 0 i lnd it lnN , (2.81) whereT i =T 1 i −T 0 i + 1, and Var ˆ δ i = σ 2 υ (lnN) 2 1 T i − 1 NT i . (2.82) The estimator of the most dominant unit is given by ˆ δ max , and as in the balanced panel case, the asymptotic distribution of ˆ δ max is given by D max = (lnN) ˆ δ max −δ 0 max ˆ σ υ 1 Tmax − 1 NTmax 1/2 , (2.83) whereT max refers to the sample size of the most dominant unit, and ˆ σ 2 υ = P N i=1 (T i − 1) −1 P T 1 i t=T 0 i ˆ υ 2 it N . (2.84) The distribution of the most dominant unit is well defined ifT max > 1. Comparison of power law and extremum estimators As compared to the power law estimators, the extremum estimator has several advantages. First, it does not require knowing the true value of d min , whereas the estimates of the shape parameter may be highly sensitive to the choice of the cut-off value. Although procedures such as the feasible MLE proposed by Clauset et al. (2009) estimate d min jointly with β, such estimates assume that the true distribution below and aboved min are known, whilst the extremum estimator is robust to any distributional assumptions below d min , so long as ln(d i /d min ) has second order moments. Granted that it may not be as efficient as MLE 78 if the true distribution is indeed Pareto, one does not need to make such strong assumptions on the entire distribution. Third, the extremum type estimators can identify the dominant units besides the most dominant one, and estimate the degrees of pervasiveness of each of the of the dominant units separately. 2.8 Monte Carlo experiments In this section, we investigate the small sample properties of the proposed extremum estimator for balanced panels using Monte Carlo techniques, and compare its performance with that of the power law method. 15 We consider two types of data generating processes (DGPs) for the outdegrees (d it ): an exponent speci- fication and a power law specification. The DGP for the exponent specification is given by lnd it = lnκ +δ i lnN +υ it , i = 1, 2,...,N; t = 1, 2,...,T, (2.85) whereυ it ∼IIDN(0, 1), withκ set as κ = exp − 1 2 N −1 P N i=1 N δ i > 0, (2.86) to ensure thatd it add up toN acrossi for eacht. For the power law model we closely follow Clauset et al. (2009), and initially generatey it as random draws from the following mixture distribution that obeys an exact Pareto distribution above y min,t and an exponential distribution belowy min,t : f(y it ) = C t (y it /y min,t ) −(β+1) , fory it ≥y min,t C t e −(β+1)(y it /y min,t −1) , fory it <y min,t , (2.87) 15 Small sample properties of the extremum estimator for unbalanced panels are investigated in the Online Supplement. 79 fori = 1, 2,...,N, andt = 1, 2,...,T . To ensure thatf(y it ) integrates to 1 over its full support,y it > 0, we setC t as C t = y min,t e β+1 − 1 β + 1 + y min,t β −1 . (2.88) We then setd it =y it /¯ y t andd min,t =y min,t /¯ y t , where ¯ y t =N −1 P N i=1 y it , which ensure that the outdegrees add up toN. It is worth noting that under this DGP Pr (d it ≥d min,t ) = Pr (y it ≥y min,t ) = 1 β e β+1 − 1 β + 1 + 1 β ! −1 , (2.89) which is time-invariant and depends only on the value ofβ. 16 The inverse transformation sampling method is used to generatey it such that its distribution satisfies (2.87). To this end we first generateu it asIIDU[0, 1], i = 1, 2,...,N;t = 1, 2,...,T, and set u min,t =C t y min,t β + 1 e β+1 − 1 , (2.90) and then generatey it as y it = − y min,t β + 1 ln " 1− (β + 1)u it C t e β+1 y min,t # , ifu it <u min,t β (u min,t −u it ) +C t y min,t C t y β+1 min,t −1/β , ifu it ≥u min,t . (2.91) We carry out two sets of experiments based on the above two DGPs: Exponent DGP: Observations ond it are generated according to the exponent specification, (2.85), with a finite number of dominant units and a large number of non-dominant units. Specifically A.1. One strongly dominant unit:δ max =δ (1) = 1, withδ (i) = 0 fori = 2, 3,...,N. A.2. Two strongly dominant units:δ max =δ (1) =δ (2) = 1, withδ (i) = 0 fori = 3, 4,...,N. 16 WhenT > 1, we construct a panel data assuming that all units maintain their relative dominance over time, and therefore for eacht we sortdit in a descending order. 80 A.3. One strongly dominant unit and one weakly dominant unit: δ max = δ (1) = 1 andδ (2) = 0.75, withδ (i) = 0 fori = 3, 4,...,N. We consider all combinations ofN = 100, 300, 500, and 1, 000, andT = 1, 2, 6, 10, and 20, and also provide simulation results for a very large data set withN = 450, 000, which can arise when using inter- firm level sales data. 17 We focus on the 5 largest estimates ofδ, which we denote by ˆ δ max = ˆ δ (1) ≥ ˆ δ (2) ≥ ...≥ ˆ δ (5) , computed according to (2.73). WhenT > 1, the variance of ˆ δ (i) is computed by (2.76), withσ 2 υ estimated by (2.79). Pareto DGP: Observations on d it are generated according to the mixture Pareto distribution, (2.87), described above and we consider Experiments B.1: β = 1.0, and B.2: β = 1.3. The values ofy min,t are set asy min,t =y min = 15. The sample sizes are combinations ofN = 100, 300, 500, 1, 000, and 450, 000, and T = 1 and 2. We assess the performance of the Gabaix-Ibragimov estimator ( ˆ β GI ) given by (2.60) for different given cut-off values,d min,t , the maximum likelihood estimator ( ˆ β MLE ) given by (2.61) for differentd min,t , and the CSN estimator ( ˆ β CSN ) which estimatesβ jointly with the cut-off value. As shown in Proposition 3, the inverse of the extremum estimator, 1/ ˆ δ max , is a consistent estimator of β, and analogously one can consider the inverse ofβ as an estimatorδ max . 18 It is therefore of interest to see how the extremum estimator performs under the Pareto DGP, and conversely how the power law estimators perform under the Exponent DGP. To investigate the robustness of the alternative estimators ofβ to the choice of the underlying distribution, we conduct two sets of misspecification experiments where we compare the small sample properties of the four estimators ofβ, namely ˆ β GI , ˆ β MLE , ˆ β CSN , and ˆ β max = 1/ ˆ δ max , when the underlying DGP is Pareto, and conversely when Exponent DGP is used. We consider the values of β = 1.0, and 1.3 under Pareto DGP, and δ max = 1 and 1/1.3 under Exponent DGP. We focus on small values ofT = 1 and 2, for all combinations ofN = 100, 300, 500, 1, 000, and 450, 000. All experiments are carried out with 2, 000 replications. 19 17 For example, Carvalho et al. (2016) use a subset of data compiled by Tokyo Shoko Research Ltd that contains information on inter-firm transactions of around one million firms across Japan. This data set is proprietary and has not been made available to us. 18 See also the discussions in Section 2.6 on the relationship betweenδmax andβ. 19 We also investigate the small sample properties of the extremum estimator of δmax for three other sets of experiments: (a) exponentially decayingδi’s, (b) unbalanced panels, and (c) heteroskedastic and serially correlated errors in the case of largeN and largeT panels. 81 MC results The estimation results under Exponent DGP for Experiments A.1 to A.3 are summarized in Table 2.1, and focus on the extremum estimators of δ max = δ (1) and δ (2) when applicable. For each experiment we report bias (×100), root mean squared error (RMSE×100), as well as size (×100) and power (×100) for the estimators under consideration. We first note that the bias and RMSE of the extremum estimators decline as N and/or T rises. The bias and RMSE reduction is particularly pronounced as T is increased. This is in line with the theoretical derivations which establish that along the cross-sectional dimension the rate of convergence is of order 1/ ln(N), as compared toT −1/2 along the time dimension. We also note that the empirical sizes of the tests based on ˆ δ max and ˆ δ (2) are close to the assumed 5% nominal size in most cases. There is some over-rejections in cases whereT is much larger thanN, and when there are more than one dominant units. In practice, this is unlikely to be a real concern sinceN is typically much larger thanT . Seen from this perspective, it is particularly satisfying to note that the extremum estimator has satisfactory performance even whenN approaches 450, 000. The slow rate of convergence along the cross section dimension is, however, important for the power of the test. For example, in the case of Experiment A.1, the power of detecting the strongly dominant unit (against the alternative thatδ max = 0.90) is around 9% forN = 100 andT = 2, and rises only slowly asN is increased. However, we see a significant rise in power ifT is increased to 6. ForT = 6 the power rises from 17% forN = 100 to 89% forN = 450, 000, twice as much as the values obtained forT = 2. The power also rises as the number of strongly dominant units is raised from one to two. We also consider the frequency with which the dominant units are jointly selected under Exponent DGP, Experiments A.1 to A.3. The results are summarized in Table 2.2, and show that units withδ equal to unity are almost always correctly selected, especially whenT > 2. The selection frequency can be low in the case of Experiment A.3 where the network has two dominant units one of which is weak withδ = 0.75. In such cases we needN to be quite large ifT is less than 3. Tables 2.3 and 2.4 summarize the results for the first set of misspecification experiments where the data are generated from the Pareto tail distribution given by (2.68). For different values of β, the extremum estimator demonstrates robustness to the model misspecification, although it converges to the true value much more slowly than the other shape estimators under Pareto type distributions. This finding is in line with our theoretical results provided in 2.7.2. The extremum estimator, ˆ β max = 1/ ˆ δ max , performs well particularly when β = 1, even when N is relatively small. For example, under Pareto DGP with β = 1 (Experiment B.1), forN = 300 andT = 2, ˆ β max = 1.01 (0.05), while ˆ β GI = 1.04 (0.19) and ˆ β MLE = 82 Table 2.1: Bias, RMSE, size and power of the extremum estimator for the dominant unit or units under Exponent DGP for Experiments A.1 to A.3 Bias(×100) RMSE(×100) Size(×100) Power(×100) T\N 100 300 500 1,000 450,000 100 300 500 1,000 450,000 100 300 500 1,000 450,000 100 300 500 1,000 450,000 Experiment A.1: One strongly dominant unit,δmax = 1 1 -0.90 -0.46 -0.31 -0.21 -0.06 20.87 17.28 15.89 14.34 7.63 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 2 -1.29 -0.55 -0.40 -0.29 -0.10 15.31 12.40 11.37 10.24 5.44 5.75 5.25 4.90 5.05 4.95 8.90 11.55 12.75 14.85 44.45 6 -1.19 -0.49 -0.35 -0.23 -0.07 8.83 7.09 6.51 5.85 3.10 4.75 4.60 4.60 4.60 4.65 16.70 25.45 30.40 37.40 89.10 10 -1.26 -0.55 -0.39 -0.27 -0.09 6.98 5.61 5.14 4.62 2.45 5.35 5.35 5.15 5.20 5.35 24.45 39.10 46.55 56.35 98.75 20 -1.14 -0.44 -0.29 -0.19 -0.04 4.97 3.94 3.61 3.24 1.72 5.25 4.80 4.90 4.95 4.90 45.65 67.90 76.85 86.10 100.00 Experiment A.2: Two strongly dominant units,δ (1) =δ (2) = 1 δ (1) = 1 1 9.89 8.91 8.39 7.72 4.20 20.38 16.99 15.73 14.25 7.63 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 2 6.15 5.93 5.65 5.24 2.89 13.99 11.78 10.92 9.91 5.32 4.00 4.45 4.25 4.45 4.75 14.95 20.25 22.70 26.75 67.40 6 2.58 3.02 2.99 2.85 1.62 7.75 6.66 6.22 5.68 3.07 2.95 3.85 4.15 4.40 5.05 25.05 41.50 49.00 59.15 98.25 10 1.76 2.37 2.39 2.31 1.33 6.01 5.24 4.91 4.50 2.44 3.20 4.45 4.80 5.45 5.85 36.45 62.00 71.20 80.85 100.00 20 0.68 1.50 1.60 1.59 0.95 3.99 3.52 3.34 3.08 1.70 2.30 3.40 3.95 4.45 5.40 60.45 89.10 95.00 98.55 100.00 δ (2) = 1 1 -13.87 -10.66 -9.58 -8.48 -4.39 22.05 18.06 16.53 14.82 7.82 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 2 -10.66 -7.64 -6.81 -5.97 -3.06 16.22 12.54 11.40 10.17 5.34 6.45 5.15 4.80 5.15 4.60 1.95 1.55 1.60 2.00 20.40 6 -7.09 -4.79 -4.18 -3.60 -1.80 10.24 7.67 6.92 6.14 3.19 8.55 6.10 5.75 5.60 5.25 3.10 7.05 9.20 14.15 78.40 10 -6.02 -3.92 -3.38 -2.88 -1.42 8.16 5.95 5.32 4.70 2.43 9.40 6.05 5.15 4.70 4.15 3.85 14.45 21.15 32.00 97.50 20 -4.79 -2.91 -2.45 -2.05 -0.98 6.19 4.32 3.82 3.34 1.71 11.35 6.85 6.05 5.00 4.70 13.90 44.05 58.05 73.80 100.00 Experiment A.3: One strongly dominant unit and one weakly dominant unit,δ (1) = 1,δ (2) = 0.75 δ (1) = 1 1 1.68 1.30 1.11 0.83 -0.02 18.75 15.68 14.54 13.27 7.56 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 2 -0.73 -0.21 -0.16 -0.15 -0.10 14.11 11.77 10.93 9.99 5.44 3.60 4.00 4.05 4.30 4.95 8.30 11.20 12.40 14.65 44.45 6 -1.82 -0.72 -0.49 -0.30 -0.07 8.76 7.07 6.50 5.85 3.10 4.60 4.30 4.40 4.45 4.65 14.50 24.30 29.45 36.80 89.10 10 -2.00 -0.80 -0.54 -0.35 -0.09 7.14 5.63 5.15 4.63 2.45 5.75 5.25 5.20 5.20 5.35 20.40 37.60 44.90 55.85 98.75 20 -1.89 -0.69 -0.44 -0.26 -0.05 5.19 3.97 3.63 3.25 1.72 6.15 5.05 4.80 4.85 4.90 38.80 66.05 75.60 85.90 100.00 δ (2) = 0.75 1 -2.94 -1.97 -1.66 -1.32 -0.17 16.42 14.77 14.13 13.29 7.74 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 2 -3.19 -1.33 -0.89 -0.53 -0.07 13.63 11.30 10.48 9.54 5.22 3.05 3.40 3.10 3.80 4.40 45.10 57.60 63.90 71.95 99.40 6 -2.19 -0.88 -0.61 -0.40 -0.12 8.94 7.19 6.61 5.96 3.16 5.30 5.20 5.35 5.65 5.65 87.15 95.20 97.15 99.00 100.00 10 -1.76 -0.59 -0.35 -0.17 0.00 7.01 5.55 5.08 4.57 2.42 5.80 4.95 4.70 4.90 4.65 97.30 99.55 99.90 100.00 100.00 20 -1.72 -0.55 -0.31 -0.14 0.02 5.03 3.87 3.54 3.18 1.69 6.30 5.10 4.90 4.80 5.20 100.00 100.00 100.00 100.00 100.00 Notes: The Data Generating Process (DGP) is given by the exponent specification (2.85). For Experiment A.1, there is one strongly dominant unit and the rest of the units are non-dominant:δmax = 1, withδ (i) = 0 fori = 2, 3,...,N. For Experiment A.2, there are two strongly dominant units and the rest are non-dominant:δ (1) =δ (2) = 1, withδ (i) = 0 fori = 3, 4,...,N. For Experiment A.3, there are one strongly dominant unit and one weakly dominant unit, and the rest are non-dominant:δ (1) = 1 andδ (2) = 0.75, withδ (i) = 0 fori = 3, 4,...,N.δ (i) denotes thei th largestδ, i.e.,δmax =δ (1) ≥δ (2) ≥δ (3) ≥..., which are estimated by (2.73). The standard error of ˆ δ (i) is computed using (2.76), forT≥ 2. The nominal size of the test is 5%, and power is computed at 0.9 if the true value is 1, and at 1 if the true value is 0.75. The number of replications is to 2, 000. 83 Table 2.2: Frequencies with which the dominant unit or units are jointly selected, under Exponent DGP for Experiments A.1 to A.3 Empirical frequency (percent) T\N 100 300 500 1,000 450,000 Experiment A.1: One strongly dominant unit,δ max = 1 1 97.25 99.55 99.80 99.90 100.0 2 100.0 100.0 100.0 100.0 100.0 6 100.0 100.0 100.0 100.0 100.0 10 100.0 100.0 100.0 100.0 100.0 20 100.0 100.0 100.0 100.0 100.0 Experiment A.2: Two strongly dominant units, δ (1) =δ (2) = 1 1 94.20 99.10 99.65 99.85 100.0 2 100.0 100.0 100.0 100.0 100.0 6 100.0 100.0 100.0 100.0 100.0 10 100.0 100.0 100.0 100.0 100.0 20 100.0 100.0 100.0 100.0 100.0 Experiment A.3: One strongly dominant unit and one weakly dominant unit,δ (1) = 1,δ (2) = 0.75 1 61.55 75.90 79.70 85.45 98.75 2 86.30 91.90 93.65 95.65 100.0 6 97.75 99.10 99.50 99.85 100.0 10 99.65 99.95 100.0 100.0 100.0 20 99.95 100.0 100.0 100.0 100.0 Notes: This table complements Table 2.1 and reports the frequencies with which the dominant units are jointly selected across 2, 000 replications. See also the notes to Table 2.1. 1.05 (0.14), assuming a 10% cut-off value. 20 It is also worth noting that the Gabaix-Ibragimov estimator ( ˆ β GI ) and the ML estimator ( ˆ β MLE ) are quite sensitive to the choice of the cut-off values. 21 The feasible MLE, ˆ β CSN , performs better, but it is important to note that the validity of the feasible MLE procedure critically depends on how close the assumed specification of the distribution ofd it belowd min,t is to the true underlying distribution. Consider now the case where the outdegrees are generated according to the exponent specification, (2.85). In this misspecified case the Pareto estimators, ˆ β GI , ˆ β MLE , and ˆ β CSN , all show significant biases (see Tables 2.5 and 2.6). For instance, whenβ = 1,N = 300 andT = 2, and the outdegrees are generated 20 Figures in brackets are standard errors. 21 Similar Monte Carlo evidence illustrating the truncation sensitivity problem is reported in Table 1-4 of Gabaix and Ibragimov (2011). An interesting theoretical discussion can be found in Eeckhout (2004). 84 Table 2.3: Estimates of the shape parameter,β, of the power law and inverse of the exponent,δ max , under Pareto DGP for Experiment B.1 (β = 1) T = 1 T = 2 N 100 300 500 1,000 450,000 100 300 500 1,000 450,000 Assumed Log-log regression b β GI cut-off value 10% 1.11 1.02 1.01 1.00 1.00 1.11 1.04 1.02 1.01 1.00 (0.50) (0.26) (0.20) (0.14) (0.01) (0.35) (0.19) (0.14) (0.10) (0.00) 20% 1.04 1.01 1.00 1.00 1.00 1.06 1.02 1.01 1.00 1.00 (0.33) (0.18) (0.14) (0.10) (0.00) (0.24) (0.13) (0.10) (0.07) (0.00) 30% 1.02 1.00 1.00 1.00 1.00 1.04 1.01 1.00 1.00 1.00 (0.26) (0.15) (0.12) (0.08) (0.00) (0.19) (0.11) (0.08) (0.06) (0.00) Infeasible Using truedmin,t cut-off value 1.03 1.00 1.00 1.00 1.00 1.05 1.02 1.01 1.00 1.00 24% (0.30) (0.17) (0.13) (0.09) (0.00) (0.22) (0.12) (0.09) (0.06) (0.00) Assumed Maximum Likelihood Estimation b β MLE cut-off value 10% 1.24 1.07 1.04 1.02 1.00 1.15 1.05 1.03 1.01 1.00 (0.39) (0.20) (0.15) (0.10) (0.00) (0.26) (0.14) (0.10) (0.07) (0.00) 20% 1.11 1.03 1.02 1.01 1.00 1.07 1.02 1.01 1.00 1.00 (0.25) (0.13) (0.10) (0.07) (0.00) (0.17) (0.09) (0.07) (0.05) (0.00) 30% 1.06 1.01 1.01 1.00 1.00 1.02 0.99 0.99 0.98 0.99 (0.19) (0.11) (0.08) (0.06) (0.00) (0.13) (0.07) (0.06) (0.04) (0.00) Infeasible Using truedmin,t cut-off value 1.09 1.03 1.01 1.01 1.00 1.04 1.01 1.00 1.00 1.00 24% (0.23) (0.12) (0.09) (0.07) (0.00) (0.15) (0.08) (0.07) (0.05) (0.00) Estimated Feasible MLE b β CSN cut-off value 44% 38% 37% 35% 24% 37% 33% 31% 29% 22% 1.02 1.00 1.00 1.00 1.00 1.02 1.00 1.00 1.00 1.00 (0.17) (0.10) (0.08) (0.06) (0.00) (0.13) (0.08) (0.06) (0.04) (0.00) b β max = 1/ b δmax 1.04 1.03 1.02 1.02 1.00 1.01 1.01 1.00 1.00 1.00 (N/A) (N/A) (N/A) (N/A) (N/A) (0.08) (0.05) (0.04) (0.04) (0.01) Notes: The DGP follows the Pareto tail distribution given by (2.68) withβ = 1.dmin,t denotes the assumed lower bound for the Pareto distribution. The cut-off value refers to the percentage of the largest observations (sorted in descending order) that are assumed to follow the Pareto distribution. The infeasible cut-off value is computed by (2.89) assuming the true value ofdmin,t is known. All estimates are averaged across 2, 000 replications. Standard errors are in parentheses. ˆ β GI is the Gabaix-Ibragimov estimator obtained by running the log-log regression, (2.60). b β MLE is computed by (2.61). ˆ β CSN is calculated by applying the joint MLE procedure described in Clauset et al. (2009). ˆ δmax is computed according to (2.73), and its standard error by (2.76). The standard error for the inverse of ˆ δmax is computed by the delta method. (N/A) indicates that the standard error of ˆ δmax cannot be computed whenT = 1. according to the Exponent DGP, the bias of ˆ β GI ranges from 0.10 to 0.35, for the cut-off values 10% to 30%. Also, the bias of ˆ β GI increases rapidly withN. The ML type estimators exhibit similar biases. Finally, the extremum estimator continues to perform well in the case of unbalanced panels, and large N andT panels with heteroskedastic and serially correlated errors. It is also reasonably robust to alternative 85 Table 2.4: Estimates of the shape parameter,β, of the power law and inverse of the exponent,δ max , under Pareto DGP for Experiment B.2 (β = 1.3) T = 1 T = 2 N 100 300 500 1,000 450,000 100 300 500 1,000 450,000 Assumed Log-log regression b β GI cut-off value 10% 1.44 1.33 1.31 1.30 1.30 1.42 1.34 1.32 1.30 1.30 (0.65) (0.34) (0.26) (0.18) (0.01) (0.45) (0.24) (0.19) (0.13) (0.01) 20% 1.35 1.31 1.30 1.29 1.30 1.36 1.32 1.31 1.30 1.30 (0.43) (0.24) (0.18) (0.13) (0.01) (0.30) (0.17) (0.13) (0.09) (0.00) 30% 1.31 1.29 1.29 1.29 1.29 1.32 1.30 1.29 1.29 1.29 (0.34) (0.19) (0.15) (0.11) (0.00) (0.24) (0.14) (0.11) (0.07) (0.00) Infeasible Using truedmin,t cut-off value 1.37 1.31 1.30 1.30 1.30 1.37 1.32 1.31 1.30 1.30 16% (0.49) (0.27) (0.20) (0.14) (0.01) (0.34) (0.19) (0.14) (0.10) (0.00) Assumed Maximum Likelihood Estimation b β MLE cut-off value 10% 1.61 1.39 1.35 1.32 1.30 1.48 1.35 1.33 1.31 1.30 (0.51) (0.25) (0.19) (0.13) (0.01) (0.33) (0.17) (0.13) (0.09) (0.00) 20% 1.44 1.34 1.32 1.31 1.30 1.37 1.32 1.31 1.30 1.30 (0.32) (0.17) (0.13) (0.09) (0.00) (0.22) (0.12) (0.09) (0.06) (0.00) 30% 1.34 1.28 1.26 1.26 1.25 1.28 1.25 1.25 1.24 1.25 (0.24) (0.13) (0.10) (0.07) (0.00) (0.17) (0.09) (0.07) (0.05) (0.00) Infeasible Using truedmin,t cut-off value 1.49 1.35 1.33 1.31 1.30 1.39 1.33 1.31 1.31 1.30 16% (0.37) (0.19) (0.15) (0.10) (0.00) (0.24) (0.13) (0.10) (0.07) (0.00) Estimated Feasible MLE b β CSN cut-off value 39% 32% 30% 28% 17% 33% 28% 26% 24% 17% 1.31 1.30 1.30 1.30 1.30 1.31 1.30 1.30 1.30 1.30 (0.23) (0.14) (0.11) (0.08) (0.00) (0.18) (0.11) (0.08) (0.06) (0.00) b β max = 1/ b δmax 1.27 1.27 1.27 1.27 1.27 1.24 1.25 1.25 1.25 1.27 (N/A) (N/A) (N/A) (N/A) (N/A) (0.08) (0.05) (0.04) (0.03) (0.00) Notes: The DGP follows the Pareto tail distribution given by (2.68) withβ = 1.3. See also the notes to Table 2.3. network structures under different specifications of the distribution of outdegrees, such as exponentially decayingδ i ’s. The relevant summary tables are available in the Online Supplement. 2.9 Dominant units in US production networks In this section we apply the proposed estimation strategy to identify the top five most pervasive (domi- nant) sectors in the US economy. We also compare our results with the estimates ofβ (the inverse ofδ max ) obtained by Acemoglu et al. (2012) for the most dominant sector. We provide estimates based on the US 86 Table 2.5: Estimates of the shape parameter,β, of the power law and inverse of the exponent,δ max , under Exponent DGP for Experiment A.1 (β = 1) T = 1 T = 2 N 100 300 500 1,000 450,000 100 300 500 1,000 450,000 Assumed Log-log regression b β GI cut-off value 10% 0.98 1.10 1.20 1.36 2.39 0.97 1.10 1.20 1.37 2.39 (0.44) (0.29) (0.24) (0.19) (0.02) (0.31) (0.20) (0.17) (0.14) (0.01) 20% 1.11 1.28 1.39 1.54 2.11 1.11 1.29 1.39 1.55 2.11 (0.35) (0.23) (0.20) (0.15) (0.01) (0.25) (0.17) (0.14) (0.11) (0.01) 30% 1.17 1.34 1.44 1.56 1.91 1.18 1.35 1.45 1.57 1.91 (0.30) (0.20) (0.17) (0.13) (0.01) (0.22) (0.14) (0.12) (0.09) (0.01) Assumed Maximum Likelihood Estimation b β MLE cut-off value 10% 1.53 1.74 1.84 1.95 2.11 1.44 1.71 1.82 1.93 2.11 (0.48) (0.32) (0.26) (0.19) (0.01) (0.32) (0.22) (0.18) (0.14) (0.01) 20% 1.52 1.64 1.68 1.73 1.79 1.46 1.62 1.67 1.72 1.79 (0.34) (0.21) (0.17) (0.12) (0.01) (0.23) (0.15) (0.12) (0.09) (0.00) 30% 1.42 1.49 1.51 1.54 1.58 1.38 1.48 1.51 1.54 1.58 (0.26) (0.16) (0.12) (0.09) (0.00) (0.18) (0.11) (0.09) (0.06) (0.00) Estimated Feasible MLE b β CSN cut-off value 39% 29% 24% 18% 2% 36% 26% 21% 16% 1% 1.37 1.58 1.69 1.85 2.83 1.36 1.59 1.71 1.87 2.87 (0.24) (0.19) (0.17) (0.15) (0.04) (0.17) (0.13) (0.12) (0.11) (0.03) ˆ β max = 1/ b δmax 1.06 1.04 1.03 1.02 1.01 1.04 1.02 1.02 1.01 1.00 (N/A) (N/A) (N/A) (N/A) (N/A) (0.16) (0.13) (0.12) (0.10) (0.05) Notes: The DGP is given by the exponent specification, (2.85). There is one strongly dominant unit and the rest are non-dominant: δmax =δ (1) = 1, withδ (i) = 0 fori = 2, 3,...,N, whereδ (i) denotes thei th largestδ. The true value ofβ isβ = 1. See also the notes to Table 2.3 for other details. input-output tables for single years as well as when two or more input-output tables are pooled in an unbal- anced panel. Acemoglu et al. (2012) only consider the estimates of β based on single-year input-output tables. We begin with a re-examination of the data set used by Acemoglu et al. (2012) so that we have a direct comparison of the estimates of β (or its inverse) based on the shape of the power law, and the extremum estimates which is given by ˆ δ max , and computed using (2.72). The Acemoglu et al. (2012) data set are based on the US input-output accounts data over the period 1972-2002 compiled by the Bureau of Economic Analysis (BEA) every five years. We first confirmed that we can replicate their estimates ofβ, which we denote by ˆ β GI assuming a 20% cut-off value (the percentage above which the degree sequences are assumed to follow the Pareto distribution). The estimates ˆ δ max and the inverse of ˆ β for the years 1972, 1977, 1982, 87 Table 2.6: Estimates of the shape parameter,β, of the power law and inverse of the exponent,δ max , under Exponent DGP for Experiment A.1 (β = 1.3) T = 1 T = 2 N 100 300 500 1,000 450,000 100 300 500 1,000 450,000 Assumed Log-log regression b β GI cut-off value 10% 1.43 1.53 1.61 1.76 2.40 1.38 1.51 1.61 1.76 2.40 (0.64) (0.39) (0.32) (0.25) (0.02) (0.44) (0.28) (0.23) (0.18) (0.01) 20% 1.45 1.59 1.67 1.79 2.11 1.44 1.60 1.68 1.79 2.11 (0.46) (0.29) (0.24) (0.18) (0.01) (0.32) (0.21) (0.17) (0.13) (0.01) 30% 1.44 1.57 1.64 1.72 1.92 1.44 1.58 1.65 1.73 1.92 (0.37) (0.23) (0.19) (0.14) (0.01) (0.26) (0.17) (0.13) (0.10) (0.01) Assumed Maximum Likelihood Estimation b β MLE cut-off value 10% 1.84 1.89 1.95 2.01 2.11 1.70 1.85 1.92 1.99 2.11 (0.58) (0.35) (0.28) (0.20) (0.01) (0.38) (0.24) (0.19) (0.14) (0.01) 20% 1.65 1.70 1.72 1.75 1.79 1.58 1.67 1.71 1.74 1.79 (0.37) (0.22) (0.17) (0.12) (0.01) (0.25) (0.15) (0.12) (0.09) (0.00) 30% 1.50 1.52 1.54 1.55 1.58 1.45 1.51 1.53 1.55 1.58 (0.27) (0.16) (0.13) (0.09) (0.00) (0.19) (0.11) (0.09) (0.06) (0.00) Estimated Feasible MLE b β CSN cut-off value 38% 26% 22% 16% 2% 32% 22% 18% 13% 1% 1.50 1.70 1.80 1.95 2.83 1.51 1.72 1.83 1.99 2.87 (0.28) (0.22) (0.19) (0.17) (0.04) (0.21) (0.17) (0.15) (0.13) (0.03) b β max = 1/ b δmax 1.35 1.36 1.35 1.35 1.31 1.38 1.34 1.34 1.33 1.31 (N/A) (N/A) (N/A) (N/A) (N/A) (0.21) (0.17) (0.15) (0.14) (0.07) Notes: The DGP is given by the exponent specification, (2.85). There is one strongly dominant unit and the rest of the units are non-dominant:δmax = 1/1.3 = 0.77, withδ (i) = 0 fori = 2, 3,...,N, whereδ (i) denotes thei th largestδ. The true value ofβ is β = 1.3. See also the notes to Table 2.5. 1987, 1992, 1997 and 2002 are given in Tables 2.7 and 2.8. For the inverse of ˆ β, Tables 2.7 and 2.8 report estimates based on the first-order and second-order interconnections, respectively. 22 We estimateβ by the three approaches discussed above, namely Gabaix-Ibragimov estimator ( ˆ β GI ) given by (2.60), the MLE ( ˆ β MLE ) given by (2.61), and the feasible MLE ( ˆ β CSN ). For the Gabaix-Ibragimov regression and MLE, we give estimates for the cut-off values of 10%, 20%, and 30%. For the feasible MLE, we present both the estimates ofβ and the estimated cut-off values. 23 22 The first-order degree of sectorj is just its outdegree,dj , defined as before, while the second-order degree of sectorj is defined bydj,2 = d 0 w·j , where d = (d1,d2,...,dN ) 0 is the vector of first-order degrees and w·j is thej th column of W. 23 Acemoglu et al. (2012) estimated the shape parameter of the power law by the log-log regression and non-parametric Nadaraya- Watson regression, taking the tail to correspond to the top 20% of the samples for each year and did not try other cut-off values. They also estimated the shape parameter by the feasible maximum likelihood method proposed by Clauset et al. (2009), but did not report the estimates for each year or the estimated cut-off values. 88 The results in Tables 2.7 and 2.8 show that the yearly estimates ofδ max are clustered within the narrow range of 0.77 to 0.82, covering a relatively long period of 30 years. We can not provide standard errors for such yearly estimates, but given the small over-time variations in these estimates we can confidently conclude that there is a high degree of sectoral pervasiveness in the US economy, although these estimates do not support the presence of a strongly dominant unit which requires ˆ δ max to be close to unity. In contrast, the estimates ofδ max based on the inverse of ˆ β differ considerably depending on the estimation methods, the choice of the cut-off value, and whether the first- or second-order interconnections are considered. For example, for 1972, the estimates based on the power law, inverse of ˆ β GI , range from 0.694 when the cut- off value is 10% and the first-order interconnections are used, and rise to 1.035 when the second-order interconnections are used with a 30% cut-off value. The estimates ofδ based on the inverses of ˆ β GI and ˆ β MLE , rise with the choice of cut-off values and with the order of interconnections, whilst our estimator does not require making such choices. Recall thatδ max provides an exact measure of the rate at which the variance of aggregate output responds to sectoral shocks, whilstβ characterizes a lower bound if the first- order interconnections are used. A 20% cut-off value, which is assumed by Acemoglu et al. (2012) seems reasonable, considering the closeness between the estimates of ˆ δ max and the inverse of ˆ β GI , and given its similarity to the estimated cut-off values by the feasible MLE. Nevertheless, the estimated cut-off value based on the first-order interconnections for the year 1992 is only 9.5%, which is markedly lower than that for the other years. Similar issues arise when the second-order interconnections are used. The differences between ˆ δ max and inverse of ˆ β GI also vary across the years. For example, using the second-order interconnections and a cut-off value of 20%, ˆ δ max and inverse of ˆ β GI are reasonably close for the years 1992, 1997 and 2002, but diverge for the earlier years of 1972, 1977 and 1982. The data sets provided by Acemoglu et al. (2012) do not give the identities of the sectors, which is fine if one is only interested inβ orδ max . But, as noted earlier, our estimation approach also allows us to identify the sectors with the highest degrees of pervasiveness in the production network. With this in mind, we compiled our own W matrices from the input-output tables downloaded from the BEA website. 24 The top five largest estimates ofδ, denoted by ˆ δ max = ˆ δ (1) ≥ ˆ δ (2) ≥ ...≥ ˆ δ (5) , for each of the years 1972 to 2007 are given in Table 2.9. The identities of the associated sectors are given in Table 2.10. We note that 24 The W matrices for different years were computed from commodity-by-commodity direct requirements tables at the detailed level that cover around 400-500 US industries. The (i,j) th entry of such a table shows the expenditure on commodityj per dollar of production of commodityi. As in Acemoglu et al. (2012), the terms sector and commodity are used interchangeably to convey the same meaning. These direct requirements tables can be derived from the total requirement tables at the detailed level, which are compiled by the BEA every five years. Further details on the data description and transformations can be found in Appendix B.4. 89 Table 2.7: Yearly estimates of the degree of dominance,δ max , and inverse of the shape parameter of power law,β, based on the first-order interconnections, using US input-ouput tables compiled by Acemoglu et al. (2012) b δ max based on the inverse of b β using the first-order interconnections Inverse of b β GI Inverse of b β MLE Inverse of b β CSN Assumed cut-off value Assumed cut-off value Estimated Year N b δ max 10% 20% 30% 10% 20% 30% cut-off value 1972 483 0.767 0.694 0.727 0.832 0.736 0.829 1.135 0.728 16.8% (0.142) (0.104) (0.098) (0.106) (0.095) (0.145) (0.081) 1977 524 0.778 0.677 0.725 0.804 0.715 0.852 1.009 0.726 13.6% (0.133) (0.100) (0.091) (0.099) (0.099) (0.114) (0.086) 1982 529 0.788 0.717 0.739 0.818 0.719 0.786 1.039 0.741 15.3% (0.139) (0.101) (0.092) (0.099) (0.084) (0.119) (0.082) 1987 510 0.804 0.667 0.731 0.814 0.724 0.849 1.028 0.742 13.3% (0.132) (0.102) (0.093) (0.101) (0.099) (0.118) (0.090) 1992 476 0.824 0.672 0.758 0.842 0.738 0.891 1.002 0.706 9.5% (0.137) (0.110) (0.100) (0.107) (0.110) (0.114) (0.105) 1997 a 474 0.778 0.625 0.698 0.791 0.617 0.909 0.982 0.670 13.1% (0.129) (0.101) (0.094) (0.090) (0.137) (0.131) (0.085) 2002 417 0.765 0.639 0.687 0.759 0.685 0.756 0.930 0.730 19.4% (0.139) (0.107) (0.096) (0.106) (0.092) (0.113) (0.081) Notes: Estimates are obtained using the data sets provided by Acemoglu et al., 2012, which are based on the US input-output account data by the Bureau of Economic Analysis (BEA).N is the total number of sectors in a given year and the standard errors are in parentheses. ˆ δmax is the largest estimate ofδ computed using (2.67). The first-order degree sequence is used in the estimation of the shape parameter of the power law,β. ˆ β GI is obtained by the log-log regression with Gabaix and Ibragimov (2011) correction using the OLS regression defined by (2.60). ˆ β MLE is the maximum likelihood estimate (MLE) ofβ computed by (2.61). A 10% cut-off value, for example, means that the Pareto tail is taken to be the top 10% of all sectors in terms of outdegrees in each year. Acemoglu et al. (2012) report ˆ β GI estimates only based on a 20% cut-off point. ˆ β CSN is the feasible MLE proposed by Clauset et al. (2009) and its estimated cut-off values are reported in the last column of the table. a From the year 1997 and thereafter, the BEA input-output tables are based on the North American Industry Classification System (NAICS), while for the earlier years they are based on the Standard Industrial Classification (SIC) system. both the degrees of dominance and the identities of the pervasive sectors in the US economy are relatively stable over the years. Consistent with the results in Table 2.7, no sector is strongly dominant. The highest estimate ofδ max is 0.82, for the year 1992, with an average estimate of around 0.78 over the sample. The wholesale trade sector turns out to be the most dominant sector for all the years with the exception of 2002. In this year the management of companies and enterprises is the most dominant sector with the wholesale trade coming second. But it is generally difficult to distinguish between the top two or three sectors as theirδ estimates are quite close to one another and we are not able to apply formal statistical tests to their differences as standard errors 90 Table 2.8: Yearly estimates of the degree of dominance,δ max , and inverse of the shape parameter of power law,β, based on the second-order interconnections, using US input-ouput tables compiled by Acemoglu et al. (2012) b δ max based on the inverse of b β using the second-order interconnections Inverse of b β GI Inverse of b β MLE Inverse of b β CSN Assumed cut-off value Assumed cut-off value Estimated Year N b δ max 10% 20% 30% 10% 20% 30% cut-off value 1972 483 0.767 0.719 0.880 1.035 0.873 1.126 1.353 0.973 15.7% (0.147) (0.126) (0.122) (0.126) (0.147) (0.174) (0.112) 1977 524 0.778 0.718 0.870 1.008 0.821 1.058 1.351 0.750 9.4% (0.141) (0.120) (0.114) (0.114) (0.133) (0.177) (0.107) 1982 529 0.788 0.773 0.913 1.013 0.885 1.028 1.329 1.088 23.6% (0.150) (0.125) (0.114) (0.122) (0.116) (0.158) (0.097) 1987 510 0.804 0.686 0.879 1.031 0.883 1.070 1.325 1.110 22.9% (0.136) (0.123) (0.118) (0.124) (0.128) (0.161) (0.103) 1992 476 0.824 0.661 0.869 1.012 0.750 1.014 1.277 0.818 12.2% (0.135) (0.126) (0.120) (0.108) (0.141) (0.182) (0.107) 1997 a 474 0.778 0.632 0.790 0.955 0.648 1.100 1.202 0.666 12.0% (0.130) (0.115) (0.113) (0.095) (0.192) (0.187) (0.088) 2002 417 0.765 0.620 0.768 0.954 0.721 0.998 1.245 0.772 13.4% (0.135) (0.119) (0.121) (0.111) (0.151) (0.192) (0.103) Notes: This table differs from Table 2.7 in that the second-order degree sequence is used to produce the estimates ofβ. See also the notes to Table 2.7 for further details. Table 2.9: Yearly estimates of the degree of dominance, δ, for the top five pervasive sectors, using US input-ouput tables (our data) Year N b δ (1) b δ (2) b δ (3) b δ (4) b δ (5) 1972 446 0.764 0.740 0.701 0.638 0.608 1977 468 0.774 0.704 0.628 0.608 0.590 1982 468 0.786 0.669 0.655 0.635 0.619 1987 457 0.802 0.669 0.657 0.633 0.629 1992 451 0.823 0.678 0.677 0.646 0.631 1997 a 452 0.775 0.725 0.635 0.622 0.597 2002 408 0.758 0.743 0.639 0.563 0.560 2007 365 0.722 0.649 0.606 0.591 0.550 Notes: Estimates are obtained using the input-output accounts data downloaded from the Bureau of Economic Analysis (BEA) website. See Appendix B.4 for details of the data sources and their transformations. The table reports the five largest yearly estimates ofδ, computed using (2.73), denoted by ˆ δ (1) = ˆ δmax, ˆ δ (2) ,··· , ˆ δ (5) .N is the number of sectors with non-zero outdegrees. a From the year 1997 and thereafter, the BEA input-output tables are based on the North American Industry Classification System (NAICS), while for the previous years they are based on the Standard Industrial Classification (SIC) system. 91 Table 2.10: Identities of the top five pervasive sectors based on the yearly estimates ofδ Year The top five pervasive sectors 1972 Wholesale trade Blast furnaces and steel mills Real estate Miscellaneous business services Motor freight transportation & warehousing 1977 Wholesale trade Blast furnaces and steel mills Real estate Petroleum refining Industrial inorganic & organic chemicals 1982 Wholesale trade Blast furnaces and steel mills Petroleum refining Private electric services (utilities) Advertising 1987 Wholesale trade Blast furnaces and steel mills Advertising Motor freight transportation and warehousing Electric services (utilities) 1992 Wholesale trade Real estate agents, managers, operators, and lessors Blast furnaces and steel mills Trucking and courier services, except air Advertising 1997 a Wholesale trade Management of companies and enterprises Real estate Iron and steel mills Truck transportation 2002 Management of companies and enterprises Wholesale trade Real estate Electric power generation, transmission, and distribution Iron and steel mills and ferroalloy manufacturing 2007 Wholesale trade Management of companies and enterprises Other real estate Iron and steel mills and ferroalloy manufacturing Petroleum refineries Notes: This table complements Table 2.9 and reports the identities of those sectors corresponding to the five largest estimates ofδ (in descending order) for each year. a From the year 1997 and thereafter, the BEA input-output tables are based on the North American Industry Classification System (NAICS), while for the previous years they are based on the Standard Industrial Classification (SIC) system. can not be computed using outdegrees for one single year. 25 Accordingly, to provide more reliable estimates 25 Acemoglu et al. (2012) are able to compute standard errors for their estimates ofβ because they impose a Pareto distribution on the ordered outdegrees beyond a cut-off point, which they assume. 92 ofδ (1) ,δ (2) ,...,δ (5) and the associated sectoral identities, we also consider pooled estimates. However, there have been major changes in the BEA industry classifications over the years, with the input-output tables for the period 1972-1992 being based on the Standard Industrial Classification (SIC) system, while starting from 1997 they are based on the North American Industry Classification System (NAICS). Accordingly, we computed panel estimates ofδ for the two sub-samples separately. The results are summarized in Table 2.11, which also gives standard errors in parentheses, computed using (2.82). It is interesting that despite changes to the sectoral classifications, the wholesale trade sector is identified as the most dominant sector in both sub-samples, with ˆ δ max = 0.762 (0.036) for the first sub-sample (1972-1992), and ˆ δ max = 0.716 (0.045) for the second sub-sample (1997-2007). The two panel estimates are quite close and identify wholesale trade as the most dominant sector in the US economy. Turning to the estimates ofδ (2) ,δ (3) ,...,δ (5) , we find that these estimates are also remarkably similar across the two sub-samples, ranging from 0.667 to 0.605 in the first sub-sample, and 0.683 to 0.595 in the second sub-sample. What has changed is the identity of the sectors across the two sub-samples. For example, the second most dominant sector has been blast furnaces and steel mills over the first sub-sample (1972-1992), whilst it is management companies and enterprises over the second sub-sample (1997-2007). 2.10 Concluding remarks This paper extends the production network considered by Acemoglu et al. (2012) and derives a dual price network, which allows us to obtain exact conditions under which sectoral shocks can have aggregate effects. The paper presents a simple nonparametric estimator of the degree of pervasiveness of sectoral shocks that compares favorably with the parametric estimators based on Pareto distribution fitted to the outdegrees. The proposed extremum estimator is simple to implement and is applicable not only to the pure cross section models where the Pareto shape parameter is estimated, but also extends readily to shortT as well as large T panels. The paper also develops a simple test of the degree of pervasiveness of the most dominant units in the network, which is shown to have satisfactory size and power properties whenN is large, even ifT is quite small. The production and price networks considered in this paper are static, but the proposed statistical framework can be extended to allow for dynamics, along similar lines as in Pesaran and Chudik (2014) who consider aggregation of large dynamic panels. 93 Table 2.11: Pooled panel estimates of the degree of dominance,δ, for the top five pervasive sectors, using US input-output tables for the two sub-periods 1972 -1992 and 1997-2007 Sub-sample 1972-1992 Sub-sample 1997-2007 b δ (1) 0.762 Wholesale trade 0.716 Wholesale trade (0.036) (0.045) b δ (2) 0.667 Blast furnaces 0.683 Management of companies (0.036) and steel mills (0.045) and enterprises b δ (3) 0.642 Real estate 0.609 Real estate a (0.036) (0.045) b δ (4) 0.605 Trucking and courier 0.598 Iron and steel mills (0.036) services, except air (0.045) b δ (5) 0.605 Miscellaneous business 0.595 Other real estate a (0.036) services (0.045) N 548 619 T 5 3 Notes: The pooled estimates for the years 1972, 1977, 1982, 1987 and 1992 are based on US input-output data using the Bureau of Economic Analysis (BEA) industry codes, which are in turn based on the Standard Industrial Classification (SIC). For the years 1997, 2002 and 2007, the sectoral classifications are based on the BEA industry codes, which are based on the North American Industry Classification System (NAICS). The table gives the five largest panel estimates ofδ together with the identities of the associated sectors. The estimates are denoted by ˆ δ (1) = ˆ δmax, ˆ δ (2) ,··· , ˆ δ (5) , and computed using (2.81). The standard errors are given in parentheses and computed using (2.82).N is the total number of sectors with non-zero outdegrees, andT is the number of time periods in the panel. a In the BEA industry classifications, the real estate sector was subdivided into housing and other real estate sectors starting from 2007. Since the pooled estimates are based on unbalanced panels constructed according to BEA codes, real estate and other real estate are considered as two sectors. Our empirical application to US input-output tables suggests some evidence of sector-specific shock propagation, but such effects do not seem sufficiently strong and long-lasting, and are likely to be dominated by common technological effects. Similar empirical evidence are also provided by Foerster, Sarte, and Watson (2011), who incorporate sectoral linkages into multisector growth models producing an approximate factor model. Their factor analytic approach, however, cannot distinguish dominant unit(s) from common factors and therefore may underestimate the influence of input-output linkages. 26 The issue of the relative importance of internal network interactions and external common shocks for macro economic fluctuations continues to be an open empirical question. 26 The factor analysis also requires largeN andT panels and is not applicable whenT is small. 94 Chapter 3 Estimation and Inference in Spatial Models with Dominant Units 1 3.1 Introduction In spatial econometrics, the interdependence among cross-sectional units is captured via a spatial weights matrix, W = (w ij ), which is usually constructed based on some measures of geographical, economic or social distance. A critical assumption that has been adopted in the existing literature is that the maximum absolute row and column sum norms of W are uniformly bounded as the number of cross-sectional units, n, tends to infinity. This assumption, which originates from Kelejian and Prucha (1998, 1999), essentially imposes a strong restriction on the degree of cross-sectional dependence. For example, the assumption will be satisfied if W is sparse in the sense that each unit has only a finite number of "neighbors", or if the interconnection between units decays sufficiently fast as they are "further" away from each other. The boundedness (or absolute summability) assumption on W, however, rules out the possibility that some units may affect a large number of units and exhibit pervasive influence in the system. Therefore, standard econo- metric models fail to characterize the commonly observed phenomena that a few units (firms, industries, cities, etc.) play dominant roles in many economic and social activities (such as in the networks of financial market, production, trade, and migration). In this paper, we consider estimation and inference in spatial models in which the boundedness assump- tion is relaxed. In specific, we row-normalize W following the standard practice to facilitate interpretation, and allow the maximum absolute column sums of W to expands withn at the rate ofδ, whereδ is a parame- ter that lies in [0, 1]. This rate is referred to as the "network pervasiveness" in a related study by Pesaran and Yang (2016) (see Definition 2). Moreover, Pesaran and Yang (2016) also introduces the notion of "degree of dominance" of a unit (or node)j in a network, forj = 1, 2,...,n. The degree of dominance is measured 1 This chapter is based on Pesaran and Yang (2018). 95 by an exponentδ j that controls the rate at which the absolute sum of thej th column of W increases with n, where 0≤ δ j ≤ 1 (see Definition 1). In particular, unit j is said to be strongly dominant if δ j = 1, weakly dominant if 0 < δ j < 1, and non-dominant ifδ j = 0. By definition, the network pervasiveness is the highest degree of dominance in the network, namely,δ = max (δ 1 ,δ 2 ,...,δ n ). From this perspective, the assumption that W has bounded column sum norm imposes the constraint that all units in the network have to be non-dominant (δ = 0). The present paper relaxes this assumption and develops new estimation and inference theory allowing for the existence of dominant units (δ≥ 0). Two main estimation approaches have been developed in the literature for spatial models with bounded W. The first approach is the maximum likelihood (ML) estimation. See, for example, Anselin (1988), Lee (2004), Lee and Yu (2010a), and Aquaro et al. (2015). The second approach is the instrumental variables (IV) and generalized method of moments (GMM). See, among others, Kelejian and Prucha (1998, 1999), Lee (2007), Kapoor et al. (2007), Lin and Lee (2010), and Lee and Yu (2014). In the current paper, we focus on the spatial autoregressive (SAR) model with unbounded W and consider two methods of moments to estimate the model. We first extend the GMM method developed by Lee (2007). The GMM estimator utilizes both linear moment conditions based on IVs and quadratic moments based on the properties of the disturbances. We then propose a new bias-corrected method of moments (BMM) to estimate the SAR model. The BMM approach was first introduced in a recent paper by Chudik and Pesaran (2017) for the estimation of dynamic panel data models with short time-dimension. The key idea is to directly correct the "bias" of the moment conditions before estimation rather than correct the bias of the estimator. In the context of the SAR model, the spatially lagged dependent variable is endogenous. Instead of looking externally for valid IVs that are orthogonal to the errors, we use the endogenous variable as an "instrument" for itself and correct the bias due to the correlation between the spatial lag and the error term. This method has the advantage of avoiding the weak instruments problem by construction. We derive the asymptotic properties of the GMM and BMM estimators and show that both estimators are consistent if 0≤ δ < 1 and they are normally distributed if 0≤δ< 1/2. A wide range of Monte Carlo experiments lend support to the theoretical results and document that both estimators have comparably good small sample properties and in some cases the BMM estimator outperforms the GMM estimator. The estimation techniques are shown to be robust to different intensities of spatial dependence, various specifications of the spatial weights matrix, and non-Gaussian errors. The empirical part of the paper applies the estimation techniques to study the inflation spillover effect across more than 300 industries in the US from 1997 to 2014. It extends the multisectoral model in Pesaran 96 and Yang (2016) and shows that the price network can be represented by a SAR model. Applying the extremum estimator developed by Pesaran and Yang (2016) produces estimates of network pervasiveness lying between 0.71 and 0.85, which indicates the presence of dominant sectors. We then estimate the SAR model by the GMM and BMM approaches and document significant inflation spillover through input-output linkages. In addition, we also identify a significant effect of the volatility of real output growth on changes in sectoral prices. The remainder of this chapter is organized as follows. Section 3.2 describes the model and assumptions. Section 3.3 considers the GMM estimation method. Section 3.4 presents the BMM estimation method. Section 3.5 investigates the finite sample properties of both GMM and BMM estimators using Monte Carlo technique. Section 3.6 contains an empirical application to US price networks, and Section 3.7 concludes. Further details including lemmas and proofs are given in the Appendix. 3.2 The model and assumptions This paper focuses on cross-sectional models for ease of exposition and the analysis could be readily extended to panel data models. Specifically, let us consider the following SAR model: y i =ρy ∗ i +β 0 x i +ε i , fori = 1, 2,...,n, (3.1) wherey i is the outcome variable on unit (region)i, ρ is a fixed spatial coefficient, x i is ak× 1 vector of regressors on uniti with the associated vector of fixed coefficientsβ,ε i is a random error,y ∗ i is the spatially lagged dependent variable, namely, y ∗ i = w 0 i y = n X j=1 w ij y j , (3.2) y = (y 1 ,y 2 ,...,y n ) 0 , and w i = (w i1 ,w i2 ,...,w in ) 0 . The parameters of interest areψ = (ρ,β 0 ) 0 , and their true values are denoted byψ 0 = (ρ 0 ,β 0 0 ) 0 . Let y ∗ = (y ∗ 1 ,y ∗ 2 ,...,y ∗ n ) 0 . Model (3.1) can be written more compactly as y =ρy ∗ + Xβ +ε =ρWy + Xβ +ε, (3.3) 97 where W = (w 1 , w 2 ,..., w n ) 0 = (w ij ) is then×n known matrix of spatial weights, X = (x 0 1 , x 0 2 ,..., x 0 n ) 0 is then×k matrix of exogenous regressors, andε = (ε 1 ,ε 2 ,...,ε n ) 0 . The reduced-form representation of (3.3) is given by y = S −1 (ρ) (Xβ +ε), (3.4) where S(ρ) = I n −ρW. The existence of S −1 (ρ) is ensured under the assumptions to be discussed below. It immediately follows from (3.4) that y ∗ = Wy = WS −1 (ρ) (Xβ +ε) = G (ρ) (Xβ +ε), (3.5) where G (ρ) = WS −1 (ρ). (3.6) Clearly, y ∗ is in general correlated with the error term. The following assumptions are postulated in order to carry out the asymptotic analysis. Assumption 3.1. The idiosyncratic errorε i , fori = 1, 2,...,n, is independently and identically distributed with zero mean and varianceσ 2 , and sup i E|ε i | 4+η <K, for someη> 0. Assumption 3.2. The vector of parametersθ 0 = (ρ 0 ,β 0 0 ,σ 2 0 ) 0 is in the interior of the parameter space Θ, which is a compact subset of (−1, 1)×R k × (0,∞). Assumption 3.3. The k× 1 vector of regressors, x i = (x i1 ,x i2 ,...,x ik ) 0 , for i = 1, 2,...,n, are dis- tributed independently of the errors,ε j , for alli andj, and sup i,s E |x is | 2+ < K, for some > 0. In addition,n −1 P n i=1 x i x 0 i → p Σ xx , where Σ xx is ak×k positive definite matrix;n −1 X 0 G 0 X→ p Σ xgx , andn −1 X 0 G 0 0 G 0 X→ p Σ xggx , where G 0 = WS −1 0 and S 0 = S(ρ 0 ) = I n −ρ 0 W. Assumption 3.4. The spatial weights matrix, W = (w ij ), is non-negative, namely,w ij ≥ 0 for alli andj; it is row-standardized such that Wτ n =τ n , whereτ n is ann−dimensional vector of ones. Assumption 3.5. The column sums of W, represented by d = (d 1 ,d 2 ,...,d n ) 0 = W 0 τ n , are non-zero and could be unbounded in the sense thatd j =κ j n δ j , forj = 1, 2,...n, where 0≤δ j ≤ 1,κ j is a fixed positive constant that does not depend on n, and max j d j =||W|| 1 = O(n δ ) with δ = max(δ j ). In addition, the number of unbounded columns of W (associated with δ j > 0) is assumed to be finite. Furthermore, |ρ|kW 22 k 1 < 1, where W 22 is the weights matrix associated with the non-dominant units. 98 Assumption 3.6. There exists ann 0 such that for alln≥n 0 (includingn→∞), either (a)n −1 Q 0 0 Q 0 is positive definite, where Q 0 = (G 0 Xβ 0 , X), and/or (b)h g,n >> 0, where is a small positive number and h g,n =n −1 Tr G 0 0 G 0 + G 2 0 − 2n −2 [Tr (G 0 )] 2 . (3.7) Remark 3.1. Under Assumptions 3.2 and 3.4, the matrix S(ρ) is invertible for all ρ satisfying|ρ| < 1 irrespective of whether the column sums of W are bounded (See Lemma 1 in Appendix A of Pesaran and Yang, 2016). Remark 3.2. Assumption 3.5 states that the column sums of W could rise withn at the rate ofδ 1 ,δ 2 ,...,δ n , respectively. Moreover, it can be shown that the number of unbounded columns of W must be fixed and cannot rise withn if the summability condition, P n j=1 δ j <K, is satisfied (See Proposition 2 of Pesaran and Yang, 2016). Remark 3.3. The assumption thatw ij ≥ 0 is imposed only for ease of exposition and is unrestrictive. When it fails to hold, we can always split W into positive and negative weights matrices, namely, W = W + +W − , and then rewrite model (3.1) as y =ρ 1 W + y+ρ 2 W − y+Xβ+ε. See Bailey et al. (2016a) for an empirical application employing this strategy. Remark 3.4. Assumption 3.6 provides the identification conditions forψ 0 and will be explained in details in the subsequent sections. 3.3 GMM estimation We begin by extending the GMM method proposed by Lee (2007) for standard SAR models to the case where the column sums of the spatial weights matrix are not necessarily bounded inn. Lee (2007) suggests using both linear moment conditions formed with instruments and additional quadratic moments based on the properties of the idiosyncratic errors. In specific, consider model (3.1) and suppose that Z is ann×r matrix of instruments for the regressors ˜ X = (y ∗ , X), wherer≥k + 1. The linear moment conditions are based on the instrument validity assumption and given by E Z 0 ε (ψ) = 0, (3.8) 99 where ε (ψ) = y−ρy ∗ − Xβ. (3.9) Given that X is exogenous, a possible candidate for Z consists of linearly independent columns of X, WX, W 2 X,... . This choice of instruments was first proposed by Kelejian and Prucha (1998). To see why Z could take this form, note that from (3.5) we haveE (y ∗ |X) = G (ρ) Xβ. This term is clearly correlated with y ∗ but uncorrelated withε. Since|ρ|kWk ∞ < 1 under Assumptions 3.2 and 3.4, G (ρ) can be expanded as G (ρ) = W (I n −ρW) −1 = W +ρW 2 +ρ 2 W 3 +..., and then G (ρ) Xβ = P ∞ j=1 ρ j−1 W j Xβ. This implies that the instruments for y ∗ can be chosen from (linearly independent) columns of WX, W 2 X,... . Furthermore, Lee (2003) has shown that the asymp- totically best IV matrix within the 2SLS framework is given by Z ∗ = Q 0 = (G 0 Xβ 0 , X). Since Z ∗ depends on the unknown parametersρ 0 andβ 0 , a feasible best IV can be constructed using some initial consistent estimates of the parameters. Turning to the quadratic moment condition, we recall that the idiosyncratic errors are assumed to be cross-sectionally uncorrelated and homoskedastic. Using these properties, we have the following moment condition: E ε 0 (ψ) Bε (ψ) = 0, (3.10) whereε (ψ) is given by (3.9) and B is a matrix that satisfies the following assumption. Assumption 3.7. The matrix B = (b ij ) is ann×n constant matrix that satisfiesTr(B) = 0,||B|| ∞ <K, and||B|| 1 =κn δ , whereκ is a fixed positive constant that does not depend onn and 0≤δ≤ 1. Equation (3.10) is a valid moment because at the true valueψ 0 , E ε 0 (ψ 0 ) Bε (ψ 0 ) =E ε 0 Bε =nσ 2 0 Tr (B) = 0. Here we consider a single quadratic moment to simplify the exposition. In practice, one could use multiple quadratic moments as long as the matrices B 1 , B 2 ,... satisfy Assumption 3.7. Lee (2007) assumes that B is uniformly bounded in both row and column sums (in absolute value) and suggests using B 1 = W− n −1 Tr (W) I n , B 2 = W 2 −n −1 Tr W 2 I n , and so on in the quadratic moments. In contrast, given that 100 W satisfies Assumptions 3.4 and 3.5, we assume that the row sum of B is bounded but the column sum of B rises withn at the rate ofδ. Remark 3.5. If the idiosyncratic errors are heteroskedastic, the conditionTr(B) = 0 in Assumption 3.7 need to be replaced by a stronger conditiondiag (B) = 0 in order to constitute valid moment. The practical choice of B in this case could be B 1 = W−Diag (W), B 2 = W 2 −Diag W 2 , and higher powers of W with diagonal entries set to zero. Remark 3.6. It is worth noting that the use of quadratic moments enables us to estimate the SAR model even if there are no exogenous regressors; if the model contains exogenous regressors, we could improve efficiency by using both quadratic and linear moments. We are now in a position to define the GMM estimator ofψ 0 , denoted by ˜ ψ = (˜ ρ, ˜ β 0 ) 0 , using both quadratic and linear moments: ˜ ψ = arg min ψ∈Ψ g 0 (ψ)Ξg (ψ), (3.11) where g(ψ) = n −1 ε 0 (ψ) Bε (ψ) n −1 Z 0 ε (ψ) , (3.12) Ξ is a symmetric positive definite moments weighting matrix that converges in probability to a positive definite matrix Ξ 0 , and Ψ denotes the parameter space ofψ satisfying Assumption 3.2. The following theorem presents the asymptotic distribution of the GMM estimator. Its proof is given in Appendix C.2. Theorem 3.1. Consider the SAR model given by (3.1) and suppose that Assumptions 3.1–3.6, and 3.7 hold. Then the true parameter values,ψ 0 = (ρ 0 ,β 0 0 ), are locally identified. The GMM estimator ofψ 0 , denoted by ˜ ψ and defined in (3.11), is consistent if 0≤δ< 1. Furthermore, ˜ ψ has the following asymptotic distribution asn→∞, if 0≤δ< 1/2, √ n ˜ ψ−ψ 0 → d N 0, Ω g , where Ω g = D 0 Ξ 0 D −1 D 0 Ξ 0 V g Ξ 0 D D 0 Ξ 0 D −1 , (3.13) 101 D = (d q , Σ Qz ) 0 , V g = v 1 μ 3ε ν 0 μ 3ε ν σ 2 0 Σ zz , (3.14) d q = (d q1 , 0 1×k ) 0 , d q1 = lim n→∞ n −1 σ 2 0 Tr B + B 0 G 0 , (3.15) v 1 = lim n→∞ n −1 n γ 2ε b 0 b +σ 4 0 Tr B B + B 0 o , ν =p lim n→∞ n −1 Z 0 b, (3.16) Σ Qz =p lim n→∞ n −1 Q 0 0 Z, Σ zz =p lim n→∞ n −1 Z 0 Z, (3.17) μ 3ε =E(ε 3 i ),γ 2ε =E(ε 4 i )− 3σ 4 0 , andb =diag (B). The optimal moments weighting matrix is given by Ξ ∗ = V −1 g . A feasible optimum GMM estimator (OGMM) ofψ 0 , denoted by ˜ ψ o , can be obtained by using a consistent estimator of V −1 g in the moments conditions. Theorem 3.2. Under the same assumptions as in Theorem 3.1, the feasible OGMM estimator, defined as ˜ ψ o = arg min ψ∈Ψ g 0 (ψ) ˜ V −1 g g (ψ), (3.18) is consistent if 0 ≤ δ < 1, where g (ψ) is given by (3.12) and ˜ V −1 g is a consistent estimator of V g . Furthermore, ˜ ψ o has the following asymptotic distribution asn→∞, if 0≤δ< 1/2, √ n ˜ ψ o −ψ 0 → d N 0, D 0 g V −1 g D g −1 . The best choice of B exists under certain conditions. Lee (2007) shows that if the idiosyncratic errors are normally distributed, the OGMM estimator using B ∗ = G 0 −n −1 Tr (G 0 ) I n in the quadratic moment and Z ∗ in the linear moments has the smallest asymptotic variance among the group of GMM estimators derived with the class of matrices having zero trace. This estimator is referred as the best GMM estimator, and B ∗ is referred to as the best quadratic matrix. 2 By a similar argument and applying Lemma C.4, it is straightforward to see that the asymptotic properties of the best GMM estimator can be extended to the case where the column sums of W rise withn, under the same conditions onδ as in Theorem 3.1. Since both B ∗ and Z ∗ depend on unknown parameters, a feasible best GMM estimation can be implemented in two steps: 2 Among the group of GMM estimators derived with the class of matrices having zero diagonal, the OGMM estimator using B ∗ = G0−Diag (G0) and Z ∗ in the moments has the smallest asymptotic variance. This result does not require the condition that the idiosyncratic errors are normally distributed. See Lee (2007) Proposition 3 for details. 102 In the first step, we obtain a preliminary consistent estimate ofψ 0 ; then in the second step, we perform the optimal GMM estimation using the best IV and best quadratic matrix evaluated at the first-stage estimate. 3.4 BMM estimation In this section, we propose applying the bias-corrected method of moments (BMM) to SAR models, which was recently introduced in Chudik and Pesaran (2017) for the estimation of dynamic panels with shortT , and is shown to perform well as compared to GMM. The BMM procedure uses least squares but corrects the bias due to the endogeniety, to be contrasted with the bias-corrected estimation methods in the literature that correct for the bias of the estimator. The application of BMM to the SAR model given by (3.3) is straightforward. Using y ∗ and X as instruments, the bias-corrected population moments are given by E y ∗0 (y−ρy ∗ − Xβ) =E y ∗0 ε , (3.19) E X 0 (y−ρy ∗ − Xβ) = 0, (3.20) E (y−ρy ∗ − Xβ) 0 (y−ρy ∗ − Xβ) =nσ 2 . (3.21) Using (3.5), we have E y ∗0 ε =E β 0 X 0 +ε 0 G 0 (ρ)ε , and under Assumptions 3.1 and 3.3, we obtainE (y ∗0 ε) =σ 2 Tr [G (ρ)]. The sample version of the moment conditions (3.19)–(3.21) can now be written as n −1 y ∗0 y−ˆ ρy ∗ − X ˆ β = ˆ σ 2 Tr h n −1 G (ˆ ρ) i , (3.22) n −1 X 0 y−ˆ ρy ∗ − X ˆ β = 0, (3.23) n −1 y−ˆ ρy ∗ − X ˆ β 0 y−ˆ ρy ∗ − X ˆ β = ˆ σ 2 =n −1 (y−ˆ ρy ∗ ) 0 M x (y−ˆ ρy ∗ ), (3.24) where M x = I n − X (X 0 X) −1 X 0 , the vector ˆ θ = (ˆ ρ, ˆ β 0 , ˆ σ 2 ) 0 denotes the BMM estimator of θ 0 = (ρ 0 ,β 0 0 ,σ 2 0 ) 0 , andθ 0 is the true value ofθ = (ρ,β 0 ,σ 2 ) 0 . The above system of equations (3.22)–(3.24) can now be used to solve for ˆ θ as follows: 103 ˆ θ = arg min θ∈Θ m 0 (θ) m (θ), (3.25) where m (θ) =n −1 m 1 (θ), m 0 2 (θ),m 3 (θ) 0 , m 1 (θ) =n −1 y ∗0 ε (ψ)−σ 2 Tr h n −1 G (ρ) i , m 2 (θ) =n −1 X 0 ε (ψ), m 3 (θ) =n −1 ε 0 (ψ)ε (ψ)−σ 2 , andε (ψ) is given by (3.9). Unlike least squares, the BMM procedure is non-linear in ˆ ρ, and its asymptotic properties critically depends on the assumptions regarding the rate at which the column sums of W rise with n. As we shall see, the BMM estimators are consistent and do not suffer from the weak instrument problem since y ∗ is instrumented with its own values. However, in small samples it might be beneficial to augment the system of estimating equations, (3.22)–(3.24), with additional moment conditions. See, for example, Lee (2001, 2007). The following theorem summarizes the asymptotic distribution of the BMM estimator. Its proof is given in Appendix C.2. Theorem 3.3. Consider the SAR model given by (3.1) and suppose that Assumptions 3.1–3.5 and 3.6 hold. The bias-corrected method of moments (BMM) estimator ofψ 0 , denoted by ˆ ψ = (ˆ ρ, ˆ β 0 ) 0 and defined by (3.25), is consistent if 0≤ δ < 1. Furthermore, it has the following asymptotic distribution asn→∞, if 0≤δ< 1/2, √ n ˆ ψ−ψ 0 → d N (0, Ω b ), where Ω b = H −1 VH −1 , (3.26) 104 H = h ρρ β 0 0 Σ xgx Σ xgx β 0 Σ xx , V = ω 2 σ 2 0 β 0 0 Σ xgx σ 2 0 Σ xgx β 0 σ 2 0 Σ xx , (3.27) ω 2 =σ 2 0 β 0 0 Σ xggx β 0 +γ 2ε p lim n→∞ n −1 n X i=1 π 2 ii +σ 4 0 p lim n→∞ h Tr n −1 Π 0 Π +Tr n −1 Π 2 i , (3.28) h ρρ =β 0 0 Σ xggx β 0 +σ 2 0 h g , Π = G 0 − M x Tr n −1 G 0 = (π ij ), (3.29) h g =p lim n→∞ h n −1 Tr G 0 0 G 0 + G 2 0 − 2n −2 Tr (G 0 M x )Tr (G 0 ) i , (3.30) and Σ xgx and Σ xggx are defined in Assumption 3.3. Remark 3.7. It can be seen from (3.26) thatψ 0 is identified if H is positive definite. Note that H = H 1 +H 2 , where H 1 =p lim n→∞ n −1 Q 0 0 Q 0 , and H 2 = σ 2 0 h g 0 1×k 0 k×1 0 k×k . Since H 1 is positive semi-definite andh g ≥ 0, it follows that H is positive definite if eitherh g > 0 and/or H 1 is positive definite. Further notice that n −1 Tr (G 0 M x ) =n −1 Tr (G 0 )−n −1 Tr h G 0 X X 0 X −1 X 0 i =n −1 Tr (G 0 )−n −1 Tr n −1 X 0 X −1 n −1 X 0 G 0 X . Since n −1 X 0 X → p Σ xx , and n −1 X 0 G 0 X → p Σ xgx , we have p lim n→∞ n −1 Tr (G 0 M x ) = lim n→∞ n −1 Tr (G 0 ), and then h g = lim n→∞ n n −1 Tr G 0 0 G 0 + G 2 0 − 2n −2 [Tr (G 0 )] 2 o . (3.31) Under Assumption 3.6(a), H 1 is positive definite; under Assumption 3.6(b),h g > 0. Therefore, Assumption 3.6 ensures the identification ofψ 0 . Remark 3.8. The identification conditions given by Assumption 3.6 reduce to (b)h g,n > 0 for alln including n → ∞, if there are no exogenous regressors. This result is consistent with the conditions derived in Aquaro et al. (2015) and Yang (2017), and can be obtained as a special case under homogeneity of spatial 105 coefficients and in the absence of common factors. Including exogenous regressors in the model helps with the identification ofρ, as long as not all slope coefficients are zero. Remark 3.9. It is interesting to relate the identification condition given by Assumption 3.6(b) to the literature on social interactions. Let us first consider a simple example where there is only one social group in which everyone is connected with each other and self-influence is excluded. In this case, the matrix of network is represented by W = 1 n− 1 τ n τ 0 n − I n . (3.32) Section 1.3 has shown that a necessary condition for Assumption 3.6(b) is given byn −1 tr (W 0 W)>> 0 for all n (including n → ∞), where is some positive number. Given (3.32), it is easily verified that lim n→∞ n −1 Tr (W 0 W) = 0 and hence the identification condition is violated. To see this, note that W 2 = 1 n− 1 2 n τ n τ 0 n − 2τ n τ 0 n + I n , and then n −1 Tr W 2 = 1 n (n− 1) 2 n 2 − 2n +n = 1 n− 1 , which tends to zero, asn→∞. Therefore, the endogenous social effect is unidentifiable without exogenous regressors. Now suppose that there areR groups andn r units in therth group, forr = 1, 2,...,R. Clearly, P R r=1 n r = n. The standard linear-in-means social interaction model assumes that individuals within a group have the same pairwise dependence, whereas individuals across different groups are not dependent. See Case (1991, 1992) for examples of empirical studies employing such a network structure. Then the matrix of group interactions, W, can be represented by the following block diagonal matrix: W =Diag (W 1 , W 2 ,..., W R ), W r = 1 n r − 1 τ nr τ 0 nr − I nr , r = 1, 2,...,R. Since we have shown that Tr W 2 r = n r n r − 1 , it follows that n −1 Tr W 2 = P R r=1 Tr W 2 r n = R X r=1 1 n r − 1 π r , 106 where π r = n r /n is the fraction of population in the rth group. Suppose that n r rises with n such that π r ≥ 0, as n → ∞. If R is fixed, then lim n→∞ n −1 Tr W 2 = 0 and the group interaction effect is unidentified in the absence of exogenous explanatory variables. 3.5 Monte Carlo experiments In this section, we examine the small sample properties of the GMM and BMM estimators for SAR models with dominant units using Monte Carlo techniques. The Data Generating Process (DGP) is specified as follows: y i =ρy ∗ i +βx i +α +σ ε ε i , i = 1, 2,...,n, (3.33) wherey ∗ i = w 0 i.,y y, y = (y 1 ,y 2 ,...,y n ) 0 , and w 0 i.,y is thei th row of W y . The exogenous regressor, x i , could also be spatially autocorrelated and is generated as x i =λx ∗ i +σ η η i , i = 1, 2,...,n, (3.34) wherex ∗ i = w 0 i.x x, x = (x 1 ,x 2 ,...,x n ) 0 , and w 0 i.,x is thei th row of W x . Note that the spatial coefficient and weights matrix may be different for the y and x processes. In matrix form, (3.33) can be rewritten as y = S −1 y (ρ) (βx+ατ n ) + u, where S y (ρ) = I n −ρW y , u =σ ε S −1 (ρ)ε,ε = (ε 1 ,ε 2 ,...,ε n ) 0 , and u = (u 1 ,u 2 ,...,u n ) 0 . Similarly, (3.34) can be rewritten as x =σ η S −1 x (λ)η, where S x (λ) = I n −λW x andη = (η 1 ,η 2 ,...,η n ) 0 . For the idiosyncratic errors, we consider both Gaussian and non-Gaussian processes: Gaussian errors:ε i ∼IIDN (0, 1) andη i ∼IIDN (0, 1). Non-Gaussian errors:ε i ∼IID χ 2 (2)− 2 /2 andη i ∼IID χ 2 (2)− 2 /2, whereχ 2 (2) denotes a chi-square random variable with two degrees of freedom. 107 The parameter values are set toα = 1,β = 1,λ = 0.75,σ ε = 1, andσ η takes the value such that the following relation holds R 2 = 1− n −1 Tr [Var (u)] n −1 Tr [Var(y)] = 1− a σ 2 η /σ 2 ε β 2 b +a , (3.35) where a =n −1 Tr h Δ −1 y (ρ) i , b =n −1 Tr h Δ −1 x (λ) Δ 0−1 y (ρ) i , Δ x (λ) = S 0 x (λ) S x (λ), Δ y (ρ) = S 0 y (ρ) S y (ρ). That is, for a givenR 2 , we computeσ η following σ 2 η = aR 2 bβ 2 (1−R 2 ) σ 2 ε . (3.36) Turing to the specifications of the spatial weights matrices, we consider the case where W x = W y = W in the main text and relegate the results for W x 6= W y to the online supplement. The spatial weights matrix W, W = (w ij ) n×n = 0 w 0 12 w 21 W 22 . is generated as follows: We assume, without loss of generality, that the first unit of the network isδ-dominant and the rest are non-dominant. Specifically, the first j n δ k elements of w 21 are drawn fromIIDU(0, 1) and the rest are zero, whereb.c denotes the integer part of a number. In this way, the sum of the first column of W expands with n at the rate of δ, i.e., P n i=1 w i1 = O n δ . The first 8 elements of w 0 12 take value one and the remaining elements take value zero. W 22 is a standard (n− 1)× (n− 1) spatial matrix with 8 connections (4-ahead-and-4-behind with equal weights), namely,w i,j = 0.125 forj = i− 4,...,i− 1, i+1,...,i+4, andw i,j = 0 otherwise. By construction, W 22 is uniformly bounded in both row and column sums. Finally, W is standardized so that each row sums to one. We consider combinations ofδ = 0, 0.25, 0.50, 0.75, 0.95, 1;ρ = 0.2, 0.5, 0.75, 0.95; andR 2 = 0.75. The sample sizes aren = 100, 300, 500, and 1, 000. The number of replications is 2, 000. We perform both GMM and BMM estimation for each experiment. The BMM estimator is computed by (3.25). The GMM 108 estimator is obtained by using the best IV and best quadratic matrix in the moments. 3 In specific, the GMM estimator is calculated in two steps: In the first step, we conduct the equally weighted GMM estimation following (3.11), using B 1 = W, B 2 = W 2 −n −1 Tr W 2 I n , and Z = τ n , x, Wx, W 2 x in the moment conditions. In the second step, we re-estimate the model by the optimum GMM procedure using the best IV and best quadratic matrix evaluated at the first-step estimate, namely, ˜ Z ∗ = ˜ Gx˜ α, ˜ Gx ˜ β,τ n , x and ˜ B ∗ = ˜ G−n −1 Tr ˜ G I n , where ˜ ψ = ˜ ρ, ˜ α, ˜ β 0 denotes the first-step GMM estimate and ˜ G = G (˜ ρ). Tables 3.4 and 3.5 present the small sample properties of the GMM and BMM estimators for the exper- iments with Gaussian errors; Tables 3.6 and 3.7 present the results for the case of non-Gaussin errors. For each experiment, we report bias, root mean square error (RMSE), size, and power of both estimators ofρ andβ. The estimates of the intercept term are omitted in order to save space. In addition, Figures 3.1–3.4 plot the empirical power functions of testingρ 0 = 0.5 andρ 0 = 0.75 whenn = 100 andδ = 0, 0.25, 0.75, and 1. A complete collection of power function graphs are provided in the online supplement. Let us begin by examining the bias and RMSE results. We first observe that both GMM and BMM estimators display declining bias and RMSE as the sample size increases. On the whole, the bias and RMSE are very small even whenn = 100, irrespective of the magnitude of spatial autoregressive parameter. How- ever, as the value ofδ approaches 1, we see a substantial increase in RMSE. This finding is in line with the theoretical result that the estimators are consistent ifδ < 1. Moreover, the RMSE of the GMM and BMM estimators are very close whenδ< 1. In specific, the BMM estimator ofβ has smaller RMSE than the GMM estimator whenn = 100, but such differences become negligible whenn≥ 300; both estimators ofρ yield extremely similar RMSE for all sample sizes under consideration. These results support our finding that the GMM and BMM estimators have the same limiting distribution under Gaussian errors andδ < 1/2. Last but not least, a comparison of the results obtained under different specifications of the disturbances reveals that the estimators are very robust to non-Gaussian errors. We now turn to inspecting the size and power properties. As can be seen from Table 3.4, the tests based on both estimators of ρ overall have empirical sizes close to the nominal size of 5% when δ is no higher than 0.75. When the sample size is small (n = 100), the GMM estimator slightly over-rejects the null if the degree of spatial autocorrelation is high (ρ 0 = 0.75), and the size distortion becomes more severe asρ 0 rises to near unity. In comparison, the BMM estimator has proper empirical sizes even when the sample size is small andρ 0 is high. As the sample size becomes larger (n≥ 300), both estimators exhibit correct size and 3 We also consider the GMM estimators using other IV and quadratic matrices. The results are presented in the online supplement. 109 high power for all values ofρ 0 ifδ≤ 0.75. Turning to the estimation results ofβ (Table 3.5), it is evident that both estimators yield improved size and power properties as compared to the results ofρ. There are no notable changes in these findings when the errors are non-Gaussian. 3.6 An empirical application to US price networks In this section, we apply the GMM and BMM estimation techniques to examine the inflation spillover through intersectoral production linkages in the US economy. This application is a continuation of the inves- tigation by Pesaran and Yang (2016), who find that the highest degree of dominance in the US production network lies between 0.72 and 0.82 over the period 1972–2007. Accordingly, the standard assumption in the spatial econometrics literature that presumes all units are non-dominant is violated. We extend the closed economy multi-sectoral model in Pesaran and Yang (2016) to a small open econ- omy in which production also requires imported intermediate inputs (raw materials). For simplicity, we assume that there is only one type of imported intermediate good, whose quantity demanded for production by sectori at timet is denoted bym it . Each sectori produces output,q it , by the following Cobb-Douglas production technology: q it =e αu it l α it m ϑ it n Y j=1 q (1−α−ϑ)w ij ij,t , for i = 1, 2,...,n;t = 1, 2,...,T, (3.37) wherel it is the labor input,q ij,t is the amount of output of sectorj used by sectori,u it is the productivity shock that consists of two components: u it = ε it +γ i f t , where ε it is a sector-specific shock, and f t is a common factor with heterogeneous factor loading γ i . The parameter α represents the share of labor, ϑ represents the share of the imported intermediate good, andw ij is the share of sectorj’s output in the total domestic intermediate input use in sectori. The representative household has the following Cobb-Douglas preferences overn goods: u (c 1t, c 2t ,...,c nt ) =A n Y i=1 c 1/n it ,A> 0. wherec it is the quantity consumed of goodi. Furthermore, the household is endowed withl t unit of labor, supplied inelastically at wage rate Wage t . 110 In equilibrium, the commodity markets clear, c it =q it − n X j=1 q ji,t −q X it , for i = 1, 2,...,n, (3.38) whereq X it is the quantity exported of goodi; the labor market clears,l t = P n i=1 l it , and trade is balanced, P t,m n X i=1 m it = n X i=1 P it q X it , (3.39) whereP it denotes the price of goodi, andP t,m denotes the exogenous world price of the imported interme- diate good. Given prices{P 1t ,P 2t ,...,P nt ,P t,m , Wage t }, the profit-maximization problem of sector i, for i = 1, 2,...,n, is given by max q ij,t ,l it ,m it P it e αu it l α it m ϑ it n Y j=1 q (1−α−ϑ)w ij ij,t − Wage t l it −P t,m m it − n X j=1 P jt q ij,t . The first-order conditions with respect toq ij,t ,l it , andm it imply that q ij,t = (1−α−ϑ)w ij P it q it P jt , (3.40) l it = αP it q it Wage t , (3.41) and m it = ϑP it q it P t,m . (3.42) Substituting (3.40)–(3.42) into (3.37) and simplifying gives p it = (1−α−ϑ) n X j=1 w ij p jt +αω t +ϑp t,m −b i −α (γ i f t +ε it ), (3.43) wherep it = log (P it ),ω t = log (Wage t ),p t,m = log (P t,m ), and b i =α log (α) +ϑ log (ϑ) + (1−α−ϑ) log (1−α−ϑ) + (1−α−ϑ) n X j=1 w ij log(w ij ) We setw ij log (w ij ) = 0 ifw ij = 0. 111 The price system, (3.43), can be rewritten in matrix form as p t = (1−α−ϑ) Wp t +αω t τ n +ϑp t,m τ n − (b +αγf t +αε t ), (3.44) where p t = (p 1t ,p 2t, ...,p nt ) 0 , W = (w ij ) n×n ,τ n is an n× 1 vector of ones,γ = (γ 1 ,γ 2 ,...,γ n ) 0 , b = (b 1 ,b 2 ,...,b n ) 0 , andε t = (ε 1t ,ε 2t ,...,ε nt ) 0 . Note that in the absence of the imported intermediate good, (3.44) reduces to (11) in Pesaran and Yang (2016). It follows from (3.44) that the average change in log-prices from periodst tot−s (1≤t−s<t≤T ) is given by Δ s p t =ρWΔ s p t + (αΔ s ω t +ϑΔ s p t,m )τ n −αγΔ s f t −αΔ s ε t , (3.45) where Δ s = (1/s)(L−L s ),L is the lag operator, andρ = 1−α−ϑ. The unobserved individual effects can be modeled as −αΔ s f t γ i =β 0 t x i +ζ i , where x i = (x 1 ,x 2 ,...,x k ) 0 is a vector of observed explanatory variables, andζ i is assumed to be indepen- dent of x i and all the other random variables in the model. Then (3.45) can be rewritten as Δ s p t =ρWΔ s p t +η t τ n + Xβ t +u t , (3.46) whereη t =αΔ s ω t +ϑΔ s p t,m , X = (x 0 1 , x 0 2 ,..., x 0 n ) 0 ,u t =−αΔ s ε t +ζ, andζ = (ζ 1 ,ζ 2 ,...ζ n ) 0 . The new composite error,u it , is assumed to be independently distributed acrossi with zero mean and variance σ 2 , 0<σ 2 <K. For a givent, we will estimate the reduced-form cross-section model (3.46) using the BMM methods. The parameters of interest are ρ,η t ,β 0 t 0 , whereρ is the spatial coefficient, which can be interpreted as the capital’s share of output in this model,η t is an intercept, andβ t is the vector of slope parameters associated with the observed explanatory variables. Note that for a givent, we can only test the significance ofβ t but not the its sign, because the sign ofβ t varies with that of the unobserved term, Δ s f t . We now briefly describe the data we use. The spatial weights matrix is constructed from the input-output tables at the most disaggregated level. These tables are compiled by the Bureau of Economic Analysis (BEA) every five years. In specific, W is a commodity-by-commodity direct requirements matrix, of which 112 Table 3.1: Estimates of the degree of dominance,δ, for the top five pervasive sectors using US input-output tables Input-output table for 2002 Input-output table for 2007 Balanced panel W 2002 ˜ W 2002 W 2007 ˜ W 2007 W ˜ W b δ (1) 0.778 0.851 0.724 0.705 0.753 0.806 b δ (2) 0.759 0.796 0.651 0.703 0.720 0.693 b δ (3) 0.597 0.642 0.608 0.695 0.598 0.672 b δ (4) 0.550 0.422 0.592 0.565 0.541 0.472 b δ (5) 0.546 0.402 0.553 0.491 0.520 0.438 S.E. (0.056) (0.083) n 313 [301] 286 [114] 384 [364] 350 [140] 308 [300] 271 [144] Notes: ˜ W denotes the robust W matrix constructed using a threshold value ofεw = 0.1. ˆ δ (1) ≥ ˆ δ (2) ≥...≥ ˆ δ (5) are the five largest estimates of the degree of dominance. The balanced panel was constructed by merging the input-output tables for year 2002 and 2007 based on the BEA industry codes. Standard errors (S.E.) are in parentheses. S.E. cannot be computed whenT = 1.n is the total number of sectors with non-zero total demands (indegrees). The numbers in square brackets are the numbers of sectors with non-zero outdegrees. Note that a few sectors were dropped when constructing ˜ W from W, since their total demands become zero. the (i,j) th entry represents the expense on commodity j per dollar of production of commodity i. 4 We construct the robust weights matrix, denoted by ˜ W, by setting each element of W to one if it is greater than or equal to a given threshold valueε w (0<ε w < 1), and to zero if otherwise. Then ˜ W is row-standardized so that each row sums up one. In particular, we useε w = 0.1 as the cut-off value in our analysis. This means that for any given sector, only important suppliers that contribute at least 10% of the total input purchases are taken into account. The sectoral price data are also obtained from the BEA and matched with the input-ouput tables based on the BEA industry codes. The price data are available at the annual frequency from 1997 to 2014. Given the time range of the price data, we consider ˜ W constructed from the input-output tables for the years 2002 and 2007. For the explanatory variable, we are interested in the effect of volatility of real output growth on price changes. LetY it denote the real output of sectori in yeart, fori = 1, 2,...,n andt = 1, 2,...,T . The real output growth is computed by Δy it =y it −y i,t−1 , wherey it = log (Y it ). Then the volatility of real output growth for sector i, denoted by σ y , is computed as the standard deviation of Δy it over the entire sample period, namely,σ y = P T t=1 Δy it − Δy i 2 / (T− 1) 1/2 , where Δy i =T −1 P T t=1 Δy it . 4 The words commodity and sector are used interchangeably to convey the same meaning throughout this paper. 113 Table 3.2: Estimation results of the cross-section model using the 2002 input-output table Year 1998-2000 2001-2003 2004-2006 2007-2009 2010-2012 2013-2014 (1) (2) (1) (2) (1) (2) (1) (2) (1) (2) (1) (2) GMM estimates ρ 0.29 0.25 0.42 0.51 0.49 0.51 0.37 0.33 0.35 0.33 0.34 0.36 (0.14) (0.12) (0.09) (0.08) (0.08) (0.07) (0.08) (0.08) (0.07) (0.07) (0.08) (0.08) β -0.20 -0.12 -0.04 -0.10 -0.06 -0.09 (0.04) (0.03) (0.05) (0.04) (0.03) (0.03) Con. 0.21 2.15 0.70 1.81 0.95 1.24 1.72 2.70 1.46 2.14 0.87 1.75 (0.23) (0.46) (0.19) (0.34) (0.30) (0.53) (0.29) (0.46) (0.23) (0.37) (0.19) (0.33) BMM estimates ρ 0.29 0.25 0.42 0.35 0.49 0.50 0.38 0.37 0.35 0.34 0.34 0.33 (0.14) (0.13) (0.09) (0.09) (0.08) (0.08) (0.08) (0.07) (0.07) (0.07) (0.08) (0.08) β -0.20 -0.14 -0.04 -0.10 -0.06 -0.09 (0.04) (0.03) (0.05) (0.04) (0.03) (0.03) Con. 0.21 2.13 0.70 2.02 0.95 1.31 1.70 2.64 1.46 2.06 0.87 1.76 (0.23) (0.46) (0.19) (0.34) (0.30) (0.54) (0.29) (0.46) (0.23) (0.37) (0.19) (0.33) Notes: The model is given by (3.46). The spatial weights matrix is ˜ W2002, which is constructed using a threshold value of εw = 0.1. The parameterρ is the spatial coefficient,β is the coefficient of volatility of real output growth, and Con. refers to the constant term. The GMM estimates are computed following (3.11) using the best IV and best quadratic moment. The BMM estimates are computed by (3.25). Standard errors are in parentheses. We begin with examining the degrees of dominance of the production networks by applying the extremum estimators developed by Pesaran and Yang (2016). Table 3.1 reports the estimated degrees of dominance of the top five pervasive sectors for the year 2002, for the year 2007, and for the balanced panel constructed by merging the data sets for these two years. The results show that the highest degree of domi- nance, b δ (1) , lies between 0.71 and 0.85. 5 Tables 3.2 and 3.3 present the estimation results of the cross-section SAR model (3.46) using the US input-output tables for years 2002 and 2007, respectively. The rolling window size is three years, and the results are reported at an interval of two years. 3.7 Concluding remarks A crucial assumption in the spatial econometrics literature requires that the interaction matrix is uni- formly bounded in both row and column sums in absolute value. This assumption excludes the existence of dominant units in the network and is too restrictive for many applications. The current paper relaxes 5 Although the estimated highest degree of dominance is greater than 0.5, our extensive Monte Carlo experiments suggest that the variance formula works well whenδ (1) is around 0.75. 114 Table 3.3: Estimation results of the cross-section model using the 2007 input-output table Year 1998-2000 2001-2003 2004-2006 2007-2009 2010-2012 2013-2014 (1) (2) (1) (2) (1) (2) (1) (2) (1) (2) (1) (2) GMM estimates ρ 0.42 0.33 0.48 0.49 0.36 0.35 0.22 0.22 0.26 0.26 0.28 0.28 (0.10) (0.10) (0.08) (0.07) (0.08) (0.08) (0.08) (0.07) (0.07) (0.07) (0.09) (0.09) β -0.16 -0.11 -0.03 -0.10 -0.06 -0.07 (0.03) (0.02) (0.04) (0.03) (0.03) (0.02) Con. 0.47 2.00 0.74 1.79 1.59 1.90 2.20 3.11 1.57 2.16 1.01 1.64 (0.20) (0.37) (0.17) (0.27) (0.31) (0.49) (0.29) (0.42) (0.20) (0.32) (0.18) (0.30) BMM estimates ρ 0.43 0.39 0.48 0.42 0.36 0.36 0.25 0.25 0.26 0.26 0.29 0.28 (0.10) (0.09) (0.08) (0.08) (0.08) (0.08) (0.07) (0.07) (0.07) (0.07) (0.09) (0.09) β -0.15 -0.12 -0.03 -0.10 -0.06 -0.07 (0.03) (0.02) (0.04) (0.03) (0.03) (0.02) Con. 0.46 1.85 0.74 1.88 1.58 1.86 2.18 3.08 1.56 2.16 1.01 1.64 (0.20) (0.36) (0.17) (0.28) (0.31) (0.48) (0.28) (0.42) (0.20) (0.32) (0.18) (0.30) Notes: The estimates are based on the robust spatial weights matrix ˜ W2007, which is constructed using a threshold value of εw = 0.1. See also the notes to Table 3.2. this assumption and allows the column sums to increase with the number of cross-section units, n. This paper develops asymptotic theory for two estimation approaches: it first extends the GMM procedure by Lee (2007) and then proposes a new BMM estimator, which self-instruments the endogenous spatially lagged dependent variable and corrects the bias of the moment conditions. Both estimators are shown to be con- sistent and normally distributed if the maximum absolute column sum of the interaction matrix does not increase too fast asn grows. The theoretical findings are supported by extensive Monte Carlo evidence. This paper also provides an empirical application to the price network covering more than 300 industries in the US. It documents significant inflation spillover between industries and also identifies a significant effect of real output growth volatility on sectoral inflation. 115 Table 3.4: Small sample properties of the GMM and BMM estimators ofρ for the experiments with Gaussian errors GMM BMM Bias(×100) RMSE(×100) Bias(×100) RMSE(×100) δ\n 100 300 500 1, 000 100 300 500 1, 000 100 300 500 1, 000 100 300 500 1, 000 ρ = 0.2 0.00 -2.05 -0.43 -0.31 -0.16 10.68 5.50 4.16 2.87 -2.78 -0.65 -0.42 -0.22 10.68 5.49 4.16 2.87 0.25 -2.03 -0.42 -0.31 -0.16 10.63 5.50 4.16 2.87 -2.75 -0.64 -0.42 -0.22 10.63 5.49 4.16 2.87 0.50 -1.98 -0.41 -0.30 -0.16 10.46 5.40 4.12 2.85 -2.75 -0.64 -0.42 -0.22 10.43 5.40 4.12 2.85 0.75 -1.67 -0.37 -0.27 -0.15 10.76 5.53 4.27 2.92 -2.87 -0.67 -0.42 -0.22 10.68 5.50 4.25 2.92 0.95 -1.63 -0.41 -0.30 -0.16 15.74 7.66 5.93 4.11 -4.65 -1.21 -0.77 -0.40 14.85 7.40 5.76 4.04 1.00 -2.34 -0.63 -0.64 -0.34 21.61 11.63 9.35 6.75 -6.87 -2.16 -1.56 -0.81 19.58 10.64 8.59 6.21 ρ = 0.5 0.00 -2.12 -0.49 -0.33 -0.17 9.22 4.53 3.41 2.34 -3.06 -0.74 -0.46 -0.24 9.30 4.54 3.42 2.35 0.25 -2.11 -0.49 -0.33 -0.17 9.19 4.52 3.41 2.34 -3.04 -0.74 -0.46 -0.24 9.27 4.54 3.42 2.35 0.50 -2.05 -0.48 -0.33 -0.18 9.00 4.45 3.39 2.33 -3.04 -0.74 -0.46 -0.24 9.07 4.46 3.39 2.33 0.75 -1.91 -0.49 -0.34 -0.19 9.72 4.85 3.75 2.60 -3.45 -0.89 -0.54 -0.29 9.72 4.84 3.75 2.61 0.95 -2.52 -0.82 -0.67 -0.43 17.68 8.61 6.76 4.65 -7.20 -2.28 -1.49 -0.83 16.11 8.15 6.37 4.52 1.00 -4.65 -1.35 -1.05 -0.54 26.56 14.32 10.99 7.31 -11.28 -3.95 -2.60 -1.31 22.55 12.04 9.42 6.51 ρ = 0.75 0.00 -1.95 -0.50 -0.32 -0.16 6.93 3.16 2.35 1.61 -2.97 -0.74 -0.44 -0.22 7.12 3.19 2.36 1.61 0.25 -1.96 -0.50 -0.32 -0.17 6.92 3.15 2.35 1.61 -2.96 -0.74 -0.44 -0.22 7.11 3.18 2.36 1.61 0.50 -1.88 -0.49 -0.32 -0.17 6.78 3.10 2.34 1.60 -2.95 -0.74 -0.45 -0.23 6.95 3.13 2.35 1.61 0.75 -1.91 -0.58 -0.38 -0.23 7.94 3.69 2.80 1.94 -3.68 -1.01 -0.60 -0.33 8.00 3.69 2.80 1.94 0.95 -3.36 -0.71 -0.57 -0.46 18.25 8.12 5.80 3.60 -9.42 -2.89 -1.74 -0.92 16.18 7.04 5.12 3.41 1.00 -8.68 -1.66 -0.41 -0.12 30.58 15.18 11.00 7.08 -15.04 -5.28 -3.27 -1.62 23.06 11.09 8.23 5.50 ρ = 0.95 0.00 -1.18 -0.32 -0.19 -0.09 3.52 1.13 0.79 0.52 -1.83 -0.42 -0.24 -0.11 3.31 1.15 0.80 0.53 0.25 -1.16 -0.31 -0.19 -0.09 3.38 1.13 0.79 0.52 -1.83 -0.41 -0.24 -0.11 3.32 1.14 0.80 0.52 0.50 -1.15 -0.30 -0.18 -0.09 4.13 1.11 0.79 0.52 -1.79 -0.41 -0.24 -0.12 3.27 1.12 0.79 0.52 0.75 -1.09 -0.32 -0.20 -0.13 4.19 1.49 1.04 0.66 -2.42 -0.59 -0.34 -0.18 4.35 1.44 1.01 0.67 0.95 -4.85 -0.70 -0.10 -0.06 16.23 6.47 2.86 1.60 -13.35 -2.41 -1.21 -0.55 20.15 4.47 2.45 1.33 1.00 -16.49 -10.32 -8.20 -5.43 39.27 30.87 28.57 23.65 -34.96 -23.58 -20.44 -16.22 40.31 29.30 26.27 21.11 Notes: The data generating process (DGP) is given by (3.33) and (3.34), fori = 1, 2,...,n, where the errors are generated asIIDN (0, 1). True parameter values are set toα = 1, β = 1,λ = 0.75, andσε = 1,ση is computed by (3.36) andR 2 = 0.75. The first column of Wy isδ−dominant, and the rest of the columns are non-dominant. Wx = Wy . The number of replications is 2, 000. The BMM estimator is given by (3.25). The GMM estimator is obtained by using ˜ Z ∗ = ˜ Gx˜ α, ˜ Gx ˜ β,τn, x and ˜ B ∗ = ˜ G−n −1 Tr ˜ G In in (3.12), where ˜ G = W (In− ˜ ρW) −1 , and ˜ ψ = ˜ ρ, ˜ α, ˜ β 0 denote the initial GMM estimates defined in (3.11) and computed with Z = τn, x, Wx, W 2 x , B1 = W, B2 = W 2 −n −1 Tr W 2 In, and Ξ = In. The 95% confidence interval for size 5% is [3.6%, 6.4%]. The power is calculated atρ− 0.1, whereρ denotes the true value. 116 Table 3.4: (Continued) GMM BMM Size(×100) Power(×100) Size(×100) Power(×100) δ\n 100 300 500 1, 000 100 300 500 1, 000 100 300 500 1, 000 100 300 500 1, 000 ρ = 0.2 0.00 6.00 5.25 4.90 4.30 16.60 43.30 63.50 91.10 6.10 5.35 4.70 4.40 15.00 42.00 62.65 90.75 0.25 5.90 5.35 4.95 4.25 16.95 43.45 63.40 90.90 6.05 5.45 4.60 4.25 14.80 42.10 62.40 90.60 0.50 5.70 5.05 4.60 4.05 16.80 44.05 64.00 91.40 5.65 5.15 4.60 4.10 14.95 42.40 63.10 90.95 0.75 6.45 5.15 5.70 4.30 17.80 43.85 64.45 90.85 5.85 4.90 5.25 4.70 14.35 41.80 63.55 90.55 0.95 10.65 9.05 8.05 6.80 20.20 33.60 48.25 72.20 7.80 7.75 7.15 5.90 12.00 29.45 44.55 70.00 1.00 18.05 14.55 15.35 14.75 22.20 28.20 33.95 49.25 11.00 10.35 10.95 10.85 12.90 21.10 28.35 46.55 ρ = 0.5 0.00 6.35 5.65 4.75 4.55 23.60 58.45 79.80 98.00 6.35 5.55 4.80 4.55 20.25 56.55 79.35 97.95 0.25 6.45 5.55 4.75 4.55 23.75 58.70 79.50 98.00 6.25 5.65 4.75 4.55 20.05 56.70 78.95 97.85 0.50 6.25 5.20 4.75 4.35 22.90 59.10 80.30 98.05 5.65 5.10 4.65 4.55 19.40 56.95 79.05 97.90 0.75 7.45 5.95 5.60 5.10 23.70 54.80 76.00 95.40 5.80 5.75 5.55 5.05 18.20 51.85 74.70 95.00 0.95 17.85 13.15 13.20 10.35 26.75 36.60 46.75 67.05 9.95 9.80 10.30 9.60 13.85 30.00 43.05 64.85 1.00 31.05 27.15 26.00 23.80 32.55 35.55 39.45 55.80 16.50 17.90 17.75 18.25 16.95 26.70 34.45 53.45 ρ = 0.75 0.00 7.70 5.65 4.90 5.15 38.55 84.50 97.05 99.95 6.65 5.55 4.85 4.85 32.20 83.55 96.75 99.95 0.25 7.30 5.80 4.70 5.15 39.05 84.50 97.10 99.95 6.35 5.75 4.90 4.85 32.65 83.85 96.85 99.95 0.50 7.65 5.45 4.95 5.15 39.05 84.20 97.35 100.00 5.90 5.20 5.05 4.95 32.90 83.15 96.70 100.00 0.75 11.10 7.10 6.15 5.65 38.00 77.25 91.45 99.70 6.75 6.10 5.75 5.20 28.55 74.00 90.75 99.70 0.95 34.05 22.40 19.35 15.55 40.35 54.40 67.60 87.30 15.35 14.00 13.30 12.75 18.60 42.80 62.80 86.50 1.00 54.80 48.55 46.20 41.50 48.95 47.75 56.15 74.05 26.65 31.30 32.05 32.45 22.35 37.30 51.50 72.10 ρ = 0.95 0.00 15.35 6.60 5.50 4.75 87.70 100.00 100.00 100.00 8.15 6.35 4.85 4.85 84.70 100.00 100.00 100.00 0.25 15.00 6.60 5.45 4.75 88.25 100.00 100.00 100.00 7.90 6.40 4.90 4.85 84.85 99.95 100.00 100.00 0.50 15.40 6.25 5.65 4.95 88.70 99.95 100.00 100.00 7.55 5.50 5.05 4.90 87.10 99.95 100.00 100.00 0.75 24.75 11.85 8.70 7.20 83.95 99.80 100.00 100.00 10.25 8.15 6.85 6.60 78.65 99.85 100.00 100.00 0.95 67.55 48.25 41.00 32.40 68.40 90.95 97.35 99.95 45.00 29.10 23.45 21.10 41.40 86.85 96.75 99.90 1.00 91.90 93.90 94.75 94.85 76.05 77.45 78.55 85.90 89.30 92.85 95.25 96.20 59.05 66.00 70.45 78.15 117 Table 3.5: Small sample properties of the GMM and BMM estimators ofβ for the experiments with Gaussian errors GMM BMM Bias(×100) RMSE(×100) Bias(×100) RMSE(×100) δ\n 100 300 500 1, 000 100 300 500 1, 000 100 300 500 1, 000 100 300 500 1, 000 ρ = 0.2 0.00 -0.08 -0.08 -0.03 0.02 9.00 5.07 3.83 2.68 0.95 0.27 0.16 0.12 8.84 5.05 3.82 2.68 0.25 -0.07 -0.08 -0.03 0.02 8.99 5.07 3.83 2.68 0.95 0.27 0.16 0.12 8.84 5.05 3.82 2.68 0.50 -0.07 -0.08 -0.03 0.02 9.03 5.08 3.84 2.68 0.97 0.27 0.16 0.12 8.88 5.05 3.83 2.68 0.75 -0.10 -0.10 -0.06 0.01 9.12 5.14 3.90 2.74 0.96 0.26 0.15 0.12 8.98 5.10 3.89 2.74 0.95 -0.33 -0.16 -0.10 0.00 9.66 5.77 4.52 3.43 0.77 0.27 0.17 0.16 9.45 5.69 4.48 3.42 1.00 -0.51 -0.25 -0.18 -0.05 10.31 6.37 5.23 4.25 0.60 0.19 0.11 0.13 10.07 6.25 5.17 4.22 ρ = 0.5 0.00 0.08 -0.03 0.00 0.05 11.50 6.44 4.87 3.40 1.70 0.51 0.30 0.20 11.29 6.40 4.86 3.40 0.25 0.10 -0.03 0.00 0.05 11.49 6.44 4.87 3.40 1.70 0.51 0.30 0.20 11.28 6.40 4.86 3.40 0.50 0.09 -0.03 0.00 0.05 11.59 6.47 4.90 3.42 1.73 0.51 0.31 0.21 11.37 6.43 4.88 3.41 0.75 0.00 -0.08 -0.05 0.05 12.21 7.02 5.42 3.91 1.84 0.58 0.33 0.24 12.00 6.96 5.40 3.90 0.95 -0.52 -0.32 -0.23 0.06 15.64 10.94 9.16 7.75 1.90 0.88 0.55 0.50 15.23 10.68 9.00 7.69 1.00 -1.03 -0.68 -0.60 -0.18 18.43 13.97 12.64 11.50 1.57 0.60 0.26 0.32 17.94 13.65 12.50 11.47 ρ = 0.75 0.00 0.39 0.10 0.06 0.11 15.25 8.42 6.38 4.43 3.01 0.94 0.55 0.35 15.02 8.36 6.36 4.43 0.25 0.42 0.10 0.07 0.11 15.23 8.41 6.38 4.43 3.02 0.95 0.55 0.35 15.00 8.36 6.36 4.43 0.50 0.37 0.09 0.07 0.11 15.42 8.52 6.45 4.49 3.06 0.96 0.56 0.36 15.19 8.47 6.43 4.49 0.75 0.16 0.02 -0.01 0.14 17.46 10.33 8.08 5.97 3.50 1.24 0.71 0.49 17.24 10.23 8.04 5.98 0.95 -0.37 -0.47 -0.57 0.09 27.18 20.66 17.87 15.80 4.06 2.02 1.10 0.96 26.93 20.23 17.62 15.68 1.00 -0.99 -1.43 -1.61 -0.60 33.80 28.51 27.01 25.45 3.16 1.19 0.39 0.60 34.15 28.35 26.89 25.49 ρ = 0.95 0.00 1.35 0.51 0.26 0.22 20.55 10.84 8.19 5.64 5.43 1.75 1.00 0.58 20.27 10.77 8.16 5.64 0.25 1.31 0.52 0.26 0.22 20.36 10.84 8.20 5.63 5.44 1.75 1.00 0.58 20.23 10.77 8.17 5.64 0.50 1.27 0.46 0.24 0.22 21.13 11.13 8.37 5.78 5.42 1.76 1.02 0.60 20.48 11.03 8.34 5.78 0.75 0.65 0.20 -0.01 0.25 24.78 15.35 12.16 9.23 6.35 2.39 1.34 0.90 25.12 15.15 12.10 9.27 0.95 1.56 -0.27 -0.96 -0.22 42.02 33.30 29.97 27.12 9.47 4.22 2.11 1.70 44.70 33.44 29.91 27.07 1.00 1.66 0.93 0.16 0.90 50.91 45.59 44.13 42.30 11.61 10.23 9.02 9.55 57.42 50.44 48.63 47.06 Notes: The true parameter value isβ = 1 and the power is calculated at 0.8. See also the notes to Table 3.4. 118 Table 3.5: (Continued) GMM BMM Size(×100) Power(×100) Size(×100) Power(×100) δ\n 100 300 500 1, 000 100 300 500 1, 000 100 300 500 1, 000 100 300 500 1, 000 ρ = 0.2 0.00 6.40 5.40 4.15 4.65 64.60 97.60 99.90 100.00 5.70 5.30 4.25 4.70 69.10 97.95 99.90 100.00 0.25 6.35 5.55 4.30 4.80 64.50 97.65 99.90 100.00 5.90 5.40 4.30 4.70 68.90 98.00 99.90 100.00 0.50 6.20 5.45 4.55 4.95 64.35 97.75 99.90 100.00 5.90 5.35 4.55 4.90 69.10 98.00 99.95 100.00 0.75 6.65 5.50 4.85 4.85 64.05 97.20 99.90 100.00 6.20 5.15 4.55 5.00 68.60 97.90 99.90 100.00 0.95 7.15 6.20 5.25 5.10 59.80 93.65 99.20 100.00 6.60 5.55 4.90 4.65 65.00 94.70 99.25 100.00 1.00 7.45 6.25 5.20 5.35 54.35 89.00 97.35 99.70 7.15 5.60 4.70 4.85 59.60 90.80 97.85 99.75 ρ = 0.5 0.00 6.90 5.65 4.65 4.85 47.05 88.10 98.75 100.00 6.15 5.60 4.60 4.80 52.25 90.00 98.85 100.00 0.25 6.85 5.60 4.65 4.80 46.95 88.10 98.75 100.00 6.10 5.55 4.55 4.75 52.40 89.90 98.85 100.00 0.50 6.55 5.55 4.85 4.90 45.80 87.15 98.55 100.00 6.05 5.60 4.70 4.95 51.50 89.50 98.85 100.00 0.75 6.65 5.35 4.80 4.90 42.65 82.60 96.60 100.00 6.10 5.35 4.90 4.70 48.80 85.15 97.00 100.00 0.95 7.80 6.50 5.45 5.35 29.45 50.15 60.55 74.40 6.90 5.70 5.50 4.85 34.20 53.70 64.15 76.40 1.00 7.85 5.80 5.20 4.90 23.50 33.15 35.25 40.30 7.15 5.60 4.90 4.50 26.95 36.30 37.90 41.90 ρ = 0.75 0.00 7.30 5.75 4.85 4.75 30.70 69.30 88.85 99.40 6.50 5.70 4.65 4.60 36.10 73.20 90.40 99.50 0.25 7.30 5.90 4.95 4.70 30.85 69.20 88.75 99.45 6.30 5.70 4.75 4.55 36.45 73.35 90.30 99.50 0.50 7.35 5.60 5.10 4.70 30.05 68.25 87.70 99.40 6.25 6.05 4.70 4.55 35.90 71.80 89.10 99.50 0.75 7.65 5.35 5.30 5.05 25.15 53.15 71.45 92.90 6.50 5.60 5.45 5.10 30.80 57.80 74.80 93.55 0.95 8.30 6.90 5.50 5.10 15.75 19.95 21.30 25.30 7.25 5.70 4.95 4.60 18.55 21.70 23.30 27.15 1.00 6.95 5.30 4.85 4.35 11.00 11.20 11.55 11.35 7.15 5.45 5.05 4.15 13.45 13.05 12.35 12.25 ρ = 0.95 0.00 8.60 5.90 5.45 4.30 23.20 51.75 70.70 95.65 7.20 6.05 5.15 4.50 28.05 56.15 73.60 96.30 0.25 8.50 5.90 5.35 4.40 23.35 51.90 71.00 95.65 7.35 5.90 5.05 4.55 28.25 56.20 74.00 96.35 0.50 8.65 6.05 5.25 4.55 22.65 49.90 68.80 94.55 7.15 5.85 5.20 4.55 27.60 54.25 72.40 95.40 0.75 8.65 6.55 5.75 4.65 16.95 30.15 39.75 57.55 7.55 5.70 5.25 4.90 22.65 34.55 43.50 60.05 0.95 7.30 6.20 5.55 5.00 10.40 10.90 11.55 11.45 9.10 6.80 5.55 4.80 16.25 14.00 13.50 12.60 1.00 6.80 5.15 5.10 4.70 8.25 8.35 8.45 7.30 9.45 8.10 7.70 6.90 14.05 13.00 12.50 12.10 119 Table 3.6: Small sample properties of the GMM and BMM estimators ofρ for the experiments with non-Gaussian errors GMM BMM Bias(×100) RMSE(×100) Bias(×100) RMSE(×100) δ\n 100 300 500 1, 000 100 300 500 1, 000 100 300 500 1, 000 100 300 500 1, 000 ρ = 0.2 0.00 -1.99 -0.64 -0.37 -0.15 10.41 5.49 4.13 2.91 -2.68 -0.83 -0.49 -0.21 10.48 5.50 4.13 2.91 0.25 -1.96 -0.64 -0.37 -0.15 10.38 5.48 4.12 2.91 -2.65 -0.83 -0.48 -0.21 10.43 5.49 4.13 2.91 0.50 -2.01 -0.65 -0.38 -0.16 10.34 5.49 4.13 2.91 -2.73 -0.85 -0.50 -0.22 10.39 5.50 4.13 2.91 0.75 -2.00 -0.59 -0.37 -0.14 10.93 5.72 4.29 3.00 -2.99 -0.86 -0.53 -0.21 10.96 5.70 4.28 2.99 0.95 -2.35 -0.73 -0.43 -0.21 15.99 7.81 5.94 4.23 -4.85 -1.38 -0.83 -0.42 15.49 7.56 5.75 4.16 1.00 -2.74 -1.06 -0.57 -0.35 21.39 11.88 9.36 6.85 -6.47 -2.15 -1.32 -0.75 19.47 10.77 8.49 6.26 ρ = 0.5 0.00 -2.05 -0.66 -0.38 -0.16 8.91 4.52 3.37 2.36 -2.90 -0.88 -0.51 -0.23 9.05 4.54 3.38 2.37 0.25 -2.03 -0.66 -0.38 -0.16 8.87 4.51 3.37 2.36 -2.88 -0.88 -0.51 -0.23 9.00 4.53 3.38 2.37 0.50 -2.06 -0.67 -0.40 -0.17 8.86 4.53 3.38 2.37 -2.95 -0.89 -0.52 -0.24 8.99 4.55 3.39 2.38 0.75 -2.13 -0.68 -0.44 -0.19 9.84 5.00 3.76 2.67 -3.41 -1.01 -0.64 -0.28 9.92 5.01 3.78 2.68 0.95 -3.43 -1.17 -0.77 -0.44 17.90 8.76 6.53 4.68 -7.19 -2.32 -1.48 -0.79 16.80 8.25 6.29 4.59 1.00 -5.08 -1.69 -0.93 -0.50 26.23 14.26 10.64 7.19 -10.50 -3.62 -2.26 -1.17 22.22 11.84 9.17 6.47 ρ = 0.75 0.00 -1.92 -0.62 -0.36 -0.16 6.78 3.15 2.31 1.60 -2.81 -0.82 -0.47 -0.22 6.96 3.19 2.33 1.61 0.25 -1.92 -0.62 -0.36 -0.16 6.73 3.15 2.31 1.61 -2.80 -0.82 -0.47 -0.22 6.91 3.18 2.33 1.61 0.50 -1.90 -0.62 -0.37 -0.17 6.68 3.17 2.33 1.62 -2.83 -0.83 -0.48 -0.23 6.86 3.20 2.34 1.63 0.75 -2.03 -0.72 -0.48 -0.22 8.06 3.77 2.82 1.98 -3.53 -1.07 -0.69 -0.32 8.11 3.81 2.86 2.00 0.95 -3.73 -1.05 -0.71 -0.38 19.11 7.97 5.48 3.57 -9.21 -2.85 -1.77 -0.88 16.54 7.10 5.14 3.49 1.00 -9.16 -2.08 -0.48 -0.09 30.93 15.77 10.59 7.00 -14.29 -4.83 -2.97 -1.49 22.86 10.63 7.97 5.44 ρ = 0.95 0.00 -1.14 -0.37 -0.21 -0.09 3.37 1.15 0.78 0.52 -1.76 -0.45 -0.25 -0.12 3.33 1.16 0.79 0.52 0.25 -1.16 -0.37 -0.21 -0.09 3.36 1.15 0.78 0.52 -1.75 -0.45 -0.25 -0.12 3.27 1.16 0.80 0.52 0.50 -1.17 -0.36 -0.21 -0.10 3.65 1.15 0.79 0.53 -1.72 -0.44 -0.26 -0.12 3.21 1.16 0.80 0.54 0.75 -1.28 -0.41 -0.28 -0.13 5.15 1.48 1.05 0.67 -2.31 -0.62 -0.39 -0.18 4.15 1.49 1.07 0.69 0.95 -5.25 -0.53 -0.17 -0.02 17.18 5.63 2.94 1.67 -12.90 -2.37 -1.30 -0.58 19.71 4.43 2.60 1.44 1.00 -17.32 -11.16 -8.82 -4.98 39.46 33.08 28.98 21.15 -33.70 -22.71 -19.49 -15.47 39.71 28.78 25.17 20.39 Notes: The DGP is given by (3.33) and (3.34), fori = 1, 2,...,n, where the errors are generated asIID χ 2 (2)− 2 /2. The power is calculated atρ− 0.1, whereρ denotes the true value. See also the notes to Table 3.4. 120 Table 3.6: (Continued) GMM BMM Size(×100) Power(×100) Size(×100) Power(×100) δ\n 100 300 500 1, 000 100 300 500 1, 000 100 300 500 1, 000 100 300 500 1, 000 ρ = 0.2 0.00 5.80 4.50 4.80 5.30 16.35 42.45 64.60 91.50 5.65 4.45 4.70 5.05 14.90 41.35 63.65 91.30 0.25 5.60 4.45 4.80 5.40 15.95 42.40 64.55 91.70 5.55 4.30 4.75 5.25 14.70 41.65 63.80 91.25 0.50 5.90 5.00 4.75 5.20 16.20 42.55 63.95 91.75 5.60 4.90 4.65 4.85 14.45 41.15 62.70 91.55 0.75 6.65 6.05 5.30 5.30 18.30 43.15 63.25 90.55 6.20 6.00 5.15 5.20 14.85 41.60 62.25 90.10 0.95 11.20 8.75 8.50 6.95 19.75 34.05 47.25 70.70 8.55 6.95 7.15 6.15 13.80 29.85 43.80 68.80 1.00 18.15 14.50 15.10 14.60 21.10 27.85 33.95 49.10 11.50 9.55 10.65 11.00 13.35 22.45 28.75 46.15 ρ = 0.5 0.00 5.80 4.55 5.10 5.15 22.35 57.75 80.20 97.65 5.95 4.50 5.05 5.05 19.45 55.95 79.40 97.50 0.25 5.80 4.70 5.05 5.10 22.35 57.70 80.20 97.50 5.65 4.55 5.00 5.15 19.85 55.90 79.40 97.35 0.50 6.25 5.30 5.10 5.20 22.65 56.95 79.95 97.30 6.25 5.05 5.25 5.10 19.45 54.60 79.05 97.25 0.75 7.65 6.05 5.65 5.45 23.20 53.55 73.45 94.70 6.85 5.85 5.45 5.50 18.80 51.25 72.25 94.40 0.95 18.70 13.60 11.75 10.95 26.65 36.80 46.35 67.25 11.35 9.30 9.55 9.65 15.65 30.35 41.25 65.40 1.00 31.15 26.25 25.05 22.60 31.75 34.70 40.00 55.25 17.20 15.75 18.30 17.80 17.90 26.60 34.25 53.55 ρ = 0.75 0.00 7.65 5.05 4.80 5.15 39.25 84.85 97.00 100.00 6.60 4.95 4.55 4.95 33.70 83.55 96.90 100.00 0.25 7.45 4.95 4.90 5.40 39.40 84.95 96.95 100.00 6.60 4.95 4.70 5.15 33.95 83.85 96.85 100.00 0.50 7.45 5.30 5.00 5.40 40.40 84.55 97.25 100.00 6.45 5.35 4.95 5.20 34.50 83.45 97.05 100.00 0.75 10.45 6.65 6.50 6.35 39.25 76.15 91.75 99.35 7.30 6.00 6.15 6.10 31.40 73.50 90.45 99.20 0.95 34.90 24.20 18.65 15.50 42.35 53.40 66.95 88.25 17.35 14.10 14.00 13.40 20.60 43.95 60.65 86.15 1.00 55.10 46.85 43.25 41.00 49.00 46.35 54.35 73.65 28.20 30.05 31.40 32.40 23.25 37.55 49.85 72.30 ρ = 0.95 0.00 14.90 6.40 5.35 5.80 88.55 100.00 100.00 100.00 8.05 5.95 5.35 5.45 86.65 100.00 100.00 100.00 0.25 14.30 6.40 5.25 5.75 88.35 100.00 100.00 100.00 7.85 5.85 5.30 5.55 86.85 100.00 100.00 100.00 0.50 15.10 7.25 5.70 5.70 89.65 99.95 100.00 100.00 8.15 6.50 5.35 5.95 88.10 100.00 100.00 100.00 0.75 23.65 10.70 9.20 6.90 84.45 99.75 100.00 100.00 11.20 7.95 8.00 6.75 79.55 99.75 100.00 100.00 0.95 70.45 49.50 41.85 33.50 68.55 91.05 97.25 100.00 43.20 28.95 26.15 22.70 41.60 87.10 96.95 99.90 1.00 91.80 94.50 93.90 93.80 75.70 77.85 79.25 84.65 86.95 92.50 94.20 96.05 55.60 62.65 68.20 75.35 121 Table 3.7: Small sample properties of the GMM and BMM estimators ofβ for the experiments with non-Gaussian errors GMM BMM Bias(×100) RMSE(×100) Bias(×100) RMSE(×100) δ\n 100 300 500 1, 000 100 300 500 1, 000 100 300 500 1, 000 100 300 500 1, 000 ρ = 0.2 0.00 0.36 0.14 0.17 0.04 9.27 5.06 3.82 2.76 1.35 0.46 0.35 0.13 9.23 5.05 3.82 2.76 0.25 0.36 0.14 0.16 0.04 9.25 5.06 3.82 2.76 1.34 0.45 0.35 0.13 9.22 5.04 3.82 2.75 0.50 0.39 0.14 0.17 0.04 9.29 5.07 3.83 2.76 1.39 0.46 0.36 0.13 9.26 5.06 3.83 2.76 0.75 0.34 0.14 0.17 0.04 9.42 5.16 3.91 2.84 1.34 0.46 0.37 0.13 9.39 5.14 3.90 2.83 0.95 0.25 0.09 0.16 0.03 9.95 5.80 4.52 3.58 1.24 0.46 0.40 0.17 9.87 5.75 4.49 3.55 1.00 0.11 0.03 0.12 -0.01 10.67 6.40 5.25 4.43 1.10 0.41 0.39 0.16 10.52 6.33 5.20 4.38 ρ = 0.5 0.00 0.64 0.25 0.25 0.06 11.79 6.41 4.83 3.48 2.17 0.73 0.54 0.21 11.79 6.40 4.83 3.48 0.25 0.64 0.25 0.25 0.07 11.77 6.41 4.83 3.48 2.16 0.73 0.54 0.21 11.77 6.39 4.83 3.48 0.50 0.67 0.24 0.26 0.07 11.87 6.45 4.86 3.50 2.23 0.74 0.54 0.22 11.86 6.44 4.87 3.50 0.75 0.59 0.25 0.29 0.08 12.58 7.05 5.40 4.04 2.27 0.82 0.64 0.26 12.58 7.04 5.41 4.03 0.95 0.42 0.13 0.31 0.07 15.99 11.01 9.15 8.05 2.56 1.14 1.02 0.49 15.86 10.93 9.13 8.00 1.00 0.13 -0.11 0.19 -0.06 18.95 14.07 12.69 12.02 2.41 0.99 0.99 0.40 18.64 13.91 12.66 11.98 ρ = 0.75 0.00 1.16 0.45 0.40 0.12 15.60 8.38 6.29 4.52 3.59 1.21 0.85 0.36 15.70 8.39 6.31 4.52 0.25 1.17 0.45 0.40 0.13 15.57 8.37 6.29 4.52 3.59 1.21 0.86 0.36 15.67 8.38 6.31 4.52 0.50 1.18 0.44 0.42 0.13 15.76 8.49 6.37 4.59 3.67 1.22 0.87 0.37 15.85 8.49 6.40 4.59 0.75 1.07 0.48 0.54 0.18 17.98 10.33 8.06 6.18 4.03 1.52 1.18 0.53 18.09 10.36 8.12 6.19 0.95 1.06 0.07 0.58 0.12 28.49 20.96 18.03 16.38 5.11 2.39 2.11 1.01 27.94 20.98 18.10 16.40 1.00 0.96 -0.29 0.25 -0.25 35.11 28.99 27.12 26.69 4.72 2.02 2.07 0.82 35.28 29.04 27.34 26.72 ρ = 0.95 0.00 2.38 0.96 0.75 0.28 21.13 10.88 8.13 5.82 6.22 2.06 1.43 0.63 21.22 10.91 8.19 5.83 0.25 2.43 0.96 0.76 0.29 21.04 10.87 8.13 5.81 6.22 2.07 1.43 0.64 21.16 10.90 8.18 5.82 0.50 2.57 0.93 0.79 0.29 22.11 11.13 8.33 5.99 6.26 2.08 1.48 0.65 21.36 11.16 8.39 5.99 0.75 2.24 0.95 0.97 0.37 26.15 15.38 12.26 9.68 7.16 2.75 2.13 1.01 26.27 15.40 12.42 9.72 0.95 3.72 0.54 1.02 0.18 43.72 34.69 30.56 28.54 11.15 4.68 4.02 2.04 45.94 34.78 30.85 28.50 1.00 4.77 2.76 3.33 1.19 53.26 47.19 45.05 45.46 13.48 10.92 11.39 9.57 58.27 51.98 50.25 49.70 Notes: The true parameter value isβ = 1 and the power is calculated at 0.8. See also the notes to Table 3.6. 122 Table 3.7: (Continued) GMM BMM Size(×100) Power(×100) Size(×100) Power(×100) δ\n 100 300 500 1, 000 100 300 500 1, 000 100 300 500 1, 000 100 300 500 1, 000 ρ = 0.2 0.00 6.15 5.85 5.30 5.50 63.55 97.90 100.00 100.00 6.10 5.55 5.25 5.25 67.35 98.15 100.00 100.00 0.25 6.00 5.80 5.20 5.45 63.50 97.90 100.00 100.00 6.10 5.70 5.25 5.25 67.75 98.20 100.00 100.00 0.50 5.80 5.90 5.15 5.35 62.95 97.70 100.00 100.00 6.10 5.75 5.25 5.25 67.60 98.05 100.00 100.00 0.75 6.05 6.25 5.20 5.55 62.20 97.65 100.00 100.00 6.60 6.10 5.15 5.50 66.45 98.05 100.00 100.00 0.95 6.65 6.80 5.60 6.25 58.65 94.55 99.65 100.00 6.40 6.35 5.90 6.10 62.75 95.20 99.70 100.00 1.00 7.00 7.05 5.70 6.20 54.10 89.95 97.10 99.75 6.50 6.30 5.80 6.05 57.50 91.25 97.50 99.80 ρ = 0.5 0.00 6.35 6.10 5.10 5.50 47.05 89.90 98.65 100.00 6.40 5.60 5.50 5.65 50.95 91.30 98.90 100.00 0.25 6.20 6.00 5.15 5.50 47.05 89.95 98.65 100.00 6.35 5.55 5.45 5.65 50.95 91.40 98.95 100.00 0.50 6.15 6.20 5.15 5.35 46.40 89.70 98.65 100.00 6.00 5.70 5.30 5.40 50.50 91.05 98.90 100.00 0.75 6.35 6.25 5.35 5.60 42.75 84.05 96.75 100.00 6.70 5.90 5.60 5.60 47.30 86.30 97.15 100.00 0.95 7.20 7.20 6.20 6.10 30.85 49.70 62.35 73.40 6.65 6.65 5.90 6.25 34.85 53.25 64.90 75.35 1.00 7.35 7.10 6.10 5.85 24.30 32.90 37.05 39.70 6.50 6.45 5.80 5.95 27.40 35.35 39.00 41.20 ρ = 0.75 0.00 6.65 6.10 5.65 5.70 31.10 70.20 90.30 99.70 7.00 5.95 5.55 5.95 36.10 73.55 91.95 99.85 0.25 6.60 6.20 5.70 5.70 30.95 70.05 90.40 99.70 6.70 6.00 5.50 5.80 36.00 73.40 91.95 99.90 0.50 6.45 6.05 5.35 5.45 30.30 68.90 89.55 99.60 6.65 5.80 5.30 5.75 35.50 72.05 91.15 99.65 0.75 7.35 6.30 5.85 5.75 26.20 53.35 72.75 91.50 7.15 5.95 5.95 5.70 30.55 57.75 75.65 92.60 0.95 7.60 6.80 6.60 6.15 15.70 19.30 23.05 25.15 7.20 6.45 6.10 5.90 18.55 21.55 24.45 27.45 1.00 6.50 6.15 5.30 5.65 12.15 11.95 11.65 12.45 6.65 6.25 5.50 5.85 14.25 13.55 13.55 13.20 ρ = 0.95 0.00 7.80 5.70 5.60 6.00 24.05 52.50 73.15 94.60 7.65 5.75 5.25 5.90 28.25 55.90 76.00 95.35 0.25 7.65 5.75 5.55 5.90 23.95 52.20 72.65 94.75 7.65 5.65 5.30 5.80 28.35 55.75 76.00 95.35 0.50 7.65 6.20 5.75 5.95 23.90 50.30 70.60 93.25 7.55 5.80 5.25 5.90 28.10 53.95 73.70 94.60 0.75 8.55 6.10 5.70 6.20 19.25 31.05 42.40 58.90 8.20 5.50 6.00 6.05 23.10 35.10 45.70 61.25 0.95 7.60 7.25 6.25 6.95 11.65 12.40 12.65 13.30 8.85 6.40 6.90 6.50 15.90 14.20 14.55 13.80 1.00 6.45 6.00 5.85 5.80 9.45 8.70 8.80 8.75 9.10 8.45 8.25 8.05 14.15 13.70 13.95 13.25 123 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.05 0.2 0.4 0.6 0.8 1 =0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.05 0.2 0.4 0.6 0.8 1 =0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.05 0.2 0.4 0.6 0.8 1 =0.75 -0.3 -0.1 0.1 0.3 0.5 0.7 0.9 0 0.05 0.2 0.4 0.6 0.8 1 =1 BMM GMM Figure 3.1: Empirical power functions of testingρ 0 = 0.5 for different values ofδ, in the case of Gaussian errors andn = 100 124 0.35 0.45 0.55 0.65 0.75 0.85 0.95 0 0.05 0.2 0.4 0.6 0.8 1 =0 0.35 0.45 0.55 0.65 0.75 0.85 0.95 0 0.05 0.2 0.4 0.6 0.8 1 =0.5 0.35 0.45 0.55 0.65 0.75 0.85 0.95 0 0.05 0.2 0.4 0.6 0.8 1 =0.75 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95 0 0.05 0.2 0.4 0.6 0.8 1 =1 BMM GMM Figure 3.2: Empirical power functions of testingρ 0 = 0.75 for different values ofδ, in the case of Gaussian errors andn = 100 125 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.05 0.2 0.4 0.6 0.8 1 =0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.05 0.2 0.4 0.6 0.8 1 =0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.05 0.2 0.4 0.6 0.8 1 =0.75 -0.3 -0.1 0.1 0.3 0.5 0.7 0.9 0 0.05 0.2 0.4 0.6 0.8 1 =1 BMM GMM Figure 3.3: Empirical power functions of testing ρ 0 = 0.5 for different values of δ, in the case of non- Gaussian errors andn = 100 126 0.35 0.45 0.55 0.65 0.75 0.85 0.95 0 0.05 0.2 0.4 0.6 0.8 1 =0 0.35 0.45 0.55 0.65 0.75 0.85 0.95 0 0.05 0.2 0.4 0.6 0.8 1 =0.5 0.35 0.45 0.55 0.65 0.75 0.85 0.95 0 0.05 0.2 0.4 0.6 0.8 1 =0.75 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95 0 0.05 0.2 0.4 0.6 0.8 1 =1 BMM GMM Figure 3.4: Empirical power functions of testing ρ 0 = 0.75 for different values of δ, in the case of non- Gaussian errors andn = 100 127 References Acemoglu, D., U. Akcigit, and W. Kerr (2016a). “Networks and the macroeconomy: An empirical exploration”. In: NBER Macroeconomics Annual 2015. Ed. by Eichenbaum, M. and Parker, J. V ol. 30. University of Chicago Press. Chap. 4, pp. 276–335. Acemoglu, D., D. Autor, D. Dorn, G. H. Hanson, and B. Price (2016b). “Import competition and the great US employ- ment sag of the 2000s”. In: Journal of Labor Economics 34, S141–S198. Acemoglu, D., V . M. Carvalho, A. Ozdaglar, and A. Tahbaz-Salehi (2012). “The network origins of aggregate fluctua- tions”. In: Econometrica 80, pp. 1977–2016. Acemoglu, D., A. Ozdaglar, and A. Tahbaz-Salehi (2016c). “Networks, shocks, and systemic risk”. In: Oxford Hand- book of the Economics of Networks. Ed. by Bramoulle, Y ., Galeotti, A., and Rogers, B. Oxford University Press. Chap. 21, pp. 569–609. Anselin, L. (1988). Spatial econometrics: Methods and models. V ol. 4. Springer Science & Business Media. Aquaro, M., N. Bailey, and M. H. Pesaran (2015). “Quasi maximum likelihood estimation of spatial models with heterogeneous coefficients”. USC-INET Research Paper, No. 15-17. Arnold, B. C., N. Balakrishnan, and H. N. Nagaraja (1992). A first course in order statistics. V ol. 54. SIAM-Society for Industrial and Applied Mathematics. Bai, J. (2009). “Panel data models with interactive fixed effects”. In: Econometrica 77, pp. 1229–1279. Bai, J. and K. Li (2012). “Statistical analysis of factor models of high dimension”. In: Annals of Statistics 40, pp. 436– 465. — (2014). “Spatial panel data models with common shocks”. MPRA Paper 52786. — (2015). “Dynamic spatial panel data models with common shocks”. Manuscript. Columbia University. Bai, J. and S. Ng (2002). “Determining the number of factors in approximate factor models”. In: Econometrica 70.1, pp. 191–221. — (2007). “Determining the number of primitive shocks in factor models”. In: Journal of Business & Economic Statistics 25, pp. 52–60. Bailey, N., S. Holly, and M. H. Pesaran (2016a). “A two-stage approach to spatio-temporal analysis with strong and weak cross-sectional dependence”. In: Journal of Applied Econometrics 31.1, pp. 249–280. 128 Bailey, N., G. Kapetanios, and M. H. Pesaran (2016b). “Exponent of cross-sectional dependence: Estimation and inference”. In: Journal of Applied Econometrics 31, pp. 929–960. Bailey, N., M. H. Pesaran, and L. Smith (2014). “A multiple testing approach to the regularisation of sample correlation matrices”. CESifo Working Paper No. 4834. Beirlant, J., Y . Goegebeur, J. Segers, and J. Teugels (2006). Statistics of extremes: Theory and applications. John Wiley & Sons. Bonar, D. D., M. J. Khoury Jr, and M. Khoury (2006). Real infinite series. Mathematical Association of America. Brady, R. R. (2011). “Measuring the diffusion of housing prices across space and over time”. In: Journal of Applied Econometrics 26.2, pp. 213–231. — (2014). “The spatial diffusion of regional housing prices across US states”. In: Regional Science and Urban Eco- nomics 46, pp. 150–166. Carvalho, V . M. (2014). “From micro to macro via production networks”. In: Journal of Economic Perspectives 28, pp. 23–47. Carvalho, V . M., M. Nirei, Y . Saito, and A. Tahbaz-Salehi (2016). “Supply chain disruptions: Evidence from the great east Japan earthquake”. Available at http://www.columbia.edu/ at2761/JapanEQ.pdf. Case, A. (1991). “Spatial patterns in household demand”. In: Econometrica 59, pp. 953–965. — (1992). “Neighborhood influence and technological change”. In: Regional Science and Urban Economics 22.3, pp. 491–508. Chudik, A. and M. H. Pesaran (2013). “Econometric analysis of high dimensional V ARs featuring a dominant unit”. In: Econometric Reviews 32, pp. 592–649. — (2015a). “Common correlated effects estimation of heterogeneous dynamic panel data models with weakly exoge- nous regressors”. In: Journal of Econometrics 188.2, pp. 393–420. — (2015b). “Large panel data models with cross-sectional dependence: A survey”. In: Oxford Handbook of Panel Data. Ed. by Baltagi, B. H. Oxford University Press. Chap. 1. ISBN: 978-0-19-994004-2. — (2017). “A bias-corrected method of moments approach for estimation of dynamic panels”. CESifo Working Paper No. 6688. Chudik, A., M. H. Pesaran, and E. Tosetti (2011). “Weak and strong cross-section dependence and estimation of large panels”. In: The Econometrics Journal 14.1, pp. C45–C90. Clauset, A., C. R. Shalizi, and M. E. Newman (2009). “Power-law distributions in empirical data”. In: SIAM Review 51, pp. 661–703. Cohen, J. P., Y . M. Ioannides, and W. W. Thanapisitikul (2016). “Spatial effects and house price dynamics in the USA”. In: Journal of Housing Economics 31, pp. 1–13. Eeckhout, J. (2004). “Gibrat’s law for (all) cities”. In: American Economic Review 94, pp. 1429–1451. Elhorst, J. P. (2014). Spatial econometrics: From cross-sectional data to spatial panels. Springer. 129 Foerster, A. T., P.-D. G. Sarte, and M. W. Watson (2011). “Sectoral versus aggregate shocks: A structural factor analysis of industrial production”. In: Journal of Political Economy 119, pp. 1–38. Gabaix, X. (2011). “The granular origins of aggregate fluctuations”. In: Econometrica 79, pp. 733–772. Gabaix, X. and R. Ibragimov (2011). “Rank- 1/2: A simple way to improve the OLS estimation of tail exponents”. In: Journal of Business & Economic Statistics 29, pp. 24–39. Hill, B. M. et al. (1975). “A simple general approach to inference about the tail of a distribution”. In: Annals of Statistics 3, pp. 1163–1174. Holly, S., M. H. Pesaran, and T. Yamagata (2010). “A spatio-temporal model of house prices in the USA”. In: Journal of Econometrics 158.1, pp. 160–173. — (2011). “The spatial and temporal diffusion of house prices in the UK”. In: Journal of Urban Economics 69.1, pp. 2–23. Horvath, M. (1998). “Cyclicality and sectoral linkages: Aggregate fluctuations from independent sectoral shocks”. In: Review of Economic Dynamics 1, pp. 781–808. — (2000). “Sectoral shocks and aggregate fluctuations”. In: Journal of Monetary Economics 45, pp. 69–106. Kapetanios, G. (2010). “A testing procedure for determining the number of factors in approximate factor models with large datasets”. In: Journal of Business & Economic Statistics 28, pp. 397–409. Kapetanios, G., M. H. Pesaran, and T. Yamagata (2011). “Panels with non-stationary multifactor error structures”. In: Journal of Econometrics 160.2, pp. 326–348. Kapoor, M., H. H. Kelejian, and I. R. Prucha (2007). “Panel data models with spatially correlated error components”. In: Journal of Econometrics 140, pp. 97–130. Kelejian, H. H. and I. R. Prucha (1998). “A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances”. In: Journal of Real Estate Finance and Economics 17.1, pp. 99–121. — (1999). “A generalized moments estimator for the autoregressive parameter in a spatial model”. In: International Economic Review 40.2, pp. 509–533. — (2001). “On the asymptotic distribution of the MoranI test statistic with applications”. In: Journal of Econometrics 104.2, pp. 219–257. — (2010). “Specification and estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances”. In: Journal of Econometrics 157.1, pp. 53–67. Lee, L.-f. (2001). “Generalized method of moments estimation of spatial autoregressive processes”. Manuscript. Department of Economics, The Ohio State University. — (2003). “Best spatial two-stage least squares estimators for a spatial autoregressive model with autoregressive disturbances”. In: Econometric Reviews 22.4, pp. 307–335. 130 Lee, L.-f. (2004). “Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive mod- els”. In: Econometrica 72.6, pp. 1899–1925. — (2007). “GMM and 2SLS estimation of mixed regressive, spatial autoregressive models”. In: Journal of Econo- metrics 137.2, pp. 489–514. Lee, L.-f. and J. Yu (2010a). “Estimation of spatial autoregressive panel data models with fixed effects”. In: Journal of Econometrics 154.2, pp. 165–185. — (2010b). “Some recent developments in spatial panel data models”. In: Regional Science and Urban Economics 40, pp. 255–271. — (2014). “Efficient GMM estimation of spatial dynamic panel data models with fixed effects”. In: Journal of Econo- metrics 180.2, pp. 174–197. — (2016). “Identification of spatial Durbin panel models”. In: Journal of Applied Econometrics 31.1, pp. 133–162. LeSage, J. P. and R. K. Pace (2009). Introduction to spatial econometrics. CRC Press, Taylor & Francis Group, Boca Raton. Lin, X. and L.-f. Lee (2010). “GMM estimation of spatial autoregressive models with unknown heteroskedasticity”. In: Journal of Econometrics 157.1, pp. 34–52. Long, J. B. and C. I. Plosser (1983). “Real business cycles”. In: Journal of Political Economy 91, pp. 39–69. Lu, L. (2017). “Simultaneous spatial panel data models with common shocks”. Federal Reserve Bank of Boston Work- ing Paper RPA 17-03. Luo, Z. Q., C. Liu, and D. Picken (2007). “Housing price diffusion pattern of Australia’s state capital cities”. In: International Journal of Strategic Property Management 11.4, pp. 227–242. Meen, G. (1999). “Regional house prices and the ripple effect: A new interpretation”. In: Housing Studies 14.6, pp. 733–753. Moon, H. R. and M. Weidner (2015). “Dynamic linear panel regression models with interactive fixed effects”. In: Econometric Theory, pp. 1–38. Newman, M. E. (2005). “Power laws, Pareto distributions and Zipf’s law”. In: Contemporary Physics 46, pp. 323–351. Pesaran, M. H. (2006). “Estimation and inference in large heterogeneous panels with a multifactor error structure”. In: Econometrica 74.4, pp. 967–1012. — (2015a). “Testing weak cross-sectional dependence in large panels”. In: Econometric Reviews 34.6-10, pp. 1089– 1117. — (2015b). Time series and panel data econometrics. Oxford University Press. Pesaran, M. H. and A. Chudik (2014). “Aggregation in large dynamic panels”. In: Journal of Econometrics 178, pp. 273–285. Pesaran, M. H. and E. Tosetti (2011). “Large panels with common factors and spatial correlation”. In: Journal of Econometrics 161.2, pp. 182–202. 131 Pesaran, M. H. and C. F. Yang (2016). “Econometric analysis of production networks with dominant units”. USC-INET Research Paper No. 16-25. — (2018). “Estimation and inference in spatial models with dominant units”. Manuscript. Department of Economics & USC Dornsife INET, University of Southern California. Pollakowski, H. O. and T. S. Ray (1997). “Housing price diffusion patterns at different aggregation levels: An exami- nation of housing market efficiency”. In: Journal of Housing Research, pp. 107–124. Rothenberg, T. J. (1971). “Identification in parametric models”. In: Econometrica, pp. 577–591. Sarafidis, V . and T. Wansbeek (2012). “Cross-sectional dependence in panel data analysis”. In: Econometric Reviews 31.5, pp. 483–531. Shi, S., M. Young, and B. Hargreaves (2009). “The ripple effect of local house price movements in New Zealand”. In: Journal of Property Research 26.1, pp. 1–24. Shi, W. and L.-f. Lee (2017). “Spatial dynamic panel data models with interactive fixed effects”. In: Journal of Econo- metrics 197.2, pp. 323–347. Siavash, S. S. (2016). “Dominant sectors in the US: A factor model analysis of sectoral industrial production”. Manuscript. Goethe University Frankfurt. Stewart, W. J. (2009). Probability, Markov chains, queues, and simulation: The mathematical basis of performance modeling. Princeton University Press. Stock, J. H. and M. W. Watson (2011). “Dynamic factor models”. In: Oxford Handbook of Economic Forecasting 1, pp. 35–59. Yang, C. F. (2017). “Common factors and spatial dependence: An application to US house prices”. Manuscript. Depart- ment of Economics, University of Southern California. Yu, J., R. De Jong, and L.-f. Lee (2008). “Quasi-maximum likelihood estimators for spatial dynamic panel data with fixed effects when bothn andT are large”. In: Journal of Econometrics 146.1, pp. 118–134. 132 Appendix A Appendix to Chapter 1 A.1 Lemmas The proofs of the following lemmas can be found in Appendix A of Yang (2017). Lemma A.1. Under Assumptions 1.4 and 1.6, the matrix Δ −1 has bounded row and column norms, where the (i,j) th subblock of Δ −1 , fori,j = 1, 2,...,N, is given by (1.4). Lemma A.2. Under Assumption 1.2, 1.4 and 1.6, for allt, (a)E (¯ .t ) = 0,Var (¯ .t ) =O N −1 , and hence ¯ .t q.m. → 0, asN→∞, (b)E||¯ .t || 2 =O N −1 ,E||¯ .t || =O N −1/2 , where ¯ .t = Θ a .t , Θ a =N −1 τ 0 N ⊗ I k+1 , and .t = Δ −1 u .t . Lemma A.3. Under Assumptions 1.1, 1.2, 1.3, 1.4 and 1.6, for alli, (a) ¯ 0 ¯ T =O p 1 N , (b) F 0 ¯ T =O p 1 √ NT , (c) V 0 i. F T =O p 1 √ T , (d) e 0 i. ¯ T =O p 1 N +O p 1 √ NT , V 0 i. ¯ T =O p 1 N +O p 1 √ NT , (e) X 0 i. T =O p 1 N +O p 1 √ NT , where ¯ = (¯ .1 , ¯ .2 ,..., ¯ .T ) 0 is of dimensionT× (k + 1), with ¯ .t = Θ a .t , F = (f 1 , f 2 ,..., f T ) 0 , V i. = (v i1 , v i2 ,..., v iT ) 0 , e i. = (e i1 ,e i2 ,...,e iT ) 0 , and X i. = (x i1 , x i2 ,..., x iT ) 0 . Lemma A.4. Let Π = F ¯ C. Under Assumptions 1.1, 1.2, 1.3, 1.4 and 1.6, (a) Π 0 Π T =O p (1), (b) Π 0 ¯ T =O p 1 √ NT , (c) ¯ Z 0 ¯ Z T =O p (1), (d) ¯ Z 0 F T =O p (1), (e) ¯ Z 0 V i. T =O p 1 N +O p 1 √ T , (f) ¯ Z 0 X i. T =O p (1), (g) Π 0 X i. T =O p (1), (h) ¯ Z 0 e i. T =O p 1 N +O p 1 √ T , (i) ¯ Z 0 ¯ T =O p 1 N +O p 1 √ NT . 133 Lemma A.5. Under Assumptions 1.1-1.6, for anyi andj, (a) X 0 i. ¯ MF T =O p 1 N +O p 1 √ NT , (b) X 0 i. ¯ MX j. T = X 0 i. M f X j. T +O p 1 N +O p 1 √ NT , (c) X 0 i. ¯ Me j. T = X 0 i. M f e j. T +O p 1 N +O p 1 √ NT , (d) e 0 i. ¯ Me j. T = e 0 i. M f e j. T +O p 1 N √ T +O p 1 √ NT +O p 1 N 2 , (e) e 0 i. ¯ MF T =O p 1 N +O p 1 √ NT , (f) F 0 ¯ MF T =O p 1 N , (g) X 0 i. ¯ M¯ T = X 0 i. M f ¯ T +O p 1 N . Lemma A.6. Under Assumptions 1.1-1.7, (a) 1 NT Q 0 M b (I T ⊗ Γ) f =O p 1 N +O p 1 √ NT , (b) 1 NT Q 0 M b (I T ⊗ B) e =O p 1 N +O p 1 √ NT , (c) 1 NT Q 0 M b (I T ⊗ B) X = 1 NT Q 0 M b f (I T ⊗ B)X +O p 1 N +O p 1 √ NT , where B = (b ij ) is anyN×N nonstochastic matrix with bounded row and column norms. Lemma A.7. Under Assumptions 1.1-1.6, for anyN×N nonstochastic matrix B = (b ij ) with bounded row and column norms, (a) 1 NT e 0 ¯ M⊗ B e− 1 N P N i=1 b ii σ 2 i =o p (1), (b) 1 NT f 0 ¯ M⊗ Γ 0 B e =O p 1 N +O p 1 √ NT , (c) 1 NT f 0 ¯ M⊗ Γ 0 BΓ f =O p 1 N . Lemma A.8. Under Assumption 1.2, for any twoN×N nonstochastic matrices B and D with bounded row and column norms and satisfyingdiag(B) =diag(D) = 0, (a)E [e 0 (I T ⊗ B) e] = 0, (b)E n [e 0 (I T ⊗ B) e] 2 o =Ttr [(B B s )Σ eT ] =T P N i=1 P N j=1 b ji (b ij +b ji )ς eT,ij , (c)E [e 0 (I T ⊗ B) ee 0 (I T ⊗ D) e] =Ttr [(B D s )Σ eT ] =T P N i=1 P N j=1 b ji (d ij +d ji )ς eT,ij , where B s = B + B 0 , D s is defined similarly, andΣ eT = (ς eT,ij ) is anN×N matrix of which the (i,j) th element is given byς eT,ij =T −1 tr (Ω e,i Ω e,j ). 134 Lemma A.9. Consider the following linear-quadratic form:h = e 0 (I T ⊗ B) e+c 0 e, where e is anNT×1 vector of disturbances following the data generating process specified in Assumption 1.2, B is anN×N nonstochastic matrix with bounded row and column norms and satisfiesdiag(B) = 0, and c is anNT× 1 nonstochastic vector such thatsup N,T (NT ) −1 P N i=1 P T t=1 |c it | 2+δ <∞, for someδ> 0. Then the variance ofh is given by σ 2 h = N X i=1 N X j=1 b ji (b ij +b ji )tr (Ω e,i Ω e,j ) + N X i=1 c 0 i. Ω e,i c i. . If (NT ) −1 σ 2 h is bounded away from zero, we haveh/σ h d →N(0, 1) asN→∞ andT/N→ 0. Lemma A.10. Let A = (a ij ) be anN×N matrix. Then, tr A 2 + AA 0 N − 2 [tr (A)] 2 N 2 ≥ 0, (A.1) for allN, includingN→∞. A.2 Proofs of main theorems Proof of Theorem 1.1 For ease of notation, in this proof we omit the subscript “0” and useγ i , Γ, etc., to denote the true parameters. The key to the proof is to establish the distribution of (NT ) −1/2 Q 0 M b [(I T ⊗ Γ)f + e]. Applying Lemma A.6, we only need to derive the distribution of (NT ) −1/2 P N i=1 X 0 i. ¯ M(Fγ i + e i. ), and then the distribution of (NT ) −1/2 P N i=1 P N l=1 w s il X 0 i. ¯ M(Fγ i + e i. ), for s = 1, 2,..., will readily follow. Let us first consider (NT ) −1/2 P N i=1 X 0 i. ¯ MFγ i . Under Assumption 1.3, γ i = γ +η i , and note that N −1 P N i=1 X 0 i. ¯ MFγ = ¯ X 0 i. ¯ MFγ = 0, we have 1 √ NT N X i=1 X 0 i. ¯ MFγ i = 1 √ NT N X i=1 X 0 i. ¯ MFη i . (A.2) It is shown in the proof of Lemma A.5 that X 0 i. ¯ MF =−A 0 i ¯ C ¯ C 0 −1 ¯ C¯ 0 ¯ M¯ ¯ C 0 ¯ C ¯ C 0 −1 − V 0 i. ¯ M¯ ¯ C 0 ¯ C ¯ C 0 −1 . (A.3) 135 Substituting (A.3) into (A.2) yields 1 √ NT N X i=1 X 0 i. ¯ MFγ i = 1 √ NT N X i=1 −A 0 i ¯ C ¯ C 0 −1 ¯ C¯ 0 ¯ M¯ ¯ C 0 ¯ C ¯ C 0 −1 − V 0 i. ¯ M¯ ¯ C 0 ¯ C ¯ C 0 −1 η i . Since ¯ 0 ¯ M¯ T = ¯ 0 ¯ T − ¯ 0 ¯ Z T ! ¯ Z 0 ¯ Z T ! −1 ¯ Z 0 ¯ T ! =O p 1 N , (A.4) and V 0 i. ¯ M¯ T = V 0 i. ¯ T − V 0 i. ¯ Z T ! ¯ Z 0 ¯ Z T ! −1 ¯ Z 0 ¯ T ! =O p 1 N +O p 1 √ NT , (A.5) and noting that the norms of ¯ C 0 ( ¯ C ¯ C 0 ) −1 are bounded, we get 1 √ NT N X i=1 X 0 i. ¯ MFη i =− 1 √ NT N X i=1 V 0 i. ¯ M¯ ¯ C 0 ¯ C ¯ C 0 −1 η i +O p s T N . Further using (A.5) and noticing that its probability order is dominated by the first term on the right hand side by Lemma A.4, we obtain 1 √ NT N X i=1 X 0 i. ¯ MFη i =− 1 N N X i=1 √ NV 0 i. ¯ √ T ¯ C 0 ¯ C ¯ C 0 −1 η i +O p s T N . Now that it is readily seen that ¯ C 0 ¯ C ¯ C 0 −1 − ¯ C 0 −i ¯ C −i ¯ C 0 −i −1 =O p N −1 , where ¯ C −i is constructed in a similar way as ¯ C but excluding Φ i , and by a weak law of large numbers for martingale difference triangular array we can establish that 1 N N X i=1 √ NV 0 i. ¯ √ T ¯ C 0 −i ¯ C −i ¯ C 0 −i −1 η i p → 0, asN→∞ andT/N→ 0, sinceη i are i.i.d. with zero mean and are independent of all the stochastic quantities in the model, and E √ NV 0 i. ¯ / √ T ¯ C 0 −i ¯ C −i ¯ C 0 −i −1 η i 2 <∞. Hence, it follows that 1 √ NT N X i=1 X 0 i. ¯ MFγ i p → 0, asN→∞ andT/N→ 0. 136 We next turn to analyzing the distribution of (NT ) −1/2 P N i=1 X 0 i. ¯ Me i. . Let Π = F ¯ C. It can be shown that 1 √ NT N X i=1 X 0 i. ¯ Me i. = 1 √ NT N X i=1 X 0 i. M f e i. + 1 N N X i=1 X 0 i. Π T Π 0 Π T ! −1 √ N¯ 0 e i. √ T ! +O p s T N . (A.6) The first term on the right-hand side of (A.6) follows a distribution 1 √ N N X i=1 X 0 i. M f e i. √ T d →N(0, Ω XMe ), where Ω XMe = lim N→∞ N −1 P N i=1 S iXMe ,S iXMe = plim T→∞ T −1 X 0 i. M f E (e i. e 0 i. ) M f X i. , because X 0 i. M f e i. √ T = V 0 i. e i. √ T − 1 √ T V 0 i. F √ T F 0 F T −1 F 0 e i. √ T = V 0 i. e i. √ T +O p 1 √ T , andT −1/2 V 0 i. e i. = O p (1) under Assumption 1.2. For the second term on the right-hand side of (A.6), we have 1 N N X i=1 X 0 i. Π T Π 0 Π T −1 √ N¯ 0 ei. √ T = 1 N N X i=1 X 0 i. Π T Π 0 Π T −1 √ N¯ 0 −i ei. √ T +Op r T N ! , where we used that T −1 ¯ 0 e i. −T −1 ¯ 0 −i e i. = O p N −1 . Applying a weak law of large numbers for a martingale difference triangular array with finite second moment leads to 1 N N X i=1 X 0 i. Π T Π 0 Π T ! −1 √ N¯ 0 −i e i. √ T ! p → 0, asN→∞. Thus, as (N,T ) j →∞ andT/N→ 0, 1 N N X i=1 X 0 i. Π T Π 0 Π T ! −1 √ N¯ 0 e i. √ T ! p → 0, and it follows that 1 √ NT N X i=1 X 0 i. ¯ Me i. d →N (0, Ω XMe ). As a result, as (N,T ) j →∞ andT/N→ 0, we have √ NT ˆ δ 2sls −δ 0 d → N (0, Σ 2sls ), where Σ 2sls is given by (1.26). 137 Proof of Theorem 1.2 Since √ NT ˆ δ b2sls −δ 0 = 1 NT ˆ Q ∗0 L −1 1 √ NT ˆ Q ∗0 [(I T ⊗ Γ 0 )f + e], to establish the asymptotic distribution, it suffices to show that plim N,T→∞ 1 NT ˆ Q ∗0 L = plim N,T→∞ 1 NT L 0 0 M b f L 0 , (A.7) and 1 √ NT ˆ Q ∗0 [(I T ⊗ Γ 0 )f + e] d →N (0, Ω LMe ). (A.8) Substituting Y = I T ⊗ S −1 0 [Xβ 0 + (I T ⊗ Γ 0 ) f + e] into the definition of L yields L = L 0 + [(I T ⊗ Γ 0 ) f + e, 0], and it follows that 1 NT ˆ Q ∗0 L = 1 NT h (I T ⊗ G(ˆ ρ)) X ˆ β, X i 0 M b L 0 + 1 NT h (I T ⊗ G(ˆ ρ)) X ˆ β, X i 0 M b [(I T ⊗ Γ 0 )f + e, 0]. Using the first-order Taylor expansion of G(ˆ ρ), we have W(I N − ˆ ρW) −1 = G 0 + W(I N − ˆ ρW) −1 G 0 (ˆ ρ−ρ 0 ). Applying Lemma A.6, and using ˆ ρ =ρ +o p (1) and ˆ β =β 0 +o p (1), we obtain 1 NT h (I T ⊗ G(ˆ ρ)) X ˆ β, X i 0 M b L 0 = 1 NT L 0 0 M b f L 0 +o p (1), 1 NT h (I T ⊗ G(ˆ ρ)) X ˆ β, X i 0 M b [(I T ⊗ Γ 0 )f + e, 0] =o p (1). Thus, the result in (A.7) is proved. The claim in (A.8) can be established using an argument similar to the one in the proof of Proposition 1.1. Now in order to examine if ˆ Q ∗ is the best IV , we need to compare the asymptotic variances Σ b2sls with Σ 2sls . Notice that L 0 0 P Q,f L 0 = L 0 0 M b f Q(Q 0 M b f Q) −1 Q 0 M b f L 0 ≤ L 0 0 M b f L 0 , 138 and hence Ψ LPL ≤ Ψ LML . If the disturbances{e it } are independently and identically distributed with mean zero and varianceσ 2 e , then Σ b2sls = σ 2 e Ψ −1 LML ≤ σ 2 e Ψ −1 LPL = Σ 2sls . However, in general we cannot conclude that ˆ Q ∗ is optimal, because Ω e,i. is unknown and Ω LPe could be greater than Ω LMe . Proof of Theorem 1.3 Consistency Under the identification conditions for this model, it suffices to show that (NT ) −1 A w NT g NT (δ) con- verges to its mean uniformly inδ∈Δ sp and the limit equals zero atδ 0 . Notice that ξ(δ) = [I T ⊗ S(ρ)] I T ⊗ S −1 0 [Xβ 0 + (I T ⊗ Γ 0 )f + e]−Xβ. Since S(ρ)S −1 0 = [S 0 + (ρ 0 −ρ)W] S −1 0 = I N + (ρ 0 −ρ)G 0 , where G 0 = WS −1 0 , we then obtain ξ(δ) = [I NT + (ρ 0 −ρ)(I T ⊗ G 0 )] [Xβ 0 + (I T ⊗ Γ 0 )f]−Xβ + I T ⊗ h S(ρ)S −1 0 i e =X(β 0 −β) + (ρ 0 −ρ) (I T ⊗ G 0 ) [Xβ 0 + (I T ⊗ Γ 0 )f] + (I T ⊗ Γ 0 )f + I T ⊗ h S(ρ)S −1 0 i e =d(δ) + r ξ (δ). (A.9) where d (δ) = X (β 0 −β) + J (ρ 0 −ρ) (A.10) J = (I T ⊗ G 0 ) [Xβ 0 + (I T ⊗ Γ 0 )f], (A.11) r ξ (δ) = (I T ⊗ Γ 0 ) f + I T ⊗ h S(ρ)S −1 0 i e. (A.12) Let A w NT = a (1) , a (2) ,..., a (r) , A (Q) , where a (l) , forl = 1, 2,...,r, arek a × 1 vectors, and A (Q) is a k a ×q matrix. By definition, we have 1 NT A w NT g NT (δ) = 1 NT r X l=1 a (l) h ξ 0 (δ) M b P b l M b ξ (δ) i + 1 NT A (Q) Q 0 M b ξ (δ). (A.13) 139 Expanding the first term of (A.13) produces 1 NT r X l=1 a (l) h ξ 0 (δ) M b P b l M b ξ (δ) i =$ 1 + 2$ 2 +$ 3 , where $ 1 = 1 NT r X l=1 a (l) h d 0 (δ) M b P b l M b d (δ) i , $ 2 = 1 NT r X l=1 a (l) h d 0 (δ) M b P b l M b r ξ (δ) i , $ 3 = 1 NT r X l=1 a (l) h r 0 ξ (δ) M b P b l M b r ξ (δ) i . Note that S(ρ)S −1 0 has bounded row and column norms, and so do the products of S(ρ)S −1 0 , P l , and h S(ρ)S −1 0 i 0 . Also notice that M b P b l M b = ¯ M⊗ P l . Applying Lemma A.6 and A.7, we obtain that$ 1 ,$ 2 and$ 3 converge uniformly to their means, respectively. In addition, the second term in (A.13) converges uniformly to zero. Hence, we establish the uniform convergence of (NT ) −1 A w NT g NT (δ). Furthermore, its limit equals zero at the true valueδ 0 . This can be verified by noticing thatξ(δ 0 ) = (I T ⊗ Γ 0 ) f + e, and E e 0 M b f P b l M b f e =tr [E (M f ⊗ P l )E (ee 0 )] = 0. Asymptotic distribution We omit the subscript and let ˆ δ denote the GMM estimator in this proof. By a mean value expansion of ∂g 0 NT ( ˆ δ) ∂ A w0 NT A w NT g NT ˆ δ = 0 around the true value,δ 0 , we obtain √ NT ˆ δ−δ 0 =− 1 NT ∂g 0 NT ˆ δ ∂δ A w0 NT A w NT 1 NT ∂g NT ¨ δ ∂δ 0 −1 1 NT ∂g 0 NT ˆ δ ∂δ A w0 NT 1 √ NT A w NT g NT (δ 0 ), where ¨ δ is a point between ˆ δ and δ 0 . For any δ in the parameter space Δ sp , we have ∂ξ (δ)/∂δ 0 = − [(I T ⊗ W)Y, X], and it follows that ∂g (δ) ∂δ 0 =− h M b P sb 1 M b ξ (δ),..., M b P sb r M b ξ (δ), M b Q i 0 [(I T ⊗ W) Y, X], 140 where P sb l = I T ⊗ P s l and P s l = P l + P 0 l , forl = 1, 2,...,r. Since 1 NT ξ 0 (δ) M b P sb l M b (I T ⊗ W) Y = 1 NT ξ 0 (δ) ¯ M⊗ P s l G 0 Xβ 0 + 1 NT ξ 0 (δ) h ¯ M⊗ P s l G 0 e + ¯ M⊗ P s l G 0 Γ 0 f i , (A.14) by Lemma A.6 and A.7, at true valueδ 0 the above equation (A.14) can be rewritten as 1 NT [(I T ⊗ Γ 0 ) f + e] 0 M b P sb l M b (I T ⊗ W) Y = 1 N N X i=1 ˜ g s ii,l σ 2 i +o p (1), where ˜ g s ii,l is thei th diagonal element of matrix ˜ G l (ρ 0 ) = P s l G 0 , and (NT ) −1 e 0 M b P s l M b X = o p (1). In addition, we have 1 NT Q 0 M b (I T ⊗ W) Y = 1 NT Q 0 (M f ⊗ G 0 ) Xβ 0 +o p (1). It then follows that (NT ) −1 ∂g 0 NT (δ)/∂δ =−D +o p (1), where D is given by (1.42). Finally, applying the Central Limit Theorem given by Lemma A.9 for the linear and quadratic forms establishes 1 √ NT A w NT g NT (δ 0 ) = 1 √ NT " r 0 ξ (δ 0 ) r X l=1 a (l) M b P b l M b ! r ξ (δ 0 ) + A (Q) Q 0 M b r ξ (δ 0 ) # d →N 0, A w0 Σ g A w , (A.15) where Σ g is given by (1.44), and this completes the proof. A.3 Data appendix The house price indices for Metropolitan Statistical Areas (MSAs) at monthly frequency are obtained from the website of Freddie Mac: http://www.freddiemac.com/finance/fmhpi/archive.html. The quarterly values are computed by taking the three-month arithmetic averages. The annual Consumer Price Index (CPI) series for all urban areas is sourced from website of the Bureau of Labor Statistics: http://data.bls.gov/pdq/querytool.jsp?survey=cu. The CPI for each MSA is constructed 141 from the corresponding state CPI, and the missing observations for a few area-year combinations are replaced by the US averages. The data on annual income per capita and population at the MSA level are obtained from the website of the Bureau of Economic Analysis (BEA): http://bea.gov/regional/downloadzip.cfm. The quarterly values of CPI, income, and population are computed from annual series, following the interpolation method given in Appendix B.3 of the Global Vector AutoRegressive (GV AR) Toolbox User Guide, which is available at the GV AR modeling website: https://sites.google.com/site/gvarmodelling/gvar-toolbox/download. When regional factors are taken into consideration, all MSAs are grouped into eight Regions based on the specifications by the BEA (see Figure A.1). 1 Figure A.1: The eight BEA Regions of the United States The geodesic distance between MSAs is calculated by the Haversine formula, using the Latitude- Longitude of zip codes of the corresponding MSAs. The data on MSA-to-MSA migration flows are sourced from the 2010-2014 American Community Survey (ACS) 5-year estimates by the United States Census Bureau. The flow estimates with coefficients of variation higher than 20% are dropped from the sample. Figure A.2 shows the histogram of distance between area of origin and area of destination, based on the migration weights matrix. 1 The eight BEA Regions are New England, Mideast, Great Lakes, Plains, Southeast, Southwest, Rocky Mountain, and Far West Regions. See the BEA web page, https://www.bea.gov/regional/docs/regions.cfm, for details. 142 0 500 1000 1500 2000 2500 3000 Distance between MSAs (miles) 0 100 200 300 400 500 600 Number of flows Figure A.2: Histogram of inter-MSA migration distance, based on the migration flow matrix W m Notes: This figure shows the distribution of distance between two MSAs that have a migration flow between them, as indicated by a nonzero entry in the migration weights matrix Wm. The inflows and outflows between two MSAs are considered as two flows. Table A.1 reports the summary measures of different spatial weights matrices used in the analysis, and Figure A.3 presents the respective intensity plots. Table A.1 Summary of the spatial weights matrices W 75 W 100 W 125 W m ˆ W + ˆ W − Links Mean 3.31 5.73 8.65 4.46 10.41 7.53 Max 12 20 27 59 35 35 Total 1,246 2,162 3,260 1,681 3,926 2,838 Network density 0.88% 1.53% 2.30% 1.19% 2.77% 2.00% Isolated MSAs 39 15 10 53 3 45 Dimensions 377× 377 Notes: W d denotes radial distance weights matrix with threshold distanced (miles). Wm represents weights matrix based on MSA- to-MSA migration flows. ˆ W + ( ˆ W − ) is constructed from significantly positive (negative) pairwise correlations of de-factored house price changes. The total number of links equals the number of nonzero elements in the weights matrix. Network density is computed by dividing the sum of existing links by the number of all possible links. 143 (a) W75 (b) W100 (c) W125 (d) Wm (e) ˆ W + (f) ˆ W − Figure A.3: Intensity plots of the spatial weights matrices Notes: The 377 MSAs are sorted first by Region and then by State. The Regions and States are ordered from the East Coast to the West Coast. Zero elements of the weights matrix are plotted in white. Higher values are represented by darker colors. See the notes to Table A.1. 144 Appendix B Appendix to Chapter 2 B.1 Lemmas Lemma B.1. Let A be anN×N matrix whose entries are non-negative and each row sums up to 1. Then λ 1 (A) = 1, whereλ 1 (A) is the largest eigenvalue of A, and I N −ρA is invertible given that|ρ|< 1. Proof. Matrix A is as a right stochastic matrix, andλ 1 (A) = 1 follows. See, for example, Property 10.1.2 in Stewart (2009). Given that|ρ|< 1 andλ 1 (A) = 1, it then readily seen that all eigenvalues of I N −ρA are strictly positive in absolute value, and hence invertible. Remark B.1. It should be noted that this lemma holds irrespective of whether A has bounded column matrix norm. Also note thatλ 1 (A 0 ) = 1 and I N −ρA 0 is invertible, since a matrix and its transpose always have the same set of eigenvalues. Lemma B.2. Let A be anN×N matrix and B = I N −ρA. Suppose that |ρ|< max (1/kAk ∞ , 1/kAk 1 ). Then B −1 has bounded row and column sum matrix norms. Proof. See Pesaran (2015, p.756). B.2 Multiple dominant units This appendix extends the analysis of Section 2.5 to the scenario where there are more than one dominant unit in the network. Specifically, we assume that the firstm units are dominant with degrees of dominance 145 {δ 1, δ 2 ,...,δ m }, and the restn units are non-dominant, withδ i = 0 , fori =m + 1,m + 2,....,m +n , and letN =m +n. Consider now the following partitioned version of model (2.23) x 1t x 2t = ρW 11 ρW 12 ρW 21 ρW 22 x 1t x 2t + g 1t g 2t , where x 1t = (x 1t ,x 2t ,...,x mt ) 0 , x 2t = (x m+1,t ,x m+2,t ,...,x Nt ) 0 , W 11 is them×m weight matrix asso- ciated with the dominant units, W 22 is then×n weight matrix associated with the non-dominant units and assumed to satisfy|ρ|kW 22 k 1 < 1, and g 1t = (g 1t ,g 2t ,...,g mt ) 0 , g 2t = (g m+1,t ,g m+2,t ,...,g Nt ) 0 , where g it =−b i −α(γ i f t +ε it ), fori = 1, 2,...,N. As%(W 22 )≤ 1 and|ρ|< 1, we have x 2t = S −1 22 (ρW 21 x 1t + g 2t ), (B.1) where S 22 = I n −ρW 22 . Substituting (B.1) into x 1t =ρW 11 x 1t +ρW 12 x 2t + g 1t , and rearranging yields x 1t = Z −1 1 g 1t +ρZ −1 1 W 12 S −1 22 g 2t , (B.2) where Z 1 = I m −ρW 11 −ρ 2 W 12 S −1 22 W 21 , (B.3) and Z 1 is invertible as (I N −ρW) is nonsingular by Lemma B.1 in Appendix B.1. Now consider the cross-section average ofx it fori = 1, 2,...,N, x Nt =N −1 τ 0 m x 1t +τ 0 n x 2t . Using (B.1) in the above equation gives x Nt =N −1 h τ 0 m +ρτ 0 n S −1 22 W 21 x 1t +τ 0 n S −1 22 g 2t i , 146 and by the definition of g 1t we obtain x Nt =N −1 −a n +θ 0 n x 1t −αψ n f t −αφ 0 n ε 2t , wherea n =τ 0 n S −1 22 b 2 ,θ 0 n =τ 0 m +ρφ 0 n W 21 ,ψ n =φ 0 n γ 2 ,φ 0 n =τ 0 n S −1 22 , with b 2 = (b m+1 ,b m+2 ,...,b N ) 0 andγ 2 = (γ m+1 ,γ m+2 ,...,γ N ) 0 . We will deriveVar(x Nt ) and inspect its asymptotic order of magnitude asN→∞ following similar steps as in Section 2.5. First, as with the case of the SAR(1) model with one dominant unit, we have 1 < φ min ≤ φ max < K <∞, whereφ 0 n = (φ m+1 ,φ m+2 ,...,φ N ), φ min = min(φ m+1 ,φ m+2 ,...,φ N ), andφ max = max(φ m+1 ,φ m+2 ,...,φ N ). Also, it readily follows thata n = O(1) andVar N −1 φ 0 n ε 2t = N −1 . Considering now the terms due to the dominant units, and using (B.2) note that Cov x 1t ,N −1 φ 0 n ε 2t =Cov Z −1 1 g 1t +ρZ −1 1 W 12 S −1 22 g 2t ,N −1 φ 0 n ε 2t (B.4) =−N −1 ραZ −1 1 W 12 S −1 22 V 22,ε S −10 22 τ n , Cov (x 1t ,f t ) =−αVar(f t ) Z −1 1 γ 1 +ρZ −1 1 W 12 S −1 22 γ 1 , (B.5) Var (x 1t ) = α 2 Z −1 1 V 11,ε Z 0−1 1 +α 2 ρ 2 Z −1 1 W 12 S −1 22 V 22,ε S −10 22 W 0 12 Z 0−1 1 (B.6) +α 2 Var(f t ) Z −1 1 γ 1 γ 0 1 Z 0−1 1 +ρ 2 Z −1 1 W 12 S −1 22 γ 2 γ 0 2 S −10 22 W 0 12 Z 0−1 1 , where V 11,ε =diag σ 2 1 ,σ 2 2 ,...,σ 2 m , V 22,ε =diag σ 2 m+1 ,σ 2 m+2 ,...,σ 2 N , andγ 1 = (γ 1 ,γ 2 ,...,γ m ) 0 . Consider now the individual terms ofVar(x Nt ), which is given by Var (x Nt ) = N −2 θ 0 n Var(x 1t )θ n − 2αN −2 θ 0 n Cov x 1t ,φ 0 n ε 2t +α 2 N −2 Var φ 0 n ε 2t +α 2 N −2 Var(f t ) ψ 2 n + 2ψ n θ 0 n Z −1 1 γ 1 + 2ρψ n θ 0 n Z −1 1 W 12 S −1 22 γ 2 . In the case where the network containsm dominant units but is not subject to common shocks, Var (x Nt ) =N −2 θ 0 n Var(x 1t )θ n − 2αN −2 θ 0 n Cov x 1t ,φ 0 n ε 2t + (N −1 ). (B.7) 147 Consider thei th element ofN −1 θ n , denoted byN −1 θ i,n , fori = 1, 2,...,m, and note thatm is fixed and does not rise withN. Then by definition,N −1 θ i,n =N −1 1 +ρφ 0 n w ·i,21 , where w ·i,21 is thei th column of W 21 . Hence φ min N −1 n X j=1 w ji,21 ≤N −1 φ 0 n w ·i,21 ≤φ max N −1 n X j=1 w ji,21 , and N −1 +φ min N −1 n X j=1 w ji,21 ≤N −1 θ i,n ≤N −1 +φ max N −1 n X j=1 w ji,21 . (B.8) Also note that w 0 ·i,21 τ n = (N δ i ), which immediately follows that w 0 ·i,21 τ n + w 0 ·i,11 τ m =d i =κ i N δ i , withm being fixed. Therefore, by (B.8) it follows thatN −1 θ i,n = (N δ i −1 ), fori = 1, 2,...,m, and then N −2 θ 0 n θ n = (N 2δmax−2 ), where 0<δ max = max (δ 1 ,δ 2 ,...,δ m )≤ 1. Further notice that N −2 θ 0 n θ n λ m [Var(x 1t )]≤N −2 θ 0 n Var(x 1t )θ n ≤N −2 θ 0 n θ n λ 1 [Var(x 1t )], where λ 1 [Var(x 1t )] and λ m [Var(x 1t )] denote the largest and smallest eigenvalue of Var(x 1t ), respec- tively, and 0 < λ m [Var(x 1t )]≤ λ 1 [Var(x 1t )] < K <∞. Hence we obtain N −2 θ 0 n Var(x 1t )θ n = (N 2δmax−2 ). Turning now to thej th element of the covariance term, forj = 1, 2,...,m, we have Cov x jt ,N −1 φ 0 n ε 2t ≤N −1 |ρα| Z −1 j.,1 ∞ kW 12 k ∞ S −1 22 ∞ kV 22,ε k ∞ kφ n k ∞ , where Z −1 j.,1 denotes thej th row of Z −1 1 , and using similar line of reasoning as in the main text it is easily verified that Cov x jt ,N −1 φ 0 n ε 2t =O N −1 , and thereforeN −2 θ 0 n Cov x 1t ,φ 0 n ε 2t =O(N δmax−2 ). Consequently, in the absence of common shocks, we have demonstrated that Var (x Nt ) = (N 2δmax−2 ) + (N −1 ), which clearly shows that the rate of convergence of x Nt depends on the strongest dominant unit in the network. 148 Finally, if the network is subject to both dominant units and a common factor, it immediately follows fromN −1 ψ n = (N δγ−1 ) and similar arguments as before that Var (x Nt ) = (N 2δmax−2 ) + (N 2δγ−2 ) + (N −1 ), which is a direct extension of (2.56) to the multiple-dominant-units network, and shows that the relative magnitudes betweenδ max andδ γ determines the limiting properties of the aggregate effects. B.3 Consistency of ˆ δ max First we note that sincez i are distributed independently with finite means and variances then E (¯ z N ) =N −1 N X i=1 E (z i ) =β −1 Pr (z≥ 0) +E (z|z< 0) [1− Pr (z≥ 0)], which is finite. Further using standard results for the moments of ordered random variables (see, for example, Section 4.6 of Arnold et al. (1992)) we have E z (i) = (1/β) N−i+1 X j=1 1 j , Var(z (i) ) = (1/β) 2 N−i+1 X j=1 1 j 2 , fori = 1, 2,...,N. (B.9) Taking expectations and variance of ˆ δ max given by (2.69), and making use of the above results we now have E ˆ δ max = E (z max )−E (¯ z N ) lnN = (1/β) P N j=1 j −1 −E (¯ z N ) lnN , (B.10) Var ˆ δ max = Var (z max ) +N −2 P N i=1 Var(z i )− 2N −1 P N i=1 Cov(z max ,z (i) ) (lnN) 2 = Var (z max ) +N −2 P N i=1 Var(z i )− 2N −1 P N i=1 Var(z (i) ) (lnN) 2 . (B.11) Also using well known bounds to harmonic series (see, for example, Section 3.1 and 3.2 of Bonar et al. (2006)), we have ln(N + 1)< P n j=1 1 j ≤ 1 + lnN, and hence lim N→∞ P N j=1 j −1 lnN = 1. (B.12) 149 Using (B.9) and (B.12) in (B.10) we now have lim N→∞ E ˆ δ max = 1/β. Turning to the variance of ˆ δ max , we note that Var ˆ δ max = Var (z max ) + 1−2N N 2 P N i=1 Var(z (i) ) (lnN) 2 , (lnN) −2 Var (z max ) ≤ Var ˆ δ max ≤ (lnN) −2 Var (z max ) + 2N− 1 N 2 NVar (z max ) , (lnN) −2 δ 2 N X j=1 1 j 2 ≤ Var ˆ δ max ≤ (lnN) −2 3N− 1 N δ 2 N X j=1 1 j 2 . But P N j=1 j −2 ≤π 2 /6, and henceVar ˆ δ max =O h (lnN) −2 i . B.4 Data appendix The input-output accounts data are obtained from the Bureau of Economic Analysis (BEA) website at http://www.bea.gov/industry/io_annual.htm. The input-output tables at the finest level of disaggregation are compiled every five years, and the latest available data are for year 2007. We derive the Commodity-by- Commodity Direct Requirements (DR) table by applying the following formula: DR = (TR− I) (TR) −1 , where I is an identity matrix, and TR denotes the Commodity-by-Commodity Total Requirements table that is available from the BEA. The input-output matrix, W, is set to the transpose of DR and row-standardized so that the intermediate input shares sum to one for each sector. The sectors without any direct requirements and those with zero outdegrees are excluded from W. 150 Appendix C Appendix to Chapter 3 C.1 Lemmas and supplementary theorem The proofs of the following lemmas and supplementary theorem are provided in the Appendix of Pesaran and Yang (2018). Lemma C.1. Suppose that the elements of ann×n matrix A are uniformly bounded in absolute value, and B is ann×n matrix such that (i)||B|| 1 < K, then the elements of AB are uniformly bounded in absolute value and Tr (AB) = O (n). (ii)||B|| ∞ < K, then the elements of BA are uniformly bounded in absolute value andTr (BA) = O (n). Lemma C.2. Suppose that A and B are n× n matrices satisfying||A|| ∞ < K,||B|| ∞ < K, then ||AB|| ∞ <K. Lemma C.3. Let A be ann×n matrix and b be ann× 1 vector. (i) If||A|| 1 <K, and||b|| 1 =O(n δ ), 0≤δ≤ 1, then||Ab|| 1 =O(n δ ). (ii) If||A|| 1 =O(n δ ), 0≤δ≤ 1, and||b|| 1 <K, then||Ab|| 1 =O(n δ ). Lemma C.4. Suppose that W is ann×n matrix that satisfies Assumptions 3.4 and 3.5. Then (i)||S −1 || ∞ <K, and||S −1 || 1 =O(n δ ), where S = I n −ρW, and|ρ|< 1. (ii)||G|| ∞ <K, and||G|| 1 =O(n δ ), where G = WS −1 . Lemma C.5. Let A = (a ij ) and B = (b ij ) ben×n matrices such that||A|| ∞ <K <∞,||B|| ∞ <K < ∞, and||B|| 1 = O n δ , where 0≤ δ≤ 1. Let C = (c ij ) be ann×n matrix with uniformly bounded elements. Then (i)Tr (A 0 BB 0 A) =O n δ+1 . (ii)Tr h (A 0 B) 2 i =O n δ+1 . (iii)Tr (AB 0 C) =O n δ+1 . 151 Lemma C.6. Consider G(ρ) = WS −1 (ρ) = W (I n −ρW) −1 and suppose that Assumptions 3.2, 3.4, and 3.5 hold. Then τ 0 n G 0 (ρ)τ n =τ 0 n G(ρ)τ n = n 1−ρ , (C.1) τ 0 n G 0 (ρ)G(ρ)τ n = n (1−ρ) 2 , (C.2) Tr h n −1 G s (ρ) i ≤K, fors = 1, 2,..., (C.3) Tr n −1 G 0 G ≤K. (C.4) Lemma C.7. Letξ = (ξ 1 ,ξ 2 ,...,ξ n ) 0 = Xβ and suppose that|E(ξ i ξ j )| =|σ ξ (i,j)|<K, for alli andj. Then n −1 β 0 X 0 G 0 (ρ) M x G (ρ) Xβ≤K, (C.5) E h n −1 ε 0 G 0 (ρ) M x ε i =n −1 σ 2 Tr G 0 (ρ) M x =O(1), (C.6) Var h n −1 ε 0 G 0 (ρ) M x G (ρ) Xβ i =O n −1 , (C.7) Var h n −1 β 0 X 0 G 0 (ρ) M x ε i =O n −1 , (C.8) Var h n −1 ε 0 G 0 (ρ) M x ε i =O n −1 , (C.9) Lemma C.8. Consider the random variablesε = (ε 1 ,ε 2 ,...,ε n ) 0 , where ε i ∼IID(0,σ 2 ) for all i and suppose that its fourth moment, denoted byμ 4 = E(ε 4 i ), exists. Then for anyn×n matrix A = (a ij ), we have (i)E(ε 0 Aε) =σ 2 Tr(A), (ii)E(ε 0 Aε) 2 = (μ 4 − 3σ 4 ) P n i=1 a 2 ii +σ 4 Tr 2 (A) +Tr(AA 0 ) +Tr(A 2 ) , (iii)Var(ε 0 Aε) = (μ 4 − 3σ 4 ) P n i=1 a 2 ii +σ 4 Tr(AA 0 ) +Tr(A 2 ) . 152 Theorem C.1. Let{ε in , i = 1, 2,...,n} denote a stochastic array with IID 0,σ 2 n elements andε n = (ε 1n ,ε 2n ,...,ε nn ) 0 . Suppose that sup i E|ε in | 4+ <K, for some> 0,μ 3,n =E(ε 3 in ), andμ 4,n =E(ε 4 in ). Let P n = (p ij,n ) be an array ofn×n constant matrices that satisfy the following conditions: kP n k ∞ = sup i n X j=1 |p ij,n |<K, (C.10) kP n k 1 = sup j n X i=1 |p ij,n | =O n δ , δ≥ 0, (C.11) and P n has a finite number of unbounded columns. Let A n = (a ij,n ) = (P n + P 0 n )/2. Let z n = (z 1n ,z 2n ,...,z nn ) 0 be a vector of random variables, where z in is distributed independently of the errors ε jn , for alli andj,E(z in ) = μ z,in ,Var(z in ) = σ 2 z,in > 0, and sup i E |z in | 2+ς < K, for someς > 0. Let $ n = (μ 4,n − 3σ 4 n )n −1 n X i=1 a 2 ii,n + 2σ 4 n Tr(n −1 A 2 n ) +σ 2 n n −1 n X i=1 σ 2 z,in + 2μ 3,n n −1 n X i=1 a ii,n μ z,in , (C.12) and assume that$ 2 n >c> 0 for alln includingn→∞. Then, we have ε 0 n A n ε n +ε 0 n z n −σ 2 n Tr (A n ) $ n √ n → d N(0, 1), (C.13) asn→∞, ifδ< 1/2. C.2 Proofs of main theorems Proof of Theorem 3.1 Consistency We can rewriteε (ψ) as ε (ψ) =ζ (ψ) +ε + (ρ 0 −ρ)G 0 ε, (C.14) where ζ (ψ) = (ρ 0 −ρ) G 0 Xβ 0 + X (β 0 −β). 153 Let C = (c ij ) = (B + B 0 )/2. The quadratic moment can be expressed as the sum of three terms: n −1 ε 0 (ψ) Bε (ψ) = 2h (ψ) +r (ρ) +u (ψ), where h (ψ) =n −1 ζ 0 (ψ) C [ε + (ρ 0 −ρ)G 0 ε], r (ρ) =n −1 [ε + (ρ 0 −ρ)G 0 ε] 0 B [ε + (ρ 0 −ρ)G 0 ε]. u (ψ) =n −1 ζ 0 (ψ) Bζ (ψ). We will show in turn thath (ψ),r (ρ), andu (ψ) converge uniformly inψ∈ Ψ. By expansion, h (ψ) =h 1 (ψ) +h 2 (ψ) +h 3 (ψ) +h 4 (ψ), where h 1 (ψ) =n −1 (β 0 −β) 0 X 0 Cε, h 2 (ψ) =n −1 (ρ 0 −ρ)β 0 0 X 0 G 0 0 Cε, h 3 (ψ) =n −1 (ρ 0 −ρ) (β 0 −β) 0 X 0 CG 0 ε, h 4 (ψ) =n −1 (ρ 0 −ρ) 2 β 0 0 X 0 G 0 0 CG 0 ε. Clearly,h j (ψ),j = 1, 2, 3, 4, has zero mean due to independence between X andε, and hence it suffices to show thatVar [h j (ψ)] converges to zero. Thenh j (ψ) =o p (1) would follow from Chebyshev’s inequality. The convergence is uniform sinceh (ψ) is equicontinuous on a compact interval. LetE Xβ 0 β 0 0 X 0 = Σ ξ andE h X (β 0 −β) (β 0 −β) 0 X 0 i = Σ { , and note that the elements of Σ ξ and Σ { are uniformly bounded. First, Var [h 1 (ψ)] =n −2 σ 2 0 Tr C 2 Σ { = (1/2)n −2 σ 2 0 h Tr B 2 Σ { +Tr BB 0 Σ { i . Given thatkBk ∞ < K, we have B 2 ∞ < K by Lemma C.2, andTr B 2 Σ { = O (n) by Lemma C.1. Also sincekBk 1 =O n δ , by Lemma C.5 we haveTr (BB 0 Σ { ) =O n δ+1 . Therefore,Var [h 1 (ψ)] = O n δ−1 , andh 1 (ψ) converges to zero ifδ< 1. Second, Var [h 2 (ψ)] =n −2 (ρ 0 −ρ) 2 σ 2 0 Tr G 0 0 C 2 G 0 Σ ξ , 154 where Tr G 0 0 C 2 G 0 Σ ξ = Tr C 2 (G 0 Σ ξ G 0 0 ) . SincekG 0 k ∞ < K by Lemma C.4, the elements of G 0 Σ ξ G 0 0 are uniformly bounded by Lemma C.2. Then, similar arguments used in establishing the order of Tr C 2 Σ { apply which givesTr C 2 (G 0 Σ ξ G 0 0 ) =O n δ+1 . Thus,Var [h 2 (ψ)] =O n δ−1 . Third, Var [h 3 (ψ)] =n −2 (ρ 0 −ρ) 2 σ 2 0 Tr CG 0 G 0 0 CΣ { ≤Kn −2 Tr BG 0 G 0 0 BΣ { +Tr B 0 G 0 G 0 0 BΣ { +Tr BG 0 G 0 0 B 0 Σ { +Tr B 0 G 0 G 0 0 B 0 Σ { ] Noting thatkBG 0 k ∞ <K by Lemma C.2, and the elements of BΣ { are bounded by Lemma C.1. Hence, applying Lemma C.5 leads to Tr (BG 0 G 0 0 BΣ { ) = O n δ+1 . Similarly, noticing that the elements of BΣ { B 0 and Σ { B 0 are bounded, the remaining three traces are also of order O n δ+1 by Lemma C.5. Thus, we haveVar [h 3 (ψ)] =O n δ−1 . Finally, Var [h 4 (ψ)] =n −2 (ρ 0 −ρ) 4 σ 2 0 Tr CG 0 G 0 0 C G 0 Σ ξ G 0 0 , where the elements of G 0 Σ ξ G 0 0 are bounded and hence Var [h 4 (ψ)] = O n δ−1 , which follows from similar arguments as forVar [h 3 (ψ)]. This completes our proof thath (ψ) converges to zero uniformly as long asδ< 1. Turning now tor (ψ), which can be expanded as r (ρ) =n −1 ε 0 Bε + 2n −1 (ρ 0 −ρ)ε 0 G 0 0 Cε +n −1 (ρ 0 −ρ) 2 ε 0 G 0 0 BG 0 ε. (C.15) To establish the uniform convergence ofr (ψ), it is sufficient to show that the following three terms converge uniformly to their means respectively, r 1 (ρ) =n −1 ε 0 Cε, r 2 (ρ) =n −1 ε 0 G 0 0 C + CG 0 ε, r 3 (ρ) =n −1 ε 0 G 0 0 CG 0 ε. For the first term,E [r 1 (ρ)] =n −1 σ 2 0 Tr (C) = 0, and by Lemma C.8, Var [r 1 (ρ)] =n −2 γ 2ε n X i=1 c 2 ii + 2n −2 σ 4 0 Tr(C 2 )≤Kn −2 Tr C 2 , (C.16) 155 whereγ 2ε =E(ε 4 i )− 3σ 4 0 . SinceTr C 2 = (1/2) Tr B 2 +Tr (B 0 B) , whereTr (B 0 B) =O (n) and Tr B 2 =O (n) by Lemma C.1, we haveTr C 2 =O (n) and thenVar [r 1 (ρ)] =O n −1 . Therefore, by Chebyshev’s inequalityr 1 (ρ) converges to zero for all values ofδ. Next,E [r 2 (ρ)] = 2n −1 σ 2 0 Tr (CG 0 0 ). Note that G 0 0 is bounded in column sums in absolute value by Lemma C.4, and the elements of C are uniformly bounded in absolute value. Hence applying Lemma C.1 again leads toE [r 2 (ρ)] =O (1). For the variance ofr 2 (ρ), similarly to (C.16), Var [r 2 (ρ)]≤Kn −2 Tr h G 0 0 C + CG 0 2 i = 2Kn −2 h Tr G 0 0 C 2 G 0 +Tr (CG 0 CG 0 ) i . Expanding C and rearranging gives 2Tr G 0 0 C 2 G 0 = (1/2)Tr (BG 0 BG 0 ) +Tr G 0 BG 0 B 0 + (1/2)Tr h B 0 G 0 2 i , and 2Tr (CG 0 CG 0 ) =Tr B 2 G 0 G 0 0 +Tr G 0 0 BB 0 G 0 . Since the row norms of B and G 0 are bounded, so are the row norms of BG 0 B, G 0 BG 0 , B 2 G 0 by Lemma C.2. Then applying Lemma C.1 yields Tr (BG 0 BG 0 ) = O (n), Tr (G 0 BG 0 B 0 ) = O (n), and Tr B 2 G 0 G 0 0 = O (n). For the remaining two terms, by Lemma C.5 we have Tr (G 0 0 BB 0 G 0 ) = O n δ+1 andTr h (B 0 G 0 ) 2 i =O n δ+1 . Thus,Var [r 2 (ρ)] =O n δ−1 , and by Chebyshev’s inequality r 2 (ρ) converges to its mean ifδ< 1. Moving on to examiner 3 (ρ). We have E [r 3 (ρ)] =n −1 σ 2 0 Tr G 0 0 CG 0 =n −1 σ 2 0 Tr G 0 0 BG 0 , and Var [r 3 (ρ)]≤Kn −2 Tr h G 0 0 CG 0 2 i = (1/2)Kn −2 n Tr h G 0 0 BG 0 2 i +Tr G 0 0 BG 0 G 0 0 B 0 G 0 o . Since BG 0 has bounded row norm, by Lemma C.1 we haveTr (G 0 0 BG 0 ) = O (n), and by Lemma C.5 Tr h (G 0 0 BG 0 ) 2 i = O n δ+1 and Tr (G 0 0 BG 0 G 0 0 B 0 G 0 ) = O n δ+1 . Therefore, E [r 3 (ρ)] = O (1) andVar [r 3 (ρ)] = O n δ−1 , and consequentlyr 3 (ρ) converges to its mean ifδ < 1. In sum, we have 156 established thatr (ρ) =o p (1) provided thatδ< 1, and the uniform convergence follows asr (ρ) is equicon- tinuous on a compact set. Similarly, it can be shown thatu (ψ) uniformly converges to its mean ifδ < 1. The proof is analogous to that ofh (ψ) and omitted to save space. For the linear moment, n −1 ε 0 (ψ) Z =n −1 ζ 0 (ψ) Z +n −1 ε 0 I n + (ρ 0 −ρ)G 0 0 Z, where the second term iso p (1), since its mean is zero and its variance is given by n −2 σ 2 0 Tr [I n + (ρ 0 −ρ)G 0 ] I n + (ρ 0 −ρ)G 0 0 Σ zz =O n δ−1 , where Σ zz =E (ZZ 0 ), and the order is obtained by applying Lemma C.5. Overall, we have established the uniform convergence of g(ψ), which together with the identification conditions leads to the consistency of the GMM estimator ˜ ψ. Asymptotic normality By a mean-value expansion of ∂g 0 ( ˜ ψ) ∂ψ Ξg( ˜ ψ) = 0 aroundψ 0 , we obtain √ n( ˜ ψ−ψ 0 ) =− ∂g 0 ( ˜ ψ) ∂ψ Ξ ∂g( ¯ ψ) ∂ψ 0 ! −1 ∂g 0 ( ˜ ψ) ∂ψ Ξ √ ng(ψ 0 ), where ¯ ψ lies element by element betweenψ 0 and ˆ ψ. Note that ∂g(ψ) ∂ψ 0 =−n −1 [2Cε (ψ), Z] 0 (y ∗ , X), and y ∗ = G 0 Xβ 0 + G 0 ε, we have n −1 ε 0 (ψ) Cy ∗ =n −1 ε 0 (ψ) CG 0 Xβ 0 +n −1 ε 0 (ψ) CG 0 ε. But we have shown that n −1 ε 0 (ψ) CG 0 Xβ 0 =n −1 ζ 0 (ψ) CG 0 Xβ 0 +n −1 ε 0 CG 0 Xβ 0 +n −1 (ρ 0 −ρ)ε 0 G 0 0 CG 0 Xβ 0 =n −1 ζ 0 (ψ) CG 0 Xβ 0 +o p (1), 157 n −1 ε 0 (ψ) CG 0 ε =n −1 ζ 0 (ψ) CG 0 ε +n −1 ε 0 CG 0 ε +n −1 (ρ 0 −ρ)ε 0 G 0 0 CG 0 ε =n −1 σ 2 0 Tr (CG 0 ) +n −1 σ 2 0 (ρ 0 −ρ)Tr G 0 0 CG 0 +o p (1), as long asδ< 1, and consequently n −1 ε 0 (ψ) Cy ∗ =n −1 ζ 0 (ψ) CG 0 Xβ 0 +n −1 σ 2 0 Tr (CG 0 ) +n −1 σ 2 0 (ρ 0 −ρ)Tr G 0 0 CG 0 +o p (1), uniformly inψ∈ Ψ. Atψ 0 , we haveε (ψ 0 ) =ε,ζ (ψ 0 ) = 0, and it follows that n −1 ε 0 Cy ∗ =n −1 σ 2 0 Tr (CG 0 ) +o p (1), and n −1 Z 0 y ∗ =n −1 Z 0 G 0 Xβ 0 +n −1 Z 0 G 0 ε =n −1 Z 0 G 0 Xβ 0 +o p (1), ifδ < 1. Thus,∂g( ¯ ψ)/∂ψ 0 =−D g +o p (1), where D g is given by (3.14). Moreover, by Theorem C.1 we have V −1/2 g √ ng(ψ 0 )→ d N(0, I k+1 ) ifδ < 1/2, where V g is given by (3.14). Hence, the asymptotic distribution of √ n( ˜ ψ−ψ 0 ) is as stated in the theorem. Proof of Theorem 3.3 To establish consistency and asymptotic distribution of the BMM estimators we first note that under the spatial model (3.3) withθ =θ 0 , and using (3.5), we have y−ˆ ρy ∗ =− (ˆ ρ−ρ 0 ) y ∗ + Xβ 0 +ε, and hence M x (y−ˆ ρy ∗ ) =− (ˆ ρ−ρ 0 ) M x y ∗ + M x ε. Using the above results the estimating equations (3.22)–(3.24) can now be written as y ∗0 X n ˆ β−β 0 + y ∗0 y ∗ n (ˆ ρ−ρ 0 ) = y ∗0 ε n − ˆ σ 2 Tr h n −1 G (ˆ ρ) i , (C.17) X 0 X n ˆ β−β 0 + X 0 y ∗ n (ˆ ρ−ρ 0 ) = X 0 ε n , (C.18) 158 and ˆ σ 2 −σ 2 0 = ε 0 M x ε n −σ 2 0 − 2 (ˆ ρ−ρ 0 ) y ∗0 M x ε n + (ˆ ρ−ρ 0 ) 2 y ∗ 0 M x y ∗ n ! . (C.19) Also noting that y ∗ = G 0 Xβ 0 + G 0 ε, where for simplicity we have used G 0 in place of G (ρ 0 ), we have y ∗0 X n = β 0 0 X 0 G 0 0 X n + ε 0 G 0 0 X n , y ∗0 y ∗ n = β 0 0 X 0 G 0 0 G 0 Xβ 0 n + ε 0 G 0 0 G 0 ε n + 2 ε 0 G 0 0 G 0 Xβ 0 n , y ∗0 ε n = β 0 0 X 0 G 0 0 ε n + ε 0 G 0 0 ε n , y ∗ 0 M x y ∗ n = β 0 0 X 0 G 0 0 M x G 0 Xβ 0 n + ε 0 G 0 0 M x G 0 ε n + 2 ε 0 G 0 0 M x G 0 Xβ 0 n , y ∗0 M x ε n = β 0 0 X 0 G 0 0 M x ε n + ε 0 G 0 0 M x ε n . Also, denoting G (ˆ ρ) by ˆ G, ˆ σ 2 Tr n −1 ˆ G −σ 2 0 Tr n −1 G 0 = ˆ σ 2 −σ 2 0 Tr n −1 G 0 +σ 2 0 h Tr n −1 ˆ G −Tr n −1 G 0 i + ˆ σ 2 −σ 2 0 h Tr n −1 ˆ G −Tr n −1 G 0 i . (C.20) But ˆ G− G 0 = W (I n − ˆ ρW) −1 −W (I n −ρ 0 W) −1 = W (I n − ˆ ρW) −1 [(I n −ρ 0 W)− (I n − ˆ ρW)] (I n −ρ 0 W) −1 = (ˆ ρ−ρ 0 ) W (I n − ˆ ρW) −1 W (I n −ρ 0 W) −1 = (ˆ ρ−ρ 0 ) ˆ GG 0 . (C.21) Hence, ˆ G = G 0 + (ˆ ρ−ρ 0 ) ˆ GG 0 , and using this result back in (C.21) now yields ˆ G− G 0 = (ˆ ρ−ρ 0 ) h G 0 + (ˆ ρ−ρ 0 ) ˆ GG 0 i G 0 = (ˆ ρ−ρ 0 ) G 2 0 + R n (ˆ ρ,ρ 0 ), 159 where R n (ˆ ρ,ρ 0 ) = (ˆ ρ−ρ 0 ) 2 G (ˆ ρ) G 2 (ρ 0 ). But by Lemma C.4,kG(ρ)k ∞ < K, and only consider- ing estimates of ρ that satisfy the condition|ˆ ρ| < 1, we havekR n (ˆ ρ,ρ 0 )k ∞ ≤ K|ˆ ρ−ρ 0 | 2 , and hence E n −1 Tr [R n (ˆ ρ,ρ 0 )] ≤KE|ˆ ρ−ρ 0 | 2 , which establishes that n −1 Tr ˆ G− G 0 = (ˆ ρ−ρ 0 )Tr n −1 G 2 0 +O p h (ˆ ρ−ρ 0 ) 2 i . (C.22) Using results in Lemmas C.6 and C.7, it is now readily established that ε 0 G 0 0 X n =O p n −1/2 ; ε 0 G 0 0 G 0 Xβ 0 n =O p n −1/2 ; ε 0 G 0 0 M x G 0 Xβ 0 n =O p n −1/2 ; ε 0 G 0 0 G 0 ε n =Tr G 0 0 G 0 n +O p n −1/2 ; ε 0 G 0 0 M x G 0 ε n =Tr G 0 0 M x G 0 n +O p n −1/2 ; ε 0 G 0 0 ε n =Tr G 0 n +O p n −1/2 ; ε 0 G 0 0 M x ε n =Tr G 0 M x n +O p n −1/2 , and hence ε 0 M x ε n =σ 2 0 +O p n −1/2 , y ∗0 ε n =Tr G 0 n +O p n −1/2 , y ∗0 M x ε n =Tr G 0 M x n +O p n −1/2 , y ∗ 0 M x y ∗ n = β 0 0 X 0 G 0 0 M x G 0 Xβ 0 n +Tr G 0 0 M x G 0 n +O p n −1/2 , and using these results in (C.19) now yields ˆ σ 2 −σ 2 0 = h n −1 ε 0 M x ε −σ 2 0 i − 2 (ˆ ρ−ρ 0 )Tr n −1 G 0 M x +O p h (ˆ ρ−ρ 0 )n −1/2 i +O p h (ˆ ρ−ρ 0 ) 2 i . (C.23) Substituting the above result and (C.22) in (C.20) we have (noting thatTr n −1 G 0 <K) ˆ σ 2 Tr n −1 ˆ G −σ 2 0 Tr n −1 G 0 =Tr n −1 G 0 h n −1 ε 0 M x ε −σ 2 0 i − 2σ 2 0 (ˆ ρ−ρ 0 )Tr n −1 G 0 M x Tr n −1 G 0 +σ 2 0 (ˆ ρ−ρ 0 )Tr n −1 G 2 0 +O p h (ˆ ρ−ρ 0 ) 2 i +O p h (ˆ ρ−ρ 0 )n −1/2 i . (C.24) 160 Using this result in (C.17) and after re-arrangements we have n −1 y ∗0 X ˆ β−β 0 +h n,ρρ (ˆ ρ−ρ 0 ) =h n,ρε +O p h (ˆ ρ−ρ 0 ) 2 i +O p h (ˆ ρ−ρ 0 )n −1/2 i , (C.25) where h n,ρε = n −1 β 0 0 X 0 G 0 0 ε +n −1 ε 0 h G 0 0 − M x Tr n −1 G 0 i ε, h n,ρρ = n −1 y ∗0 y ∗ +σ 2 0 Tr n −1 G 2 0 − 2σ 2 0 Tr n −1 G 0 M x Tr n −1 G 0 . Combining (C.25) and (C.18) we have h n,ρρ y ∗0 X n X 0 y ∗ n X 0 X n ˆ ρ−ρ 0 ˆ β−β 0 = h n,ρε X 0 ε n + O p h (ˆ ρ−ρ 0 ) 2 i +O p h (ˆ ρ−ρ 0 )n −1/2 i 0 . It is also easily seen that h n,ρρ =n −1 β 0 0 X 0 G 0 0 G 0 Xβ 0 +n −1 ε 0 G 0 0 G 0 ε + 2n −1 ε 0 G 0 0 G 0 Xβ 0 +σ 2 0 Tr n −1 G 2 0 − 2σ 2 0 Tr n −1 G 0 M x Tr n −1 G 0 , or h n,ρρ =n −1 β 0 0 X 0 G 0 0 G 0 Xβ 0 +σ 2 0 Tr n −1 G 0 0 G 0 +σ 2 0 Tr n −1 G 2 0 − 2σ 2 0 Tr n −1 G 0 M x Tr n −1 G 0 +O p n −1/2 . Furthermore, using results in Lemmas C.6 and C.7, we have p lim n→∞ h n,ρρ =h ρρ , p lim n→∞ h n,ρε = 0, p lim n→∞ X 0 ε n = 0, p lim n→∞ y ∗0 X n =β 0 0 p lim n→∞ X 0 G 0 0 X n =β 0 0 Σ xgx , p lim n→∞ X 0 X n = Σ xx , 161 whereh ρρ is given by (3.29). Therefore, the BMM estimators are consistent if H, defined in (3.27), is a non- singular matrix. In particular under this condition ˆ ρ−ρ 0 =O p (n −1/2 ). To derive the asymptotic distribution of the BMM estimators we first note that h n,ρρ y ∗0 X n X 0 y ∗ n X 0 X n √ n (ˆ ρ−ρ 0 ) √ n ˆ β−β 0 = √ nh n,ρε X 0 ε √ n + O p h √ n (ˆ ρ−ρ 0 ) 2 i +O p [(ˆ ρ−ρ 0 )] 0 , (C.26) and H √ n ˆ ψ−ψ 0 = √ nh n,ρε +O p n −1/2 X 0 ε √ n . Consider now √ nh n,ρε = β 0 0 X 0 G 0 0 ε √ n + ε 0 Πε √ n , (C.27) where Π = G 0 − M x Tr n −1 G 0 . By Lemma C.4(ii), G 0 satisfies the conditions in (C.10) and (C.11). Therefore, Theorem C.1 is applicable and we have β 0 0 X 0 G 0 0 ε √ n + ε 0 Πε √ n → d N(0,ω 2 ), (C.28) whereω 2 is given by (3.28). Also, X 0 ε √ n → d N 0,σ 2 0 Σ xx . Hence, the asymptotic distribution of ˆ ψ as stated in Theorem 3.3 is established and this completes the proof. 162
Abstract (if available)
Abstract
This dissertation contributes to the econometric analysis of cross-sectional dependence in the framework of factor, spatial and network analysis. Chapter 1 considers panel data models with cross-sectional dependence arising from both spatial autocorrelation and unobserved common factors. It derives identification conditions and proposes estimation methods that employ cross-sectional averages as factor proxies, including the 2SLS, Best 2SLS, and GMM estimations. The proposed estimators are robust to unknown heteroskedasticity and serial correlation in the errors, unrequired to estimate the number of unknown factors, and computationally tractable. I establish the asymptotic distributions of these estimators and compare their consistency and efficiency properties. An empirical application finds strong evidence of spatial dependence of real house price changes across 377 Metropolitan Statistical Areas in the US from 1975Q1 to 2014Q4. The results also reveal that population and income growth have significantly positive direct and spillover effects on house price changes. Chapter 2 (co-authored with M. Hashem Pesaran) considers production and price networks with unobserved common factors, and derives an exact expression for the rate at which aggregate fluctuations vary with the dimension of the network. It introduces the notions of strongly and weakly dominant and non-dominant units, and shows that at most a finite number of units in the network can be strongly dominant. The pervasiveness of a network is measured by the degree of dominance of the most pervasive unit in the network, and is shown to be equivalent to the inverse of the shape parameter of the power law fitted to the network outdegrees. New cross-section and panel extremum estimators for the degree of dominance of individual units in the network are proposed and their asymptotic properties investigated. An application to US input-output tables spanning the period 1972 to 2007 suggests that no sector in the US economy is strongly dominant. The most dominant sector turns out to be the wholesale trade with an estimated degree of dominance ranging from 0.72 to 0.82. Estimation and inference in the spatial econometrics literature are carried out assuming that the matrix of spatial or network connections have uniformly bounded absolute column sums in the number of cross-section units, n. In Chapter 3, M. Hashem Pesaran and I consider spatial models where this restriction is relaxed. We begin by extending the GMM estimator introduced by Lee (2007) and deriving the asymptotic properties. We then propose a Bias-Corrected Method of Moments (BMM) estimator that avoids the problem of weak instruments by self-instrumenting the spatially lagged dependent variable and correcting the bias of the moment conditions. Both estimators are consistent and asymptotically normal, depending on the rate at which the maximum column sum of the weights matrix rises with n. The estimation methods are applied to examine the inflation spillovers across more than 300 industries in the US from 1997 to 2014.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Essays on estimation and inference for heterogeneous panel data models with large n and short T
PDF
Essays on econometrics analysis of panel data models
PDF
Essays on factor in high-dimensional regression settings
PDF
Three essays on linear and non-linear econometric dependencies
PDF
Essays on high-dimensional econometric models
Asset Metadata
Creator
Yang, Fan
(author)
Core Title
Essays on the econometric analysis of cross-sectional dependence
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Economics
Publication Date
04/09/2019
Defense Date
03/05/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
aggregate fluctuations,common factors,cross-sectional dependence,degree of pervasiveness,GMM,house prices,input-output tables,OAI-PMH Harvest,outdegrees,power law,spatial models,strongly and weakly dominant units
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Pesaran, M. Hashem (
committee chair
), Hsiao, Cheng (
committee member
), Sun, Wenguang (
committee member
)
Creator Email
cynthia.dnb@gmail.com,yang488@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-6367
Unique identifier
UC11671395
Identifier
etd-YangFan-6187.pdf (filename),usctheses-c89-6367 (legacy record id)
Legacy Identifier
etd-YangFan-6187.pdf
Dmrecord
6367
Document Type
Dissertation
Rights
Yang, Fan
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
aggregate fluctuations
common factors
cross-sectional dependence
degree of pervasiveness
GMM
house prices
input-output tables
outdegrees
power law
spatial models
strongly and weakly dominant units