Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Three essays on linear and non-linear econometric dependencies
(USC Thesis Other)
Three essays on linear and non-linear econometric dependencies
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Three Essays on Linear and Non-linear Econometric Dependencies by Hayun Song A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ECONOMICS) May 2024 Copyright 2024 Hayun Song Acknowledgments I would like to express my deep gratitude to my main advisor, Hashem Pesaran, for his invaluable guidance, steadfast support, and faith in me. I am profoundly thankful for the opportunity to have learned from him and am eternally indebted for his mentorship. His dedication to nurturing my academic growth through countless hours spent assisting with my paper and further developing my research ideas has been instrumental. His vigilant oversight of my progress and gentle encouragement to excel have been fundamental to my development. I am equally grateful to my dissertation committee members, Cheng Hsiao, Timothy Armstrong, and Gourab Mukherjee, for their encouragement and guidance. Their insightful feedback and suggestions have been crucial in refining my work and advancing my journey as a researcher—special thanks to Simon Quach and Gareth James for their contributions to my Qualifying Exam. I extend a heartfelt appreciation to Youngmin Ju and Zhan Gao for their generous commitment to discussing my ideas and aiding in the development of my dissertation. The journey through a PhD program transcends academic achievement. I am immensely thankful for the support from the staff at the Department of Economics, particularly Young Miller, Morgan Ponder, Alex Karnazes, and Annie Le; the staff at the Office of International Services; the Graduate School; the medical professionals at the Engemann Student Health Center; and the faculty at the American Language Institute. The presence of wonderful friends has enriched the past six years. I have been fortunate to form life-changing friendships with Youngmin Ju, Junghyuk Lee, Zhan Gao, Bada Han, Jeehyun Ko, Eunjee Kwon, Bora Kim, Andrew Yimeng Xie, Mike Yinqi Zhang, Sheryl Weiran Deng, Qin Jiang, Ray Yiwei Qian, Lidan Tan, Yinan Liu, Kanika Aggarwal, Jaehong Kim, Dongwook Kim, Jeongwhan Yun, Chris Jeong Yoo, Jason Choi, Seungwoo Chin, Eunhae ii Shin, Hay Yeun Park, Minsoo Cho, Ida Johnson, Jisu Cao, Mahrad Sharifvaghefi, Brian Finley, Rachel Lee, Grigori Frangouridi, Rashad Ahmed, Jake Schneider, Chris Zhen Chen, Tal Roitberg, Richard Yejia Xu, Usman Ghaus, Dario Laudati, Weizhao Huang, Rihyun Park, Sangyoon Nam, Chang Li, Jin Seok Park, Minji Kwak, Jing Kong, Woo-Jin Kim, Yugen Chen, Paul Delatte, and many others. The bond formed with my cohort through the shared challenges of the first year and the core exams is unparalleled. Living away from my home country for the first time, I was fortunate to find a sense of belonging within the Korean community in Los Angeles, with friends who share my language, thoughts, and tastes. Friends from outside the PhD program have been essential in maintaining a balanced life outside the academic bubble. I am particularly thankful to Dongwook Chae and others who made my stay in Los Angeles feel closer to home. I am also deeply appreciative of my friends from undergraduate days in Korea—Injae Yoo, Haeil Joe, Jeokin Mun, Sungee Lee, and many more—for their unwavering support and reminders of my roots and identity during critical moments. Above all, my loving family has been my cornerstone. I am eternally grateful for the unwavering support of my parents, Seonim Oh and Kyungjin Song, who encouraged me to pursue my dreams, despite the distance. My heartfelt thanks go to my incredible sister, Ahlyon Song, whose support and care for our parents have allowed me to focus wholeheartedly on my studies. iii Table of Contents Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii Chapter 1: Individual Heterogeneity in the Returns to Schooling: Instrumental Variable Quantile Regression Approach 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2 Theoretical Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.1 Model of returns to education . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.3.1 Ability bias in twins . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4 Econometric Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.4.1 Average returns to education . . . . . . . . . . . . . . . . . . . . . . 14 1.4.2 Quantile regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.4.3 Instrument variable quantile regression . . . . . . . . . . . . . . . . . 17 1.5 Discussion and Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.6 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.6.1 Results of the homogeneous approach . . . . . . . . . . . . . . . . . . 21 1.6.2 Heterogeneous approach . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.6.2.1 Inferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.6.2.2 The levels model . . . . . . . . . . . . . . . . . . . . . . . . 25 iv 1.6.2.3 The proxy model . . . . . . . . . . . . . . . . . . . . . . . . 31 1.7 Other Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 1.7.1 Ability bias and measurement error . . . . . . . . . . . . . . . . . . . 35 1.7.2 Estimation results for other covariates . . . . . . . . . . . . . . . . . 36 1.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Chapter 2: Bayesian Dynamic Factor Augmented Structure Learning: Cross-sectional Dependence for Residuals 43 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.2.1 Factor Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.2.2 Graphical VAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 2.3 Bayesian Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.3.1 Factor Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.3.2 Objective Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . 57 2.3.2.1 Fractional Bayes Factor . . . . . . . . . . . . . . . . . . . . 58 2.3.3 Graphical Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 2.3.4 MCMC Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 2.4 Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 2.4.1 Simulation Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 2.4.2 Simulation Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 2.5 Empirical Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 2.5.1 U.S. Housing Prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Chapter 3: High-dimensional Bayesian Nonparanormal Dynamic Conditional Model With Multivariate Volatility Applications 82 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.2 Bayesian Nonparanormal Dynamic Conditional Model . . . . . . . . . . . . . 89 3.2.1 Dynamic conditional framework . . . . . . . . . . . . . . . . . . . . . 89 3.2.2 Bayesian rank transformation and likelihood . . . . . . . . . . . . . . 94 v 3.2.3 Using Gibbs sampling to obtain the unconditional precision matrix . 97 3.3 Bayesian nonparanormal dynamic conditional partial correlation – GARCH model estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 3.3.1 Bayesian GARCH(1,1) estimation . . . . . . . . . . . . . . . . . . . . 100 3.3.2 Posterior computation . . . . . . . . . . . . . . . . . . . . . . . . . . 101 3.4 Monte Carlo Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 3.4.1 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 3.4.1.1 Monte Carlo design A . . . . . . . . . . . . . . . . . . . . . 109 3.4.1.2 Monte Carlo design B . . . . . . . . . . . . . . . . . . . . . 113 3.5 Empirical Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 3.5.1 Foreign Stock Price Indexes . . . . . . . . . . . . . . . . . . . . . . . 117 3.5.2 Daily Returns on Securities Selected from S&P 500 . . . . . . . . . . 127 3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 Bibliography 134 Chapter A: Appendix to Chapter 1 144 A.1 Kolmogorov-Smirnov Test Results . . . . . . . . . . . . . . . . . . . . . . . . 144 A.2 Residual Plots for Checking Linearity . . . . . . . . . . . . . . . . . . . . . . 145 A.3 Mean Estimates of the Levels and Proxy Models . . . . . . . . . . . . . . . . 146 Chapter B: Appendix to Chapter 2 148 B.1 Discussion of Rescaled Spikes and Slab . . . . . . . . . . . . . . . . . . . . . 148 B.1.1 Rescaled Spike and Slab . . . . . . . . . . . . . . . . . . . . . . . . . 149 B.1.2 Asymptotic Properties of the Rescaled Spikes and Slab . . . . . . . . 150 B.1.3 Consistency of Estimators . . . . . . . . . . . . . . . . . . . . . . . . 152 B.1.4 Local Asymptotics of Estimators . . . . . . . . . . . . . . . . . . . . 152 B.2 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 B.2.1 Proof of Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 B.2.2 Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 Step 3 Take a difference: . . . . . . . . . . . . . . . . . . . . . 160 vi B.2.3 Proof of Theorem 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 B.2.4 Derivation of Equation (2.3.5) . . . . . . . . . . . . . . . . . . . . . . 161 B.2.5 Derivation of Equation (2.3.6) . . . . . . . . . . . . . . . . . . . . . . 161 B.2.6 Derivation of Equation (2.3.7) . . . . . . . . . . . . . . . . . . . . . . 162 B.3 Pervasiveness in the Cross-sectional dependence . . . . . . . . . . . . . . . . 163 B.4 Degree distributions of the U.S. Housing Prices . . . . . . . . . . . . . . . . 164 Chapter C: Appendix to Chapter 3 166 C.1 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 C.1.1 Algorithm 2: Sampling the transformed standardized residuals . . . . 166 C.1.2 Algorithm 3: Sampling the sparse unconditional precision matrix . . . 167 C.2 Additional Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 C.2.1 Foreign Stock Price Indexes . . . . . . . . . . . . . . . . . . . . . . . 170 C.2.2 Daily Returns on Securities Selected from S&P 500 . . . . . . . . . . 173 C.3 Extra Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 C.4 List of Companies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 vii List of Tables 1.1 Correlations Between Ability Bias, Education, and Other Variables . . . . . 13 1.2 Mean Estimates of Returns to Education . . . . . . . . . . . . . . . . . . . . 22 1.3 Coefficients of Education from Quantile Regression (QR) and Instrumental Variable Quantile Regression (IVQR) in the Levels Model . . . . . . . . . . . 28 1.4 Coefficients of Education from Quantile Regression (QR) and Instrumental Variable Quantile Regression (IVQR) in the Proxy Model . . . . . . . . . . . 33 1.5 Location Shift Test Results of Other Covariates . . . . . . . . . . . . . . . . 38 2.1 Structural Hamming Distance (SHD) . . . . . . . . . . . . . . . . . . . . . . 77 3.1 Spectral and Frobenius norm losses for the different conditional precision and inverse correlation matrices estimators – Monte Carlo design A . . . . . . . . 110 3.2 Ratio of spectral and Frobenius norm loss averages for the different conditional precision matrix estimators and inverse correlation matrices estimators (DCC– NL and DCC–L models) – Monte Carlo design A . . . . . . . . . . . . . . . 111 3.3 Ratio of spectral and Frobenius norm loss averages for the different conditional precision matrix estimators, Pt , and inverse correlation matrix estimators, St (Gaussian and t-Copula models) – Monte Carlo design A . . . . . . . . . . . 112 3.4 Spectral and Frobenius norm losses for the different conditional precision and inverse correlation matrices estimators – Monte Carlo design B . . . . . . . . 113 3.5 Ratio of spectral and Frobenius norm loss averages for the different conditional precision matrix estimators and inverse correlation matrices estimators (DCC– NL and DCC–L models) – Monte Carlo design B . . . . . . . . . . . . . . . 114 viii 3.6 Ratio of spectral and Frobenius norm loss averages for the different conditional precision matrix estimators and inverse correlation matrices estimators (Gaussian and t-Copula models) – Monte Carlo design B . . . . . . . . . . . . . . . . . 115 3.7 Descriptive Statistics of Rˆ t and Ψˆ t : Full Sample Periods (01. 04. 1991 – 08. 31. 2023) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 3.8 Descriptive Statistics of the conditional correlations and conditional partial correlations: Six Market disruption periods . . . . . . . . . . . . . . . . . . . 120 3.9 Rejection Frequencies of the Jˆ α, GOS, and SW tests . . . . . . . . . . . . . . 130 A.1 Kolmogorov-Smirnov Test Results . . . . . . . . . . . . . . . . . . . . . . . . 144 A.2 Mean Estimates of Returns to Education: Levels Model . . . . . . . . . . . . 146 A.3 Mean Estimates of Returns to Education: Proxy Model . . . . . . . . . . . . 147 B.1 Degree distributions of the U.S. Housing Prices . . . . . . . . . . . . . . . . 165 C.1 Size of the SW(P) and other tests, GARCH(1,1) errors . . . . . . . . . . . . 182 C.2 Power of the SW(P) and other tests, GARCH(1,1) errors . . . . . . . . . . . 183 C.3 List of Companies used in Section 3.5.2 . . . . . . . . . . . . . . . . . . . . . 184 ix List of Figures 1.1 Quantile Regression (QR) and Instrumental Variable Quantile Regression (IVQR) Results of the Levels Model: Schooling Coefficients . . . . . . . . . . 26 1.2 Quantile Regression (QR) and Instrumental Variable Quantile Regression (IVQR) Results of the Proxy Model: Schooling Coefficients . . . . . . . . . . 31 1.3 Comparison of Coefficients Across Models: Levels and Proxy Models with and without Instrumental Variable (IV) . . . . . . . . . . . . . . . . . . . . . . . 36 1.4 Instrumental Variable Quantile Regression (IVQR) Results of the Proxy Model: Returns to Other Covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.1 Posterior Inclusion Matrix and Corresponding Graph with Two Factors Case 71 2.2 Results with One Factor Case . . . . . . . . . . . . . . . . . . . . . . . . . . 73 2.3 Posterior Inclusion Matrix and Corresponding Graph with no Factor Case . 74 2.4 Estimated Factors and Factor Loadings with No Factor Case . . . . . . . . . 75 2.5 Estimated Factors × Factor Loadings . . . . . . . . . . . . . . . . . . . . . . 77 2.6 Posterior Inclusion Matrix and Corresponding Graph . . . . . . . . . . . . . 80 3.1 Conditional (Partial) Correlations in Market Disruption Periods: 1997 – 2009 122 3.2 Conditional (Partial) Correlations in Market Disruption Periods: 2010 – 2023 125 A.1 Residual Plots of the Levels Model . . . . . . . . . . . . . . . . . . . . . . . 145 C.1 Dynamic Conditional (Partial) Correlations: Foreign Stock Indexes 1 . . . . 171 C.2 Dynamic Conditional (Partial) Correlations: Foreign Stock Indexes 2 . . . . 172 C.3 Dynamic Conditional (Partial) Correlations: Blue Chips Stocks from S&P 500 A174 C.4 Dynamic Conditional (Partial) Correlations: Blue Chips Stocks from S&P 500 B175 x C.5 Dynamic Conditional (Partial) Correlations: Blue Chips Stocks from S&P 500 C176 C.6 Dynamic Conditional (Partial) Correlations: Blue Chips Stocks from S&P 500 D177 C.7 Dynamic Conditional (Partial) Correlations: Blue Chips Stocks from S&P 500 E178 xi Abstract This dissertation comprises three scholarly articles that explore econometric dependencies in the real world, both from theoretical and empirical standpoints. These papers advance and broaden current methodologies for examining both linear and non-linear relationships among variables, emphasizing the underlying structure connecting them. This first paper aims to explore the impact of unobserved ability levels on individual’s earnings relative to their years of schooling. This paper’s contribution to the literature is using instrumental quantile regression to analyze a twins’ sample, distinguishing itself by accounting for both ability and measurement error biases. The findings reveal a variation in returns to education ranging from 9 to 15 percent despite challenges related to weak identification. The analysis confirms significant heterogeneity in individual earnings outcomes, employing a general Wald-type location shift test to demonstrate the complementary effect of education and schooling on earnings. Additionally, the paper examines the influence of positive ability bias and negative measurement error, assesses the linear relationship of education, and analyzes the heterogeneity in returns associated with other factors, including age, race, gender, union membership, and tenure. In the second paper, we propose a Bayesian approach to estimate dynamic factoraugmented Vector Autoregressive (VAR) models, allowing for the depiction of contemporaneous connections as a graphical representation of cross-sectional dependencies. Our approach starts with the estimation of unobserved factors through principal component analysis based on a predetermined number of factors, followed by the extraction of these factors using the Gibbs sampling method, particularly via the forward-filtering backward-sampling algorithm. After estimating the factors, we apply Bayesian graphical model selection to the residuals, ensuring that the estimated factors are accounted for within the graphical VAR model context. This xii process is facilitated by the use of the fractional Bayes factor, emphasizing graphical VAR models. We validate the effectiveness of our methodology through Monte Carlo simulations and apply it empirically to analyze the cross-sectional dependencies in housing prices across 384 Metropolitan Statistical Areas in the U.S. The third paper proposes a Bayesian approach for the estimation of large conditional precision matrices instead of inverting conditional covariance matrices estimated, using, for example, the dynamic conditional correlations (DCC) approach. By adopting a Wishart distribution and horseshoe priors within a DCC–GARCH(1,1) model, our method imposes sparsity and circumvents the inversion of conditional covariance matrices. We also employ a nonparanormal method with rank transformation to allow for conditional dependence without estimating transformation functions to achieve Gaussianity. Monte Carlo simulations show that our approach is effective at estimating the conditional precision matrix, particularly when the number of variables (N) exceeds the number of observations (T). We investigate the utility of our proposed approach with two real-world applications. First, to study conditional partial correlations among international stock price indices. Second, to test for α in the context of CAPM and Fama-French 5 factor models with a conditional precision matrix-based Wald-type test. The results indicate stable conditional partial correlations through market disruptions. When there are market disruptions, blue chip stocks chosen from S&P 500 daily returns provide statistically significant evidence against the CAPM and Fama-French five models. xiii Chapter 1 Individual Heterogeneity in the Returns to Schooling: Instrumental Variable Quantile Regression Approach1 1.1 Introduction Understanding the causal relationship between education and earnings represents a crucial inquiry within empirical labor economics. Numerous studies have demonstrated that individuals with higher levels of education tend to earn more and face lower unemployment rates. Nonetheless, accurately identifying this positive causal effect within empirical models has presented challenges. These include the complexity of earnings generation, measurement errors, and biases resulting from factors not observed in the models. Since the initial recognition of these challenges in the 1950s, there have been numerous efforts to overcome these obstacles and accurately quantify the relationship (Card, 1999). This paper focuses on exploring the impact of unobserved abilities on the earnings of individuals, examining how variations in educational performance influence their income. We delve into the role of individual differences in shaping earnings and assess the feasibility of 1 I am especially grateful to my adviser Hashem Pesaran, Cheng Hsaio, and Christopher Taber for their continuous advice and support. All mistakes are my own. 1 observing and testing such heterogeneity. To this end, we employ quantile and instrumental variable quantile regression techniques, as proposed by Chernozhukov and Hansen (2005, 2006), to capture the heterogeneity effectively. In our study, we begin by adopting the average returns to education framework as suggested by Ashenfelter and Krueger (1994); Ashenfelter and Rouse (1998); Card (1994, 1999), and Bonjour et al. (2003), utilizing data from identical twins. Given that identical twins generally share the same genetic characteristics, it is expected that using fixed effects in twin regressions would control for ability bias, resulting in a lower coefficient if no variation exists within twins. Conversely, employing the first difference of cross-reporting on education as an instrument to adjust for measurement errors is anticipated to increase the coefficient, indicating a negative bias. In practice, however, we observe that this method yields a positive ability bias and a negative measurement error. Nevertheless, this approach does not account for individual heterogeneity, as it presupposes the absence of variation within twins. Consequently, to accurately capture individual heterogeneity, we utilize both quantile and instrumental variable quantile regression techniques. The main advantage of adopting a quantile-based methodology2 in our research is that it can reveal the variation in outcomes across different individuals. Therefore, we employ instrumental variable quantile regression. Due to the limitation of quantile regression in capturing fixed effects, we incorporate a proxy variable to mitigate ability bias. Specifically, we select the father’s years of schooling as this proxy, owing to its strong correlation with the twins’ average educational attainment. Furthermore, we adjust for measurement errors by instrumenting the years of schooling in a manner consistent with our previous model. This strategy offers compelling evidence that challenges the notion of uniform returns to education, demonstrating that the impacts are significant and vary across different quantiles. Our analysis concentrates on six primary aspects throughout the paper. 2We refer to a quantile-based approach as the heterogeneous approach in this paper. 2 First of all, in analyzing the returns to education through quantile regression, it is essential to recognize that the coefficients represent the effect of education on the distribution of log earnings across different quantiles rather than individual-level changes. These coefficients illuminate how the returns to education vary across the earnings distribution, indicating that an individual’s position within this distribution may shift with changes in education. Unlike in Ordinary Least Squares (OLS) regression, where coefficients denote average marginal effects, quantile regression coefficients reflect educational return disparities at specific earnings distribution points. This distinction arises because the conditional quantile function, unlike the conditional expectation function, does not yield a straightforward marginal effect, making the coefficients indicative of the distribution-wide impact of education on earnings at various quantiles. Thus, interpreting these coefficients centers on understanding the heterogeneity in the economic returns to education, revealing the differential effect of educational attainment on earnings across the distribution. Second, in our examination, we utilize two distinct models under the heterogeneous approach to contrast the methodologies previously outlined. The first model, known as the “levels model,” mirrors the pooled regression common in studying average returns to education. The second model, referred to as the “proxy model,” aligns with the fixed effect model approach. Each model is subjected to three types of regression analyses to evaluate the dynamics of the coefficients: one focusing solely on education, another on basic demographic factors (age, race, gender), and a comprehensive regression that includes a broad range of covariates (age, race, gender, marital status, union membership, and tenure). The proxy model uniquely adjusts for ability bias by incorporating the father’s education level as a control. To demonstrate the heterogeneity of returns to education among individuals, we employ a general Wald-type location shift and the Kolmogorov-Smirnov tests for each model, which consistently indicates significant heterogeneity in the results. The table of the Kolmogorov-Smirnov tests is presented in Appendix A. 3 Third, our focus shifts to examining ability bias. We illustrate this by plotting the differences in coefficients between the levels model and the proxy model. These comparisons reveal the presence of a positive ability bias associated with education, particularly evident when the father’s education is employed as a proxy for ability. A majority of the coefficients exhibit a decrease in the proxy model, especially when the regression includes all covariates. This bias sharply increases, showing a significant positive increase at the 0.4 quantile within the distribution, although the magnitude of this bias differs across quantiles. Assuming the accuracy of this bias, these findings suggest that the estimated coefficients might represent the upper limit of the true values within the quantile framework. Fourth, our analysis extends to the investigation of measurement errors. Both models underscore the presence of negative measurement errors in regressions that encompass all specifications. These errors are addressed through the use of instrumental variable (IV). In the levels model, we observe a moderate negative bias in the coefficients. Conversely, when schooling is controlled as a proxy variable in the proxy model, the negative errors become more pronounced. Nevertheless, the proxy model displays a positive bias at higher quantiles, attributed to the less reliable estimates at the upper distribution tails. Although the magnitude of these errors varies across different quantiles, the consistency of this outcome aligns with findings from the homogeneous approach. Fifth, we examine the assumption of linearity in schooling within the quantile regression framework. The models employed do not ensure that the relationship between schooling and earnings is linear. This is particularly evident as additional covariates are included and as both measurement error and ability bias are accounted for, indicating that linearity does not persist in our model. Through residual analysis, we identify nonlinear relationships across various quantiles, with graphical representations provided in the appendix B. Finally, to further investigate potential sources of heterogeneity that might influence our findings, we examine the quantile coefficients of other covariates. Employing the same Waldtype location shift test for these additional covariates, we find that most exhibit heterogeneity, 4 with the exceptions of age and marital status. Our analysis allows us to reject the null hypothesis for the majority of covariates included in both models. Specifically, for females, only the lower quantiles display mild heterogeneity, and the coefficients in other quantiles were statistically insignificant. The structure of this paper is organized as follows: Section 2 and Section 3 delve into the theoretical and empirical frameworks utilized in this study, respectively, with a specific focus in Section 2 on examining the interplay between schooling, ability, and earnings. Section 4 describes the dataset, highlighting variations both between and within families, which underscores the rationale for investigating individual heterogeneity. Section 5 presents the empirical findings from both the levels and proxy models and discusses the implications of these results. This section also includes a comprehensive discussion on how to interpret these findings. The paper concludes with Section 6, which summarizes the study’s key points and findings. Additionally, the appendix contains residual plots that are used to assess the assumption of linearity, the table of the Kolmogorov-Smirnov tests, and the complete OLS and IV regressions results. 1.1.1 Previous work The study of returns to education encounters numerous challenges, particularly when employing the OLS method for causal analysis. The fundamental assumption of mean independence in education returns is compromised in causal schooling models due to unobserved variables, notably “ability,” which influences schooling, earnings, and other potential covariates. Additionally, the presence of measurement errors in schooling data can introduce bias into OLS estimates. To address these complexities, IV analysis is employed to estimate the causal effects by using observable IVs to control for schooling. For an IV to be effective, it must not be correlated with the unobserved ability factors, yet it should have a direct impact on schooling decisions. 5 One innovative approach to instrumental variables involved using family background factors, such as the education levels of the mother and father. The rationale for this method is the strong correlation between these factors and educational attainment. However, family background factors cannot serve as legitimate instruments unless they account for all unobserved variables, which is practically unfeasible. Despite this limitation, they tend to introduce an upward bias in the estimation. An alternative strategy explored by researchers is the use of sibling and twin models. The primary advantage of this method is its potential to mitigate the impact of unobservable variables present in cross-sectional analyses, particularly within the context of a family or between twins. This approach yields a consistent estimator if the variation between families significantly outweighs the intra-family variation. If this condition is not met, the within-family estimator may fail to deliver accurate results, potentially resulting in greater bias than that observed with traditional OLS methods. These challenges in identification highlight several key considerations. Firstly, the intricate relationship between ability and education, as analyzed by Card (1994) and Arias et al. (2013), warrants careful examination. The unobservable nature of ability impacts human capital formation in complex and undetermined ways, indicating that straightforward analyses of average returns to education fail to encompass the full spectrum of its causal effects on education and earnings. Secondly, the field faces methodological and empirical limitations, including the influence of unobserved ability, the non-random allocation of educational opportunities, and errors in measuring educational attainment (Ashenfelter and Krueger, 1994). These challenges have led to the adoption of instrumental quantile regression methods. This semi-parametric approach minimally constrains the crucial relationship of interest, proving instrumental in addressing the primary biases associated with estimating the returns to education. In a related study, Buchinsky (2001) explores shifts within the wage structure for women, applying quantile regression methods to investigate variations in educational returns among females. This analysis incorporates several methodological innovations. It addresses sample 6 selection bias through a non-parametric correction method, employing a two-stage process initially proposed by Heckman (1979) and subsequently refined by Newey (2009) for mean regression analysis. This approach provides a nuanced understanding of how the returns to education for women have evolved over time. The model presented in this paper integrates elements from family background and sibling models, while also building upon the analysis by Arias et al. (2013), employing refined instrumental quantile regressions and broadened interpretations. This model takes into account several key considerations as identified by Ashenfelter and Rouse (1998): the influence of family background on the deterministic aspects of educational achievement, and the relationship between log earnings and schooling, as originally posited by (Mincer, 1974). However, Heckman et al. (2006) introduced modifications to the strict assumptions underlying Mincer’s model, specifically the linearity between log earnings and schooling, and the uniformity of earnings growth with experience across different levels of education. They included factors such as school tuition fees, non-pecuniary costs of education, income taxes, and length of work experience to better estimate decision models before and after completing education. Their findings suggest that the characteristics central to Mincer’s equation may not apply to more recent datasets, advocating for a relaxation of the linearity assumption and the inclusion of heterogeneity within the model. Consequently, our model aligns with the works of Ashenfelter and Rouse (1998); Card (1994, 1999), and Arias et al. (2013), incorporating insights from Heckman et al. (2006) and employing the heterogeneity-focused instrumental quantile regression approach as developed by Chernozhukov and Hansen (2005, 2006). 1.2 Theoretical Framework 1.2.1 Model of returns to education Consider a utility maximization problem with respect to the level of schooling S conditional on (1) observable individual, and specific characteristics that affect the level of education X, 7 (2) other observed determinants of schooling or work as instrumental variables Z, and (3) an unobserved idiosyncratic component related with AS and containing other unobservable factors ξ that contribute to each individual’s decision regarding education level. The variable AS is an unobservable variable that is responsible for the heterogeneity of earnings in individuals and families having the same observable characteristics and education level. Based on the discrete level of schooling described as S = {0, 1, . . . , S¯}, individuals maximize their utility as follows: S = arg max S∈S E [U{YS, S, X}|X, Z, ξ] = arg max S∈S E [U{q(S, x, AS(ξ, θ))}, S, X|X, Z, ξ] , where U{YS, S, X} = ln YS −C(S, r(θ)), YS = q(S, x, AS(ξ, θ)), and S = δ(Z, X, ξ). The term U{YS, S, X} is an unobservable Bernoulli utility function depending on potential earnings, YS under the different levels of education, S and other characteristics, X. This term contains the log of potential earnings, as well as C(S, r(θ)), which stands for the measurement of the direct and indirect costs of education determined by S and the individual’s family-specific variable, r(θ) that may include the family’s financial situation or the family’s education. The variable θ is an unobservable family factor that makes the difference in individual’s earning belong to different family.3 Potential earnings can be expressed by the function of q(S, x, AS), which represents the earning function that generates earnings in the labor market, one for each individual having X = x, and the level of ability. Hence, it can decide an individual’s position in the distribution in terms of his or her potential earnings. In the context of quantile regression, it can be assumed that AS(ξ, θ) ∼ Unif(0, 1), so that YS can also be written as q(S, x, τ ), where τ is a quantile of distribution. If the S is endogenous, it can be captured by some function δ of observable variable Z, X, and unobservable ξ. This determines the choice of S across similar individuals. 3The variable θ can contain some human capital factors such as genetic differences, educational environment, quality of education, and labor market accessibility coming from the family (Arias et al., 2013). 8 By assuming the differentiability, the optimal choice of schooling can be found from the first order condition (FOC), marginal benefit (MB) and marginal cost (MC) given X = x defined as: MB = ∂ ∂S ln YS = {q (S, x, AS)} −1 · ∂ ∂S q (S, x, AS) MC = ∂ ∂S C (S, r(θ)). Then, the optimal level of S satisfies MB = MC and is a unique maximum if the utility function is globally concave in terms of S. As noted above, MB captures the effect of schooling to the log of potential earnings through the earnings function q. It also implicitly contains AS, giving the individual-specific effect to the potential earnings given S. Similarly, MC identifies the pecuniary and non-pecuniary cost of an extra year of schooling since schooling is usually reported on a yearly basis. Based on these conditions, as suggested by Card (1999), and Arias et al. (2013), I identify the composition of variations in the effect of schooling between individuals. It provides the characteristics of heterogeneity described in Section 1.6. First of all, since each individual does not have the same ability levels, it is reasonable to consider that differences can be attributed to the heterogeneity from AS. If the function is differentiable at a certain level of S, then the change of MB with respect to AS becomes assuming that q(S, x, AS) ̸= 0, and also given S, and X, ∂ 2 ln YS ∂S∂AS = ∂MB ∂AS = ∂ 2 ∂S∂AS q (S, x, AS) · {q (S, x, AS)} −1 − ∂ ∂S q (S, x, AS) · ∂ ∂AS q (S, x, AS) · {q (S, x, AS)} −2 . This equation describes the impact of variations in AS on the influence of S on earnings. It further clarifies the aspects of heterogeneity, as outlined in Implications 1, 2, and 3. 9 Implication 1. (Perfect Substitutes) If ∂MB/∂AS = 0, it indicates that variations in AS, the unobserved idiosyncratic component, have no impact on the marginal benefit of schooling with respect to earnings. In this context, S and AS are considered perfect substitutes, implying the absence of heterogeneity across individuals in terms of the effect of education on their earnings.4 Implication 2. (Complements) If ∂MB/∂AS > 0, this indicates that an increase in AS leads to higher marginal benefits from acquiring an additional unit of education. In this context, individuals with a higher AS level—reflecting greater ability—can achieve enhanced earnings by pursuing further education. Specifically, a person with a high AS is capable of learning material more efficiently than others. Consequently, this individual can assimilate more information in the same amount of time, potentially yielding a positive impact on their future earnings due to the increased acquisition of knowledge and skills. Implication 3. (Substitutes) If ∂MB/∂AS < 0, it implies that individuals with a lower AS derive greater marginal benefits from an additional unit of schooling compared to those with a higher AS. Additionally, the education system may impose limitations by only offering education up to a certain level or restricting access beyond that level. This limitation can disproportionately affect individuals with lower ability, as it might hinder highly capable individuals from accessing better information. These cases illustrate the existence of heterogeneity in the effects of education. Second, family background constitutes an another factor contributing to individual differences. By examining the variable θ, it is possible to identify variations in returns 4 In the context of a simple linear model with perfect substitution, represented by ax + by = c for a constant c, the first term reduces to zero, and the second term is absent. 10 attributable to family-specific characteristics. This relationship is expressed as follows: ∂ 2U ∂S∂θ = ∂ 2 ∂S∂AS q (S, x, AS) · ∂AS ∂θ · {q (S, x, AS)} −2 − ∂ ∂S q (S, x, AS) · ∂ ∂AS q (S, x, AS) · ∂AS ∂θ 2 · {q (S, x, AS)} −2 − ∂ 2 ∂S∂rC (S, r(θ)) · ∂r(θ) ∂θ . The term, in its entirety, quantifies the variation in the marginal utility of education across different families. The initial two terms represent the influence of θ on the MB of education via parameter A, specifically encapsulating the component of the effect attributable solely to θ on the overall impact on ∂MB/∂AS. The last term denotes the variation in MC as influenced by an unobserved family characteristic, θ, mediated through the family-specific factor, r. It is generally plausible to assume ∂ 2C(S, r(θ))/∂S∂r < 0 for θ, particularly because families of higher wealth or education levels are likely to face lower opportunity costs of education. Despite the ambiguous direct relationship between r(θ) and θ, positing a positive correlation is not unrealistic. Should θ reflect genetic factors or the educational quality, it may positively affect r. Under these premises, and considering that S and As are not perfect substitutes, similar inferences can be applied to the relationship between S and θ. Implications 4 and 5 detail the consequences of this relationship. Implication 4. (Negative Bias) If ∂ 2U/∂S∂θ < 0, this indicates a negative increase in the ability bias as described in implication 2. Consequently, it implies that education disproportionately benefits individuals with less favorable characteristics by providing them with relatively higher compensation. Implication 5. (Positive Bias) If ∂ 2U/∂S∂θ > 0, this indicates that the ability bias identified in implication 3 can be increased positively. Under these circumstances, individuals possessing superior ability would benefit from enhanced returns on their schooling. 11 The overall direction of bias is determined by the magnitude of each contributing term. The third aspect involves variation due to an unobservable individual-specific term, ξ. In this context, ξ represents a deterministic factor influencing both S through some function δ, and YS viaAS. For instance, ξ accounts for differences in educational level among twins within the same family, highlighting that the allocation of education level among twins is not random. To ensure the estimator’s consistency, it is presumed that this variation does not alter the optimal S when considering the average returns to education. Conversely, the second methodology does not adhere to this assumption. 1.3 Data Analysis The data used in this paper was collected from meetings of the Annual Twins Festival in Twinsburg, Ohio. In particular, this is the same data set used by Ashenfelter and Rouse (1998), and it contains 680 identical twins. By comparing the representativeness in Ashenfelter and Krueger (1994), this sample of twins is more educated and highly paid than usual. Similarly, samples are younger and include more Caucasians and females. 1.3.1 Ability bias in twins To see the ability bias in twins, we investigate the correlation between average family education and average family characteristics. We also check the correlation of within-twin differences in education with within-twin differences in characteristics (Ashenfelter and Rouse, 1998; Bonjour et al., 2003). The first column of Table 1.1 shows the correlation between the average of self-reported education (SR) and family characteristics. Such family characteristics include items such as marriage status (Married), self-employment (Employed), union membership (Union), years of job tenure (Tenure), father’s education (Father’s), and mother’s education (Mother’s). Each correlation between self-reported education and these characteristics is statistically significant 12 Table 1.1: Correlations Between Ability Bias, Education, and Other Variables Twin’s average Twin’s within-difference SR CR ∆SR ∆CR Married – 0.292*** – 0.293*** ∆Married – 0.057 – 0.111*** Employed 0.036 0.019 Union – 0.065* – 0.083** ∆Union – 0.053 – 0.052 Tenure – 0.137*** – 0.136*** ∆Tenure – 0.027 – 0.037 Father’s 0.380*** 0.386*** Mother’s 0.291*** 0.295*** Notes: *** indicates statistical significance at the 1% level, ** at the 5% level, and * at the 10% level. The correlation between the average self-reported education (SR) and the average cross-reported education (CR) is 0.967. The correlation between the within-twin differences in self-reported education (∆SR) and cross-reported education (∆CR) is 0.639. except for self-employment. The third column of Table 1.1 denotes the insignificant correlation between within-pair differences in education and within-pair differences in characteristics. From this comparison, ability bias has more of an effect on inter-family differences in education compared to within-twin educational differences. Although these characteristics are not perfect measures of ability, the results suggest it could be considered the “average” returns to schooling. However, correlations between cross-reporting of education and other characteristics give slightly different results. As can be seen in Table 1.1, in the second column, union membership is no longer significant. Also, the fourth column, the first row, shows a statistically significant result, unlike the third row. Considering the fact that both self-reporting and cross-reporting are measured with reporting error and also that both are highly correlated, the significant result in the fourth column is not a result of simple measurement error. Even though the correlation is small and inter-family difference is more affected by ability bias, this may show the existence of within-twin variation. This finding provides motivation for taking heterogeneity into account. 13 1.4 Econometric Framework In this section, estimation processes are described to capture the model in the previous section. The procedure is as follows: First of all, the usual average returns to education framework suggested by Ashenfelter and Krueger (1994); Ashenfelter and Rouse (1998); Card (1994, 1999), and Bonjour et al. (2003) will be applied to the twin dataset. Then, in order to capture the heterogeneity in the data, quantile regression (QR) and instrument variable quantile regression (IVQR) will be performed. 1.4.1 Average returns to education Ashenfelter and Krueger (1994) identified a classical measurement error in their model, formulated as S k ij = S ∗ ij + η k ij , for i ∈ {1, 2}, k ∈ {2, 1}, where S k ij represents the reported schooling of the i-th twin in the j-th family, as provided by the k-th twin, S ∗ ij denotes the actual schooling of the i-th twin in the j-th family, and η r ij signifies the corresponding i.i.d. measurement error. In addition, Ashenfelter and Rouse (1998); Arias et al. (2013) and Bonjour et al. (2003) assumed that within-twins variation does not affect the optimal level of schooling by defining S opt ij = S opt j + uij , where uij is an i.i.d. optimization or measurement error having a mean zero and independent with S opt j and ξij . We assume that Aj is the same for twins from j-th family.5 Therefore, without assuming heterogeneity (αj = α), we consider the following structure model for i-th twin in j-th family: ln (Yij ) = α0 + S ∗ ijα1 + Z ′ ijγ + X ′ jd + Ajπ + ξij , (1.4.1) 5The notation Aj corresponds to AS, and Yij corresponds to YS, with the exception that AS remains unchanged by ξ, signifying that S is invariant under the influence of ξ. 14 where (α0, α1, γ, d, π) are corresponding coefficients and Zij , the observed independent variable. Then, we can have ln (Y1j ) − ln (Y2j ) = S ∗ 1j − S ∗ 2j α1 + (Z1j − Z2j ) ′ γ + (ξ1j − ξ2j ) = S 1 1j − S 2 2j α1 + (Z1j − Z2j ) ′ γ + (ξ1j − ξ2j ) − η 1 1j − η 2 2j α1. To address the measurement error in (1.4.1), it is necessary to employ an IV. Using (S 2 1j −S 1 2j ) as an IV requires the strong assumptions that E[η 2 1j η 1 1j ] = E[η 1 2j η 2 2j ] = E[η 2 1j η 2 2j ] = E[η 1 1j η 2 2j ] = 0 to obtain a consistent estimator. The first two assumptions, however, are implausible.6 To address the aforementioned issue, a regression must be conducted using only the data reported by twin 1, specifically employing the difference S 2 1j − S 2 2j . This analysis is formulated as ln (Y1j ) − ln (Y2j ) = S 1 1j − S 1 2j α1 + (Z1j − Z2j ) ′ γ + (ξ1j − ξ2j ) − η 1 1j − η 1 2j α1. (1.4.2) Regression as described in Equation (1.4.2) provides a consistent estimator under weaker assumptions, E[η 2 1j η 2 2j ] = E[η 1 1j η 2 2j ] = 0. Upon following the analysis, it becomes pertinent to consider the possibility of relaxing no-heterogeneity assumption. Several papers have endeavored to account for individual heterogeneity. Card (1994, 1999) pioneered the investigation into proving the existence of individual heterogeneity. Concurrently, Ashenfelter and Rouse (1998) endeavored to quantify the effect by modifying α to αj = α0 + α1Aj , wherein α1 is adjusted to account for an unobserved ability bias. This adjustment was aimed at capturing variations in family background that could influence ability. Both approaches were integrated into the framework of fixed-effects equations. However, this paper proposes an alternative methodology to capture the heterogeneity. 6Note Cov(S 2 1j , S1 1j ) = Cov(S ∗ 1j , S∗ 1j ) + Cov(η 2 1j , η1 1j ) = V ar(S ∗ 1j ) + Cov(η 2 1j , η1 1j ) > 0, which parallels Cov(S 1 2j , S2 2j ) 1.4.2 Quantile regression The usefulness of quantile regression is due to its ability to capture heterogeneous effects and to explain unobserved heterogeneity. The standard quantile regression can be described in this context as follows. Recall from the theoretical framework that q(S, x, AS) ≡ q(S, x, τ ), where τ ∼ Unif(0, 1). Then, it becomes the minimization problem of Conditional Quantile Function (CQF). In principle, this is done by assuming the integral Qln(Ys)|X (τ ) ∈ arg min q(S,X,τ)∈F E [ρτ (Y − q(S, X, τ ))|X] , τ = Pr (Y < q(S, X, τ )|X), where ρτ (u) = 1(u > 0) · τ u + 1(u ≤ 0) · (1 − τ )u is an asymmetric check function and F is a class of measurable functions. In practice, a linear conditional quantile model is estimated. Qln(Y )|X (τ ) = S ′α (τ ) + X ′β (τ ) + F ′ γ (τ ), (1.4.3) τ = Pr (Y < S′α(τ ) + X ′β(τ ) + F ′ γ(τ )|X), where F is a proxy variable for an unobservable family ability in the regressors (Heckman and Robb Jr, 1985; Uusitalo et al., 1998; Arias et al., 2013), α(τ ) is defined as quantile treatment effects (QTE), and β(τ ), and γ(τ ) are corresponding coefficients. I discuss the interpretation of these coefficients in Section 1.5. As noted, this model is not adequate to correct the selection bias from the ability that can be correlated to both earnings and schooling. The use of proxy variables only captures, at best, part of individual variation, and, in fact, the proxy for the family effect is not strong enough. This prompts the use of IVQR. 16 1.4.3 Instrument variable quantile regression The IVQR method, proposed by Chernozhukov and Hansen (2005, 2006), offers a solution to the limitations of traditional QR. While IVQR shares the structural framework of QR, it introduces a distinct estimation approach. Chernozhukov and Hansen establish a theorem that, under five principal conditions, ensures Pr(Y < q(S, X, τ )|X, Z) = τ almost surely (a.s.) for all τ ∈ (0, 1) under the five main conditions.7 Rewriting for clarity, for each τ , Pr (Y − q(S, x, τ ) < 0|X, Z) = τ a.s. QY −q(S,x,τ) (τ |X, Z) = 0 a.s. indicating that conditional on X and Z, 0 is the τ -th quantile of random variable Y −q(S, x, τ ). Consequently, the QR minimization problem transforms to finding 0 within the following: 0 ∈ arg min f∈F E [ρτ (Y − q(S, X, τ ) − f(X, Z))] . The coefficient estimation unfolds in two steps, using a sample analogue of the minimization problem with the linear conditional quantile model in (1.4.3): Qn (τ, α, β, γ, π) ≡ 1 n Xn i=1 ρτ ln(Yi) − S ′ iα(τ ) − X ′ iβ(τ ) − Φˆ′ i (τ )π , (1.4.4) where Φˆ i(τ ) ≡ Φˆ(τ, Xi , Zi) is a least square projection of Si on Zi and Xi , having a size [dim(α) × 1], and Xi is observable variables including Fi . Then, we can solve the model using the following two-steps: Step 1: For each given probability index τ , define a grid such that {αk, k = 1, . . . , K¯ }, and run the standard τ -QR of {ln(Yi) − S ′ iαk(τ )} on Xi and Φˆ i(τ ) = W · (W′W) −1W′S, 7The core assumptions include (1) Potential outcomes, (2) Independence, (3) Selection, (4) Rank similarity, and (5) Observed variables. See Chernozhukov and Hansen (2005) for a detailed exposition of these assumptions. 17 where W = [Z, X, 1] is a [n × (dim(Z) + dim(X) + 1)] matrix to get coefficients βˆ(αk, τ ) and πˆ(αk, τ ). Step 2: Choose αˆ(τ ) in the grid that makes ||πˆ(αk, τ )|| ≈ 0. Then, βˆ(τ ) is found as βˆ(αˆ(τ ), τ ). This process allows to obtain (αˆ(τ ), βˆ(αˆ(τ ), τ )) for each quantile τ via the IVQR technique. For more comprehensive insights into the method’s identification and proofs, refer Chernozhukov and Hansen (2005). Additionally, the interpretation of the estimated coefficients is elaborately discussed in Section 5. 1.5 Discussion and Interpretation This section focuses on the interpretation of the coefficients obtained from the results, explaining how to understand them within the framework of this study. It begins by examining the interpretation of average estimates, which lays the foundation for understanding the coefficients related to QR and IVQR. Generally, a coefficient in the regression of log earnings against education level is interpreted as the ‘returns to education,’ which signifies a rate of return or a causal effect of education on earnings. However, Heckman et al. (2006) contended that this coefficient does not represent the causal effect of education but rather an average growth rate of earnings without accounting for specific conditions like policy implications, implicit costs, and variations among individuals. According to Heckman’s perspective, the coefficient in the Mincer model might encounter a significant issue: the level of education could influence the earnings growth rate, challenging the straightforward interpretation of this coefficient. In the same way, a typical IV regression may encounter issues with interpretation, particularly when analyzing the return to education. IVs are commonly utilized to address biases such as simultaneity, omitted variables, and measurement errors. In these scenarios, the coefficient of interest is often viewed as the average causal effect. Imbens and Angrist (1994) 18 identified this coefficient as the Local Average Treatment Effect (LATE), which estimates the change in the dependent variable for individuals whose endogenous variable is influenced by the instrumental variable, assuming that the IV consistently affects all individuals in the same manner. This assumption neglects individual differences in response, as noted by Heckman and Vytlacil (2005). Interpreting the LATE becomes straightforward if both the endogenous variable and the instrument are categorical. However, difficulties arise when dealing with continuous variables like education levels, which are typically integers, and instruments that may also be discrete. Understanding what LATE precisely signifies can be challenging. Following the logic of Heckman et al. (2006), it could be considered an unidentified weighted average of the individual effect (causal effect) on earnings growth. In this study, we do not rely on LATE for interpretation purposes because we use the IV approach primarily to correct for measurement errors. Needless to say, the challenges encountered in interpretation are even more pronounced with QR and IVQR methods. The coefficient in QR is often interpreted as Structural Quantile Effects (SQE) or Quantile Treatment Effects (QTE), represented as ∂ ∂dq (d, τ ) or q (d1, τ ) − q (d0, τ ), where q(d, τ ) refers to a Conditional Quantile Function (CQF). Despite appearing similar to the traditional Conditional Expectation Function (CEF), these functions cannot be interpreted in the same manner due to specific complexities. Firstly, in contrast to the CEF, the CQF impacts distributions rather than individual outcomes. For example, if an individual is initially at a certain τ -th quantile level and then experiences an increase in a variable of interest, this does not guarantee the individual will remain within the same τ -th conditional quantile. Therefore, if a coefficient indicates a positive effect on the lower decile of the log earnings distribution, it does not simply imply that an individual with low earnings in that decile, who has not received any intervention (e.g., an additional year of schooling), is now 19 less impoverished. Rather, it suggests that those in the lower income bracket, under the intervention, are comparatively less impoverished than they would have been without it. This complexity also challenges the direct comparison between the first differences in QR and IVQR estimates. While QR’s fixed effect estimates the impact of additional schooling on the quantiles of the conditional distribution of earnings differences between twins, it does not directly relate to the quantiles of the conditional earnings distribution itself. Another critical aspect is the transition from conditional quantiles to marginal quintiles, which establishes a link enabling the identification of how changes in quantile regression coefficients affect the overall outcome. The requirement to align all conditional quantiles to determine a specific marginal quintile complicates the task of discerning the aggregate effect that arises from adjusting the conditional quantile coefficient. This distinction becomes clear when comparing the CEF with the CQF, described as E (Yi |Xi) = X ′ iβ ≡ E (Yi) = E (X ′ i ) β Qτ (Yi |Xi) = X ′ iβ ̸= Qτ (Yi) = Qτ (X ′ i ) βτ , where E(Yi |Xi) denotes a CEF. Although extracting marginal quantiles is theoretically feasible if the CQF exhibits linearity, this approach does not apply to the current model analyzing the returns to education, indicating a significant methodological constraint. The interpretation of coefficients in QR and IVQR regarding the returns to education can be summarized as follows: Both sets of coefficients represent the impact on the distribution in question, contingent upon whether an individual’s position in the log earnings distribution remains unchanged post-intervention. Given that alterations in an individual’s educational attainment do not typically preserve their rank, any observed positive effect within a specific quantile suggests that the collective within that quantile benefits, though it does not pinpoint specific individuals. Thus, it’s more accurate to view and articulate the coefficient as indicative of disparities in log earnings outcomes influenced by variations in educational achievements. 20 Similar to the challenges faced in simple IV analyses, interpreting IVQR becomes complex when the instrument is applied to address omitted variable bias. However, when the instrument serves solely to correct measurement errors, the interpretative challenge diminishes, as the focus here is on obtaining a consistent estimator, which does not substantially alter the fundamental interpretation. In this paper, despite ongoing debates surrounding interpretation, we adopt a simplified approach by treating the average coefficient as an indicative causal effect in subsequent sections. Furthermore, we explore the interpretation of heterogeneous regressions, drawing upon the discussions outlined above. 1.6 Empirical Results 1.6.1 Results of the homogeneous approach Table 1.2 presents the selected results on the average returns to education.8 Column (1) provides a statistical summary of the dataset. Column (2) details an OLS pooled regression that incorporates variables such as education level, age, the square of age, gender, and race, indicating an 11 percent return to education. Column (3) extends the OLS pooled regression by including additional variables like marital status, union membership, and years of tenure, revealing a return to education of 12.2 percent. In column (4), the OLS pooled regression incorporates the father’s education level as a surrogate for family background effects, which results in a slight decrease in the return to education to 11.9 percent from 12.2 percent. Column (5) employs the same variables as column (2) but uses the reported education level of the other twin as an instrument for the education variable to adjust for measurement errors in the regression. This adjustment leads to an increase in the education coefficient from 11 percent to 11.6 percent compared to column (1). Similarly, column (6) follows the variable set of column (3) and applies the same instrumental variable as in column (5) to address the 8For the comprehensive results on the average returns to education, see Appendix C. 21 Table 1.2: Mean Estimates of Returns to Education Means Pooled Within pair (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) LS LS LS IV IV LS IV LS IV Education 14.029 0.110*** 0.122*** 0.119*** 0.116*** 0.128*** 0.070*** 0.088*** 0.078*** 0.100*** (0.010) (0.010) (0.010) (0.010) (0.010) (0.019) (0.025) (0.018) (0.023) Age 38.075 0.104*** 0.090*** 0.092*** 0.104*** 0.089*** (0.011) (0.011) (0.011) (0.011) (0.011) Age2 – 0.001*** – 0.001*** – 0.001*** – 0.001*** – 0.001*** (0.0001) (0.0001) (0.0001) (0.0001) (0.0001) F emale 0.595 – 0.318*** – 0.249*** – 0.254*** – 0.316*** – 0.247*** (0.040) (0.040) (0.041) (0.040) (0.040) W hite 0.919 – 0.100 – 0.101 – 0.126 – 0.098 – 0.100 (0.068) (0.069) (0.066) (0.072) (0.070) M arried 0.639 0.104* 0.101* 0.110* 0.085 0.087 (0.050) (0.049) (0.050) (0.055) (0.056) Union 0.226 0.111* 0.114* 0.113* 0.044 0.052 (0.047) (0.044) (0.047) (0.073) (0.073) T enure 8.338 0.021*** 0.020*** 0.021*** 0.024*** 0.024*** (0.003) (0.003) (0.003) (0.003) (0.003) F ather′s 12.096 0.010 (0.007) Adjusted R2 0.334 0.403 0.403 0.334 0.403 0.039 0.177 N 680 674 663 646 674 663 340 340 333 333 Notes: Figures in brackets are standard errors. ∗ ∗ ∗ indicates statistical significance at the 1% level, ** at the 5% level, and ∗ at the 10% level. Columns (2) through (4), (7), and (9) incorporate a constant term. The education difference is calculated as the difference between the education level reported by the first twin and the education level reported for the first twin by the second twin. For columns (5) and (6), the instrument employed is the education level of the first twin as reported by the second twin. In columns (8) and (10), the instrument is the difference between the second twin’s report of the first twin’s education level and the second twin’s report of their own education level. 22 measurement error, which corrects a negative bias observed in column (3). Consequently, this adjustment results in a higher return to education of 12.8 percent. Column (7) computes the within-pair regression based on the model outlined in column (2). This approach does not account for the variation within pairs that could be linked to innate abilities, avoiding the biases associated with differences between identical twins or families, which are generally more significant than biases found within twins. As a result, the estimates derived from within-pair analyses might be lower due to the unaddressed ability bias. Specifically, the return to education was found to be 7 percent, a figure that may have been influenced by the substantial measurement error discussed in Section 3. To address this concern, column (8) applies an instrumental variable technique to the difference in reported education levels between the first twin and their sibling. This adjustment raises the return to education to 8.8 percent, marking a statistically significant increase. Columns (9) and (10), which are analogous to columns (7) and (8) respectively, incorporate additional control variables. A comparison between the outcomes of columns (3) and (9) demonstrates consistency in the findings, with both columns (9) and (10) reflecting a similar trend. From this analysis, several key insights emerge. Firstly, the presence of ability bias appears to inflate estimates predominantly, as evidenced by the comparative review of columns (5) and (9), as well as (6) and (10). This inflationary effect of ability bias is further corroborated by the findings in column (4). Secondly, measurement error tends to lower the estimated returns, particularly in analyses involving twin pairs, as depicted in columns (7) and (8). Lastly, the trends observed in this analysis closely align with those reported in earlier research by Ashenfelter and Rouse (1998) and Bonjour et al. (2003), suggesting a consistency in the impact of such biases across studies. These findings are then compared with those derived from heterogeneous methodological approaches. 23 1.6.2 Heterogeneous approach 1.6.2.1 Inferences This section outlines the inferential methods utilized to assess the heterogeneity across individuals and the robustness of our identification strategy. A key technique employed is the general location shift Wald test, designed to evaluate if coefficients remain consistent across different quantiles. This test aims to detect heteroskedasticity by applying the general linear null hypothesis: H0 : RAˆ (τ ) = 0, where Aˆ(τ ) is the set of estimated quantile regression coefficients for quantiles ranging from 10th to 90th percentile, denoted as {αˆ(τ0.10) ′ , αˆ(τ0.11) ′ , . . . , αˆ(τ0.89) ′ , αˆ(τ0.90) ′}. The test statistic is given by Tn = n RAˆ(τ ) ′ RVˆ −1R ′ −1 RAˆ(τ ) , where Vˆ is the estimated asymptotic variance-covariance matrix of Aˆ(τ ). However, the applicability and reliability of this test statistic, along with its associated p-value, are subject to scrutiny due to the small sample size of the dataset under analysis. Specifically, the small sample size hampers the achievement of asymptotic normality, potentially leading to weak identification issues or an inclination to reject the null hypothesis erroneously. This is particularly true for estimates at the extreme quantiles, which might suffer from poor identification in the asymptotic approximation, as highlighted by Chernozhukov (2005). Given these concerns, the discussion advocates for adopting a secondary approach to address these challenges. The second approach involves a finite sample inference method as proposed by Chernozhukov and Hansen (2006), offering a reliable framework for inference with minimal prerequisites. This method necessitates only a basic assumption of weak independence in the sampling process and the continuity of quantile functions with respect to the probability index. It involves calculating finite sample confidence intervals (CIs) for each coefficient under study. 24 The process begins with the determination of the critical value, cn(τ ), by simulating the distribution Ln(α0), which is the Generalized Method of Moments (GMM) function for the estimation of α0. Note cn(τ ) is identified as the τ -quantile of the simulated distribution of Ln(α0). With this critical value and the finite sample distribution of Ln(α0), the null hypothesis H0 : α = α0 is rejected if Ln(α) > cn(τ ). By inverting this test statistic, a confidence interval for each τ is derived. Chernozhukov and Hansen also outline three methods for applying this inference: Markov Chain Monte Carlo (MCMC), a simple grid search, and a marginal approach.9 Should these finite sample CIs prove significantly broader or divergent from those derived through asymptotic methods, it suggests potential issues with the coefficient’s identification. As such, the reliability of inferential testing diminishes when asymptotic approximations are inapplicable. Consequently, our analysis will proceed with a focus on grid search and marginal approach techniques for subsequent inferences. In addition to examining heterogeneity through the methods described, we also evaluate the assumption of linearity in the relationship between schooling and outcomes via residual analysis. While the residual plots themselves are included in the appendix for reference, here we discuss the findings in relation to the reported R 2 values. It is important to note that R 2 , as a measure, may not always reliably indicate linearity due to its susceptibility to outliers and significant variability around regression lines. Despite these potential limitations, our analysis does not encounter such issues. Consequently, we interpret the magnitude of R 2 as indicative of linearity in the schooling variable’s effect. 1.6.2.2 The levels model Figure 1.1 presents estimates from QR and IVQR for the returns to education. In the analysis, the levels model comprises three distinct regressions: the first is an education-only regression utilizing the first twin’s reported education as the sole variable. The second, a base-only regression, incorporates age, race, and gender as predictors. The final model is an 9The details of the all algorithms are available at Chernozhukov and Hansen (2006). 25 Figure 1.1: Quantile Regression (QR) and Instrumental Variable Quantile Regression (IVQR) Results of the Levels Model: Schooling Coefficients Notes: In the IVQR case, the instrumental variable (IV) used is the education level of the first twin, as reported by the second twin. The education coefficient from Table 1.1 is depicted by a horizontal red line in the figure. These red lines correspond to coefficients obtained from both Ordinary Least Squares (OLS) and IV regressions, where the same control variables and instrumental variables have been applied. Additionally, the gray area in the figure illustrates the 90% confidence interval for the coefficient across different quantiles. 26 all-inclusive regression, adding marriage status, union membership, and tenure in the set of regressors. Estimates of α(τ ) for quantiles from the 10th to the 90th, in 0.01 increments, are graphically represented. The left-hand figures display the QR estimates, while the right-hand figures show the IVQR estimates, with the latter utilizing the second twin’s report of the first twin’s education as an instrumental variable. For each estimate, the 90 percent CI is also provided. Given that the average returns to education are homogenous across quantiles, the plot of the corresponding coefficient is depicted as a horizontal line. Figure 1.1 shows that incorporating additional covariates clarifies the coefficient trends, particularly in figures (E) and (F), enabling a more focused analysis. Initial scrutiny of the figure indicates heterogeneity within the regression model, evidenced by some CIs in the lower quantiles not encompassing the average regression coefficient reported in Table 1.2. Moreover, there appears to be a downward bias due to measurement error, though its impact does not exceed that observed in the average regression. The coefficient increases towards the higher quantiles, with a notable escalation from the lower quantile to the median of the wage distribution. Beyond the median, the CIs widen, suggesting the coefficients stabilize, pointing to potential individual heterogeneity. Therefore, the hypothesis that the relationship between schooling and earnings distribution becomes less linear with the inclusion of more covariates and correction for measurement error is also explored. These hypotheses have been empirically tested, and the findings are shown in Table 1.3. Table 1.3 presents the outcomes from QR and IVQR across quantiles ranging from 0.1 to 0.9 in increments of 0.1. Specifically, columns (1), (3), and (5) display the education coefficient estimates from education-only, base-only, and all-inclusive regression models, respectively, for each quantile. Conversely, columns (2), (4), and (6) showcase the IVQR coefficients for the same respective regression models. The estimated coefficients span from 6.6 percent to 15.0 percent, with a median of 10.8 percent in column (1); from 6.2 percent to 15.0 percent, with a median of 10.8 percent in column (2); from 8.6 percent to 13.5 percent, with a median of 10.7 percent in column (3); from 7.8 percent to 13.5 percent, with a median of 12.2 percent 27 Table 1.3: Coefficients of Education from Quantile Regression (QR) and Instrumental Variable Quantile Regression (IVQR) in the Levels Model Quantile (1) (2) (3) (4) (5) (6) QR IVQR QR IVQR QR IVQR 0.1 0.071 0.071 0.086 0.094 0.091 0.095 s.e. (0.019) (0.020) (0.017) (0.020) (0.017) (0.019) asymptotic (0.03, 0.11) (0.05, 0.13) (0.06, 0.13) grid (0.00, 0.11) (0.00, 0.11) (0.00, 0.11) marginal (– 0.01, 0.12) (– 0.01, 0.12) (– 0.01, 0.12) 0.2 0.066 0.062 0.088 0.078 0.082 0.090 s.e. (0.014) (0.018) (0.015) (0.018) (0.014) (0.016) asymptotic (0.03, 0.10) (0.04, 0.11) (0.06, 0.12) grid (0.02, 0.12) (0.02, 0.12) (0.02, 0.12) marginal (0.01, 0.13) (0.01, 0.13) (0.01, 0.13) 0.3 0.080 0.067 0.100 0.101 0.093 0.108 s.e. (0.018) (0.019) (0.015) (0.016) (0.013) (0.016) asymptotic (0.03, 0.10) (0.07, 0.13) (0.08, 0.14) grid (0.04, 0.12) (0.04, 0.12) (0.04, 0.12) marginal (0.03, 0.13) (0.03, 0.13) (0.03, 0.13) 0.4 0.101 0.101 0.106 0.112 0.114 0.123 s.e. (0.015) (0.018) (0.013) (0.014) (0.013) (0.015) asymptotic (0.07, 0.13) (0.08, 0.14) (0.09, 0.15) grid (0.06, 0.13) (0.06, 0.13) (0.06, 0.14) marginal (0.05, 0.14) (0.05, 0.14) (0.05, 0.15) 0.5 0.108 0.108 0.107 0.122 0.132 0.137 s.e. (0.016) (0.016) (0.011) (0.013) (0.012) (0.013) asymptotic (0.08, 0.14) (0.10, 0.15) (0.11, 0.16) grid (0.06, 0.14) (0.06, 0.14) (0.06, 0.14) marginal (0.05, 0.15) (0.04, 0.15) (0.05, 0.15) 0.6 0.102 0.110 0.120 0.124 0.136 0.143 s.e. (0.014) (0.015) (0.011) (0.013) (0.010) (0.012) asymptotic (0.08, 0.14) (0.10, 0.15) (0.12, 0.17) grid. (0.09, 0.14) (0.08, 0.14) (0.08, 0.14) marginal (0.08, 0.15) (0.07, 0.15) (0.07, 0.15) 0.7 0.112 0.110 0.120 0.123 0.130 0.139 s.e. (0.012) (0.015) (0.010) (0.012) (0.009) (0.011) asymptotic (0.08, 0.14) (0.10, 0.15) (0.12, 0.16) grid (0.08, 0.14) (0.08, 0.14) (0.08, 0.14) marginal (0.07, 0.15) (0.07, 0.15) (0.07, 0.15) 0.8 0.120 0.120 0.133 0.131 0.122 0.125 s.e. (0.016) (0.016) (0.014) (0.014) (0.011) (0.012) asymptotic (0.09, 0.15) (0.10, 0.16) (0.10, 0.15) grid (0.08, 0.16) (0.08, 0.16) (0.08, 0.16) marginal (0.07, 0.17) (0.07, 0.17) (0.07, 0.17) 0.9 0.150 0.150 0.135 0.135 0.154 0.152 s.e. (0.019) (0.020) (0.023) (0.021) (0.023) (0.021) asymptotic (0.11, 0.19) (0.09, 0.18) (0.11, 0.19) grid (0.10, 0.21) (0.10, 0.21) (0.10, 0.21) marginal (0.09, 0.22) (0.09, 0.22) (0.09, 0.22) F-statistic 15.39 14.85 10.91 9.25 18.30 13.98 p-value < .0001 < .0001 < .0001 < .0001 < .0001 < .0001 R 2 0.835 0.816 0.934 0.867 0.729 0.612 N 656 656 656 656 656 656 Notes: Standard errors are presented within the first set of parentheses. The second set of parentheses contains the asymptotic 90 percent confidence intervals, while the third and fourth sets of parentheses display the finite sample 90 percent confidence intervals, obtained through grid and marginal approaches, respectively. p-values are derived using an F-distribution with F(80, 575). 28 in column (4); from 8.2 percent to 15.4 percent, with a median of 13.2 percent in column (5); and from 9.0 percent to 15.2 percent, with a median of 13.7 percent in column (6), indicating a consistent increase in coefficients across the earnings distribution quantiles. This trend aligns with the observations made in Figure 1.1, and further analysis between QR and IVQR coefficients demonstrates an increase in estimates upon adjusting for measurement error. Table 1.3 also provides results from the general Wald test for location shift, including Fstatistics and p-values. The p-values uniformly suggest a strong rejection of the null hypothesis for both QR and IVQR models, indicating the presence of heterogeneity in individual returns to education. The variation in test statistics across different regressions signals a potential need for deeper analysis. Notably, Wald tests comparing results between the 0.20-0.30 and 0.50-0.70 quantiles reject the null hypothesis with significant confidence at the 1 or 5 percent levels. In contrast, tests for other quantiles do not show this pattern. The observed increase in coefficients across quantiles, coupled with the evidence of heterogeneity, hints at a discernible link between individual ability and the impact of education on earnings. Taking a specific example from Table 1.3, in the 0.2 quantile, the coefficient is reported as 0.090, and in the 0.6 quantile, it is 0.143 in column (6). The significant difference between these two quantiles suggests heterogeneity in the impact of education on earnings, implying a complementary relationship between individual ability and education in generating earnings; in technical terms, this is expressed as implication 2, ∂MB/∂A > 0. This deduction arises from a detailed analysis, where identifying distinct coefficients for the 0.2 and 0.6 quantiles indicates that the effect of education varies across different parts of the earnings distribution. Building on the discussions in Section 1.5, each positive coefficient reflects a disparity in log earnings at the respective quantile for individuals with average educational attainment compared to those one standard deviation above the mean level of education.10 Thus, this outcome suggests that the association between changes in education and log earnings is weaker at the 0.2 quantile than at the 0.6 quantile. Accepting this observation, the stronger 10For simplicity, consider having one standard deviation of schooling as a dummy, say a decision to go to college after graduation from high school. 29 effect observed at the 0.6 quantile can be interpreted as a result of individuals with higher abilities pursuing more education, which in turn leads to higher earnings, aligning with the general understanding that higher education levels correlate with increased earnings. This demonstrates that individuals’ abilities enhance the benefits of education on earnings, meaning those with greater abilities tend to seek more education because they stand to gain more from it. As previously mentioned, the R 2 value plays a role in assessing the linearity within this context. The R 2 values, alongside the residual analysis presented in the appendix, confirm that the linearity assumption for education holds in columns (1), (2), (3), and (4). Nevertheless, a decrease in R 2 values is observed in columns (5) and (6), indicating, as anticipated, that the relationship between education and log earnings exhibits strong polynomial characteristics, thereby highlighting the non-linearity of education’s impact. The heterogeneity test, while insightful, may not be infallible. Table 3 delineates CIs for each coefficient: standard errors are listed first, followed by asymptotic CIs (utilized in Figure 1.1), and finally, finite sample CIs generated through both grid and marginal methods. A noticeable discrepancy emerges between the asymptotic CIs and finite sample CIs, particularly when education is instrumented. Generally, the finite sample intervals—both grid and marginal—are broader than their asymptotic counterparts, with the exception of the 0.4, 0.6, and 0.7 quintiles in column (2). For instance, the finite sample CIs at the 0.1 and 0.9 quantiles in column (6) are significantly wider, ranging from (– 0.01, 0.12) to (0.09, 0.22). This discrepancy raises concerns about the reliability of asymptotic approximations and casts doubt on the Wald test outcomes. Given previous concerns regarding the small sample size and the anticipated non-linearity of the education variable within this analysis, such findings are not entirely unexpected. In summary, the analysis within the levels model yields several key insights. Firstly, there appears to be a complementary relationship between ability and education in influencing earnings, particularly evident between the lower and middle quantiles. Secondly, measurement 30 error predominantly introduces a downward bias across most estimates, with exceptions noted at the extreme quantiles, as detailed in Table 3. Thirdly, the assumption of linearity in the education variable’s effect diminishes with the inclusion of additional covariates. Lastly, the discrepancies observed between asymptotic and finite sample Confidence Intervals (CIs) highlight the model’s weak identification, casting doubts on the reliability of the test outcomes. 1.6.2.3 The proxy model Figure 1.2: Quantile Regression (QR) and Instrumental Variable Quantile Regression (IVQR) Results of the Proxy Model: Schooling Coefficients Notes: In the IVQR case, the instrumental variable (IV) used is the education level of the first twin, as reported by the second twin. The education coefficient from Table 1.1 is depicted by a horizontal red line in the figure. These red lines correspond to coefficients obtained from both Ordinary Least Squares (OLS) and IV regressions, where the same control variables and instrumental variables have been applied. Additionally, the gray area in the figure illustrates the 90% confidence interval for the coefficient across different quantiles. 31 In this section, the analysis previously discussed is revisited, this time incorporating an attempt to adjust for ability bias by utilizing the father’s education as a proxy variable. Specifically, the IVQR estimates within the proxy models aim to mitigate both measurement errors and ability bias. It’s important to note, as mentioned in earlier discussions regarding the interpretation of results, that neither the fixed effect model in QR nor IVQR produces the anticipated outcomes. Consequently, QR employing the father’s education as a proxy is leveraged as a surrogate for a fixed effect model, despite acknowledging that the father’s education is an imperfect measure of ability. Figure 1.2 shows the QR and IVQR regressions from the levels model, now including the father’s education to account for family-related ability bias. At a preliminary observation, the graph still reveals heterogeneity. The trends across the quantiles exhibit similarities to those observed in the levels model, yet generally, they report lower than those depicted in Figure 1.1. Notably, horizontal lines intersect with the higher quantiles with greater frequency, and the Confidence Intervals (CIs) are broader compared to the earlier model, suggesting that the proxy model is characterized by overall higher standard errors. This could lead to a greater number of rejected hypotheses in tests for location shifts. Table 1.4 mirrors the information presented in Table 1.3, with the critical distinction being the adjustment for ability bias in this model. Upon comparing this to the levels model, several noteworthy observations emerge. Primarily, the range of estimated coefficients is narrower. For instance, in column (6), the coefficients range from 9.5 percent to 13.6 percent, marking an elevation in the lower limit and a substantial reduction in the upper limit relative to the earlier model, indicating a significant effect. This pattern is consistent across all columns, suggesting that ability bias predominantly skews positive, except in the lower quantiles. This raises the possibility that measurement errors associated with the father’s education might amplify the already present errors, especially after controlling for ability bias in the lower quantiles. 32 Table 1.4: Coefficients of Education from Quantile Regression (QR) and Instrumental Variable Quantile Regression (IVQR) in the Proxy Model Quantile (1) (2) (3) (4) (5) (6) QR IVQR QR IVQR QR IVQR 0.1 0.075 0.070 0.087 0.089 0.091 0.095 s.e. (0.016) (0.019) (0.016) (0.019) (0.017) (0.019) asymptotic (0.03, 0.11) (0.05, 0.12) (0.06, 0.13) grid (0.00, 0.11) (0.00, 0.11) (0.00, 0.11) marginal (– 0.11, 0.12) (– 0.01, 0.12) (– 0.01, 0.12) 0.2 0.070 0.070 0.081 0.081 0.083 0.093 s.e. (0.014) (0.017) (0.016) (0.018) (0.013) (0.016) asymptotic (0.04, 0.10) (0.05, 0.12) (0.06, 0.12) grid (0.02, 0.12) (0.02, 0.12) (0.02, 0.12) marginal (0.01, 0.13) (0.01, 0.13) (0.01, 0.13) 0.3 0.092 0.084 0.095 0.097 0.096 0.106 s.e. (0.017) (0.019) (0.014) (0.016) (0.013) (0.016) asymptotic (0.05, 0.12) (0.07, 0.13) (0.07, 0.14) grid (0.04, 0.12) (0.04, 0.12) (0.04, 0.12) marginal (0.03, 0.13) (0.03, 0.13) (0.03, 0.13) 0.4 0.108 0.108 0.103 0.106 0.112 0.117 s.e. (0.016) (0.019) (0.013) (0.014) (0.015) (0.016) asymptotic (0.07, 0.15) (0.08, 0.13) (0.09, 0.15) grid (0.06, 0.14) (0.06, 0.14) (0.06, 0.14) marginal (0.05, 0.15) (0.05, 0.15) (0.05, 0.15) 0.5 0.110 0.112 0.101 0.108 0.122 0.130 s.e. (0.017) (0.018) (0.011) (0.014) (0.013) (0.015) asymptotic (0.08, 0.15) (0.08, 0.13) (0.10, 0.16) grid (0.06, 0.14) (0.06, 0.14) (0.06, 0.14) marginal (0.05, 0.15) (0.05, 0.15) (0.05, 0.15) 0.6 0.102 0.111 0.101 0.108 0.130 0.141 s.e. (0.015) (0.017) (0.010) (0.013) (0.012) (0.013) asymptotic (0.08, 0.14) (0.08, 0.13) (0.12, 0.17) grid. (0.08, 0.14) (0.08, 0.14) (0.08, 0.14) marginal (0.07, 0.15) (0.07, 0.15) (0.07, 0.15) 0.7 0.107 0.106 0.112 0.117 0.124 0.129 s.e. (0.013) (0.016) (0.011) (0.013) (0.011) (0.013) asymptotic (0.07, 0.14) (0.09, 0.14) (0.10, 0.15) grid (0.08, 0.14) (0.08, 0.14) (0.08, 0.14) marginal (0.07, 0.15) (0.07, 0.15) (0.07, 0.15) 0.8 0.111 0.113 0.111 0.111 0.120 0.121 s.e. (0.016) (0.016) (0.013) (0.014) (0.010) (0.012) asymptotic (0.08, 0.14) (0.08, 0.14) (0.10, 0.14) grid (0.08, 0.16) (0.08, 0.16) (0.08, 0.16) marginal (0.07, 0.17) (0.07, 0.17) (0.07, 0.17) 0.9 0.135 0.135 0.110 0.111 0.136 0.136 s.e. (0.022) (0.021) (0.022) (0.022) (0.023) (0.020) asymptotic (0.09, 0.18) (0.07, 0.15) (0.10, 0.18) grid (0.10, 0.20) (0.10, 0.20) (0.10, 0.21) marginal (0.09, 0.21) (0.09, 0.21) (0.09, 0.22) F-statistic 7.06 6.01 5.12 3.96 12.83 9.28 p-value < .0001 < .0001 < .0001 < .0001 < .0001 < .0001 R 2 0.643 0.642 0.807 0.778 0.680 0.483 N 656 656 656 656 656 656 Notes: Standard errors are presented within the first set of parentheses. The second set of parentheses contains the asymptotic 90 percent confidence intervals, while the third and fourth sets of parentheses display the finite sample 90 percent confidence intervals, obtained through grid and marginal approaches, respectively. p-values are derived using an F-distribution with F(80, 575). 33 The Wald-type location shift tests within this proxy model framework leads to a notable rejection of the null hypotheses, albeit with test statistics that are reduced in magnitude relative to those observed in the levels model. This outcome indicates the larger standard error characteristic of the proxy model. Following a line of reasoning similar to the one previously discussed, we would anticipate a higher likelihood of accepting the null hypotheses in two-sample Wald tests conducted between quartiles. Indeed, most p-values do not reach statistical significance, except for the comparison between the (0.1-0.2) and 0.60 quantiles, a finding that aligns with observations from the levels model. Therefore, in this model, while the relationship between ability and education continues to exhibit a complementary nature in influencing earnings, the magnitude of this effect appears diminished compared to the earlier findings. The examination of linearity within the proxy model yields findings analogous to those detailed in Table 1.3, yet it reveals a markedly more pronounced non-linearity compared to the previous model. The R 2 values reported here are consistently lower than their counterparts in the levels model, indicating a significant deviation from linearity, with the exception of the base-only IQ and IVQR regressions. This relative linearity in the base-only models could stem from the less substantial variations observed in these regressions compared to the others. This inference is supported by the lowest F-statistics reported among the regression models, suggesting that the minor location shifts observed indicate reduced variations. In the proxy model, the finite sample CIs exhibit a broader range than the asymptotic CIs previously anticipated. Particularly at the extreme quantiles, a significant disparity is observed between the CIs, pointing towards a weak identification issue within these ranges. This discrepancy offers a plausible rationale for the opposite direction in test outcomes and biases at these quantiles compared to others. The presence of estimation noise, attributable to weak identification, further complicates the analysis within this model framework. In summary, the proxy model, which accounts for family-related ability bias, produces outcomes akin to those of the levels model. However, both the magnitude of the coefficients 34 and the test statistics are reduced, reflecting the influence of positive ability bias and increased standard errors. The following section will delve deeper into the examination of ability bias and measurement errors, offering a more comprehensive analysis of their impacts. 1.7 Other Results 1.7.1 Ability bias and measurement error The analysis of two empirical models, the levels model, and the proxy model, enables the examination of how ability bias and measurement error influence the outcomes of quantile regressions. In these models, the education level reported by the first twin is instrumented using the second twin’s report on the first twin’s education, aiming to correct measurement error. Additionally, ability bias, or omitted variable bias, is addressed using the father’s education as a proxy variable. The focus of this analysis is on the all-inclusive model, which integrates all relevant covariates associated with the generation of earnings. To explore the impact of these biases on the coefficients, the study measures the difference between the coefficients in the levels model with IV and those in the proxy model with IV, highlighting the presence of ability bias. Similarly, the difference between coefficients in the proxy model with and without IV is examined to assess the effect of measurement error. The findings from this comparative analysis are presented in Figure 1.3. The left-hand side of Figure 1.3 reveals an increasing trend where, starting from the 0.4 quantile, the coefficients from the proxy model with IV surpass those from the levels model with IV. This observation suggests that, in the model that fails to adjust for the omitted variable bias related to the choice of schooling, ability exhibits a positive bias beyond the 0.4 quantile. This finding aligns with Implication 5, derived from the Section 1.2.1. The right-hand side of Figure 1.3 displays negative values, indicating that, within the lower and middle quantiles, the coefficients for the proxy model with IV are larger, a trend that reverses in the higher quantiles. Specifically, a shift to positive differences beyond the 35 Figure 1.3: Comparison of Coefficients Across Models: Levels and Proxy Models with and without Instrumental Variable (IV) Notes: The left figure illustrates the variation in coefficient differences between the levels and proxy models using Instrumental Variables (IV) across different quantiles. Similarly, the right figure compares the coefficient differences between the proxy model without IV and with IV. All coefficients displayed in these graphs are derived from models that incorporate the full range of covariates. 0.8 quantile likely signifies the impact of noise in the upper distribution, as evidenced by the analysis of column (6) in Table 1.4. Although there is a noticeable discrepancy between the finite sample CIs and the asymptotic CIs across most of the spectrum, it is particularly pronounced at the 0.8 and 0.9 quantile, where the finite sample CIs are broader yet do not extend to the upper limits of the asymptotic CIs. Furthermore, the graph highlights a significant negative divergence at the median quantiles, illustrating the exacerbation of measurement error when accounting for omitted variable bias through the use of a proxy variable, in this case, the father’s education. This suggests that measurement errors associated with the father’s education might introduce additional errors into the analysis. 1.7.2 Estimation results for other covariates Figure 1.4 presents the coefficients for additional covariates in the proxy model with instrumental variables (IV), covering factors such as age, race, gender, marital status, union membership, and tenure, all accompanied by 90 percent confidence intervals (CIs). A notable 36 Figure 1.4: Instrumental Variable Quantile Regression (IVQR) Results of the Proxy Model: Returns to Other Covariates Notes: The variation in Instrumental Variable Quantile Regression (IVQR) coefficients for covariates other than education, aligning with the IVQR results presented in column (6) of Table 1.4, is presented. The instrumental variable employed for IVQR analysis is based on the second twin’s report of the first twin’s level of schooling. A horizontal line indicates where αˆ(τ ) = 0, signifying no effect. The education level of the father serves as a proxy for family influences. The shaded gray area represents the 90% confidence interval for the coefficients at each quantile. 37 degree of heterogeneity across quantiles is observed for most covariates, with age and marital status being notable exceptions, as detailed in Table 1.5. The analysis reveals that females typically earn less than males at lower quantiles, and this earnings disparity increases at higher quantiles. While the impact of union membership appears to decline across quantiles, its coefficient does not reach statistical significance at the highest quantiles. Conversely, the effects of tenure are statistically significant across all quantiles, yet they do not deviate significantly from the overall average coefficient. These patterns largely align with the findings of Arias et al. (2013), with the exception of those pertaining to Caucasians and tenure. The disparities observed may be attributed to noise in the upper quantile distributions for Caucasians and tenure membership, as indicated by the broader CIs and extreme coefficient values in the upper tails shown in Figure 1.4. Apart from these, minimal heterogeneity is observed in the mentioned variables. Table 1.5: Location Shift Test Results of Other Covariates Age W hite F emale M arried T enure Union F-statistic 1.22 2.00 7.07 0.70 4.79 9.32 p-value 0.11 <.0001 <.0001 <.0001 <.0001 <.0001 N 656 656 656 656 656 656 Notes: The table presents the outcomes of the location shift Wald test applied to the additional covariates featured in Figure 1.4. It reports the F-statistics and corresponding p-values for these covariates within the context of the proxy model using Instrumental Variable Quantile Regression (IVQR). The p-values are derived using an F-distribution with degrees of freedom (80, 575). 1.8 Conclusion In this study, I have calculated the returns to education employing both homogeneous and heterogeneous approaches, utilizing data from twins. This involved exploring the interplay between education and ability in influencing earnings, while also allowing for the possibility that the impact of education is not linear. The primary emphasis has been on distinguishing 38 between variations in earnings that arise due to family-wide factors and those that are the result of individual differences in ability. For the homogeneous analysis, this paper employs the conventional method of calculating average returns to education, a technique widely utilized in prior studies. Under the assumption that individual differences are minimal, this approach reveals returns to education to be approximately 11 or 12 percent within pooled regressions after adjusting for all covariates and suggests a 10 percent return within the fixed effect model framework. To address measurement error, the education level reported by the second twin about the first twin is utilized. These findings highlight the presence of positive ability bias and negative measurement errors regarding educational attainment. However, a critical limitation of this approach is its disregard for individual variation — or the heterogeneity in the impact of education. This oversight is particularly relevant because the assignment of schooling within twin pairs is not random. Such a limitation necessitates the adoption of a heterogeneous approach. Rather than employing the traditional heterogeneous methods proposed by Ashenfelter and Rouse (1998) and Card (1994, 1999), this study utilizes Quantile Regression (QR) and Instrumental Variable Quantile Regression (IVQR) as advocated by Chernozhukov and Hansen (2005) to uncover the heterogeneity in the returns to education. Through the application of two distinct models, the levels model and the proxy model, the analysis elucidates six key implications regarding the variability in educational returns. First, while the quantile regression approach is instrumental in detecting individual heterogeneity, interpreting the coefficients derived from these regressions poses challenges. This complexity arises because the ranks of individuals within the distribution are not static and may shift following adjustments in covariates. Furthermore, unlike the conditional expectation function, the conditional quantile function does not readily translate to a marginal function. Consequently, the coefficients do not provide insights into specific individuals but rather inform the distribution of dependent variables as a whole. At most, these coefficients 39 can indicate disparities in the performance of the dependent variable, reflecting the impact of varying covariates across different quantiles. Second, both the levels model and the proxy model highlight the presence of heterogeneity in educational returns across individuals, a finding corroborated by the general location shift Wald test. This test reveals statistically significant differences among the coefficients, suggesting varying impacts of education on earnings. These differences enable the identification of a complementary relationship between ability and schooling, indicating that individuals with higher abilities tend to reap greater benefits from education. Specifically, significant disparities are observed between the lower and middle quantiles in terms of educational returns, while the higher quantiles do not exhibit such differences. Third, the adjustment for measurement error allows for the observation of a positive ability bias within the proxy model, aligning with findings from the homogeneous approach. Utilizing the father’s education as a surrogate for ability, a comparison between the levels model and the proxy model demonstrates a general decrease in coefficients, particularly noticeable when the regression includes all covariates. This ability bias exhibits a pronounced increase starting from the 0.4 quantile upwards, though the magnitude of this bias differs across quantiles. Assuming this bias is accurately adjusted for, the resulting estimates may be interpreted as the upper bound of the respective coefficients within the context of quantile analysis. Fourth, the presence of negative measurement errors within the regressions is affirmed by both models, particularly when all specifications are accounted for. These measurement errors are addressed through the use of an instrumental variable. In the levels model, the coefficients exhibit a moderate negative bias, a phenomenon that is intensified in the proxy model when education is controlled for using a proxy variable. Nonetheless, the higher quantiles in the proxy model display a positive bias, attributable to the noisy estimates prevalent in the upper distribution tails. Although the magnitude of these errors varies across different quantiles, the consistency of these findings with those from the homogeneous approach remains evident. 40 Fifth, the models under consideration indicate that the assumption of schooling linearity is not upheld. The introduction of additional covariates and the correction for both measurement error and ability bias further underscore the absence of linearity within these models. This observation is substantiated by the R 2 values presented in Tables 3 and 4, as well as by the residual plots included in the appendix. There is compelling evidence, reinforcing the viewpoint shared by Heckman et al. (2006), that linearity in the relationship between schooling and earnings cannot be assumed as a given. Sixth, substantial heterogeneity is observed in the majority of covariates, with age and marital status being notable exceptions. Across both models proposed it has been feasible to reject the null hypotheses for most covariates, indicating diverse impacts. Specifically, for females, only the lower quantiles demonstrate a slight heterogeneity, and in these cases, the coefficient turns out to be statistically insignificant when compared to other quantiles. Several challenges within this model raise concerns about the credibility of its findings. A critical issue is the weak identification stemming from the limited sample size, which becomes evident through the analysis of Confidence Intervals (CIs). The comparison between asymptotic CIs and finite sample CIs, conducted via grid and marginal methods, reveals significant discrepancies, with finite sample CIs often presenting broader ranges. This suggests that the reliance on asymptotic approximations may be misplaced, potentially leading to an excessive rejection of null hypotheses in Wald tests. Another complication arises from employing the father’s education as a proxy variable. Despite its strong association with the twins’ education levels, it is important to acknowledge that the father’s education does not perfectly capture all unobservable factors influencing earnings. Additionally, the potential for measurement errors in this variable could exacerbate errors within the regression analysis. Furthermore, the very nature of utilizing twins data introduces inherent selection biases, complicating the interpretation and generalization of the results. 41 Future research building on this paper could take several directions to enhance and expand upon the initial findings. A straightforward first step could involve increasing the sample size, which may help mitigate the issues related to weak identification observed in the regressions. Another avenue for exploration could focus on the dynamics of schooling decisions over time. Given that individuals accumulate income throughout their lives, decisions regarding education are based on the information and income expectations available at a particular time. Additionally, information evolves and becomes more comprehensive over time. Despite the data limitations and the inherent challenges of applying QR and IVQR methods to panel data, delving deeper into these aspects could yield valuable insights and contribute significantly to our understanding of education’s role in earnings generation. 42 Chapter 2 Bayesian Dynamic Factor Augmented Structure Learning: Cross-sectional Dependence for Residuals 1 2.1 Introduction In response to the burgeoning volume of economic and financial data, econometricians have been compelled to either devise new methods or modify existing ones to summarize information within these extensive databases efficiently. One approach involves selecting a subset of variables, specifically N variables of interest, to address the problem at hand effectively. Additionally, the development of techniques to analyze cross-sectional dependencies has enabled the effective utilization of large datasets. A key consideration within the scope of cross-sectional analysis is the modeling and construction of cross-sectional dependence. This includes addressing the nuances between strong and weak cross-sectional dependencies and factors, as discussed in the works of Chudik and Pesaran (2011); Chudik et al. (2011); 1 I am especially grateful to my adviser Hashem Pesaran, and Cheng Hsaio for their continuous advice and support. All mistakes are my own. 43 Bailey et al. (2016, 2019a), and exploring the selection of such factors, as seen in the studies by Kapetanios et al. (2021) and Freyaldenhoven (2022). The Vector Autoregressive (VAR) model, as introduced by Mann and Wald (1943) and further developed by Sims (1980), serves as a foundational framework for analyzing crosssectional dependencies. To circumvent the complexities of high-dimensional data, VAR models typically incorporate a limited number of variables. Researchers have innovated within the VAR framework in several ways to address this limitation. One strategy involves integrating a small set of estimated factors to capture significant information, as seen in the high-dimensional factor models discussed by Stock and Watson (1999, 2002); Forni et al. (2001); Bai and Ng (2008, 2013), and the dynamic factor model (DFM) outlined by Forni et al. (2000); Bai and Ng (2002); Bai (2003), and Hallin and Liˇska (2007). Another method employs Bayesian estimation to manage a greater number of variables through imposed restrictions, as seen in the work of De Mol et al. (2008) and Bai and Wang (2015). Merging these methods, the factor-augmented VAR (FAVAR) model proposed by Bernanke et al. (2005) and Beyeler and Kaufmann (2018) supplements standard VARs with estimated factors, which effectively summarize extensive economic information. These unobserved factors are identified using the principal components (PC) method. The estimation process unfolds in two stages to ensure the exclusion of unobserved factors from the observed ones’ information content. Initially, factors are derived from all variables, including the observed ones. Then, using a regression-based approach, these preliminary estimates are adjusted by removing their informational content from that of the observed factors, iterating this process until convergence as described by Boivin and Giannoni (2008). To avoid the estimated unobserved factors merely being linear combinations of observed ones, Bernanke et al. (2005) apply constraints to the principal square factor load matrix, estimating unobserved factors via Bayesian Markov Chain Monte Carlo (MCMC) methods. However, the interpretability of factors estimated parametrically poses challenges, as they generally lack clear meanings compared to those estimated through PC. Bai and Ng (2013) suggest that 44 factor interpretation can be enhanced through exposure rotation and series rearrangement. Conversely, Bernanke et al. (2005) recommend designating certain variables to specific factors under identification restrictions to facilitate economic interpretation. Moreover, the correlation between estimated factors and lagged regressors hampers the accurate estimation of factors through PC analysis. Miao et al. (2023) tackle this problem by applying normalization to eliminate linear combinations of factors and lagged regressors, favoring regularized estimation to resolve these interpretive and methodological challenges. Regularized estimation represents an alternative strategy for addressing challenges in the VAR model. This approach is grounded in the foundational work of Tibshirani (1996); Zhao and Yu (2006); Candes and Tao (2007), and Huang et al. (2008), whose methodologies have spurred an expanding body of research on high-dimensional autoregressive models. Beyond this, their methods have significantly influenced the literature on Bayesian predictor selection. A key reason for employing predictor selection or regularization is to estimate coefficients that alleviate the collinearity among predictors, which, if not addressed, can lead to inaccurate regression coefficients and effect sizes or signs that deviate from expected values (Winship and Western, 2016). For instance, the discrete mixture approach uses binary selection indicators γi for variables i = 1, 2, . . . , N, facilitating the definition of prior variations in regression coefficients that are consistent with their effective inclusion or exclusion in the model, as illustrated by (George and McCulloch, 1993). This selection mechanism is applied to all predictors, except for the intercept. Furthermore, the spike and slab prior, a technique developed by Kuo and Mallick (1998), falls under Stochastic Search Variable Selection (SSVS), showcasing its application in tackling the challenges of variable selection and regularization in the VAR framework. The concept of a shrinkage prior aims to diminish the representation of regression coefficients without necessarily implementing a formal mechanism to eliminate superfluous predictors, offering potential advantages in the MCMC sampling process (Bhattacharya et al., 2015). An example of this approach is the Lasso prior, which employs a heavy-tailed double 45 exponential (Laplace) prior density for the regression coefficients. This methodology has been explored in the context of VAR models, as documented in the work of Beyeler and Kaufmann (2018, 2021); Kaufmann and Schumacher (2019); Paci and Consonni (2020). Additionally, sparse factor modeling, which finds application in economics and gene expression analysis, is represented in the studies by Boivin and Ng (2006) and Carvalho et al. (2008); Bhattacharya and Dunson (2011), respectively. Our approach to estimating the dynamic factor-augmented VAR model employs a Bayesian estimation strategy, which contrasts with step-wise estimation methods by estimating the model components concurrently through the Markov Chain Monte Carlo (MCMC) algorithm. This procedure unfolds in two distinct phases within each iteration: first is the estimation of factors, followed by the estimation of residuals using the previously estimated factors from the VAR model. The next phase involves graphical model selection, which is conducted on the estimated residuals, utilizing fractional priors. First, with the number of factors predetermined, we estimate the unobserved factors using principal component analysis. Starting with these initial factors and their loadings, we employ the forward-filtering backward-sampling algorithm within the Gibbs sampler to extract the factors. This algorithm, as its name suggests, applies the Kalman filter to ascertain the distribution of the final observation before proceeding to sample from the end of the sequence back to the beginning. From these sampled factors, we determine the coefficients pertaining to dynamic factors based on the conventional VAR model framework. The coefficients are drawn using the Normal-Inverse Wishart distribution, based on initial estimates derived from the VAR model. For the selection of hyperparameters in this process, we reference the methodologies outlined by Koop et al. (2010), Uhlig (2005), and Kilian and Lutkepohl (2017). Following the estimation of factors, we proceed to implement Bayesian graphical model selection applied to the residuals after the estimated factors have been accounted for within the framework of graphical VAR models. Utilizing decomposable graphs that are Markovian to the estimated covariance matrix—obtained through prior Bayesian procedures— 46 we demonstrate that the likelihood of a graphical VAR model can be decomposed into components corresponding to cliques and separators, as detailed by Paci and Consonni (2020). To calculate the marginal likelihood, we also employ the fractional Bayes factor, an approach introduced by O’Hagan (1995). Through the application of the MCMC, we can determine the inclusion probabilities of edges within the estimated graph, thereby facilitating a detailed understanding of the graphical model’s structure. The remainder of this paper is structured as follows: Section 2 outlines our model, including its identification conditions and theoretical justifications. Section 3 details our estimation methodology, exploring its theoretical properties and derivations. In Section 4, we perform a Monte Carlo simulation to assess the performance of our estimation approach. Section 5 presents an empirical analysis focusing on U.S. housing prices. The paper concludes with Section 6. Additional proofs, derivations, and discussions are provided in the Appendix for further reference. 2.2 Model We begin with a general framework for conducting factor-augmented VAR analysis. Let xt be an N × 1 vector of observable economic variables following a dynamic structure and ft be an M × 1 vector of factors. These factors exhibit a distinct dynamic structure from xt , specifically described by a VAR(p) structure. Then, we can write the model as follows: for each time period t = 1, 2, . . . , T, xt = X k i=1 Bixt−i + Λ ′ft + ϵt , (2.2.1) where ϵt |Σ ∼ NN (0, Σ) represents the white noise process that is independently distributed over time, and ft = Φ1ft−1 + · · · + Φpft−p + ϵ f t , (2.2.2) 47 with ϵ f t i.i.d. ∼ NM(0, Σf ), where Λ denotes an M × N matrix of factor loadings, Bi represents N × N matrices of coefficients for lag matrices that dictate the dynamics of xt , and Φi are M × M matrices of VAR(p) coefficients. The dynamics of xt are influences by its lagged values, establishing a dynamic relationship, whereas the connection between xt and ft is maintained as static, despite the dynamic nature of ft . Our model presupposes known values for k, p, and M, allowing us to concentrate on modeling the cross-sectional dependencies more effectively. Although it is feasible to adapt the model to incorporate intercepts or other exogenous variables, such extensions are not explored in this paper. This analysis specifically addresses cases where the factors are not directly observable. The equation (2.2.1) can be rewritten to a more concise form. We can express xt as: xt = B′zt + Λ ′ft + ϵt , (2.2.3) where zt represents a vector of lagged observations, zt = (x ′ t−1 , . . . , x ′ t−k ), with a size N k × 1 at time t, and B′ = (B1, . . . , Bk) is the combined coefficient matrix of sizes N × N k. The expression (2.2.3) is given a (conditional) likelihood form as: g (x1, . . . , xT | Λ, B, Σ) = Y T t=1 g (xt | zt , Λ, B, Σ), where g(xt | zt , Λ, B, Σ) follows a multivariate distribution: xt |zt , Λ, B, Σ ∼ NN (B′zt + Λ ′ft , Σ). Upon this basis, we can reformulate (2.2.1) and (2.2.2) in matrix notation as: X = ZB + F Λ + E, (2.2.4) 48 with E|Σ ∼ NT,N (0,IT , Σ), and the factor dynamics modeled by:2 Ft = ΦFt−1 + v f t , (2.2.5) where v f t i.i.d. ∼ NM,P (0,IT , Σf ), Ft is constructed as a column stack of factors from t down to t − p + 1, and Φ is a block matrix, with Φ and v f t as shown: Ft = ft ft−1 ft−2 . . . ft−p+1 , Φ = Φ1 Φ2 · · · Φp−1 Φp IM 0 · · · 0 0 0 IM · · · 0 0 . . . . . . . . . . . . . . . 0 0 · · · IM 0 , v f t = ϵ f t 0 0 . . . 0 . This transformation simplifies the VAR(p) structure into a more manageable VAR(1) model.3 When dealing with unobservable factors within our model, we encounter challenges related to identification and estimation. To address these concerns, especially in the context of Equations (2.2.1)-(2.2.2), we need to explore methods for incorporating the impact of these unobservable factors. One possible approach is to model the error term such that it takes the factor specifications. Specifically, when setting k = 1, the model simplifies to: xt = B′zt + ϵt , (2.2.6) with the error term decomposed as: ϵt = Λ ′ft + ut , for t = 1, 2, . . . , T, 2The notation for X | B ∼ Na,b(0, A, B) is equivalent to vec(X) | B ∼ Nb(0, B ⊗ A). In other words, A is an (a × a) row-wise covariance matrix, and B is a b × b column-wise covariance matrix. 3 It is necessary to use the Kalman filter since it makes ft be Markov. 49 where ϵt is the N ×1 error term of the model, and ut represents an N ×1 vector of innovations, with ut |Σ ∼ NN (0, Σ). For the purpose of clarity and focus, we will adhere to the expression presented in (2.2.6). We examine the identification problems, and will address the estimation problem in Section 3. 2.2.1 Factor Identification Bai and Wang (2015) introduced a minimal set of identifying restrictions specifically designed to distinguish dynamic factors, rather than static factors or the corresponding factor space. The specified model is presented as follows: Xt = Λ0ft + Λ1ft−1 + · · · + Λsft−s + ϵt , where the factor dynamics ft are defined by ft = Φ1ft−1 + · · · + Φhft−h + ϵ f t , where both ϵt and ϵ f t follow the normal distributions.4 Despite the differences between their model and ours, adapting their identification conditions to our framework remains feasible. They established two sets of equivalent identification conditions within their settings. We adopt their second type of conditions, similar to those utilized by Bernanke et al. (2005). The primary distinction lies in the nature of the relationship between xt and ft ; it is dynamic according to Bai and Wang (2015), whereas Bernanke et al. (2005) describe a static relationship. Nevertheless, in both studies, ft itself is treated as dynamic. Our model aligns more closely with the static relationship perspective. The specifics of the identification restriction are detailed below. 4Note that the subscript 0 in their documentation does not denote the “true” values. 50 Assumption 1. The M × M block of Λ in (2.2.3) is an identity matrix, that is, for M < N, Λ ′ = IM ΛM+1:N . Assumption 1 restricts factor loadings only, which allows factors to be completely unrestricted. These M2 restrictions imply that the first M factors solely affect the first M out of N variables. Based on Assumption 1, addressing the rotation indeterminacy (also known as fundamental indeterminacy) necessitates the imposition of restrictions on the factors and their corresponding coefficients. To illustrate, consider A as a full rank M × M rotation matrix. By applying A to the dynamic factors ft on the left and the loading matrix Λ′ on the right with A−1 , we can derive new rotated factors f˜ t = Aft . Recalling Equation (2.2.2), this transformation is detailed as follows: f˜ t = AΦ1A−1ft−1 + · · · + AΦpA−1ft−p + Aϵf t = AΦ1f˜ t−1 + · · · + AΦpf˜ t−p + Aϵf t . When we substitute ft in Equation (2.2.3) expressed in matrix form, we obtain: xt = Λ ′ft + B′zt + ϵt = Λ ′A−1f˜ t + B′zt + ϵt . This demonstrates that merely by observing xt , we are unable to differentiate between these two sets of factors. To enforce the condition ft = f˜ t , thereby implying A = IM, it is necessary to implement restrictions, such as setting the upper M × M block of Λ′ to an identity matrix. Since ft and f˜ t share the same space, this normalization strategy preserves the information content within the estimated factors and loadings, thus leading to the following Proposition. 51 Proposition 1. Consider the model specifications in Equations (2.2.1) and (2.2.2). Under Assumption 1, the factors ft and factor loadings Λ are uniquely identified. 2.2.2 Graphical VAR We establish a link between the factor-augmented dynamic panel data model and the graphical framework, adhering closely to the Bayesian graphical model selection. In our setup, which involves a double array of random variables xit, we construct a graph G = (VNT , E), where VNT = V × Z represents a finite set of vertices, and E denotes the set of edges – a set of the ordered pairs of distinct vertices within VNT × VNT . In line with equations (2.2.2)–(2.2.5), our model focuses the structure of contemporaneous dependence, allowing for N potential vertices (or nodes) and N(N − 1)/2 possible active edges. To accurately reflect the dynamic nature of xit, we need to assume the translation invariance for the edge set E with a consideration of up to k lags. This framework allows us to integrate the restrictions proposed by Eichler (2007, 2012) for the general (non-linear) graphical VAR model into our model, which stipulate: (a) An edge (m, t) − (a, t) exists in E if and only if the element (Λ)ma is non-zero, for all t = 1, 2, . . . , T, (b) A directed edge (a, t − i) → (b, t) is present in E if and only if the element (Bi)ab is non-zero, for all i = 1, 2, . . . , N, (c) An edge (a, t) − (b, t) exists in E if and only if the element (Ω)ab is non-zero, for all t = 1, 2, . . . , T, where (Λ)ma, (B) ab, and (Ω) ab refer to elements within matrices Λ, Bi , and Ω = Σ−1 , respectively, as outlines in (2.2.1). The constraint on B implies that non-zero entries indicate directed edges within the graph, representing the dynamic structures of the time series through matrices of lag coefficients. Conversely, the conditions specified in Λ and Ω indicate 52 that an absence of an edge between two nodes corresponds to a zero value in the matrix. The precision matrix Ω, denoting conditional dependence, and the factor loading matrix Λ, illustrating the widespread influence of common factors across nodes, play pivotal roles in this graphical interpretation.5 In our exploration of the graphical VAR model, denoted as VAR(k, Gu ), we define Gu = (V, E u ), an undirected graph that reflects only restriction (c), thereby centering our analysis on contemporaneous dependencies among residuals. Our focus narrows specifically to decomposable graphs. Given that this model serves as the graphical interaction framework for the multivariate normal distribution, we can align it with the Gaussian Graphical Model or the Covariance Selection Model for X, contingent on X adhering to a multivariate normal distribution and satisfying the undirected pairwise Markov property in relation to Gu , as detailed by Lauritzen (1996). In this context, the precision matrix Ω is also consistent with the Markov property relative to Gu . In this context, the precision matrix Ω is also consistent with the Markov property relative to Gu . Following the separation theorem6 , the distribution adheres to the global Markov property, and provided the density is positive and continuous, it factorizes as per Whittaker (2009). We proceed with the assumption that the graph Gu is triangulated, meaning it contains no chordless cycles exceeding three vertices. This characteristic of triangulation inheritability ensures that all chordless cycles within the graph are maintained across any component resulting from graph decomposition. Consequently, the random vector X is decomposable according to the Triangulation theorem.7 5By integrating restrictions (a) and (c), we can construct factor graphs that feature two distinct types of nodes. The first type of node corresponds to random variables, represented by N, while the second type pertains to factors influencing these variables, also denoted by N. A factor graph is characterized as an undirected graph that exclusively facilitates connections between variable nodes and factor nodes. This framework allows for the potential application of constructing an induced Markov network, which is achieved by excluding the factor nodes from the graph. However, this particular application is not the focus of our current discussion. 6See Koller and Friedman (2009) for a complete proof. 7See Whittaker (2009) for a complete proof of the Triangulation theorem. 53 2.3 Bayesian Estimation This section examines an estimation method for the model under the assumption that the VAR orders k and p are predetermined. While these parameters can be estimated using existing data-driven methods in practice, our primary focus here is on the estimation of the model itself rather than on determining these parameters. 2.3.1 Factor Estimation We consider the Bayesian estimation of our model, adhering to the identification condition previously discussed. This process resembles the estimation approach used in typical dynamic factor models, starting with the estimation of the principal components of unobservable factors from xt . With the number of factors M, we extract M principal components and their corresponding eigenvectors as a foundational step. Following the normalization and restrictions outlined in Section 2, we estimate the initial factors Fe and initial residuals Ee. Let F and Lf be the extracted principal components and eigenvectors, respectively, with dimensions T × M and N × M. To incorporate the identification condition in Assumption 1, we apply QR decomposition to Lf , resulting L′ f = QLRL, where QL is an M × M orthonormal matrix and RL is an M × N upper-triangular matrix. This decomposition gives the following results: Fe = F QLR1:M,L, Lef = R1:M,L R(M+1):N,L ′ R−1′ 1:M,L = IM, R−1 1:M,LR(M+1):N,L′ , 5 and the estimation of initial residuals as: Ee = X − ZBˆ − FeLe′ f = X − ZBˆ − F QLR1:M,L IM; R−1 1:M,LR(M+1):N,L = X − ZBˆ − F QLR1:M,L; QLR(M+1):N,L = X − ZBˆ − F QLRL = X − ZBˆ − F L′ f , where Z is defined in Equation (2.2.4) and Bˆ = (Z′Z) −1Z′ (X − FeLe′ f ). This process illustrates that normalization, even with the identification restriction, does not compromise the estimation of Ee. Upon applying the identification condition, Lef is redefined as Λb, which is then utilized in the estimation of Ee. With Fe and the number of lags in factors p, we proceed to estimate the model specified in Equation (2.2.5), thereby obtaining initial values for Φb and Σbf . This constitutes a straightforward VAR(p) model of the factors. Using the initial values, we employ the Gibbs sampler in conjunction with the forward-filtering backward-sampling algorithm, as outlined by Carter and Kohn (1994) and Fruhwirth-Schnatter ¨ (1994), to draw factors. This process allows for the estimation of the parameter Φ based on VAR priors, conditional upon the drawn factors. Specifically, out objective is to generate Ft from conditional Gaussian densities. The formulation p(Ft |X) = p(FT |X) QT −1 t=1 p(Ft |xt ,Ft+1) leverages the multivariate normal distribution properties of both ϵt and ϵ f t . Thus, the process begins by drawing FT from p(FT |X), followed by sequentially drawing Ft from p(Ft |xt ,Ft+1) for t = T − 1, T − 2, . . . , 1. To have these draws, two iterations are employed. The first iteration, referred to as “up” estimates the conditional expectation E(Ft |xj ) and variance Var(Ft |xt ,Ft+1) for j ≤ t = 1, 2, . . . , T, which are required for generating FT from the corresponding multivariate normal distributions with the derived mean and variance. With FT established, the “down” iteration is initiated to estimate E(Ft |xt ,Ft+1) and Var(Ft |xt ,Ft+1) for t = T −1, T −2, . . . , 1, facilitating the construction of multivariate normal distributions for drawing Ft for each t = T − 1, T − 2, . . . , 1. This estimation process treats the equation Ft+1 = ΦFt + v F t+1 as Mp additional observations on the state vector Ft , applying the Kalman filter accordingly. For an indepth exploration of the algorithm, Carter and Kohn (1994) provide further details in their Appendix 1. To draw Φ from the established factor draws, we first address the standard VAR(p) problem to determine priors for the hyperparameters, utilizing the Bayesian VAR methodology in conjunction with the Normal-Inverse Wishart (NIW) distribution. Reflecting on the VAR(p) model with normal innovations as outlined in Equation (2.2.5), we reformulate the model into a distinct matrix representation: F = F−1Φ + v f , where v f = (v f 1 , v f 2 , . . . , v f T ) ′ ∼ NT,M(0,IT , Σf ), where F−1 is a T × Mp matrix comprising the lagged factors. Then, the priors are established as follows: Φ|Σ f ∼ NM2p,M 0, ΣΦ, Σ f and Σ f ∼ IWM (S, M + 2), where S denotes an M × M scale matrix, and M + 2 specifies the degrees of freedom for the Inverse-Wishart (IW) distribution. Employing these priors, we can formulate the matrix normal inverse Wishart distribution (MNIW) as MNIW(Φ, ΣΦ, τ,S), where the parameters 5 for this distribution are defined as:8 Φ = Σ −1 Φ + F ′ −1F−1 −1 F ′ −1F−1Φb , Φb = F ′ −1F−1 −1 F ′ −1F, ΣΦ = Σ −1 Φ + F ′ −1F−1 −1 ⊗ Σ f , τ = T + M + 2, and S and Σ f are calculated based on the derived parameters and matrices S = Φb′F ′ −1F−1Φb + S −1 + (T − p)Σ f − Φ ′ (Σ −1 Φ + F ′ −1F−1) −1Φ, Σ f = F − F−1Φb ′ F − F−1Φb . This Bayesian VAR approach, incorporating Normal-Inverse Wishart priors, allows for a nuanced estimation of Φ and Σf . For a comprehensive discussion on the derivation and selection of these parameters, references such as Uhlig (2005), Koop et al. (2010), and Kilian and Lutkepohl (2017) provide extensive overviews. An upcoming section will detail the integration of estimated factors and loadings to compute fractional Bayes factors for the residuals. 2.3.2 Objective Bayesian Inference Objective Bayesian inference advocates for the use of non-informative priors, also known as uninformative priors, to reduce the influence of subjective prior distributions in likelihoodbased analyses.9 This approach includes the implementation of uniform prior distributions, or flat priors, for parameters of interest. A more refined method, proposed by Jeffreys, involves the use of Jeffreys prior, which is derived from the square root of the Fisher information 8The comprehensive analysis of the prior and the derivation of the corresponding priors is tantamount to examining the natural conjugate Gaussian-Inverse Wishart Prior. 9The term “uninformative priors” is somewhat misleading. It does not imply that these priors are entirely devoid of information. matrix. This prior is particularly valued because it closely approximates the variance of the maximum likelihood estimate, making it a common choice in Bayesian research (Efron and Hastie, 2021). Various types of non-informative priors, such as intrinsic priors (Berger and Pericchi, 1996), fractional Bayes factors (O’Hagan, 1995), and reference priors (Bernardo, 1979) have been documented extensively, for instance, in the work of Kaass and Wasserman (1995). The objective Bayesian approach, facilitated by advances in computational techniques, has emerged as a leading methodology in theoretical research domains. A significant application of this approach is in data-based model selection. Here, we consider comparing models related to parameters, denoted as Mi(θi) for i = a, b. Given the data X, the posterior odds in favor of model Ma over model Mb is expressed as p(Ma | X) p(Mb | X) = p(Ma) p(Mb) qa(X) qb(X) = p(Ma) p(Mb) B(X), (2.3.1) where B(X) represents the Bayes factor.10 The marginal density of X for each model, i, is calculated as qi(X) = ´ π D i (θi)fi(X | θi)dθi for i = a, b. Essentially, the posterior odds ratio is the product of the prior odds ratio and the Bayes factor, indicating the Bayes factor as an essential updating term in the comparison. This method of comparison is important for defining relationships within the constructed graph. Importantly, the Bayes factor is not influenced by the prior probabilities of the models but rather by the prior densities, πi(θi), which must be proper (Consonni et al., 2018). To ensure appropriateness, we integrate fractional Bayes factors into our posterior distribution and graphical model comparisons. 2.3.2.1 Fractional Bayes Factor The Fractional Bayes Factor (FBF) is a refinement of the Partial Bayes Factor (PBF) used in objective Bayesian model selection. Both FBF and PBF address the issue of indeterminate Bayes factors arising from improper priors by dividing the dataset into a training sample and a validation sample. The training sample is utilized to construct posterior distributions 10Posterior odds represent the ratio of posterior probabilities. 58 for parameters, while the validation sample is used to calculate Bayes factors. Unlike PBF, which lacks a definitive methodology for selecting and averaging the training sample, FBF offers a more systematic approach. O’Hagan (1995) suggests approximating the training likelihood by elevating the full likelihood to the power of δ = T0/T, where T0 represents the minimum training sample size and T the total sample size. Consider the model comparison with aggregate data X, divided into X1 and X2, such that X = (X1, X2), with X1 comprising T0 observations and X2 the remaining T − T0 observations. The PBF formula, for models i = a, b, is given by B(X) = B(X1)B(X2|X1), where B(X2|X1) = qa (X2|X1) qb (X2|X1) = ´ π D a (θa | X1) fa (X2 | θa, X1) dθa ´ π D b (θb | X1) fb (X2 | θb, X1) dθb (2.3.2) is calculated as the ratio of the likelihoods conditioned on X1 for models a and b, and qi (X2|X1) = ´ π D i (θi) fi (X | θi) dθi ´ π D i (θi) fi (X1 | θi) dθi = qi (X) qi (X1) . The posterior distribution of θi given X1 πi (θi | X1) = π D i (θi) fi (X1 | θi) qi (X) , adjusts the prior based on the observed training sample, where π D i (θi) is an uninformative prior density for i = a, b For a fractional parameter δ = T0/T within the interval (0, 1), we derive that Bab(X), the Bayes factor between models a and b, can be represented as the ratio of the fractional 59 marginal likelihoods of the two models adjusted by the fractional prior: Bab(X) = qa (δ, X) qb (δ, X) = ´ π D a (θa)fa(X | θa)dθa/ ´ π D a (θa)f δ a (X | θa)dθa ´ π D b (θb)fb(X | θb)dθb/ ´ π D b (θb)f δ b (X | θb)dθb , = ´ π F a (θa)f 1−δ a (X | θa)dθa ´ π F b (θb)f 1−δ b (X | θb)dθb = f F a (X) f F b (X) . (2.3.3) where f F i (X) is the fractional marginal likelihood of model Mi , and π F i (θi) ∝ f δ i (X | θi)π D i (θi) is the induced fractional prior relevant to θi . We employ the FBF as described in (2.3.3), to rationalize the application of uninformative priors for model delineated in (2.2.4) - (2.2.6). In developing the FBF for our model and its posterior distribution, we align with the methodology of Paci and Consonni (2020). Our starting point is the uninformative prior π D (Λ, B, Σ) ∝ |Σ| − aD+N+1 2 , (2.3.4) which aligns with Jeffreys prior when αD = 0. Using Equation (2.3.4), we derive the fractional prior for the model specified in (2.2.4), adopting the Matrix Normal-Inverse Wishart (MNIW) distribution due to its conjugacy with the VAR model. This is denoted as MNIW(Ψ, Ξ, ζ, R), with Λ, B | Σ ∼ NM+KN,N (Ψ, Ξ, Σ), and Σ ∼ IWN (ζ, R). The probability density function of the inverse Wishart distribution is p(Σ) ∝ |Σ| −( ζ+N+1 2 ) exp − 1 2 tr(Σ −1R) , 60 where Ψ = Λb Bb , Ξ = T T0 Fe′Fe Fe′Z Z′F Z e ′Z −1 , ζ = aD − (M + kN) + T0, R = T0 T Ee′Ee. Building on (2.2.5), and (2.3.4), we then formulate the following fractional density:11 π F (Λ, B, Σ) ∝ C (Ξ, R, ζ)|Σ| − aD+T0−(M+KN)+N+1 2 exp − T0 2T tr Σ −1Θ( e Λb, Bb, Ee , (2.3.5) where C (Ξ, R, ζ) = (2π) − N(M+KN) 2 |Ξ| − N 2 |R/2| ζ 2 ΓN (ζ/2)−1 , and Θ( e Λb, Bb, Ee) = Λe − Λb B − Bb ′ Fe′Fe Fe′Z Z′F Z e ′Z Λe − Λb B − Bb + Ee′Ee, with ΓN (·) being the N-dimensional multivariate gamma function. The objective of the FBF is to transform an uninformative (or improper) prior into a proper one, thus ensuring that (2.3.5) is proper. This necessitates additional assumptions, which are summarized below. Assumption 2. The fractional prior density in (2.3.5) integrates to 1 (proper) if it satisfies: (i) aD + T0 − (M + N k) > N − 1, (ii) T − (M + N k) > N − 1. Condition (i) stated in Assumption 3 ensures that the (inverse) Wishart prior for Σ is not improper. This is based on the requirement that ζ, representing the degrees of freedom, must exceed N − 1, thus ζ > N − 1. For instance, when employing Jeffreys prior (αD = 0), the necessary condition is T0 > N + M + N k − 1. To achieve the minimal δ, it is optimal to set T0 = M + N k + 1 and αD = N − 1, resulting in degrees of freedom ζ = N. Condition (ii) guarantees that Ee′Ee is positive definite. Essentially, this means the sample covariance 11For a detailed derivation, refer to Appendix B. 61 matrix Ee′Ee adheres to the Wishart distribution with N − 1 degrees of freedom, requiring a sample size that exceeds this degree of freedom, as discussed by Mardia et al. (1979). From Equation (2.3.5), we can deduce the corresponding posterior distribution, which follows a MNIW distribution with updated hyperparameters, expressed as MNIW(Ψ, Ξ, ζ, R), where Ψ = Ψ, Ξ = (T0/T)Ξ, ζ = ζ − T0 + T, and R = Ee′Ee. Following the methodologies of Villani (2001), and Paci and Consonni (2020), we can derive the closed form of the fractional marginal likelihood of the model, which is valid up to a multiplicative factor.12 Importantly, the ratio of the prior and posterior normalizing constants is equivalent to the marginal likelihood, attributable to the conjugacy between the multivariate normal and inverse Wishart distributions. Specifically, q F T0 T , X = π − N(T −T0) 2 T0 T N(aD+T0) 2 Ee′Ee −( T −T0 2 ) ΓN ((aD − (M + N k) + T) /2) ΓN ((aD − (M + N k) + T0) /2). (2.3.6) To adapt our results for graphical estimation, it is necessary to focus on the submatrices of the matrices we have found, as each column within these matrices encapsulates information pertaining to one of the N variables. For example, consider XS as a T × |S| submatrix of X, where |S| denotes the cardinality of set S, that is, the number of elements selected from the N variables. Subsequently, Equation (2.3.6) can be modified to calculate the fractional marginal likelihood of XS by replacing N, αD, and Ee with |S|, αD −|S c |, and EeS = XS −FeΛb S −ZBbS, respectively. Here, |S c | is the cardinality of the complementary set of S, where S c = N\S. It is important to note that BbS is a N k × |S| submatrix of Bb and Λb S is an M × |S| submatrix selected from Λb = (IM, Λb M+1:N ), which are the M ×N estimates from multivariate regression, conforming to the identification condition specified in Assumption 1. As shown in condition (ii) of Assumption 3, for Ee′ SEeS to be positive definite, |S| must be less than T − N k + 1. In our simulations and empirical applications, we set αD = N − 1 and T0 = N k + 1. 12For an exhaustive derivation, refer to Appendix B. 62 2.3.3 Graphical Estimation In this section, we expand upon the findings from previous sections by applying them to graphical analysis, specifically focusing on graphical VAR within the context of decomposable graphs, which are characterized by factorable probability densities. Although our investigation primarily targets decomposable graphs, adapting our model to fit actual data is straightforward and effective, as noted by Fitch et al. (2014). A key component of our analysis involves utilizing chordless cycles that contain no more than three vertices, also referred to as cliques (or maximal cliques, a terminology we adopt), to examine the graph’s decomposability using separators. Let C represent the set of cliques and S the set of separators within an undirected graph, denoted as Gu . Consequently, we can define the (conditional) likelihood, now conditioned on the graph Gu . The equation is expressed as follows: g (x1, . . . , xT | Λ, B, Σ, Gu ) = Q c∈C g (Xc | Λc, Bc, Σcc) Q s∈S g (Xs | Λs, Bs, Σss) , (2.3.7) where Σαα denotes the submatrix of Σ corresponding to variables in subset α, which pertains to each C and S. This equation serves as a foundation for factorization in the context of graphical VAR(k, Gu ).13 Building on the principles outlined in Section 2.2, we can derive the marginal likelihood for the graph Gu when applying a fractional prior within the context of VAR(k, Gu ). This process begins by assuming an uninformative prior for the unconstrained parameters B and Σ, where its inverse is designed to be Markov with Gu . The justification for employing an uninformative prior is rooted in the need to align with the conjugate prior framework used in the Bayesian analysis of decomposable Gaussian graphical models. This alignment is facilitated through the adoption of a hyper Markov law, as detailed by Dawid and Lauritzen (1993). Such a law encompasses various distributions, including hyper multinomial, hyper Dirichlet, and both hyper Wishart and inverse Wishart laws. Our focus narrows to the hyper inverse Wishart 13See Appendix B for a derivation. 63 (HIW) distribution due to its conjugate relationship with the multivariate normal distribution, making it particularly relevant for our analysis. As demonstrated by Carvalho and West (2007), the joint density of the HIW distribution can be effectively segmented into cliques and separators. This segmentation is depicted in equation (2.3.7), illustrating the conjugate relationship between the full HIW distribution and the conditional likelihood of sample data derived from a multivariate normal distribution, with its variance structured according to G. To illustrate, consider that for each set J ∈ {C, S}, the covariance matrix ΣJ follows an inverse Wishart distribution, IW(ν, DJ ), with the density function p(ΣJ | ν, DJ ) ∝ |ΣJ | −(ν+2|J|)/2 exp − 1 2 tr Σ −1 J DJ , where ν is the degree of freedom and DJ is the symmetric positive-definite block diagonal matrix of Σ corresponding to ΣJ . Consequently, the overall density function can be expressed as p (Σ | ν, D) = Q C∈C p (ΣC | ν, DC) Q S∈S p (ΣS | ν, DS) . This formulation allows us to apply the results to our model to define an uninformative prior as π D (Λ, B, Σ | G u ) ∝ Q C∈C |ΣCC| −|C| Q S∈S |ΣSS| −|S| . (2.3.8) This approach helps in deriving a prior that, while being uninformative, is structured and informed by the graph Gu through its cliques and separators. To derive the fractional prior as detailed in Equation (2.3.5), we multiply Equation (2.3.8) by a fraction, δ = T0/T, of the conditional likelihood formulated in Equation (2.3.7). The 64 resulting fractional prior for our model is expressed as follows: q F (Λ, B, Σ | G u ) ∝ π D (Λ, B, Σ | G u ) · g δ (X | Λ, B, Σ, Gu ) = Q C∈C |ΣCC| −|C| Q S∈S |ΣSS| −|S| · Q C∈C 2π −|C|T0/2 |ΣCC| −T0/2 Q S∈S 2π −|S|T0/2 |ΣSS| −T0/2 · exp n − T0 2T tr h Σ −1 CC(XC − FeΛb C − ZBbC) ′ (XC − FeΛb C − ZBbC) io exp n − T0 2T tr h Σ −1 SS(XS − FeΛb S − ZBbS) ′ (XS − FeΛb S − ZBbS) io ∝ Q C∈C |ΣCC| −(|C|+T0/2) exp n − T0 2T tr h Σ −1 CCΘ( e Λb C, BbC, EeC) io Q S∈S |ΣSS| −(|S|+T0/2) exp n − T0 2T tr h Σ −1 SSΘ( e Λb S, BbS, EeS) io , (2.3.9) where Θ( e Λb, Bb, Ee) = Λe J − Λb J BJ − BbJ ′ Fe′Fe Fe′Z Z′F Z e ′Z Λe J − Λb J BJ − BbJ + Ee′ JEeJ for J ∈ {C, S}. This leads to the construction of a normalizing constant by aggregating constant terms from the inverse Wishart (IW) and matrix normal (MN) distributions: K (Ξ, R, ζ) = 2π − N(M+Nk) 2 |Ξ| − P C∈C |C|/2 Q C∈C |RCC/2| (d+|C|−1)/2 Γ|C| ((d + |C| − 1) /2)−1 |Ξ| − P S∈S |S|/2 Q S∈S |RSS/2| (d+|S|−1)/2 Γ|S| ((d + |S| − 1) /2)−1 , = 2π − N(M+Nk) 2 |Ξ| −(ΣC∈C|C|−ΣS∈S|S|)/2 Q C∈C |RCC/2| (d+|C|−1)/2 Γ|C| ((d + |C| − 1) /2)−1 Q S∈S |RSS/2| (d+|S|−1)/2 Γ|S| ((d + |S| − 1) /2)−1 = 2π − N(M+Nk) 2 |Ξ| −N/2 Q C∈C |RCC/2| (d+|C|−1)/2 Γ|C| ((d + |C| − 1) /2)−1 Q S∈S |RSS/2| (d+|S|−1)/2 Γ|S| ((d + |S| − 1) /2)−1 , where ΣC∈C |C| − ΣS∈S |S| = N and d = T0 − (M + N k). To sum up, given the following distributions for the fractional prior in (2.3.9), Λ, B, Σ | G u ∼ NM+Nk,N (Ψ, Ξ, Σ) Σ | G u ∼ HIWN (d, R) 65 the fractional prior for the VAR(k, Gu ) is a Matrix Normal, Hyper-Inverse Wishart (MNHIW) distribution, write it as MNHIW(Ψ, Ξ, d, R). It enables the factorization of the fractional prior q F (Λ, B, Σ | G u ) = Q C∈C NM+Nk,|C| (ΨCC, Ξ, ΣCC) IW|C| (d + |C| − 1, RCC) Q S∈S NM+Nk,|S| (ΨSS, Ξ, ΣSS) IW|S| (d + |S| − 1, RSS) (2.3.10) demonstrating how the conjugacy between distributions allows for the derivation of the marginal likelihood for the graph Gu : q F (X | G u ) = Q C∈C q F (T0/T, XC) Q S∈S q F (T0/T, XS) , (2.3.11) where q F (T0/T, XJ ) is calculated for each submatrix J within cliques or separators, as defined. This process adheres to Assumption 3, requiring T − (M + N k) > |J| − 1 for each subset J. To establish a prior for the undirected graph Gu , as indicated in Equation (2.3.11), we consider the following prior distribution G u ω | ω i.i.d. ∼ Bernoulli(ω), ω = 1, 2, . . . , N(N − 1)/2 ω ∼ Beta(aG, bG), where Gu ω is the ω-th element of vec(Gu ), aG and bG are the shape parameters of the Beta distribution. The range of ω is based on the symmetric nature of Gu , considering only the lower or upper triangular portion, which inherently limits the maximum number of edges to N(N − 1)/2. This setup allows us to find the marginal (unconditional) prior on Gu simply by integrating out ω from the joint distribution. Then, given G, set of all decomposable graphs, π(G u ) ∝ N(N − 1)/2 |Gu| Γ (a + |G u |) Γ (b + N(N − 1)/2 − |G u |)I {G ∈ G} . (2.3.12) 66 This formulation implies that, for a decomposable graph Gu , the selection of |Gu |, the actual number of edges, from the maximum potential edges occurs with a probability outlined in Equation (2.3.12). A sparsity-inducing prior is adopted with parameters aG = 1 and bG = 2(N − 1)/3 − 1, aiming to keep the expected value of ω, E(ω) = aG/(aG + bG), at or below 0.5 for any N ≥ 4. 2.3.4 MCMC Algorithm In our Bayesian framework, the primary function is to select the most appropriate model, utilizing the MCMC method, particularly through the Metropolis-Hastings algorithm. This method facilitates model comparison by evaluating changes between the current decomposable graph Gu and a proposed new graph Gu∗ , specifically focusing on the addition or deletion of edges with certain probabilities. It is critical that Gu∗ remains decomposable, as our analysis is confined to decomposable graphs. The core of this method is the calculation of the acceptance probability for transitioning from Gu to Gu∗ , denoted as r(Gu , Gu∗ ). This probability is defined as the minimum of 1 and the ratio of the fractional prior probabilities of the graphs and their respective proposal probabilities, formalized as: r(G u , Gu∗ ) = min 1, q F (X|Gu∗ )π(Gu∗ )κ(Gu |Gu∗ ) q F (X|Gu)π(Gu)κ(Gu∗ |Gu) . Alternatively, this can be expressed in terms of logarithmic differences for clarity and computational efficiency: min 0, ln q F (X|G u∗ ) − ln q F (X|G u ) + ln π(G u∗ ) − ln π(G u ) + ln κ(G u |G u∗ ) − ln κ(G u∗ |G u ) . The term κ(Gu |Gu∗ )/κ(Gu∗ |Gu ) signifies the proposal ratio, managing the edge addition and deletion process. Notably, the proposal probabilities κ(Gu∗ |Gu ) and κ(Gu |Gu∗ ) are not inherently symmetrical, which is a divergence from the assumption of proposal density 67 symmetry (Chib and Greenberg, 1994). In cases where the proposal density is symmetric, meaning κ(Gu∗ |Gu ) = κ(Gu |Gu∗ ), the Metropolis-Hastings algorithm simplifies to the Metropolis algorithm. For our purposes, we assume symmetry in the proposal density with κ(Gu∗ |Gu ) = 0.5 = κ(Gu |Gu∗ ). Decisions on acceptance are then made through random sampling from a uniform distribution U(0, 1), with the new candidate Gu∗ being accepted if U(0, 1) ≤ r(Gu , Gu∗ ). Utilizing the outcomes of the MCMC analysis, we can estimate the posterior probabilities for each unique graph Gu ℓ , where ℓ = 1, 2, . . . , L. The posterior probability of a given graph Gu ℓ , conditional on X, is calculated as follows: p(G u ℓ | X) = q F (X | Gu ℓ )π(Gu ℓ ) PL ℓ ′=1 q F (X | Gu ℓ ′)π(Gu ℓ ′) = "PL ℓ ′=1 q F (X | Gu ℓ ′)π(Gu ℓ ′) q F (X | Gu ℓ )π(Gu ℓ ) #−1 = " 1 +X L ℓ ′̸=ℓ q F (X | Gu ℓ ′)π(Gu ℓ ′) q F (X | Gu ℓ )π(Gu ℓ ) #−1 = " 1 +X L ℓ ′̸=ℓ π(Gu ℓ ′) π(Gu ℓ ) FBFℓℓ′(X | G u ) #−1 , (2.3.13) where FBFℓℓ′(X|Gu ) is derived from Equation (2.3.3). Equation (2.3.13) then facilitates the approximation of the posterior inclusion probability of an edge (i, j) as pb(i, j) = X L ℓ=1 I{(i, j) ∈ E u ℓ }pb(G u ℓ |X), where E u ℓ indicates the edge set of the decomposable graph Gu ℓ . The subsequent section will focus on a simulation study aimed at assessing our model’s performance. This evaluation leverages a Bayesian formulation of the posterior expected false discovery rate (FDR), as introduced by Muller ¨ et al. (2007), defined as: FDR = P i<j (1 − pbij )I{pbij ≥ r} P i<j I{pbij ≥ r} , 68 where r is selected to ensure the posterior expected FDR remains below a specified threshold, such as 0.05 or 0.10. 2.4 Monte Carlo Simulation In this section, we evaluate the performance of our proposed methodology by conducting a simulation study similar to those presented in Bailey et al. (2016) and Bailey et al. (2019a). 2.4.1 Simulation Design Consider the following dynamic panel data model without exogenous regressors yit = ai + θiyit−1 + εit, for i = 1, 2, . . . , N; t = 2, 3, . . . , T, (2.4.1) where θi i.i.d. ∼ U(0, 0.95), εit follows the two-factor specification, εit = λi1f1t + λi2f2t + uit, for i = 1, 2, . . . , N; t = 2, 3, . . . , T, and factors are generated as fjt = ρjfj,t−1 + q 1 − ρ 2 j ζjt, j = 1, 2, for t = −49, −48, . . . , −1, 0, 1, . . . T, (2.4.2) where fj,−50 = 0 for j = 1, 2, and ζjt i.i.d. ∼ N(0, 1). The shocks, uit, follow an AR(1) process uit = ϕiui,t−1 + p 1 − ϕ2νit, for i = 1, 2, . . . , N; t = −49, −48, . . . , −1, 0, 1, . . . , T, 69 with ui,−50 = 0, ϕi i.i.d. ∼ U(0, 1) and νit i.i.d. ∼ h χ 2 (2)−2 4 i , i = 1, 2, . . . , N. Factor loadings follow λi1 = vi1, for i = 1, 2, . . . , ⌊N α1 ⌋ , λi1 = 0, for i = ⌊N α1 ⌋ + 1, ⌊N α1 ⌋ + 2, . . . , N, λi2 = vi2, for i = 1, 2, . . . , ⌊N α2 ⌋ , λi2 = 0, for i = ⌊N α2 ⌋ + 1, ⌊N α2 ⌋ + 2, . . . , N, where α2 are then randomized across i to achieve independence from λi1 and. We consider the case where α1 > α2 = 2 3 α1, and vij ∼ U(µvj − 0.2, µvj + 0.2) for j = 1, 2. Note that µv1 and µv2 are chosen to satisfy the condition that µvj ̸= 0, j = 1, 2, i.e., µv2 = 0.71 and µv1 = q µ2 v − N2(α2−α1)µ2 v2 such that µ 2 v1 + µ 2 v2 = µ 2 v = 0.75. 2.4.2 Simulation Result In our simulation design, we incorporate Gaussian errors where uit i.i.d. ∼ N(0, 1) without a constant term ai = 0 and with ρj = 0.5. Each simulation is conducted over 2, 000 iterations, including a 200-iteration burn-in period. This simulation setup is designed to examine cases where the model includes two strong factors within the errors, focusing on deriving the cross-sectional dependence graph directly rather than extracting it from the residuals. Figure 2.1 presents the estimated precision matrix alongside the corresponding network graph for a case with T = 200 and N = 100. By maintaining the FDR below 0.05, we were able to achieve these results. The left panel of Figure 2.1 illustrates the estimated posterior inclusion probability matrix, determined with an FDR cutoff value (r) of 0.9998, which results in an FDR of less than 0.05. The corresponding graph structure is depicted in the right panel of Figure 2.1. Given the symmetry of the matrix, there are 144 edges identified out of a total of 4950 possible edges (calculated as (100 × 99)/2), representing a connectedness rate of 2.91%. However, this level of connectedness does not align with the definitions of strong and weak cross-sectional 70 Figure 2.1: Posterior Inclusion Matrix and Corresponding Graph with Two Factors Case Notes: Parameters of the dynamic panel data model, (2.4.1), are generated as: ai = 0, ρj = 0.5, for j = 1, 2, and θi i.i.d. ∼ U(0, 0.95) for i = 1, 2, . . . , N. Factors are generated as in (2.4.2) with ζjt i.i.d. ∼ N(0, 1). Gaussian errors are generated as uit i.i.d. ∼ N(0, 1). Simulation design assumes a two-factor model with ⌊Nα1 ⌋ and ⌊Nα2 ⌋ non-zero loadings for the first and second factor, respectively, where α2 = 2α1/3. The number of replications is set to 2000 with burn-in sample 200. Precision matrix and corresponding network graph are generated by controlling F DR < 0.05 as in FDR with r = 0.9998. 71 dependence as defined in the literature. Cross-sectional dependence refers to the rate at which the largest eigenvalue of the covariance matrix of the cross-sectional units increases with the number of cross-sectional units, as outlined by Chudik et al. (2011) and Bailey et al. (2016). The estimated (not bias-adjusted) α value of the posterior inclusion probability matrix, approximately 0.3866, is lower than the theoretical value of α = 1. This discrepancy may arise because the estimated matrix does not replicate the simple covariance or correlation matrices typically used in the literature. To further assess this result, an approach is to estimate the factors individually and examine if the factor loadings or the estimated structure reflects the simulation’s design by extracting the factor from the residual. Initially, we estimate only one factor and remove it from the residual within our model, with Figure 2.2 showcasing the matrix and graph for this case involving a single factor. In the left panel of sub-figure a in Figure 2.2, the observed number of active edges decreases to 101, representing 2.04% of the total edges, and the corresponding α estimate diminishes to 0.2885. The sparseness observed in the right panel of sub-figure b in Figure 2 exceeds that in Figure 2.1, suggesting that removing a factor from the residuals minimizes the variation captured in the estimated residuals, thereby enhancing the sparsity of the network structure. The left panel of sub-figure b displays the estimated factor over time, with its associated factor loading depicted in the right panel. The significance of each is assessed through a 95% quantile interval derived from the simulation, identifying any instance where the lower bound falls below 0 and the upper bound rises above 0 as potentially indicating zero significance, thus identifying a sparse point. This analysis reveals that the estimated factor is statistically insignificant for 53 out of 200 time periods, indicating sparsity for approximately 26.5% of the time. Meanwhile, the estimated factor loading demonstrates no sparsity, suggesting the presence of strong cross-sectional dependence (α = 1), as intended in the simulation design. This inference is supported by the principle that the first principal component typically accounts for the most significant variation in the dataset. Consequently, these results affirm the model’s accuracy in identifying the simulation’s intended sparsity level, validating the 72 (a) Posterior Inclusion Matrix and Corresponding Graph (b) Estimated Factor and Factor Loading Figure 2.2: Results with One Factor Case Notes: Parameters of the dynamic panel data model, (2.4.1), are generated as: ai = 0, ρj = 0.5, for j = 1, 2, and θi i.i.d. ∼ U(0, 0.95) for i = 1, 2, . . . , N. Factors are generated as in (2.4.2) with ζjt i.i.d. ∼ N(0, 1). Gaussian errors are generated as uit i.i.d. ∼ N(0, 1). Simulation design assumes a two-factor model with ⌊Nα1 ⌋ and ⌊Nα2 ⌋ non-zero loadings for the first and second factor, respectively, where α2 = 2α1/3. The number of replications is set to 2000 with burn-in sample 200. Precision matrix and corresponding network graph are generated by controlling F DR < 0.05 as in FDR with r = 0.9998. Estimated factor and loading are evaluated with a 95% quantile interval. 73 estimated cross-sectional structure. Following this analysis, we proceed to extract two factors from the residuals for further examination. Figure 2.3: Posterior Inclusion Matrix and Corresponding Graph with no Factor Case Notes: Parameters of the dynamic panel data model, (2.4.1), are generated as: ai = 0, ρj = 0.5, for j = 1, 2, and θi i.i.d. ∼ U(0, 0.95) for i = 1, 2, . . . , N. Factors are generated as in (2.4.2) with ζjt i.i.d. ∼ N(0, 1). Gaussian errors are generated as uit i.i.d. ∼ N(0, 1). Simulation design assumes a two-factor model with ⌊Nα1 ⌋ and ⌊Nα2 ⌋ non-zero loadings for the first and second factor, respectively, where α2 = 2α1/3. The number of replications is set to 2000 with burn-in sample 200. Precision matrix and corresponding network graph are generated by controlling F DR < 0.05 as in FDR with r = 0.9998. Figure 2.3 displays the probability matrix and the network graph resulting from the model that extracts two factors. Consistent with expectations, the network graph lacks any significant cross-dependent structure, a result of the majority of variations attributable to the two factors being removed from the residuals. In this case, there is merely one active edge, indicating a state of cross-sectional independence. This finding aligns with the observations made in the one-factor model. Following this approach, we can similarly evaluate the sparsity of factors and factor loadings across different periods and variables, respectively, to assess their impact and significance further. 74 Figure 2.4: Estimated Factors and Factor Loadings with No Factor Case Notes: Parameters of the dynamic panel data model, (2.4.1), are generated as: ai = 0, ρj = 0.5, for j = 1, 2, and θi i.i.d. ∼ U(0, 0.95) for i = 1, 2, . . . , N. Factors are generated as in (2.4.2) with ζjt i.i.d. ∼ N(0, 1). Gaussian errors are generated as uit i.i.d. ∼ N(0, 1). Simulation design assumes a two-factor model with ⌊Nα1 ⌋ and ⌊Nα2 ⌋ non-zero loadings for the first and second factor, respectively, where α2 = 2α1/3. The number of replications is set to 2000 with burn-in sample 200. Precision matrix and corresponding network graph are generated by controlling F DR < 0.05 as in FDR with r = 0.9998. 75 Figure 2.4 reveals that the first factor retains the same level of sparsity as observed previously, yet the sparsity within the first factor’s loadings has increased to include 12 sparse points. Conversely, the second factor and its factor loadings exhibit 159 and 54 sparse points, respectively, out of 200 time periods and 100 variables. Given the predefined values of α1 = 1 and α2 = 2/3, we would anticipate 79 sparse points in the second factor, based on the calculation that Nα2 = 1002/3 = 21.54, which rounds down to 21 non-zero loadings. However, this expected outcome is not achieved in our simulation, even when combining the sparsities from both factors. This inconsistency arises from two primary issues: the identification condition inherent to the dynamic factor model and the model’s dynamic structure itself. Specifically, we designate the identity matrix, IM, to the loadings of the first m variables, thereby imposing sparsity across all m variables. Furthermore, the dynamic nature of the factors, which are autoregressive in this model, introduces additional variation, resulting in less precise estimates for the loadings. These factors prevent the estimated factors and loadings from fully capturing variations, contrasting with the outcomes observed in the single-factor case. Nonetheless, accurate estimation of the residuals is achievable through their multiplication by the factors and factor loadings, necessitating further evaluation of this product to confirm the model’s performance. Figure 2.5 examines the product of factors and loadings (denoted as F L). Given that F L is a matrix with dimensions T by N, we evaluate the level of sparsity for each period individually. A value of 100 signifies complete sparsity within that period, indicating an αt = 0 for t = 1, 2, . . . , T. Utilizing the established criteria for identifying sparse points, the computed average sparsity across all periods is roughly 35.65. This translates to about 71 of the factor loadings being zero out of a total of 200 loadings (accounting for the presence of two factors), resulting in an α2 = 0.7312 in the first panel of Figure 2.5. The subsequent second and third panels further break down the average sparsity across lagged and current periods, respectively. Considering the autoregressive (AR(1)) nature of the factors, it is inferred that the inherent sparsity predominantly affects the lagged periods rather than the current ones. 76 Figure 2.5: Estimated Factors × Factor Loadings Notes: Parameters of the dynamic panel data model, (2.4.1), are generated as: ai = 0, ρj = 0.5, for j = 1, 2, and θi i.i.d. ∼ U(0, 0.95) for i = 1, 2, . . . , N. Factors are generated as in (2.4.2) with ζjt i.i.d. ∼ N(0, 1). Gaussian errors are generated as uit i.i.d. ∼ N(0, 1). Simulation design assumes a two-factor model with ⌊Nα1 ⌋ and ⌊Nα2 ⌋ non-zero loadings for the first and second factor, respectively, where α2 = 2α1/3. The number of replications is set to 2000 with burn-in sample 200. Estimated factor and loading are evaluated with a 95% quantile interval. Notably, the average sparsity for lagged periods is 39.54, which implies that approximately 79 (or 79.08 to be precise) of the factor loadings are sparse, equating to an α2 value of 0.6611, closely aligning with the intended 2/3. Consequently, this analysis supports the notion that the model successfully approximates the designated degree of sparsity. Therefore, it suggests that the estimated cross-sectional dependence structure presented might represent a solution for modeling cross-sectional dependence effectively. Table 2.1: Structural Hamming Distance (SHD) # Estimated Factors SHD % Correction 2 10 99.90 1 107 98.93 0 154 98.46 Note: An element of the precision matrix, our ”true” graph, is set to 0 if an element is 0.3 considered negligible and 1, otherwise. To assess how well our method recovers the graphical structure, we utilize the Structural Hamming Distance (SHD), a metric that calculates the number of edge modifications—namely insertions, deletions, or flips—required to convert one graph into another. In our context, 77 dealing with an undirected graph, the SHD specifically measures the discrepancies in edge presence between two graphs. A lower SHD value is preferred, indicating a closer match between the evaluated graph and the reference or “true” graph. The “true” graph, in this case, is derived from the precision matrix of the simulated dataset, serving as the benchmark for comparison. For the purposes of clarity and to focus on significant correlations, elements within the precision matrix are set to 0 if they represent a negligible correlation (less than 0.3, median) and to 1 otherwise. The performance of our graph recovery process is summarized in Table 2.1, where we calculate the SHD and the corresponding correction percentage relative to the total number of edges for each scenario. The results indicate a noticeable improvement in performance when the estimated factors align accurately with the actual factors present in the data, underscoring our method’s effectiveness in capturing the impact of common factors and revealing weaker cross-sectional dependencies. Conversely, a mismatch between estimated and actual factors leads to a decline in performance, highlighting the importance of accurate factor identification for the successful recovery of the graph’s structure. 2.5 Empirical Applications This section demonstrates the application of our approaches to data on U.S. housing prices as utilized in the study by Aquaro et al. (2021). 2.5.1 U.S. Housing Prices As of March 2020, the United States Office of Management and Budget (OMB) recognizes 384 Metropolitan Statistical Areas (MSAs). Our analysis includes house prices from 382 MSAs and additionally incorporates five more MSAs: (i) Anchorage, AK Metropolitan Statistical Area, (ii) Enid, OK Micropolitan Statistical Area, (iii) Fairbanks, AK Metropolitan 78 Statistical Area, (iv) Kahului-Wailuku, HI Micropolitan Statistical Area, and (v) Honolulu, HI Metropolitan Statistical Area. This dataset allows us to examine cross-sectional dependency over the period from March 1975 to June 2021 (T = 553, on a monthly basis). The monthly house price data is sourced from the Freddie Mac House Price Index (FMHPI).14 This selection helps to prevent a potential issue where the number of MSAs (N) exceeds the time periods (T), necessitating further screening. House prices are adjusted to their real values using the Consumer Price Index (CPI) data obtained monthly from the Bureau of Labor Statistics.15 These CPI data are matched to the corresponding MSAs, utilizing state-level CPI when MSA-specific data are missing, and resorting to the U.S. average CPI when state-level figures are unavailable. Small gaps in monthly CPI data are filled by interpolation. Unlike the approach by Aquaro et al. (2021), we do not adjust the house prices using deflators, as estimating these deflators forms a part of our estimation strategy. We set M = 2 based on the Scree plot analysis. Furthermore, our model does not include other exogenous variables considered by them, such as population and income change rates, to simplify the model and focus more closely on the monthly variations in house prices. Figure 2.6 presents the inclusion probability matrix, the associated dependency graph, and the degree distribution histogram, all of which illustrate the connections between MSAs based on two extracted factors. The visualizations reveal significant fragmentation among the MSAs; specifically, 211 MSAs are isolated, 107 have a single connection (one degree), 54 have two connections (two degrees), and a small number, 8 and 2 MSAs, are interconnected with three and four degrees, respectively. The histogram highlights the minimal crosssectional dependencies among MSAs after the extraction of two factors. When comparing our findings to the work of Aquaro et al. (2021), we observe that the total dependency of our contemporaneous graph is about 44.76% (0.4476), markedly lower than their reported contemporaneous spatial coefficients of 74.01% (0.7401) indicating a difference in the level of 14Available at http://www.freddiemac.com/research/. 15Available at https://www.bls.gov/cpi/. 79 Figure 2.6: Posterior Inclusion Matrix and Corresponding Graph Notes: We replace the name of the MSAs with numbers ranked by alphabetical order names for visualization. For choice parameters, we set k = 1, p = 1, and M = 2. For detail of MSAs corresponding to each degree, see Appendix D. interconnectedness.16 However, our results align more closely with the average spatial effect (both contemporaneous and lagged), approximately 43.77% (0.4377) suggesting that our dynamic factor augmented VAR model captures the dynamics of cross-sectional dependencies through lagged values and factors as per our assumptions on cross-sectional dependence.17 It’s important to note that despite the similarities in dependency proportions, the actual connections among MSAs might differ due to the spatial coefficients relying on the distancebased weighting matrix, as discussed in Yang (2021). 2.6 Conclusion In this paper, we have introduced a Bayesian estimation approach to estimate the dynamic factor-augmented VAR model, allowing us to capture contemporaneous connectedness through a graphical representation of cross-sectional dependencies. This representation provides a distinct structure of cross-sectional dependence, which could have significant implications for understanding the interconnectedness within the data. 16It is calculated as (260 positive + 19 negative coefficients) / (338 in the reduced sample + 39 completely isolated). 17It is calculated as (147 positive + 18 negative coefficients) / (338 + 39). 80 Our approach entails the estimation of latent factors through principal component analysis, selecting a specific number of factors a priori. These factors are then extracted using the Gibbs sampling method, particularly via the forward-filtering backward-sampling algorithm. We employ the fractional Bayes factor within a Bayesian graphical model selection framework, focusing on the graphical VAR model. The robustness of our estimation methodology is validated by Monte Carlo simulations that highlight our approach’s capacity to identify weak cross-sectional dependencies, particularly in the context of U.S. housing market data. This paper not only demonstrates the effectiveness of our proposed techniques but also sets the stage for further research. Future studies could delve into the theoretical aspects of the Gibbs sampler’s convergence, investigating the specific conditions under which convergence is guaranteed, given its general propensity to provide posterior estimates that closely approximate the true underlying factors. Additionally, an empirical Bayesian exploration into methodologies for accurately determining factor strengths, utilizing the local false discovery rate, presents an intriguing avenue for advancing the field. 81 Chapter 3 High-dimensional Bayesian Nonparanormal Dynamic Conditional Model With Multivariate Volatility Applications1 3.1 Introduction Volatility modeling plays an important role in financial econometrics. Early research focused on univariate volatility models, such as autoregressive conditional heteroskedasticity (ARCH, Engle 1982) and generalized ARCH (GARCH) proposed by Bollerslev (1986). More recently, with increasing interconnection in financial markets and the advent of high-frequency, high-dimensional data have necessitated the transition toward multivariate volatility models. These models capture the dynamic correlations and covariances among multiple assets, providing a more suitable framework for portfolio selection and optimization (Ledoit and Wolf, 2003, 2017), testing capital asset pricing models (Sentana, 2009), and risk management strategies (Fan et al., 2012). However, the estimation of covariance matrices in this multivariate 1 I am especially grateful to my adviser Hashem Pesaran, Cheng Hsaio, and Timothy Armstrong for their continuous advice and support. All mistakes are my own. 82 context is subject to the curse of dimensionality, where the number of parameters can exceed the number of available time series observations. To address the challenges associated with the estimation of covariance matrices in highdimensional contexts, a large body of literature that focuses on improving the estimation of large covariance matrices through shrinkage and regularization methods (Ledoit and Wolf, 2004a,b) has emerged. These techniques, which began with linear shrinkage and have since incorporated nonlinear variants (Ledoit and Wolf, 2012, 2017, 2020, 2022; Engle et al., 2019; Nard et al., 2022), have proved important for error minimization and portfolio optimization. Researchers have also extended these methods to dynamic models, such as the DCC with a linear (DCC–L) and nonlinear (DCC–NL) shrinkage framework (Engle et al., 2019; Pakel et al., 2021), and embraced sparsity-promoting approaches like banding and thresholding (Bickel and Levina, 2008; Rothman et al., 2009; Cai and Liu, 2011; Bailey et al., 2019b). Fan et al. (2013) further assert the effectiveness of regularization for precision matrix estimation under certain factor structures with consistent convergence rates. Additionally, dynamic covariance models (DCMs) that leverage kernel smoothing have been introduced (Chen and Leng, 2016) alongside semiparametric extensions for high-dimensional settings (Chen et al., 2019). Poignard and Asai (2023) explored high-dimensional variance-covariance modeling within the multivariate stochastic volatility (MSV) framework using a penalized OLS framework without relying on Monte Carlo Markov Chain (MCMC) by introducing a vector autoregressive and moving-average (VARMA) representation for MSV. This paper studies the estimation of conditional precision matrices in high-dimensional settings within the DCC framework, essential for financial applications that require estimation of the inverse of the covariance matrix2 . In carrying out this research, we face two difficulties. First, we cannot employ the conventional DCC framework by merely inverting its components because it results in a matrix that is, at best, semi-positive definite. Furthermore, inverting the 2To clarify, we differentiate between two types of matrices: those that are static and those that are dynamic. For simplicity, we will call the dynamic ones ‘conditional matrices’ and the static ones ‘unconditional matrices.’ This distinction is important because it reflects whether the matrix elements are fixed (without subscript t) or change over time (with subscript t). 83 unconditional component is ill-conditioned when the number of variables (N) is larger than the number of time series observations (T). Second, suppose we can extract conditional precision matrices from the DCC framework. In that case, the resulting conditional partial correlations derived from the conditional precision matrices only interpret conditional independence within the restriction of a multivariate Gaussian distribution, an assumption which is rarely met in finance and macroeconomic analyses. Moreover, even if we use the rank transformation to convert the dataset to follow a Gaussian distribution, such as the Copula, quantile, and standard nonparanormal model, it does not retain the scale of the conditional precision matrices. This scale distortion leads to an identification problem, where multiple precision matrices may correspond to the same inverse correlation matrix. Accordingly, our focus is on exploring the advantages of directly estimating precision matrices, particularly in addressing the identification challenges, as opposed to relying on the inversion of estimated conditional covariance matrices. We make two main contributions to the literature of high-dimensional multivariate volatility modeling, namely the estimation of conditional precision matrices exploring the possibility of conditional dependence. The first is the development of a Bayesian method for the estimation of the conditional precision matrix within the high-dimensional DCC–multivariate GARCH (MGARCH) framework instead of inverting the estimated conditional covariance matrix. We use a Bayesian approach that samples from the Wishart distribution to bypass the challenges of inverting semi-positive definite matrices. The estimation is carried out using the Metropolis-Hastings within the Gibbs sampling algorithm. While DCC–MGARCH models perform well for a moderate number of assets (typically fewer than 25), they struggle with larger datasets due to the computational demands of estimating the unconditional precision matrix, Ω. For a dataset with T time periods and N assets, using a sample covariance matrix necessitates estimating N(N − 1)/2 parameters, which is prone to considerable error unless T ≫ N. To address this problem, we estimate Ω using the Cholesky decomposition Ω = LL′ , where L is a lower triangular matrix. In the DCC framework, we apply a horseshoe prior to 84 introduce sparsity on Ω, as outlined in the approach by Neville et al. (2014). Additionally, we incorporate block updates in our proposed distributions to achieve a balance between computational efficiency and accuracy. Our second contribution is to provide estimates of both conditional precision, Pt = (pij,t), and partial correlation Ψt = (ψij,t) matrices aimed at achieving volatility interconnectedness interpretation. We achieve this utilizing a Bayesian nonparanormal framework that applies a rank transformation, converting non-Gaussian distributions to approximate Gaussian ones. The standard nonparanormal estimation process approximates an unknown data distribution with a Gaussian distribution by transforming original variables using a smooth, monotonic function, thus achieving a Gaussian approximation. In contrast, the rank transformation approach, which we employ, simplifies this process. It bypasses the intensive computation required to estimate transformation functions. Moreover, by providing the DCC-MGARCH structure, we can identify Pt from the inverse correlation matrix St = (sij,t) given the conditional variances as specified by the univariate GARCH procedures. Under Gaussian distributions, the relationship between Pt and St can be established using the diagonal elements of the conditional covariance matrices: pij,t = sij,tp (1/σii,t)(1/σjj,t), where σii,t is the conditional variance modeled on a univariate GARCH process for security i (Rue and Held, 2005, p.26). Thus, with precise estimation of univariate GARCH processes for each security variable and a Gaussian approximation, we obtain conditional precision matrices suitable for the DCC-MGARCH model. This method circumvents the inversion of the entire conditional covariance matrices, a step often required under conventional DCC approaches. Nonparanormal model, nonparanormal estimation was introduced as a semiparametric extension of Gaussian graphical models with the capability to capture non-Gaussian marginal distributions through smooth, monotonic transformations (Liu et al., 2009). Subsequent developments in this line of inquiry have addressed high-dimensional settings in the Bayesian nonparanormal graphical model (Mulgrave and Ghosal, 2020, 2022, 2023). Therefore, our paper can add to the literature on high-dimensional multivariate volatility, introducing a 85 Bayesian nonparanormal approach to approximate unknown distributions to normality and develop rank likelihood for constructing sparse precision matrices. This approach synergies with the DCC framework, which comprises a univariate GARCH process for volatility prediction and a correlation estimation based on standardized residuals. We conduct a number of Monte Carlo (MC) simulations to evaluate the performance of the Bayesian nonparanormal conditional estimator and compared it with existing methods for estimating conditional precision matrices from the literature. Our study involves two simulation designs: the first design generates conditional precision matrices with a narrow range of eigenvalues, suggesting numerical stability, while the second design produces matrices with widely varying eigenvalues, typically pointing to numerical instability. In the first simulation design, our Bayesian estimator outperforms DCC-L, DCC-NL, Gaussian, and t-Copula models (Patton, 2009) in estimating conditional precision matrices for 20 different sample size combinations, T ∈ {50, 100, 150, 200, 250} and N = {25, 50, 100, 125}, in terms of spectral and Frobenius norms. Although the extent of this outperformance, measured by the ratio of norm loss averages (RNLA), diminishes as sample size, T, and the number of variables, N, increase, our methods show improved performance over DCC-L and DCC-NL in estimating conditional correlation matrices when sample size and the number of variables increase, but not against Gaussian and t-Copula models. In the second MC design, all considered estimators, including our own, struggle with estimating conditional precision matrices due to instability in the eigenvalue distribution. However, for inverse correlation matrices, our method slightly outperforms DCC-N and DCC-NL as N and T increase, and outperforms both the Gaussian and t-Copula methods. As in the first design, this advantage diminishes with larger T values compared to the Gaussian and t-Copula for all sample size combinations. We employ our proposed method in two empirical applications: daily foreign stock price indices and returns on blue chip stocks selected based on the market capitalization from the Standard and Poor (S&P) 500 in the U.S. equity market. In analyzing foreign stock 86 indices over the period January 4, 1991, to August 31, 2023, we focus on deciphering their complex interdependencies through conditional partial correlations. We evaluate conditional correlations to understand how pairs of variables are linked under financial market disruptions. However, conditional partial correlations are crucial for deeper insights into international market dynamics. They isolate specific relationships by excluding global and region-specific influences, clarifying interactions between specific pairs of individual stocks or sectors. It is also important in conditional covariance analysis of these indices, as it separates direct variable relationships from the myriad of influencing local and global factors, thereby offering a different viewpoint in understanding the variables’ interactions. We find that the range of the conditional partial correlations is narrower than the associated conditional correlations. Furthermore, the overall average of the pair-wise relations is smaller than the conditional correlations. Under financial market disruptions, the relationship of the indices exhibits weakened mean values and reduced variances, denoting a more stable behavior even under stressful market conditions. When examining selected securities based on the market capitalization in S&P 500 over the period September 2, 2016, to July 31, 2023, we test asset pricing theory where, in an ideal, frictionless market, a financial asset’s excess return is determined by the product of its factor loadings and the excess returns of corresponding risk factors, plus a random component. Testing the theory involves estimating precision matrices used in Wald-type statistical tests for evaluating asset pricing models. We assess the robustness of our method by applying it to various test statistics in the context of the consumption asset pricing model (CAPM) and Fama-French 5 factor model over different time frames, including periods of market turmoil, and by comparing it with the Jˆ α test by Pesaran and Yamagata (2023), which does not involve the precision matrix. Using our proposed conditional precision matrices, we find that the Wald statistic rejects H0 : α = 0, mainly during periods of major market disruptions, COVID-19, and FED’s inflation containment rate hikes periods. All test statistics show similar results for the market disruption periods except for COVID-19. 87 Our paper adds to the literature on the Copula model and nonparanormal estimation. In the context of the Copula model, its application has become recognized as a versatile framework for capturing a diverse range of dependency structures, encompassing both linear and tail dependencies (Patton, 2009; Aas et al., 2009; Anatolyev and Pyrlik, 2022). Subsequent advancements in the domain have yielded specific models such as the pair copula constructions (PCC) (Muller and Czado ¨ , 2019a), vine copulas (Muller and Czado ¨ , 2019b), and Gaussian copula graphical models (GCGM) (Pitt et al., 2006; Dobra and Lenkoski, 2011; Liu et al., 2012; Mohammadi et al., 2017). These specialized models facilitate the scrutiny of ultra-highdimensional data with complex interdependencies. Furthermore, the utility of copula methods has been expanded to accommodate dynamic dependencies through the incorporation of DCC frameworks (Kim and Jung, 2016; Oh and Patton, 2016, 2017, 2023). However, despite their versatility, copula models often presuppose rigid functional forms for dependencies, which may misalign with empirical phenomena. The rest of the paper is organized as follows: Section 3.2 presents the econometric framework. In Section 3.2.1, we set out the dynamic conditional framework, while in Section 3.2.2, we discuss the process of rank transformation and the rank-likelihood in a Bayesian framework. Section 3.2.3 describes how we implement Gibbs sampling to estimate a sparse, unconditional precision matrix. Section 3.3 describes the estimation of the Bayesian nonparanormal dynamic conditional partial correlation alongside a GARCH(1,1) model. Section 3.1 addresses the Bayesian estimation procedure for the GARCH(1,1) model, while Section 3.2 outlines a specific algorithm for computing the posterior distribution. Section 3.4 details the MC simulation designs and provides a summary of the main findings for the proposed Bayesian estimators. In Section 5, we apply our estimation framework to two sets of empirical data: daily foreign stock price indices (presented in Section 3.5.1) and daily returns on securities selected from S&P 500 (presented in Section 5.2). Section 3.6 summarizes the research and its implications. Supplementary material, including technical specifics and additional empirical visualizations, is presented in the appendix. 88 3.2 Bayesian Nonparanormal Dynamic Conditional Model 3.2.1 Dynamic conditional framework Let yit be the return of financial security i at time t, comprised of the rate of price change plus dividends if applicable. Define yt = (y1t , y2t , . . . , yN t) ′ as the vector of returns for N securities at time t, and Ft as the information set available up to time t for t = 1, 2, . . . , T. The return process is modeled as follows: yt = µt + Σ 1/2 t ϵt , for t = 1, . . . , T, (3.2.1) where µt = E(yt |Ft−1) = (µ1t , µ2t , . . . , µN t) is the conditional mean vector, and Σt = {σij,t} N i,j=1 = Cov (yt |Ft−1) represents the N × N positive-definite conditional covariance matrix. The error vectors ϵt are assumed to be i.i.d. with E(ϵt |Ft−1) = 0 and E(ϵtϵ ′ t |Ft−1) = IN , where IN is the identity matrix of order N. Following the literature, Σt is expressed as Σt = D 1/2 t RtD 1/2 t , (3.2.2) where D 1/2 t = diag{Σ 1/2 t } = diag{σ 1/2 11,t, . . . , σ 1/2 NN,t} contains the conditional standard deviations, Rt = S −1 t , and St is the conditional inverse correlation matrix. Conditional variances, σii,t, is assumed to follow GARCH(1,1) processes: σii,t = ai + θ0ir 2 i,t−1 + θ1iσii,t−1, (3.2.3) with parameters (ai , θ0i , θ1i), where ai > 0, θ0i ≥ 0, θ1i ≥ 0, θ0i + θ1i < 1 for i = 1, . . . , N, and rit = yit − µt . 3 3The restriction ai > 0 is required for our analysis due to the use of Bayesian GARCH(1,1) estimation, where ai is sampled from a normal distribution in our estimation process. This contrasts with the standard GARCH(1,1) process, where the restriction is unnecessary owing to the steady-state condition, ai = σi0(1 − θ0i − θ1i), where σi0 is the long-term variance. 89 Remark 1. The GARCH(1,1) specification in (3.2.3), requires positive parameters to ensure positive conditional variance and incorporates short memory and symmetric volatility reactions. We address the model’s tendency towards skewness and heavy tails in the error distribution through a Bayesian estimation approach, utilizing a multivariate t-distribution with ν degrees of freedom as suggested by Fioruci et al. (2014). While aware of alternative GARCH variants such as exponential GARCH (EGARCH; Nelson 1991), quadratic GARCH (QGARCH; Sentana 1995), threshold GARCH (Chen and So, 2006; Chen et al., 2008), and fractionally integrated GARCH (FIGARCH; Baillie et al. 1996) that might mitigate the GARCH(1,1) model’s limitations, our methodological choice remains justified by our primary analytical focus. Our primary focus is on estimating the conditional precision matrices, Pt , and the conditional inverse correlation matrices, St , instead of deriving the conditional covariance and conditional correlation, Rt = S −1 t , matrices from Ψt . The conditional partial correlation matrix, Ψt is also of special interest as they provide a model of appropriate measure of pair-wise conditional dependence, which is derived as ψij,t = −sij,t, i ̸= j. Contrary to the conventional application of the DCC procedure to the conditional covariance matrix, applying the procedure to the conditional precision matrix has the added advantage of bypassing the inversion of the estimated inverse conditional covariance matrix, thereby facilitating portfolio optimization and the validation of asset pricing theory. The standard DCC–MGARCH model (Engle, 2002), as specified in Equations (3.2.1)– (3.2.3) using S −1 t = Rt is defined as Rt = diag{Qt} −1/2Qtdiag{Qt} −1/2 , where Qt is an N × N symmetric positive-definite matrices defined by, Qt = (1 − a − b)Σ + aut−1u ′ t−1 + bQt−1, (3.2.4) 90 ut = D−1 t (yt − µt), Σ is the unconditional covariance matrix of ut , a > 0, b > 0 and a + b < 1. However, applying this framework to the precision matrix leads to complications, as the matrix ut−1u ′ t−1 is at most semi-positive definite and thus non-invertible. A varying correlation MGARCH (VC–MGARCH) model (Tse and Tsui, 2002) circumvents this issue by estimating conditional correlations, substituting ut−1u ′ t−1 with the sample correlation matrix over (ut−1, . . . , ut−M), where M ≥ N. Similarly, the dynamic correlation MSV (DC–MSV) model (Asai and McAleer, 2009) employs a Wishart process as an alternative to ut−1u ′ t−1 while maintaining the standard model’s conditional covariance matrices. Billio et al. (2003) propose a block-diagonal structure to restrict the dynamic to be equal only among groups of variables. As an extension, the clustered correlation MGARCH model (CC-MGARCH) proposed by So and Yip (2012) integrates these group-specific effects into the estimation of the direct conditional correlation matrix, where a Bayesian model selection selects the cluster. Although this adjustment increases the model’s flexibility to capture various dependencies, it also significantly increases the number of unknown parameters when combined with the conditional variance structure outlined in Equation (3.2.3). Such complexity present new challenges for estimation methodologies and complicates interpretations, particularly for large-dimensional datasets. In these large-dimensional cases, Engle et al. (2019) and De Nard et al. (2021) follow the same framework as presented in (3.2.4) incorporating regularization techniques only for estimation of unconditional covariance matrix Σ using shrinkage methods. In our approach, we consider directly the conditional inverse correlation matrix, St = {sij,t}, in (3.2.2), and set St = diag{Pt} −1/2Ptdiag{Pt} −1/2 , (3.2.5) where Pt represents N × N symmetric positive-definite conditional precision matrices given by Pt = (1 − a − b) Ω + aΞt−1 + bPt−1, (3.2.6) 91 with the corresponding parameters a > 0, b > 0 and a + b < 1. As noted above, we estimate the conditional precision matrix by using the precision matrix of the lagged residual, Ξt−1, to avoid the issue of non-invertibility. Furthermore, we incorporate a shrinkage prior in the sampling process of the unconditional precision matrix, Ω. To obtain Ω, we begin with the transformation of raw devolatilized residuals, ut = D−1 t (yt−µt), aiming to approximate a multivariate Gaussian distribution. This transformation, facilitated by the Nonparanormal rank transformation detailed in Section 2.2 below, converts the raw devolatilized residual matrix U = (u1, u2, . . . , uT ) ′ into a rank transformed devolatilized residual matrix Z = (z1, z2, . . . , zT ) ′ following the multivariate normal distribution with 0 mean and the correlation matrix C which has an inverse correlation matrix C−1 = diag{Ω} −1/2Ωdiag{Ω} −1/2 . The objective of this transformation is to achieve approximated Gaussian standardized residuals, thereby imbuing the resultant partial correlation matrices with meaningful interpretability concerning conditional dependence—one of the focal points of our exploration into volatility dependence in this paper. Then, the unconditional precision matrix, Ω, is sampled from the posterior distribution regularized with horseshoe priors using Gibbs sampling, as elaborated in Section 3.2.3 below. We obtain the lagged residual precision matrix Ξt−1, in (3.2.6) by drawing samples from a conjugate posterior distribution following a Wishart distribution: Ξt−1 ∼ W T + 3,(IN + zt−1z ′ t−1 ) −1 , (3.2.7) where T + 3 is the degrees of freedom, and (IN + zt−1z ′ t−1 ) −1 is the scale matrix. This distribution is particularly suited for our purposes as it ensures that the sampled precision matrices are symmetric and positive-definite. The conjugacy with the approximated normal distribution of zt , as made possible by the nonparanormal transformation, permits closedform expressions for the posterior. The Wishart distribution, a probability distribution over symmetric positive-definite matrices, inherently captures the structure and dependencies in 9 the data. We, therefore, sample the conditional precision matrix for the conditional covariance structure of the rank-transformed residuals, zt−1. This matrix reflects the dynamic volatility and interrelationships among zt−1 variables. The inclusion of IN in the scale matrix in (3.2.7) ensures its positive definiteness and invertibility, serving as regularization to prevent overfitting by limiting excessive adaptation to recent observations. It also establishes a baseline assumption of independent variables with unit variance. Building upon this framework, we can define the conditional partial correlation matrix Ψt = {ψij,t} N i,j=1, which is closely linked to St , by setting ψij,t as follows: ψij,t = −sij,t/ √sii,tsjj,t, for i ̸= j 1 for i = j . (3.2.8) Through the application of the nonparanormal rank transformation, one can interpret the elements of Ψt as indicators of conditional dependence. This transformation assures the adaptability of the standard Gaussian-based interpretation of partial correlations to our framework, hence allowing the inference of conditional dependence from the observed data. Remark 2. If the devolatilized residuals do not follow a multivariate normal distribution, the partial correlations cannot be straightforwardly interpreted as measures of conditional dependence due to the potential nonlinearity and skewness in the relationships between variables. The Gaussian assumption ensures that the partial correlations describe linear relationships and that the measure of zero partial correlation corresponds to conditional dependence. In the absence of multivariate normality, reliance on partial correlations for such interpretations necessitates prudence. Alternative analytical strategies, such as the employment of copula models or the adoption of nonparametric measures like distance correlation and mutual information, may be necessary to attain robust inference. 93 3.2.2 Bayesian rank transformation and likelihood To infer the unconditional precision matrix Ω, we transform the raw devolatilized residuals U = (u1, u2, . . . , uT ) ′ = (U1, U2, . . . , UN ) into the rank transformed devolatilized residuals Z = (z1, z2, . . . , zT ) ′ = (Z1,Z2, . . . ,ZN ) ∼ N(0, C), where C = S −1 . We delineate the set B = {Z ∈ R T ×N : ui,tr−1 < ui,tr < ui,tr+1 }, given the monotone and increasing transformation functions gi , such that i = 1, 2, . . . , N, and t = 1, 2, . . . , T with a rank of the observations r = 2, 3, . . . , T − 1. The rank-transformed residuals Z are restricted to reside within this set. Utilizing Gibbs sampling as described in Algorithm 2, Appendix A, we can obtain Z. The rank-likelihood L RL(Z) is then given by L RL (Z) = Pr(Z ∈ B|C, g1, g2 . . . , gN ) = ˆ B p (Z|C) dZ = Pr(Z ∈ B|C), (3.2.9) where g1, g2 . . . , gN are transformation functions. This likelihood is exclusively contingent on C and is devoid of dependency on specific transformation functions (Hoff, 2007). Example 1. (Hoff, 2007) Suppose we are interested in estimating the parameter θ but there is also a nuisance parameter g that we are not interested in. We find a statistic z = t(u) which is a function of our observed data u, where the distribution of t(u) depends only on θ and is independent of the nuisance parameter g. Then, we can have the relationship as follows: p (u|θ, g) = p (t(u), u|θ, g) = p (t(u)|θ) · p (u|t(u), θ, g). This expression implies that the probability distribution of observed data u, given both θ and g, can be decomposed as the distribution of the statistic t(u) and the distribution of u, given t(u), θ, and g. Since t(u) is independent of g, we can focus on p(t(u)|θ) for estimating θ, ignoring the nuisance parameter g in the estimation process. To achieve (3.2.9), we can rewrite Z as (Z1,Z2, . . . ,ZN ) = (g1(U1), g2(U2), . . . , gN (UN )) ∼ N(0, C). Then, since Z ∈ B occurs whenever U is observed, the raw devolatilized residuals 94 likelihood of U is p(U|C, g1, g2, . . . , gN ) = p (Z ∈ B, U|C, g1, g2, . . . , gN ) = Pr(Z ∈ B|C) · p(U|Z ∈ B, C, g1, g2, . . . , gN ). Therefore, Pr(Z ∈ B|C) alone can be used to estimate C, as it is dependent only on the parameter of interest, C, rather than the transformation functions, g1, g2, . . . , gN . Then, using the reparameterization in terms of the non-identifiable Ω, but focusing on the identifiable unconditional inverse correlation matrix, S, we can derive the posterior distribution Pr(S|Z ∈ B) ∝ p(S)p(Z ∈ B|S). (3.2.10) This rank-based nonparanormal approach deviates from both the Copula and the standard nonparanormal frameworks by simplifying the overall modeling process. The Copula model mandates a two-step transformations. Initially, Ui , for i = 1, 2, . . . , N are transformed into the uniform margins U˜ i via their respective empirical cumulative distribution functions (CDFs), U˜ i = Fi(Ui) ∼ U(0, 1) for i = 1, . . . , N. Subsequently, these uniform variables are converted back to standard normal margins by employing the inverse standard normal CDF, Φ −1 , denoted as Zi = Φ−1 (U˜ i). The joint likelihood L CP(Z) in the Gaussian copula model is then constructed as the product of individual marginal likelihoods and the copula density, represented as L CP (Z) = Y N i=1 f (Zi) · c (Z; C), (3.2.11) where f(Zi) is the marginal density of Zi , c(Z; C) denotes the copula density, and C is the correlation matrix capturing the dependencies among the Zi ’s. The likelihood function in this approach decomposes into marginal and copula components. On the other hand, the standard nonparanormal model extends the Copula model’s approach of formulating likelihood by introducing smooth, invertible functions to transform the original variables 95 before the copula transformation, leading to a likelihood formulation that also involves these transformation functions gi , resulting in transformed observations gi(uit). The core premise of the standard nonparanormal framework is that these transformed variables approximate a multivariate Gaussian distribution, obviating the need for additional transformations to uniform or standard normal margins. Under this Gaussian assumption, the joint likelihood L NP(Z) for the transformed dataset Z, where zit = gi(uit), is formulated as L NP (Z) = Y N i=1 Y T t=1 f (zit) · |J(gi(uit))| , (3.2.12) where f(zit) denotes the Gaussian density function for the transformed variables zit and |J(gi(uit))| is the Jacobian determinant of the transformation gi at uit. This likelihood construction directly models the dependencies among variables and across time, capitalizing on the Gaussian approximation. The rank-based nonparanormal model serves as an intermediary between the Copula and traditional nonparanormal frameworks. It avoids the Copula model’s requisite partitioning of marginal and Copula components, thereby streamlining the likelihood formulation. At the same time, it circumvents the standard nonparanormal model’s need for estimating smooth transformation functions gi , thereby reducing model complexity. In short, this approach amalgamates the respective merits of both models while attenuating their individual complexities and assumptions. We consider a new approach by integrating the DCC framework with the rank-based nonparanormal model, as proposed by Mulgrave and Ghosal (2023). We tackle the issue of nonidentification in the rank-likelihood approach to derive the identifiable conditional precision matrix Pt = (pij,t). This involves reparameterizing the model to use the non-identifiable Ω, and computing the non-identified conditional precision matrix P˜ t using Equation (3.2.6). We, then, focus on posterior inference for the identifiable conditional inverse correlation matrix St = (sij,t), defined as St = diag{P˜ t} −1/2P˜ tdiag{P˜ t} −1/2 . Since the rank-likelihood is 96 unaffected by scale transformations, both non-identifiable and identifiable models produce the same posterior distributions, as indicated in Equation (3.2.10). This allows us to sample from the posterior distribution of St without the need to estimate transformation functions. Given St , we can obtain the conditional variance σii,t by employing a GARCH(1,1) process for each security. Therefore, we can find the identifiable conditional precision matrix Pt , where pij,t = sij,tq (1/σii,t)(1/σjj,t) (3.2.13) under the Gaussian distribution (Rue and Held, 2005, p.26). 3.2.3 Using Gibbs sampling to obtain the unconditional precision matrix We obtain the unconditional precision matrix Ω by sampling the Cholesky decomposition Ω = LL′ , where L is a lower triangular matrix. The lower triangular elements of Ω are defined as Ωij = Pj k=1 LikLjk for j = 1, 2, . . . , N and i = j + 1, j + 2, . . . , N. The elements of L are determined such that: Lij = q Ωii − Pi−1 k=1 L 2 ik, if i = j, 1 Ljj Ωij − Pj−1 k=1 LikLjk , if i > j, 0, if i < j. Then, the density of Z ∼ N(0, Σ) is p(Z) = (2π) − N 2 |Ω| 1 2 exp − 1 2 ZΩZ′ and the conditional distribution of Zj given Zi>j is Gaussian, with the mean and variance derived from the elements of L: for j = 1, 2, . . . , N, Zj |Zi>j ∼ N X j−1 i=1 − Lij Ljj Zi , 1 L 2 jj ! , 97 where −Lij/Ljj represents the regression coefficient of Zj on Zi , and 1/L2 jj is the conditional variance. Therefore, we can represent the Cholesky decomposition with the regression coefficient, Ωij = X j k=1 LikLjk = X j k=1 βikβjkωk, (3.2.14) where βij = −Lij/Ljj represents the regression coefficients, and ωj = 1/σ2 j = L 2 jj denotes the precision of the multivariate Gaussian distribution. Using (3.2.14), we can formulate the regression problem (Rue and Held, 2005, p. 35). By employing the rank transformed devolatilized residuals Z ∼ N(0, Σ), where Σ is the unconditional covariance matrix, we can have Zj = X i>j βijZi + ηj , ηj ∼ N 0, ω−1 j (3.2.15) for j = 1, 2, . . . , N and i = j + 1, j + 2, . . . , N. This formulation ensures the properties of symmetry and positive definiteness in the precision matrix. Based on Equation (3.2.15), the likelihood function takes the form Zj |Zi>j , βi>j , σ2 j ∼ N Zi>jβi>j , σ2 j I , (3.2.16) where Zi>j refers to the matrix constructed by the columns of Z greater than j, and βi>j = (βj+1,j , βj+2,j , . . . , βN,j ). In this regression model, there is an intrinsic sequence to the variables. To accommodate an order between variables, we impose a sparsity constraint on the rows of the lower triangular matrix, following the approach outlined by Mulgrave and Ghosal (2022). This method ensures that the likelihood of non-zero elements remains uniform across rows, dictated by the ratio c N √ i , where c is an tuning parameter.4 In this paper, we set c = 0.1 for all analyses to adapt to 4The Cholesky L of the unconditional precision matrix Ω depends on the row index because Pr(Ωij ̸= 0) = Pr( PN k=1 likljk ̸= 0) = 1 − (1 − ρiρj ) min(i,j) , where ρi is the probability of non-zero entry in the ith row of L. Mulgrave and Ghosal (2022) set ρi = c/(N √ i) to tune the sparsity constraint. the high-dimension settings in the multivariate volatility model, thereby reducing the number of nonzero elements in each row of the unconditional precision matrix. For the regression coefficients βij , we employ a horseshoe prior, as delineated in Neville et al. (2014), with the global scale parameter λ˜ j approximating the probability of a nonzero element. Characterized by its concentration around zero and tails resembling a Cauchy distribution, the horseshoe prior offers robust variable selection and the ability to capture extreme values. This selection stands in contrast to other commonly used priors such as the Gaussian and Laplace priors, which are limited by their lighter tails and less effective variable selection capabilities. The spike-and-slab prior, while designed to induce sparsity (Li and McCormick, 2019; Mulgrave and Ghosal, 2020) can be computationally demanding and less apt at modeling heavy-tailed features. The G-Wishart prior (Mohammadi and Wit, 2015; Mohammadi et al., 2017), on the other hand, is tailored to capture structural sparsity in graph-based models but may not be optimal for handling heavy-tailed features. Hence, the horseshoe prior provides a balanced and effective choice for modeling the precision matrix’s sparsity and tail behavior of the attributes that are frequently observed in financial data. The combined application of sparsity constraints and carefully chosen priors induces a structured prior on Ω, subsequently influencing the prior on S. For a comprehensive exposition of the sparsity mechanism, the reader references to Mulgrave and Ghosal (2022). The specific algorithm employed for the implementation is elaborated is given by Algorithm 3, in Appendix A. 3.3 Bayesian nonparanormal dynamic conditional partial correlation – GARCH model estimation Section 3.1 details the Bayesian GARCH(1,1) model’s application within the DCC framework, emphasizing conditional variance calculation and the use of skewed distributions for modeling asymmetry and tail characteristics. Section 3.2 discusses the strategies for posterior 99 computation, including optimization for initial parameters and an adaptive MetropolisHastings MCMC approach to address the computational challenges in large-dimensional settings. 3.3.1 Bayesian GARCH(1,1) estimation The conditional likelihood function, corresponding to Equation (3.2.1), is given by l(Θ|Y) = Y T t=1 |Σt | −1/2 pϵ P −1/2 t yt = Y T t=1 "Y N i=1 σ −1/2 ii,t # S −1 t −1/2 pϵ (DtS −1 t Dt) −1/2yt , where pϵ represents the joint density function of ϵt , parameterized by {a1, θ10, θ11, . . . , aN , θN0, θN1, a, b} in Equations (3.2.6) and (3.2.3). We adopt the multivariate skewed distributions characterized by a shape parameter γi > 0, which quantifies the degree of asymmetry as proposed by Bauwens and Laurent (2005): pϵ (ϵt |γ) = 2N Y N i=1 γiσγi 1 + γ 2 i ! Γ ((ν + N)/2) Γ (ν/2) [π(ν − 2)]N/2 1 + ϵ ∗′ t ϵ ∗ t ν − 2 − ν+N 2 , (3.3.1) and ϵ ∗ t = (ϵtσγi + µγi )/γi if ϵt ≥ −µγi /σγi (ϵtσγi + µγi )γi if ϵt < −µγi /σγi , where Γ(·) denotes the Gamma function, µγi and σ 2 γi are defined by µγi = Γ((ν − 1)/2)√ ν − 2(γ − 1/γ) √ πΓ(ν/2) , σ 2 γi = (γ 2 i + 1/γ2 i ) − µ 2 γi − 1, 100 and ν is the degree of freedom to (tail) parameter. This methodology decouples the influence of skewness and tail characteristics while anchoring the mode at zero. The shape parameter, γi , governs the distribution of mass on either side of the mode, whereas the tail parameter, ν, modulates the distribution’s skewness. γi = 1 yields symmetric distributions, while γi > 1 and γi < 1give right and left skewness, respectively. As ν → ∞, the distribution converges to a standard multivariate normal distribution, as demonstrated by Fern´andez and Steel (1998). For the GARCH(1,1) coefficients in Equation (3.2.3), we follow the prior distributions proposed by Ardia (2008), for i = 1, 2, . . . , N, ai ∼ N µai , σ2 ωi I (ai > 0), θ0i ∼ N µθ0i , σ2 θ0i I (0 < θ0i < 1), θ1i ∼ N µθ1i , σ2 θ1i I (0 < θ1i < 1), where I(A) denotes an indicator function with I(A) = 1 if A holds and zero otherwise. Analogously, the priors for parameters a and b in Equation (3.2.6) are a ∼ N µa, σ2 a I (0 < a < 1), b ∼ N µb, σ2 b I (0 < b < 1). For the skewness and tail parameters γi and ν, the priors are specified as: γi ∼ N µγi , σ2 γi I (γi > 0), ν ∼ N µν, σ2 ν I (ν > 2). 3.3.2 Posterior computation We generate random samples of Pr(St |Z ∈ D) by employing the following steps. To efficiently explore the parameter space, an optimization problem targeting the log-posterior distribution is initially solved to acquire starting parameters. If the resultant Hessian matrix is not positive-definite, an adaptive Metropolis-Hastings Markov Chain Monte Carlo MCMC) strategy is employed to construct the proposal distribution. The proposal distribution is a key component that suggests the next point in the parameter space to explore. We first run using an initial proposal distribution, a simple Gaussian with a predetermined covariance matrix. The results of this ‘pilot run’ are then used to update the proposal distribution in two ways: First, we recalculate the covariance matrix of the proposal distribution to align with the empirical covariance of the sampled points, thereby better reflecting the shape and orientation of the target distribution. In this step, we aim to adjust the acceptance rate between 15% and 50%. Second, by adjusting the step size (scale) of the proposal distribution, which is achieved by multiplying the empirical covariance with a parameter based on the acceptance rate. This adjustment aims to optimize the rate at which new proposals are accepted to go within a range of 20% to 50%. We implement an initial ‘pilot run’ for sample generation, followed by an adaptation phase during which the proposed distribution is refined based on the empirical covariance matrix of the observed samples. These adaptations made iteratively or after a set number of iterations, are designed to improve the exploration of the parameter space, ensuring more efficient and effective convergence to the target distribution. The adapted proposal distribution and step-size are then utilized in the main MCMC run, as detailed in Algorithm 1. In large-dimensional settings where the number of variables exceeds 25, the computational burden of estimating more than 100 parameters becomes prohibitive. To ameliorate this, we adopt an approach akin to that delineated in Pakel et al. (2021). Specifically, initial parameters are derived from optimizing univariate GARCH models for each variable rather than through a joint fit under the log-posterior distribution. Subsequent analysis focuses solely on parameters pertinent to conditional partial correlation matrices and associated error distributions. This strategy effectively minimizes the number of jointly estimated parameters and the requisite pilot simulations for establishing a viable proposal distribution. Nonetheless, adaptive phase introduces complexities stemming from the approximations inherent in the initial parameters. An alternative strategy for approximating the full posterior distribution 102 Algorithm 1 Bayesian nonparanormal dynamic conditional partial correlation–GARCH(1,1) 1: for 1=1:#Simulation do 2: Choose initial parameter values Θ0 = a1, θ10, θ11, . . . , aN , θN0, θN1, a, b for the model 3: Compute Log-Rank Likelihood: 4: Compute raw devolatalized residuals U. 5: Compute rank transformed devolatalized residuals Z from Algorithm 2. 6: Sample regularized unconditional precision matrix Ω from Algorithm 3. 7: Given Σ1 and Q1 based on Ω at t = 1, 8: for t > 1 do 9: Sample Ξt ∼ W((I + z ′ t−1zt−1) −1 , 3 + T). 10: Update Pt : Pt = (1 − a − b)Ω + aΞt + bPt−1. 11: Compute St = Pt/ p diag{Pt}diag{Pt} ′ . 12: Update diagonal elements of Σt as specified in (3.2.3). 13: Save the current St for later use. 14: Update log-rank likelihood based on the error distribution specified in (3.3.1). 15: end for 16: Compute log-posterior for St from the found log-rank likelihood and log priors for Θt as specified in Section 3.1. 17: Compute log-posterior for Θt from the log-posterior for St and log-Jacobian for Θt . 18: Given log-posterior, generate a new parameter set by pertrubing the current parameter set and decide whether to accept the new parameter set based on the Metropolis-Hastings criterion. 19: end for 20: Compute the conditional partial correlation matrix Ψt as Equation (3.2.8) and the conditional precision matrix Pt from the acquired MH-MCMC samples of St . involves Bayesian variational inference. This avenue is not explored in the present study because of the flexibility of the MCMC approach to resolve the model complexity. We leave this for future research. 3.4 Monte Carlo Experiments In this section, we investigate the small sample performance by our proposed Bayesian nonparanormal conditional estimator through Monte Carlo simulations. More specifically, our objective is to empirically verify whether the estimation of the conditional precision and inverse correlation matrices surpasses the approach of inverting estimated conditional covariance and correlation matrices. Our approach is benchmarked against several shrinkage 103 and Copula-based estimators commonly integrated within the DCC model. In this regard, we consider the DCC-L and DCC-NL estimators of Ledoit and Wolf (2004b); Pakel et al. (2021) and Engle et al. (2019), respectively, as well as Patton (2009)’s Gaussian Copula and t-Copula estimators. For each estimator, including our proposed approach, we fit univariate GARCH(1,1) models to normalize the return series for each i = 1, . . . , N. We begin by generating y (r) it , for r = 1, 2, . . . , R replications, as y (r) it = µ (r) t + u (r) it , for i = 1, 2, . . . , N;t = 1, 2, . . . , T, where µ (r) t i.i.d. ∼ U(0.5, 1.5). The errors u (r) it are generated as u (r) it = q h (r) it ε (r) it , for i = 1, 2, . . . , N;t = −50, −49, . . . , −1, 0, 1, . . . , T − 1, T, where u (r) i,−50 = q h (r) i,−50ε (r) i,−50, and h (r) i,−50 = σ 2,(r) i i.i.d. ∼ ( 1 2 + χ 2 (2) 4 ). We consider two different distributions for ε (r) it following ε (r) it i.i.d. ∼ N(0, 1), and ε (r) it i.i.d. ∼ scale · t(ν = 3), where scale = q ν−2 ν and ν is a degree-of-freedom. Define h (r) t and r (r) t as h (r) t = (h (r) 1t , h(r) 2t , . . . , h(r) N t) ′ and r (r) t = (r (r) 1t , r (r) 2t , . . . , r (r) N t) ′ = y (r) t − µ (r) t . The conditional variance h (r) t are generated as: h (r) t = W(r) + Θ (r) 0 r 2,(r) t−1 + Θ (r) 1 h (r) t−1 > 0, (3.4.1) where W(r) = (w (r) 1 , w (r) 2 , . . . , w (r) N ) ′ , w (r) i = (1−θ (r) i0 −θ (r) i1 )σ 2,(r) i , Θ (r) 0 = diag{θ (r) 10 , θ(r) 20 , · · · , θ(r) N0 } and Θ (r) 1 = diag{θ (r) 11 , θ(r) 21 , · · · , θ(r) N1 }. The parameters are sampled from θ (r) i0 i.i.d. ∼ U(0.1, 0.2), 104 θ (r) i1 i.i.d. ∼ U(0.5, 0.75), and σ 2,(r) i i.i.d. ∼ 1 2 + χ 2 (2) 4 , respectively. y (r) t are generated as: y (r) t = µ (r) t + n D (r) t o1/2 L −1 S,tε (r) t , where n D (r) t o1/2 = diag{h 1/2 11,t, . . . , h1/2 NN,t} and LS,t is a Cholesky factor of the conditional inverse correlation matrix S (r) t = diag{P (r) t } −1/2P (r) t diag{P (r) t } −1/2 , and P (r) t denotes a conditional precision matrix generated by simulation designs A and B below. We focus on the problem of estimating both the conditional precision matrix, Pt , and the conditional inverse correlation matrix, St , for two particular cases of dense conditional precision matrices. Monte Carlo design A: We consider a conditional precision matrix P (r) t , which is structured such that 20% of its eigenvalues are 1, 40% are 1/3, and the remaining 40% are 1/10. This particular composition of the precision matrix is reflected in its associated covariance matrix Σ (r) t , aligning with the eigenvalue distribution specified by Ledoit and Wolf (2012)— where 20% of the eigenvalues are 1, 40% are 3, and the remaining 40% are 10. For each time point t = −50, . . . , −1, 0, 1, . . . , T, we proceed by creating a dense lower triangular Cholesky factor L (r) P,t of P (r) t , with off-diagonal elements L (r) ij,t i.i.d. ∼ N(0, 1) for i > j and L (r) ii,t i.i.d. ∼ N(0, 0.1) for i = 1, . . . , N. Subsequently, we generate a diagonal matrix of eigenvalues Λ (r) t = diag{λ1tτ1, λ2tτ2, λ3tτ3}, where τ1 = 1⌈0.2N⌉ is a vector of ones with a length that comprises 20% of N, rounded up; τ2 = 1⌈0.4N⌉ is a vector of ones representing 40% of N, also rounded up; and τ3 = 1N−⌈0.2N⌉−⌈0.4N⌉ is a vector of the remaining proportion of N. The eigenvalues are set at λ1t = 1, λ2t = 1/3, and λ3t = 1/10. Finally, the conditional precision matrix P (r) t is constructed as P (r) t = L (r) P,tΛ (r) t L (r)′ P,t. 105 Monte Carlo design B: We generate a conditional precision matrix P (r) t with an underlying sparsity pattern for each time point t = −50, . . . − 1, 0, 1, . . . , T. We first obtain a dense Cholesky factor L dense,(r) t = (l dense,(r) ij,t ) following Monte Carlo design A. First, we generate a binary selection matrix Mlower,(r) t = (m lower,(r) ij,t ) i.i.d. ∼ Binomial(1, 1 − κ) for i > j, with κ determining the sparsity level. Subsequently, we construct a sparse Cholesky factor L sparse,(r) t = (l sparse,(r) ij,t ) through element-wise multiplication with the binary selection matrix: l sparse,(r) ij,t = l dense,(r) ij,t · m lower,(r) ij,t . The precision matrix P (r) t is then formed by P (r) t = L (r) sparse,tL (r)′ sparse,t + δΛ (r) t , where Λ (r) t is drawn from a Wishart distribution with T + 3 degrees of freedom and the scale matrix IN ; Λ (r) t i.i.d. ∼ W(T + 3,IN ). The Wishart distribution, a multivariate extension of the chi-squared distribution, is conventionally employed for modeling the precision matrices of multivariate normal distributions. Opting for T + 3 degrees of freedom results in reduced variability around the scale matrix IN , thereby conferring greater stability, albeit with potential amplification, to the diagonal elements of the precision matrix P (r) t due to the additive term δΛ (r) t . The selection of IN as the scale matrix normalizes the expected conditional precision structure to be (T + 3)IN and establishes a quasi-orthogonal structure in P (r) t . This choice preserves the structure of the sparse components, L (r) sparse,tL (r)′ sparse,t, even when the sampled Λt predominates over them. δΛt introduces a dense components into P (r) t , resulting a heterogeneous matrix with both sparse and dense parts. The parameter δ = 0.1 regulates the magnitude of the dense components, ensuring that the noise they generate does not significantly alter the structure. Remark 3. The construction of P (r) t in Monte Carlo design B requires further clarification, as it is not a sparse precision matrix. Instead, sparsity is imposed solely on its first term. The second term introduces denseness into the matrix, which is regulated by the parameter δ. This hybrid structure is motivated by extant literature that questions the empirical validity 106 of sparsity assumptions, particularly in economics and finance. For instance, Giannone et al. (2021) scrutinized multiple economic datasets and concluded that sparsity is generally not an inherent feature. Echoing this, they advocate for sparsity only when there is compelling evidence in advance supporting predictive models with a restricted set of explanatory variables (Barigozzi and Brownlees, 2019). Consequently, our simulation design incorporates both sparse and dense elements in P (r) t to more closely mimic the characteristics of real-world datasets. However, the incorporation of sparsity remains methodologically advantageous for computational tractability and interpretability. Additionally, sparse representations can offer a parsimonious yet effective approximation to complex, high-dimensional data structures. Remark 4. In the first simulation design, the covariance matrix is highly unstable, yet the precision matrix exhibits a stable eigenvalue distribution. This situation often arises in high-dimensional settings where the number of variables significantly exceeds the sample size. While the covariance matrix becomes ill-conditioned or even singular due to this dimensionality issue, the corresponding precision matrix maintains stability in its eigenvalues. This phenomenon is relevant in covariance matrix estimation literature, where the primary focus is stabilizing the covariance matrix in high-dimensional data. Examples of such datasets include financial asset returns. On the other hand, the second simulation design considers the situation where, despite applying techniques to stabilize the covariance matrix, the precision matrix still can have an unstable eigenvalue distribution. This instability can occur due to factors intrinsic to the data, such as nonlinear relationships between variables, high noise levels, or outliers. These factors can distort the precision matrix, making it difficult to achieve stability through standard regularization or shrinkage methods. Examining these two designs, we aim to assess the resilience and accuracy of various estimation methods under different stability regimes. The additional simulation approach, denoted as Simulation Design C, which is predicated on the structural framework of the factor model and associated test statistics, is detailed in 107 Appendix C. This particular simulation design holds relevance for the empirical applications discussed in Section 3.5.2. 3.4.1 Simulation Results Our simulation presents results for different dimensions, N ∈ {25, 50, 100, 125}, and times, T ∈ {50, 100, 150, 200, 250}. We conducted R = 100 replications across these varying dimensions and periods. For our proposed Bayesian nonparanormal conditional estimation method, we perform 4, 000 iterations, with the first 2, 000 serving as the burn-in periods. When applying the Gibbs sampling algorithm, we fix the parameter c at 0.1 to induce substantial sparsity within the model. Concerning Bayesian GARCH parameters, we set all mean values to zero, (µωi = µθ0i = µθ1i = µa = µb = µγi = µν = 0). We assign a value of 100 to all variance parameters, except for σ 2 γi , i.e., σ2 ωi = σ 2 θ0i = σ 2 θ1i = σ 2 a = σ 2 b = σ 2 ν = 100. For σ 2 γi , we selected 0.64−1 to obtain a variance for γi of approximately 0.57 and a probability of γi being between 0 and 1 of roughly 0.58, aligning with Fioruci et al. (2014). We obtained a Bayes estimate of the conditional precision matrix of Pˆ t = E[Pt |Z] and a conditional inverse correlation matrix of Sˆ t = E[St |Z] as posterior means. In each of the Monte Carlo designs, we evaluated the accuracy of the estimated conditional precision matrices and conditional inverse correlation matrices by computing their spectral and Frobenius norms of deviation from the true matrices, P0 t and S 0 t , respectively. We measured the performance using the ratio of norm loss averages (RNLA) for the conditional precision matrix, defined as, RNLAP(J) ≡ ( PR r=1 PT t=1 ||P˜ (r) t,j − P0 t || PR r=1 PT t=1 ||Pˆ t (r) − P0 t ||) (3.4.2) 108 for each spectral and Frobenius norm, and similarly, as RNLAS(J) ≡ PR r=1 PT t=1 S˜ (r) t,j − S 0 t PR r=1 PT t=1 Sˆ t (r) − S 0 t , (3.4.3) for the conditional inverse correlation matrix, where J = (j) includes DCC-NL (Engle et al., 2019), DCC-L (Ledoit and Wolf, 2004b), Gaussian Copula, and t-Copula (Patton, 2009) estimators. P˜ (r) t,j ≡ Σˆ −1,(r) t,j and S˜ t,j = Rˆ −1,(r) t,j are estimators of P0 t and S 0 t obtained by inverting the estimated conditional covariance and correlation matrices using the estimation methods in J. An RNLA(J) above 1 indicates that, on average, the proposed method outperforms the J method, whereas an RNLA below 1 suggests the inferior performance of our proposed method compared to method in J. 3.4.1.1 Monte Carlo design A Table 3.1 summarizes the results for Monte Carlo design A, and provide both spectral and Frobenius norm comparisons for the Bayesian nonparanormal conditional estimators and the DCC-NL estimator. First, we note that the Bayesian method dominates the DCC-NL estimators in both conditional precision and inverse correlation matrices. Both methods reveal that the loss of norms decreases with increased sample size. This observation holds even when the error term is non-Gaussian. Based on the values of the losses of the norms, Tables 2 and 3 will show RNLAs for the different conditional precision and inverse correlation matrix estimators. Table 3.2 provides an evaluation of the Bayesian nonparanormal conditional precision and inverse correlation matrix estimator’s performance through the RNLAs defined in Equations (3.4.2) and (3.4.3). If the value of the RNLA is greater than 1, it implies the outperformance of our proposed estimator. For both the spectral and Frobenius norms, our proposed estimator outperform over both the DCC–L and DCC–NL methods irrespective of the distribution of the error terms or sample size combinations. First, it can be observed that the RNLA value 109 Table 3.1: Spectral and Frobenius norm losses for the different conditional precision and inverse correlation matrices estimators – Monte Carlo design A Bayesian nonparanormal conditional estimator DCC-NL Norms Spectral Frobenius Spectral Frobenius T\N 25 50 100 125 25 50 100 125 25 50 100 125 25 50 100 125 Error distribution: εit ∼Gaussian Conditional precision matrix 50 1.276 1.320 1.982 2.332 2.829 3.864 5.970 7.062 2.529 2.780 3.178 3.188 4.949 6.931 9.716 10.375 100 1.180 1.142 1.549 1.862 2.667 3.695 5.769 6.904 1.972 2.268 2.541 2.499 4.273 6.158 8.763 9.189 150 1.156 1.099 1.551 1.692 2.613 3.604 5.706 6.625 1.848 2.138 2.379 2.265 4.044 5.956 8.274 8.791 200 1.140 1.055 1.425 1.638 2.580 3.496 5.411 6.386 1.803 2.048 2.282 2.227 4.031 5.830 8.245 8.785 250 1.137 1.043 1.332 1.541 2.549 3.434 5.235 6.187 1.768 1.975 2.160 2.296 3.965 5.705 7.991 9.168 Conditional inverse correlation matrix 50 1.938 2.054 2.126 2.159 4.059 5.711 7.913 8.883 2.025 2.199 2.344 2.380 4.242 6.071 8.650 9.670 100 1.941 2.038 2.098 2.117 4.062 5.649 7.770 8.628 2.023 2.197 2.343 2.381 4.231 6.059 8.630 9.676 150 1.940 2.040 2.100 2.116 4.054 5.622 7.772 8.626 2.022 2.198 2.347 2.382 4.229 6.061 8.648 9.687 200 1.936 2.037 2.095 2.113 4.048 5.620 7.779 8.618 2.022 2.198 2.341 2.383 4.229 6.062 8.624 9.696 250 1.934 2.038 2.099 2.116 4.039 5.616 7.758 8.602 2.021 2.198 2.346 2.380 4.225 6.059 8.645 9.662 Error distribution: εit ∼ t-distributed with 3 degrees of freedom Conditional precision matrix 50 1.521 2.476 4.119 4.601 3.210 5.198 8.939 10.659 4.465 4.657 5.853 5.716 8.704 11.070 17.515 18.194 100 1.337 1.872 2.971 3.480 2.966 4.630 7.917 9.509 3.701 3.462 4.650 4.696 7.435 9.012 14.825 16.026 150 1.254 1.761 2.719 2.934 2.862 4.453 7.548 8.970 3.250 3.122 4.244 3.957 6.595 8.338 13.554 14.192 200 1.200 1.490 2.333 2.878 2.742 4.133 7.086 8.446 2.893 2.886 3.939 3.895 6.275 7.930 13.050 13.699 250 1.176 1.433 2.298 2.489 2.688 4.022 6.912 7.934 2.908 2.823 3.677 3.740 6.168 7.699 12.283 13.781 Conditional inverse correlation matrix 50 1.970 2.097 2.148 2.162 4.191 6.008 8.145 8.963 2.034 2.205 2.349 2.385 4.263 6.090 8.674 9.722 100 1.947 2.064 2.119 2.123 4.094 5.807 7.889 8.693 2.031 2.203 2.347 2.397 4.254 6.076 8.648 9.800 150 1.946 2.073 2.112 2.140 4.078 5.827 7.915 8.904 2.028 2.204 2.351 2.389 4.246 6.080 8.699 9.740 200 1.940 2.054 2.115 2.128 4.062 5.721 7.970 8.791 2.029 2.202 2.346 2.392 4.248 6.071 8.662 9.745 250 1.938 2.056 2.124 2.134 4.049 5.720 8.015 8.779 2.028 2.201 2.349 2.384 4.244 6.073 8.670 9.675 Notes: The average norm losses, computed over 100 replicates (R = 100), are given for both spectral and Frobenius norms. For the conditional precision matrix (P0 t ) and the conditional inverse correlation matrix (S 0 t ), they are 1 R 1 T PR r=1 PT t=1 ||P˚t (r) −P0 t || and 1 R 1 T PR r=1 PT t=1 ||˚S (r) t − S 0 t ||, where P˚(r) t = {Pˆ (r) t , Σˆt −1,(r) } and S˚t (r) = {Sˆ (r) t , Rˆ −1 t }. Pˆ (r) t and Sˆ (r) t are the Bayesian nonparanormal conditional estimators. Σˆ (r) t and Rˆ (r) t are DCC-NL estimators of the conditional precision and inverse correlation matrices of Engle et al. (2019). decreases as the number of samples, T, increases in the spectral and Frobenius norms of the conditional precision matrices, which is more pronounced in RNLA with DCC-NL estimators. The DCC-L estimator shows similar results to the DCC-NL estimator when N = 25, but if T is not large enough for the number of variables, N, DCC-L shows an increased RNLA. As N increases, there is a decrease in most of the RNLA values, though this decrease is not the same for all Ts. This implies that with an increase in N, the results from both DCC-L and DCC-NL are likely to align more closely with those obtained from our proposed estimation method. When estimating conditional inverse correlation matrices, an increase in both N and T leads to an increase in RNLA. It is even more pronounced in non-Gaussian distribution. When DCC-NL and DCC-L are compared, DCC-NL shows better results than 110 Table 3.2: Ratio of spectral and Frobenius norm loss averages for the different conditional precision matrix estimators and inverse correlation matrices estimators (DCC–NL and DCC–L models) – Monte Carlo design A J DCC-NL DCC-L Norms Spectral Frobenius Spectral Frobenius T\N 25 50 100 125 25 50 100 125 25 50 100 125 25 50 100 125 Error distribution: εit ∼Gaussian Relative norms of conditional precision matrix 50 1.983 2.106 1.603 1.367 1.750 1.794 1.627 1.469 1.946 2.165 1.632 1.373 1.738 1.860 1.671 1.483 100 1.670 1.986 1.640 1.343 1.602 1.666 1.519 1.331 1.678 2.080 1.830 1.340 1.601 1.739 1.705 1.330 150 1.598 1.945 1.535 1.338 1.547 1.653 1.450 1.327 1.608 2.058 1.570 1.333 1.550 1.738 1.492 1.322 200 1.581 1.942 1.602 1.360 1.563 1.667 1.524 1.376 1.605 2.079 1.882 1.350 1.574 1.760 1.785 1.370 250 1.555 1.894 1.622 1.489 1.555 1.661 1.527 1.482 1.580 2.049 1.695 1.842 1.567 1.764 1.596 1.820 Relative norms of conditional inverse correlation matrix 50 1.045 1.070 1.103 1.102 1.045 1.063 1.093 1.089 1.047 1.082 1.111 1.105 1.053 1.093 1.115 1.094 100 1.042 1.078 1.117 1.125 1.041 1.073 1.111 1.121 1.045 1.090 1.140 1.125 1.052 1.105 1.171 1.124 150 1.042 1.078 1.117 1.126 1.043 1.078 1.113 1.123 1.046 1.091 1.126 1.126 1.054 1.112 1.140 1.127 200 1.044 1.079 1.117 1.128 1.045 1.079 1.109 1.125 1.048 1.093 1.146 1.129 1.057 1.114 1.186 1.130 250 1.045 1.078 1.118 1.125 1.046 1.079 1.114 1.123 1.049 1.093 1.129 1.159 1.058 1.116 1.147 1.219 Error distribution: εit ∼ t-distributed with 3 degrees of freedom Relative norms of conditional precision matrix 50 2.936 1.881 1.421 1.242 2.712 2.130 1.959 1.707 3.514 4.435 3.138 2.422 3.087 3.575 3.099 2.588 100 2.769 1.849 1.565 1.349 2.506 1.946 1.872 1.685 3.104 3.830 3.567 2.300 2.766 2.982 3.116 2.151 150 2.591 1.773 1.561 1.349 2.304 1.872 1.796 1.582 2.761 3.527 2.683 2.195 2.486 2.800 2.408 2.071 200 2.412 1.936 1.688 1.353 2.288 1.919 1.842 1.622 2.534 3.418 3.216 2.197 2.414 2.730 2.805 2.060 250 2.473 1.969 1.600 1.503 2.295 1.914 1.777 1.737 2.525 3.375 2.833 3.064 2.390 2.677 2.417 2.810 Relative norms of conditional inverse correlation matrix 50 1.032 1.052 1.094 1.103 1.017 1.014 1.065 1.085 1.039 1.073 1.121 1.108 1.039 1.065 1.137 1.103 100 1.043 1.068 1.107 1.129 1.039 1.046 1.096 1.127 1.048 1.087 1.158 1.128 1.055 1.096 1.214 1.130 150 1.043 1.063 1.113 1.117 1.041 1.043 1.099 1.094 1.045 1.080 1.131 1.128 1.053 1.087 1.154 1.131 200 1.046 1.072 1.109 1.124 1.046 1.061 1.087 1.109 1.050 1.088 1.150 1.131 1.058 1.103 1.199 1.134 250 1.047 1.070 1.106 1.117 1.048 1.062 1.082 1.102 1.049 1.084 1.130 1.166 1.059 1.099 1.153 1.239 Notes: The Ratio of Spectral and Frobenius Norm Loss Averages (RNLA) for both Spectral and Frobenius norms are computed from 100 replications (R = 100): RNLAP(J) = { PR r=1 PT t=1 ||P˜ (r) t,j − P0 t ||/ PR r=1 PT t=1 ||Pˆt (r) − P0 t ||} and RNLAS(J) = { PR r=1 PT t=1 ||S˜ (r) t,j − S 0 t ||/ PR r=1 PT t=1 ||Sˆt (r) − S 0 t ||}, and J = DCC–L and DCC–NL of Ledoit and Wolf (2004b) and Engle et al. (2019), where Pˆ t and Sˆt denote the posterior mean of the conditional precision and inverse correlation matrices derived from the Bayesian approach we suggest, while P0 t and S 0 t refers to the known true conditional precision and inverse correlation matrices. DCC-L regardless of the number of variables and sample size. This difference can be seen better in the case of non-Gaussian distribution. Table 3.3 presents the RNLA values for both the conditional precision and inverse correlation matrix estimators for Gaussian Copula and t-Copula. Our proposed estimator produces better estimation results than Gaussian Copula and t-Copula as the number of variables increases. However, if T is significantly large relative to N, it can be seen that RNLA decreases regardless of the error distribution. Neither Gaussian Copula nor t-Copula assume a situation where the size of the sample is larger than the number of variables, so 111 Table 3.3: Ratio of spectral and Frobenius norm loss averages for the different conditional precision matrix estimators, Pt , and inverse correlation matrix estimators, St (Gaussian and t-Copula models) – Monte Carlo design A J Gaussian Copula t-Copula Norms Spectral Frobenius Spectral Frobenius T\N 25 50 100 125 25 50 100 125 25 50 100 125 25 50 100 125 Error distribution: εit ∼Gaussian Relative norms of conditional precision matrix 50 9.040 n/a n/a n/a 5.956 n/a n/a n/a 15.769 n/a n/a n/a 10.205 n/a n/a n/a 100 3.536 9.790 n/a n/a 2.818 5.648 n/a n/a 6.362 9.833 n/a n/a 4.831 5.660 n/a n/a 150 2.732 5.310 18.296 70.618 2.305 3.542 9.721 28.530 4.530 5.302 18.129 69.953 3.723 3.548 9.629 28.250 200 2.461 4.129 7.927 13.163 2.149 2.930 5.208 7.832 3.860 4.163 7.850 13.020 3.385 2.969 5.153 7.741 250 2.232 3.497 5.668 7.531 2.029 2.621 3.972 5.070 3.739 3.542 5.617 7.449 3.269 2.668 3.927 5.008 Relative norms of conditional inverse correlation matrix 50 1.657 n/a n/a n/a 1.567 n/a n/a n/a 1.700 n/a n/a n/a 1.572 n/a n/a n/a 100 1.124 1.847 n/a n/a 1.239 1.616 n/a n/a 1.122 1.853 n/a n/a 1.240 1.618 n/a n/a 150 1.091 1.245 3.396 6.959 1.162 1.383 2.111 3.061 1.089 1.241 3.393 6.959 1.159 1.382 2.110 3.061 200 1.080 1.151 2.006 2.953 1.130 1.289 1.682 1.990 1.081 1.152 2.002 2.950 1.133 1.292 1.680 1.988 250 1.071 1.134 1.514 2.020 1.110 1.238 1.522 1.704 1.073 1.136 1.511 2.016 1.113 1.241 1.520 1.701 Error distribution: εit ∼ t-distributed with 3 degrees of freedom Relative norms of conditional precision matrix 50 7.471 n/a n/a n/a 5.166 n/a n/a n/a 12.095 n/a n/a n/a 8.174 n/a n/a n/a 100 3.128 8.732 n/a n/a 2.538 6.210 n/a n/a 6.177 9.573 n/a n/a 4.609 6.932 n/a n/a 150 2.542 4.794 15.852 51.00 2.133 3.887 11.195 27.652 4.871 5.763 13.775 46.750 3.717 4.688 9.710 25.294 200 2.387 4.079 8.375 12.27 2.071 3.351 6.622 9.498 4.355 5.252 7.411 10.444 3.518 4.115 5.865 8.062 250 2.213 3.523 6.105 7.96 1.988 2.906 5.329 6.637 4.117 4.633 5.138 6.610 3.358 3.735 4.479 5.485 Relative norms of conditional inverse correlation matrix 50 1.632 n/a n/a n/a 1.519 n/a n/a n/a 1.647 n/a n/a n/a 1.521 n/a n/a n/a 100 1.122 1.832 n/a n/a 1.233 1.572 n/a n/a 1.130 1.825 n/a n/a 1.246 1.580 n/a n/a 150 1.091 1.237 3.312 6.449 1.161 1.338 2.078 2.905 1.103 1.221 9.710 6.456 1.176 1.339 2.071 2.904 200 1.083 1.141 1.994 2.977 1.134 1.270 1.655 1.965 1.095 1.148 5.865 2.963 1.149 1.276 1.648 1.957 250 1.076 1.118 1.505 2.035 1.117 1.210 1.486 1.685 1.089 1.132 4.479 2.017 1.131 1.226 1.474 1.673 Notes: The Ratio of Spectral and Frobenius Norm Loss Averages (RNLA) for both Spectral and Frobenius norms are computed from 100 replications (R = 100): RNLAP(J) = { PR r=1 PT t=1 ||P˜ (r) t,j − P0 t ||/ PR r=1 PT t=1 ||Pˆt (r) − P0 t ||} and RNLAS(J) = { PR r=1 PT t=1 ||S˜ (r) t,j − S 0 t ||/ PR r=1 PT t=1 ||Sˆt (r) − S 0 t ||}, and J = Gaussian Copula and t-Copula of Patton (2009), where Pˆ t and Sˆt denote the posterior mean of the conditional precision and inverse correlation matrices derived from the Bayesian approach we suggest, while P0 t and S 0 t refers to the known true conditional precision and inverse correlation matrices. Gaussian and t-Copula models lack regularization or shrinkage methods within their estimation procedures. These models fail to yield estimates in the singular case (T < N). Such instances are denoted as ‘not applicable’ (n/a). they do not use shrinkage or regularization methodologies in estimating the unconditional correlation matrix. Therefore, when N > T, it is denoted as n/a. When compared with DCC-L and DCC-NL based on the relative size of RNLA, for example, with the exception of Gaussian Copula in the non-Gaussian distribution, DCC-L and DCC-NL provide better estimates in most cases. When comparing Gaussian Copula and t-Coupla, it can be seen that Gaussian Copula shows better performance regardless of the distribution of error terms, or N and T combinations in estimating the conditional precision matrix. On the other hand, in the case of conditional inverse correlation, a slight difference in performance can be observed, but the difference disappears as T increases. 112 Results based on the Monte Carlo design A illustrate the effectiveness of estimating the conditional precision and inverse correlation matrices under relatively stable eigenvalue distributions instead of inverting the estimated conditional covariance and conditional correlation matrices. 3.4.1.2 Monte Carlo design B Table 3.4: Spectral and Frobenius norm losses for the different conditional precision and inverse correlation matrices estimators – Monte Carlo design B Bayesian nonparanormal conditional estimator DCC-NL Norms Spectral Frobenius Spectral Frobenius T\N 25 50 100 125 25 50 100 125 25 50 100 125 25 50 100 125 Error distribution: εit ∼Gaussian Conditional precision matrix 50 23.74 39.55 68.31 81.99 50.81 102.41 228.00 299.43 22.99 38.92 67.88 81.58 47.68 98.96 224.63 296.08 100 29.78 45.64 74.87 89.31 73.80 134.41 273.08 352.90 29.27 45.17 74.49 89.05 71.45 131.52 269.95 350.22 150 35.99 52.00 81.83 95.71 97.84 167.68 320.91 403.65 35.59 51.61 81.50 95.43 95.96 165.23 318.17 401.05 200 42.04 58.11 88.21 102.17 122.10 201.39 367.76 456.01 41.71 57.78 87.90 101.90 120.51 199.22 365.11 453.60 250 48.06 64.57 94.79 109.30 146.56 235.98 415.60 510.59 47.78 64.28 94.53 109.04 145.17 234.04 413.33 508.05 Conditional inverse correlation matrix 50 1.30 1.91 2.64 2.89 2.93 5.32 9.05 10.67 1.31 1.91 2.64 2.89 2.93 5.26 9.06 10.66 100 1.05 1.46 2.08 2.31 2.34 4.24 7.59 9.08 1.03 1.48 2.09 2.32 2.32 4.27 7.63 9.13 150 0.94 1.24 1.76 1.97 2.02 3.69 6.69 8.07 0.91 1.27 1.79 1.98 2.00 3.70 6.77 8.15 200 0.85 1.10 1.55 1.74 1.80 3.31 6.06 7.35 0.85 1.15 1.58 1.76 1.80 3.32 6.11 7.45 250 0.79 1.02 1.40 1.58 1.65 3.03 5.58 6.82 0.81 1.07 1.44 1.60 1.65 3.04 5.67 6.85 Error distribution: εit ∼ t-distributed with 3 degrees of freedom Conditional precision matrix 50 23.68 39.34 68.00 82.43 50.58 101.85 227.03 300.92 22.40 38.44 67.18 81.84 45.38 96.84 221.06 295.76 100 29.83 45.58 74.78 89.53 73.72 134.25 272.90 353.38 28.80 44.79 73.99 88.87 68.99 129.53 266.60 347.48 150 35.86 51.99 81.64 95.23 97.48 167.51 320.24 401.70 35.045 51.35 80.94 94.72 93.62 163.55 314.42 396.48 200 41.99 58.14 88.27 102.57 121.69 201.01 367.50 456.61 41.29 57.60 87.67 101.95 118.27 197.64 362.10 451.29 250 47.96 64.47 94.72 109.29 146.01 235.19 415.00 510.16 47.35 64.09 94.19 108.77 143.00 232.47 410.34 505.06 Conditional inverse correlation matrix 50 1.36 1.95 2.63 2.90 3.14 5.61 9.08 10.74 1.33 1.91 2.64 2.90 2.95 5.30 9.08 10.70 100 1.05 1.49 2.09 2.32 2.37 4.47 7.73 9.19 1.06 1.49 2.09 2.33 2.36 4.30 7.65 9.24 150 0.93 1.29 1.79 1.99 2.05 4.01 6.98 8.31 0.95 1.28 1.79 1.99 2.04 3.73 6.83 8.21 200 0.87 1.15 1.61 1.77 1.83 3.60 6.57 7.70 0.90 1.16 1.59 1.77 1.85 3.34 6.17 7.52 250 0.83 1.08 1.46 1.60 1.68 3.28 6.10 7.07 0.85 1.09 1.45 1.60 1.70 3.07 5.71 6.86 Notes: The average norm losses, computed over 100 replicates (R = 100), are given for both spectral and Frobenius norms. For the conditional precision matrix (P0 t ) and the conditional inverse correlation matrix (S 0 t ), they are 1 R 1 T PR r=1 PT t=1 ||P˚t (r) −P0 t || and 1 R 1 T PR r=1 PT t=1 ||˚S (r) t − S 0 t ||, where P˚(r) t = {Pˆ (r) t , Σˆt −1,(r) } and S˚t (r) = {Sˆ (r) t , Rˆ −1 t }. Pˆ (r) t and Sˆ (r) t are the Bayesian nonparanormal conditional estimators. Σˆ (r) t and Rˆ (r) t are DCC-NL estimators of the conditional precision and inverse correlation matrices of Engle et al. (2019). Table 3.4 presents the norm losses for Bayesian nonparanormal conditional estimators of the conditional inverse correlation matrix for Monte Carlo design B. Our findings show that both the Bayesian nonparanormal conditional estimator and the DCC-NL estimator fail to accurately estimate the conditional precision matrix due to their unstable eigenvalue distribution in Monte Carlo design B. Conversely, we can still estimate the conditional inverse 113 correlation with widely distributed eigenvalues by stabilizing the distribution through rescaling in the estimated precision matrix. Thus, our proposed estimator yields norms comparable to the DCC-NL and outperforms scaled estimates like conditional partial correlation, even under these conditions. Based on the values of the losses of the norms, Tables 5 and 6 will show RNLAs for the different conditional precision and inverse correlation matrix estimators. Table 3.5: Ratio of spectral and Frobenius norm loss averages for the different conditional precision matrix estimators and inverse correlation matrices estimators (DCC–NL and DCC–L models) – Monte Carlo design B J DCC-NL DCC-L Norms Spectral Frobenius Spectral Frobenius T\N 25 50 100 125 25 50 100 125 25 50 100 125 25 50 100 125 Error distribution: εit ∼Gaussian Relative norms of conditional precision matrix 50 0.969 0.984 0.994 0.995 0.938 0.966 0.985 0.989 0.969 0.984 0.993 0.996 0.939 0.965 0.985 0.989 100 0.983 0.990 0.995 0.997 0.968 0.979 0.989 0.992 0.983 0.989 0.995 0.996 0.968 0.977 0.987 0.992 150 0.989 0.993 0.996 0.997 0.981 0.985 0.991 0.994 0.989 0.992 0.996 0.997 0.981 0.984 0.991 0.994 200 0.992 0.994 0.997 0.997 0.987 0.989 0.993 0.995 0.992 0.993 0.995 0.998 0.986 0.987 0.990 0.995 250 0.994 0.995 0.997 0.998 0.991 0.992 0.995 0.995 0.994 0.995 0.997 0.996 0.990 0.990 0.994 0.992 Relative norms of conditional inverse correlation matrix 50 1.006 1.001 1.002 1.000 1.000 0.990 1.002 1.000 1.006 1.015 1.006 1.001 1.023 1.027 1.015 1.002 100 0.982 1.015 1.006 1.003 0.993 1.007 1.006 1.005 0.973 1.041 1.031 1.003 1.047 1.084 1.073 1.008 150 0.967 1.024 1.012 1.006 0.992 1.001 1.011 1.010 0.952 1.061 1.027 1.007 1.075 1.127 1.059 1.016 200 0.995 1.039 1.018 1.011 0.998 1.005 1.008 1.014 0.968 1.073 1.079 1.012 1.109 1.163 1.172 1.025 250 1.022 1.055 1.033 1.014 1.002 1.005 1.016 1.003 0.982 1.080 1.060 1.086 1.122 1.187 1.112 1.200 Error distribution: εit ∼ t-distributed with 3 degrees of freedom Relative norms of conditional precision matrix 50 0.946 0.977 0.988 0.993 0.897 0.951 0.974 0.983 0.943 0.966 0.982 0.997 0.893 0.934 0.968 0.987 100 0.966 0.983 0.989 0.993 0.936 0.965 0.977 0.983 0.967 0.976 0.985 0.995 0.935 0.952 0.971 0.986 150 0.977 0.988 0.992 0.995 0.960 0.976 0.982 0.987 0.974 0.983 0.988 0.990 0.957 0.966 0.979 0.983 200 0.983 0.991 0.993 0.994 0.972 0.983 0.985 0.988 0.982 0.987 0.991 0.998 0.968 0.974 0.980 0.990 250 0.987 0.994 0.994 0.995 0.979 0.988 0.989 0.990 0.985 0.989 0.993 0.993 0.976 0.979 0.986 0.985 Relative norms of conditional inverse correlation matrix 50 0.976 0.984 1.002 1.000 0.940 0.945 0.999 0.997 1.034 1.029 1.009 1.004 1.068 1.064 1.029 1.008 100 1.016 0.998 1.001 1.004 0.997 0.960 0.990 1.005 1.009 1.066 1.048 1.003 1.091 1.137 1.125 1.013 150 1.012 0.993 1.002 1.001 0.996 0.930 0.978 0.987 0.984 1.079 1.035 1.008 1.101 1.159 1.079 1.021 200 1.037 1.008 0.989 1.000 1.013 0.927 0.938 0.976 1.010 1.089 1.087 1.017 1.138 1.184 1.195 1.032 250 1.028 1.009 0.997 1.003 1.009 0.936 0.937 0.971 1.022 1.089 1.067 1.096 1.143 1.191 1.123 1.233 Notes: The Ratio of Spectral and Frobenius Norm Loss Averages (RNLA) for both Spectral and Frobenius norms are computed from 100 replications (R = 100): RNLAP(J) = { PR r=1 PT t=1 ||P˜ (r) t,j − P0 t ||/ PR r=1 PT t=1 ||Pˆt (r) − P0 t ||} and RNLAS(J) = { PR r=1 PT t=1 ||S˜ (r) t,j − S 0 t ||/ PR r=1 PT t=1 ||Sˆt (r) − S 0 t ||}, and J = DCC–L and DCC–NL of Ledoit and Wolf (2004b) and Engle et al. (2019), where Pˆ t and Sˆt denote the posterior mean of the conditional precision and inverse correlation matrices derived from the Bayesian approach we suggest, while P0 t and S 0 t refers to the known true conditional precision and inverse correlation matrices. Tables 5 and 6 show the RNLA values for each estimator. First, as Table 3.4 shows, Monte Carlo design B shows that no estimator we consider gives a sufficient estimation of the conditional precision matrix. Therefore, in Tables 5 and 6, we will focus more on the 114 result of the conditional inverse correlation. As shown in Table 3.5, irrespective of the error term distribution, in most cases, our proposed estimation method gives comparable results with DCC–NL and slightly better performance than DCC–L in estimating the conditional inverse correlation matrix. However, no clear ordering emerges when we compare DCC–NL and DCC–L. Table 3.6: Ratio of spectral and Frobenius norm loss averages for the different conditional precision matrix estimators and inverse correlation matrices estimators (Gaussian and t-Copula models) – Monte Carlo design B J Gaussian Copula estimator t-Copula estimator Norms Spectral Frobenius Spectral Frobenius T\N 25 50 100 125 25 50 100 125 25 50 100 125 25 50 100 125 Error distribution: εit ∼Gaussian Relative norms of conditional precision matrix 50 0.936 n/a n/a n/a 0.897 n/a n/a n/a 0.969 n/a n/a n/a 0.920 n/a n/a n/a 100 0.971 0.970 n/a n/a 0.944 0.939 n/a n/a 0.951 0.970 n/a n/a 0.901 0.939 n/a n/a 150 0.981 0.982 0.975 1.141 0.967 0.964 0.951 0.995 0.961 0.982 0.976 1.135 0.934 0.964 0.952 0.994 200 0.987 0.987 0.985 0.983 0.977 0.976 0.969 0.964 0.976 0.988 0.986 0.983 0.954 0.976 0.970 0.964 250 0.990 0.990 0.990 0.988 0.984 0.982 0.979 0.976 0.976 0.991 0.990 0.989 0.961 0.982 0.979 0.976 Relative norms of conditional inverse correlation matrix 50 2.426 n/a n/a n/a 1.908 n/a n/a n/a 2.465 n/a n/a n/a 1.910 n/a n/a n/a 100 1.540 2.552 n/a n/a 1.527 1.903 n/a n/a 1.545 2.550 n/a n/a 1.526 1.902 n/a n/a 150 1.296 1.913 4.030 7.406 1.424 1.652 2.316 3.195 1.289 1.912 4.027 7.404 1.412 1.651 2.314 3.195 200 1.197 1.656 2.669 3.550 1.381 1.565 1.909 2.175 1.224 1.654 2.663 3.546 1.387 1.563 1.906 2.173 250 1.118 1.496 2.218 2.679 1.343 1.505 1.765 1.900 1.133 1.493 2.213 2.674 1.366 1.501 1.760 1.896 Error distribution: εit ∼ t-distributed with 3 degrees of freedom Relative norms of conditional precision matrix 50 0.939 n/a n/a n/a 0.902 n/a n/a n/a 0.968 n/a n/a n/a 0.930 n/a n/a n/a 100 0.969 0.954 n/a n/a 0.945 0.915 n/a n/a 0.950 0.958 n/a n/a 0.905 0.920 n/a n/a 150 0.985 0.968 0.966 1.407 0.970 0.936 0.939 1.038 0.967 0.970 0.970 1.296 0.939 0.940 0.944 1.021 200 0.987 0.975 0.976 0.973 0.980 0.951 0.949 0.946 0.974 0.977 0.979 0.978 0.954 0.956 0.956 0.954 250 0.991 0.980 0.980 0.980 0.986 0.962 0.960 0.958 0.980 0.982 0.984 0.984 0.966 0.967 0.968 0.967 Relative norms of conditional inverse correlation matrix 50 2.311 n/a n/a n/a 1.777 n/a n/a n/a 2.329 n/a n/a n/a 1.780 n/a n/a n/a 100 1.562 2.531 n/a n/a 1.513 1.829 n/a n/a 1.578 2.526 n/a n/a 1.544 1.825 n/a n/a 150 1.315 1.811 3.898 7.031 1.423 1.538 2.228 3.055 1.357 1.805 3.886 7.040 1.472 1.532 2.220 3.054 200 1.202 1.606 2.582 3.543 1.394 1.475 1.779 2.093 1.277 1.588 2.568 3.526 1.453 1.458 1.769 2.083 250 1.105 1.458 2.126 2.679 1.362 1.446 1.633 1.857 1.234 1.431 2.102 2.655 1.430 1.416 1.614 1.841 Notes: The Ratio of Spectral and Frobenius Norm Loss Averages (RNLA) for both Spectral and Frobenius norms are computed from 100 replications (R = 100): RNLAP(J) = (PR r=1 PT t=1 ||P˜ (r) t,j − P0 t ||/ PR r=1 PT t=1 ||Pˆt (r) − P0 t ||} and RNLAS(J) = { PR r=1 PT t=1 ||S˜ (r) t,j − S 0 t ||/ PR r=1 PT t=1 ||Sˆt (r) − S 0 t ||}, and J = Gaussian Copula and t-Copula of Patton (2009), where Pˆ t and Sˆt denote the posterior mean of the conditional precision and inverse correlation matrices derived from the Bayesian approach we suggest, while P0 t and S 0 t refers to the known true conditional precision and inverse correlation matrices. Gaussian and t-Copula models lack regularization or shrinkage methods within their estimation procedures. These models fail to yield estimates in the singular case (T < N). Such instances are denoted as ‘not applicable’ (n/a). Similarly, in Table 3.6, our proposed estimator shows better performance than Gaussian Copula and t-Copula in estimating conditional inverse correlation. When compared with other variables, both Gaussian Copula and t-Copula gave higher RNLA values for all norm 115 values compared to DCC-L and DCC-NL when the sample size was insufficient for the number of variables. In the context of Monte Carlo design B, the simulation results demonstrate that estimating the conditional inverse correlation matrix is not only feasible but also yields competitive performance compared to other estimation approaches. By contrast, the accuracy of conditional precision matrix estimations does not reach justifiable levels with any of the evaluated methods. 3.5 Empirical Applications We consider the application of the proposed Bayesian nonparanormal conditional estimation to both foreign stock price indices and the blue chip stocks selected based on the market capitalization5 within S&P 500 index. The first application concerning foreign stock price indices aims to estimate their interdependencies by examining conditional partial correlations. While conditional correlations are commonly employed for this purpose, they may conflate the effects of widespread global phenomena with those of specific, bilateral interactions. Conditional partial correlations are applied here with the aim of to mitigate the effect of common factors. In the second application, we test H0 : αi = 0 for i = 1, 2, . . . , N to validate if the excess return of a financial asset equals its factor loadings times the excess returns of related risk factors, plus a random component, in a frictionless market. Validating this hypothesis requires the estimation of precision matrices that are integral to Wald-type test statistics in the context of asset pricing models. We consider the Jˆ α test developed by Pesaran and Yamagata (2023), the GOS test by Gagliardini et al. (2016), and the Standardized Wald tests based on the conditional precision matrices estimated by our proposed method and the DCC-NL. In applying our proposed method, we implement 4,000 iterations, including 5The market capitalization is calculated by multiplying the share price of the company’s stock by the total number of its (outstanding) shares daily. Then, we take an average across the sample periods to rank the stocks based on the average market capitalization. From September 2, 2016, to July 31, 2023, most of the stocks in the list in Appendix D have been part of S&P 500 index. 116 a 2,000-iteration burn-in period. The daily return for security i on day t, expressed as rit, is computed as rit = 100 log(pt/pt−1), where pt represents the closing price of the financial security on the given day. 3.5.1 Foreign Stock Price Indexes In the application to the foreign stock price indexes, we analyze a combined time series from a chosen dataset of stock price indices, including the Dow Jones and NASDAQ from the United States, the DAX from Germany, the CAC40 from France, and the NIKKEI from Japan (N = 5). Our analysis covers 7,600 trading days (T = 7, 600), stretching from January 4, 1991, to August 31, 2023. The data, which was obtained from Google Finance, ensures uniformity in the sample size, counteracting the variations in trading days across the indices. These variations result from national holidays, time zone differences, and market-specific practices.6 We apply our proposed Bayesian estimation approach to calculate the posterior mean of conditional partial correlations, Ψˆ t , as defined in Equation (3.2.8), using residuals from return regression without considering common factors as in (3.2.1). In contrast, for the conditional correlations, Rˆ t , we employ the method proposed by Engle et al. (2019), estimated using residuals from return regressions that include K common factors, estimated by principal component analysis (PCA). The number of factors is set at K = 1, by following Hallin and Liˇska (2007). In this application, the conditional correlations and conditional partial correlations assess distinct aspects of the interdependence of the variables. Specifically, conditional correlations, derived from the residuals of a PCA regression, reflect the relationships between variables by adjusting for the variance captured by broad, statistically derived (unknown) common factors. 6 In our study, we focus on only 5 foreign stock indices. While this number may seem limited in highdimensional settings, our main objective is to examine pairwise partial correlations. Given that there are 10 (= N(N − 1)/2) such pairwise correlations when N = 5, a smaller set of variables is preferable for demonstrating the overall dependence across different time periods. This small number of variables allows for a clearer and more focused examination of the interrelationships among the selected stock indices. 117 This method shows the co-movement of variables in response to these overarching factors. However, it might not capture the direct connections between individual variables when all variables are considered. In contrast, conditional partial correlations with our proposed method offer a different perspective. They calculate the exclusive, bilateral relationships between variables, excluding the influence of any other variable in the dataset. This method distinctly determines the direct relationship between any two variables, irrespective of their categorization as common factors. Thus, while conditional correlations provide valuable context on how variables interact in light of PCA-based common factors, conditional partial correlations reveal the inherent associations between individual variable pairs, irrespective of the influence of other variables. Consequently, examining both conditional partial correlations and conditional correlations is beneficial, even when common factors are considered in the conditional correlations. Table 3.7: Descriptive Statistics of Rˆ t and Ψˆ t : Full Sample Periods (01. 04. 1991 – 08. 31. 2023) Rˆ t Ψˆ t Full sample periods (01.04.91–08.31.23) Mean Std Mean Std DJI & NASDAQ 0.47 0.22 0.20 0.08 DJI & DAX –0.46 0.16 0.06 0.07 DJI & CAC40 –0.44 0.16 0.07 0.07 DJI & NIKKEI –0.31 0.14 0.01 0.08 NASDAQ & DAX –0.60 0.12 0.06 0.07 NASDAQ & CAC40 –0.64 0.12 0.04 0.07 NASDAQ & NIKKEI –0.31 0.13 0.02 0.08 DAX & CAC40 0.42 0.22 0.24 0.07 DAX & NIKKEI –0.25 0.16 0.04 0.08 CAC40 & NIKKEI –0.20 0.15 0.04 0.08 Notes: The table presents the average and standard deviations for estimated conditional correlations (Rˆ t) and conditional partial correlations (Ψˆ t). We derive Rˆ t using the approach proposed by Engle et al. (2019) and obtain Ψˆ t through a Bayesian nonparanormal conditional estimation method we suggest. We calculate Rˆ t after adjusting for a single factor, identified via principal component analysis. Hallin and Liˇska (2007) method determines the number of factors to consider. The abbreviations for the stock indexes are as follows: DJI denotes the Dow Jones Industrial Average, NASDAQ signifies the National Association of Securities Dealers Automated Quotations, DAX stands for Deutscher Aktienindex, CAC40 is an acronym for Cotation Assist´ee en Continu 40, and NIKKEI represents the Nikkei 225 Stock Average. 118 Table 3.7 provides summary statistics of Rˆ t = (rˆt(i, j)), and Ψˆ t = (ψˆ t(i, j)), across different stock index pairs for the period from April 1, 1991, to August 31, 2023. rˆt(DJI, NASDAQ) is positive with a mean of 0.47 and a standard deviation of 0.22, indicating moderate variability around a generally strong relationship. Similarly, rˆt(DAX, CAC40) shows a mean correlation of 0.42 and a conditional partial correlation mean of 0.24. These figures are among the highest in the dataset, indicating a robust linkage. The relatively stronger relations in both correlations and partial correlations for these index pairs could stem from several factors. The DJI and NASDAQ, both US stock indices, are more likely influenced by similar economic and market forces, resulting in a stronger correlation. The DAX and CAC40, representing major European economies, exhibit a strong relationship due to their geographical proximity and economic ties within the European Union, which could lead to synchronized economic cycles and business environments. The significant partial correlations suggest that even when controlling for other influences, the direct relationship between these European indices remains robust, likely due to shared economic policies, trade relationships, and financial regulations within the European market. Table 3.8 presents the average values and standard deviations for the estimated conditional correlations and the estimated conditional partial correlations across six distinct episodes of market turmoil: (1) the Asian Financial Crisis (July 1997 – December 1999), (2) the Dot-com Bubble Burst (March 2000 – October 2002), (3) the Great Recession (December 2007 – June 2009), (4) the European Sovereign Debt Crisis (October 2010 – August 2012), (5) the COVID–19 Pandemic (January 2022–August 2022), and (6) the FED’s Inflation-Containment Rate Hikes (March 2022 – August 2023). The data estimates show that the mean of the conditional correlations for select index pairs diminishes or becomes increasingly negative amid market distress. For instance, the average conditional correlation between the DJI and NASDAQ indices fluctuates from 0.47 over the entire sample to –0.54 to 0.69 throughout the various crises, signaling a sizable conditional correlation during market instability compared to more stable periods. Furthermore, the standard deviations for the conditional correlations 119 Table 3.8: Descriptive Statistics of the conditional correlations and conditional partial correlations: Six Market disruption periods Conditional Conditional correlations partial correlations Market disruption periods: Mean Std Mean Std (1) Asian Financial Crisis (07.97 – 12.99) DJI & NIKKEI –0.23 0.15 0.02 0.08 DAX & NIKKEI –0.26 0.13 0.03 0.08 (2) The Dot-com Bubble Burst (03.20 – 10.02) DJI & NASDAQ 0.37 0.22 0.18 0.08 NASDAQ & DAX –0.54 0.13 0.05 0.07 (3) The Great Recession (12.07 – 06.09) DJI & NASDAQ 0.69 0.09 0.23 0.07 DJI & NIKKEI –0.42 0.09 0.01 0.08 (4) European Sovereign Debt Crisis (10.10 – 08.12) DJI & CAC40 –0.42 0.14 0.10 0.06 DAX & CAC40 0.39 0.11 0.25 0.07 (5) COVID–19 (01.22 – 08.22) Negative pairwise indices –0.41 0.15 0.05 0.07 Positive pairwise indices 0.46 0.19 0.22 0.08 (6) FED’s Inf.-Containment Rate Hikes (03.22 – 08.23) DJI & NASDAQ 0.45 0.13 0.22 0.07 DJI & NIKKEI –0.29 0.11 0.00 0.08 Notes: The table presents the average and standard deviations for estimated conditional correlations (Rˆ t) and conditional partial correlations (Ψˆ t). We derive Rˆ t using the approach proposed by Engle et al. (2019) and obtain Ψˆ t through a Bayesian nonparanormal conditional estimation method we suggest. We calculate Rˆ t after adjusting for a single factor, identified via principal component analysis. Hallin and Liˇska (2007) method determines the number of factors to consider. COVID-19 case shows the collected results of the negative and positive pairwise indices, where ‘Negative pairwise indices’ = {DJI & DAX’, DJI & CAC40, DJI & NIKKEI, NASDAQ & DAX, NASDAQ & CAC40, NASDAQ & NIKKEI, DAX & NIKKEI, CAC40 & NIKKEI}, and ‘Positve pairwise indices’ = {DJI & NASDAQ, DAX & CAC40}. The abbreviations for the stock indexes are as follows: DJI denotes the Dow Jones Industrial Average, NASDAQ signifies the National Association of Securities Dealers Automated Quotations, DAX stands for Deutscher Aktienindex, CAC40 is an acronym for Cotation Assist´ee en Continu 40, and NIKKEI represents the Nikkei 225 Stock Average. 120 tend to be lower during crises than the sample period overall, suggesting a more uniform relationship between the indices in times of market stress, possibly due to a unified response to global financial shocks. In examining the conditional partial correlations, their average values tend to decrease during market disruptions. For instance, the average values typically range from 0.01 to 0.25, but they frequently fall towards the lower end of this range or even to 0.00, particularly the DJI and NIKKEI pair during the Federal Reserve’s rate hikes. Such decreases in average values suggest a reduction in the direct linkage between indices when considering other market influences. During these market disruption periods, the partial correlations’ variances are generally similar to, or slightly lower than, those in the whole sample periods. For instance, the standard deviation for the conditional partial correlation between DJI and CAC40 drops to 0.06 in the period of the Great Recession from a previous value of 0.07, potentially indicating the same stable direct relationship despite the market fluctuations. Our analysis indicates that in times of market upheaval, the relationships between indices, as captured by both conditional correlations and partial correlations, are characterized by lower average values but the same variances. This indicates a more uniform response to stress in the markets. Figure 3.1 displays the estimated conditional correlations and conditional partial correlations for pairs of stock indices during three major market crises: the Asian Financial Crisis (1997- 1999), the Dot-com Bubble Burst (2000-2002), and the Great Recession (2007-2009). The estimated mean, and variance values shown in this figure correspond to those listed in Table 3.8. During the Asian Financial Crisis, the plot reveals fluctuations in the conditional correlations between the DJI and the NIKKEI, and between the DAX and NIKKEI, alongside their conditional partial correlations. The conditional correlations and partial correlations between DJI and NIKKEI and DAX and NIKKEI show different magnitudes. Specifically, DJI and NIKKEI’s overall mean conditional correlation is -0.31 with a standard deviation of 121 Figure 3.1: Conditional (Partial) Correlations in Market Disruption Periods: 1997 – 2009 Notes: The figure depicts the estimated conditional correlations (Rˆ t) and conditional partial correlations (Ψˆ t) during periods of market disruption, specifically throughout the Asian Financial Crisis, the Dot-com Bubble Burst, and the Great Recession. We derive Rˆ t using the approach proposed by Engle et al. (2019) and obtain Ψˆ t through a Bayesian nonparanormal conditional estimation method we suggest. We calculate Rˆ t after adjusting for a single factor, identified via principal component analysis. Hallin and Liˇska (2007) method determines the number of factors to consider. The abbreviations for the stock indexes are as follows: DJI denotes the Dow Jones Industrial Average, NASDAQ signifies the National Association of Securities Dealers Automated Quotations, DAX stands for Deutscher Aktienindex, CAC40 is an acronym for Cotation Assist´ee en Continu 40, and NIKKEI represents the Nikkei 225 Stock Average. 122 0.14, as compared to -0.23 and 0.15, respectively, during the crisis. Conversely, DAX and NIKKEI maintain a stable mean conditional correlation, moving from -0.25 overall to -0.26 during the crisis with a reduced standard deviation from 0.16 to 0.13. This indicates that the relationship between DJI and NIKKEI exhibits a slight change during the crisis, whereas DAX and NIKKEI’s relationship remained largely unchanged. The conditional partial correlations capture this nuanced difference; the DJI and NIKKEI pair show a minor increase in mean from 0.01 to 0.02, and the DAX and NIKKEI pair see a decrease from 0.04 to 0.03, with both pairs maintaining a standard deviation of 0.08. As shown in the second subgraph, during the Dot-com Bubble Burst, the divergence between the rˆt and ψˆ t of DJI and NASDAQ and DAX and NASDAQ suggests differentiated market dynamics. This shift influenced the conditional correlation in investor preference from technology stocks to traditional “brick and mortar” stocks. Therefore, the negative swing in rˆt(DJI, NASDAQ) pre-mid-2001, which contrasts with the general positivity inΨˆ t throughout the period, might indicate that while the tech and traditional sectors moved divergently in the market disruption, the inherent relationship between them, when isolated from the influence of the DAX and other indices (as the conditional partial correlation suggest), uniformly positive. The smaller magnitude of the conditional partial correlation could be due to the unique characteristics of the NASDAQ, which is heavily weighted towards technology stocks, which experienced a distinct reaction to the economic environment compared to the more diversified DJI and the European-focused DAX. The Great Recession period also depicts the divergence in the conditional correlations and conditional partial correlations, particularly between DJI and NASDAQ, and DJI and NIKKEI. This divergence suggests the increased disconnections during the recession, reflecting varying degrees of integration and response to economic stress among different markets. The observed divergence in rˆt(DJI, NIKKEI) and ψˆ t(DJI, NIKKEI) during the Great Recession reflects the distinct economic environments and market responses in the United States and Japan. Geographical separation may contribute to market differentiation, with each country’s 123 unique fiscal and monetary policies, investor sentiment, and economic conditions influencing the markets, especially during the market disruption period. These factors lead to the distinct behavior of the indices, which is captured by partial correlations close to zero, revealing a disconnection not apparent in the conditional correlation. By contrast, the consistently positive relationship between DJI and NASDAQ, even after isolating other effects, underscores the significant interdependence within the U.S. markets, which the conditional correlation also cannot capture. Therefore, the conditional partial correlation offers a granular perspective, capturing the underlying market relationships that conditional correlation might not fully disclose, revealing the unique market dynamics during the Great Recession. Figure 3.2 presents the time-series behavior of the estimated conditional correlations and conditional partial correlations for stock index pairs during three significant market disruptions: (4) the European Sovereign Debt Crisis (October 2010 – August 2012), (5) the COVID–19 Pandemic (January 2022–August 2022), and (6) the FED’s Inflation-Containment Rate Hikes (March 2022 – August 2023). These estimates of mean and variance correspond to the values tabulated in Table 3.8. During the European Sovereign Debt Crisis, the Rˆ t between the DAX and CAC40 indices declined, indicating market segmentation within Euro, even though France and Germany avoided the worst of the crisis. This segmentation is discernible through reduced conditional correlations, yet the underlying connection between these two key Eurozone players remains inherently positive, as suggested by the Ψˆ t . Conversely, the relationship between the DJI and CAC40 reflects higher volatility and a tendency towards a negative correlation in terms of Rˆ t , which may stem from the disparate nature of the U.S. and French markets during market turbulence. This implies that the interactions between the DJI and CAC40 are more significantly influenced by extraneous indices rather than a direct financial linkage, underlining the divergent market behaviors under stress conditions between different economies. During the COVID-19 pandemic, we analyze distinct groups of stock index pairs, categorized by positive and negative conditional correlations. Positive conditional correlations are 124 Figure 3.2: Conditional (Partial) Correlations in Market Disruption Periods: 2010 – 2023 Notes: COVID-19 case shows the collected results of the negative and positive pairwise indices, where ‘Negative pairwise indices’ = {DJI & DAX’, DJI & CAC40, DJI & NIKKEI, NASDAQ & DAX, NASDAQ & CAC40, NASDAQ & NIKKEI, DAX & NIKKEI, CAC40 & NIKKEI}, and ‘Positive pairwise indices’ = {DJI & NASDAQ, DAX & CAC40}. The abbreviations for the stock indexes are as follows: DJI denotes the Dow Jones Industrial Average, NASDAQ signifies the National Association of Securities Dealers Automated Quotations, DAX stands for Deutscher Aktienindex, CAC40 is an acronym for Cotation Assist´ee en Continu 40, and NIKKEI represents the Nikkei 225 Stock Average. 125 consistently found between indices from the same country, like DJI and NASDAQ, or those belonging to the same economic area, such as DAX and CAC40 in the European Union. This indicates that indices within national or regional boundaries tend to move together, likely influenced by common economic conditions or collective investor sentiment. In contrast, negative conditional correlations between international index pairs suggest a disconnection of these markets during the pandemic. Additionally, the conditional partial correlations for these negatively correlated pairs are close to zero, indicating that their negative correlations are mostly driven by variables other than the direct relationship between the indices. When these variables are accounted for, the conditional partial correlation between these international indices is minimal, pointing to a fundamental disconnection during the pandemic. Thus, while local and regional markets showed some cohesion during the pandemic, the interconnectedness across global markets was disrupted. During the Federal Reserve interest rate hikes aimed at containing inflation, the DJI and NASDAQ, both US indices, exhibited a similar positive conditional correlation, reflecting their representativeness of the US market despite the differences in sector composition within each index. This similarity in movement contrasts with the period of the Dot-com Bubble, where sector rotation was a prominent feature. By contrast, ψˆ t(DJI, NIKKEI) is not statistically significant at the 1% level, suggesting a lack of meaningful linkage in their movements during this period. This lack of correlation could be attributed to the divergence in monetary policy approaches between the US and Japan. Specifically, Japan has not mirrored the US’s interest rate increases, possibly signaling conditional dependence in their monetary policy stances. The conditional correlation, however, only presents a negative association rather than capturing the nuances of independent monetary policy decisions. This difference may imply the importance of distinguishing between correlation and causation: while the indices may move in opposite directions, it does not necessarily reflect a direct linkage to monetary policy. 126 Appendix B contains the ten plots detailing both conditional correlations and conditional partial correlations for the selected five foreign stock indices and their corresponding volatilities. 3.5.2 Daily Returns on Securities Selected from S&P 500 In this application, we consider a panel of 98 US blue chip stocks, well-established, financially sound companies recognized for their stability and reliability, across different industry sectors selected from S&P 500 based on the market capitalization.7 The analysis covers the period from September 2, 2016, to July 31, 2023, encompassing 1,723 trading days. The data for this period is collected from Google and Yahoo Finance, and all companies included are constituents of S&P 500 index throughout this time frame.8 We obtained our time series data for the safe return rate and the five Fama-French factors from the data library of Ken French. For the risk-free rate(rf t), we selected the one-month U.S. treasury bill rate. The market return (rmt) is represented by the return on S&P 500 index, which includes the blue chip stocks in our study. Our analysis encompasses two model specifications assessed using three test statistics. The first is the capital asset pricing model (CAPM), and the second is the Fama and French (2015) five-factor model (FF5). The FF5 model is given by the equation: rit − rf t = ˆαi + βˆ 1,iMktRFt + βˆ 2,iSMBt + βˆ 3,iHMLt + βˆ 4,iRMWt + βˆ 5,iCMAt + ˆuit, (3.5.1) 7This study focuses on blue chip stocks for the analysis. It is important to recognize that such a selection may introduce a bias due to the inherent stability and reliability of these securities, potentially not reflecting the broader market dynamics. Furthermore, this approach might entail a survival bias, as it considers only well-performing stocks, excluding those that did not sustain or were more volatile. This limitation is noteworthy because investors, in practice, do not have the foresight to distinguish future ‘blue chip’ stocks from the entire market spectrum. Hence, the findings should be interpreted with caution, bearing in mind that they may not fully capture the investment decisions made under real-world uncertainty and the diverse nature of the market as represented by a broader index like S&P 500. See Appendix D for the full list of the companies. 8The analyses of both conditional correlations and conditional partial correlations, as discussed in Section 3.5.1, are also applicable to the current study. For a visual representation of these coefficients of firms from each GICS sector, refer to Appendix B, where detailed graphs are provided. 127 for t = 1, 2, . . . , 1723 and i = 1, 2, . . . , 98. We estimate equation (3.5.1) by using a rolling window of h = 20 trading days. In this model, MktRFt is the market excess return defined as rmt − rf t, SMBt represents the size premium measured by the difference in returns between small and large capitalization stock portfolios, HMLt captures the value premium through the difference in returns between high book-to-market and low book-to-market portfolios, RMWt differentiates between the returns of firms with robust and weak profitability, and CMAt contrasts the returns of firms with conservative and aggressive investment profiles. Our application examines three test statistics: the Jˆ α test developed by Pesaran and Yamagata (2023), the GOS test by Gagliardini et al. (2016), and the Standardized Wald test (SW). Pesaran and Yamagata (2023) demonstrate that the GOS and SW tests tend to falsely reject true null hypotheses, especially when the time dimension T, is smaller than the cross-sectional dimension N, compromising their effectiveness in both size and power. For panel regressions based on time series data, we specify the model as yi· = αiτT + Fβi + ui· , (3.5.2) where yi· = (yi1, yi2, . . . , yiT ) ′ , F ′ = (f1,f2, . . . ,fT ), and ui· = (ui1, ui2, . . . , uiT ) ′ . We test the null hypothesis H0 : α = 0, by employing the SW test statistic, written as SW = (τ ′ TMF τT ) αˆ ′Σ−1 u αˆ − N √ 2N , (3.5.3) where MF = IT − F(F ′F) −1F ′ and α = (α1, α2, . . . , αN ) ′ . The test procedure requires the estimation of Σ−1 u , which is where our proposed estimators can be applied. However, as noted above, the SW test suffers from size distortions when T < N and is not designed for time-varying contexts, thus ruling out the use of rolling windows. To address this, we modify the SW test by substituting Σ−1 u with our proposed estimator for the conditional precision matrix, Pˆ t,u in Equation (3.2.6). This is done by first conducting an ordinary least squares regression of (3.5.2) for each cross-sectional unit to obtain the residuals, 128 which are then combined across all units to form Uˆ = (uˆ1· , uˆ2· , . . . , uˆN·) = (uˆ1, uˆ2 . . . , uˆT ) ′ , where uˆt = (u1t , u2t , . . . , uN t) ′ . Applying our Bayesian estimation method to these residuals, we conduct the following time-dependent SW test, written as SW Pˆ t,u = (τ ′ TMF τT ) αˆ ′Pˆ t,uαˆ − N √ 2N , (3.5.4) which allows for the calculation of test statistics when T > N. In addition to our Bayesian nonparanormal conditional estimator denoted by SW(Pˆ t,u), we also consider an alternative method for estimating the conditional precision matrix. This second approach, based on the work of Engle et al. (2019), utilizes the nonlinear shrinkage estimation of the conditional correlation matrices to obtain estimated conditional covariance matrix, Σˆ t,u, and is represented as SW(Σˆ −1 t,u). Test statistics of Jˆ α and GOS are calculated using 20-day rolling windows for daily estimates, covering both the entire sample period and distinct periods of market disruption.9 For SW tests, conditional precision matrices are determined for each t = 1, 2, . . . , T, utilizing all T observations. Unlike the rolling window approach, the SW test statistic is calculated for each period t. Rejection frequencies for these test statistics are then computed at both 5% and 1% significance levels. Table 3.9 summarizes the rejection frequencies of the Jˆ α, GOS and SWt tests based on the CAPM and FF5 model, using the blue chip stocks from S&P 500 index at two nominal sizes (5% or 1%) during both the full sample period (09.02.16–07.31.23) and specific periods characterized by market disruption periods: (1) the Post-Crisis Int. Rate Normalization (12.02.16–06.28.19), (2) COVID-19 (01.20.20–08.31.22), and (3) the FED’s Inf.-Containment Rate Hike (03.16.22–07.31.23) periods. The Post-Crisis Interest Rate Normalization period refers to the phase during which the Federal Reserve increased benchmark interest rates, moving away from the near-zero rates that were in place to support economic recovery after the 2008 financial crisis. This shift impacted asset valuations and introduced volatility as markets 9 In the literature, Jˆ α and GOS test statistics are calculated using monthly observations with 60-month windows. In contrast, our study calculates the test statistics of Jˆ α and GOS using daily observations with a 20-day rolling window. 129 Table 3.9: Rejection Frequencies of the Jˆ α, GOS, and SW tests Tests Jˆ α GOS SW(Pˆ t,u) SW(Σˆ −1 t,u) CAPM model Significance level of 0.05 Full sample period (09.02.16–07.31.23) 0.49 0.96 0.60 0.63 Market disruption periods: (1) Post-Crisis Int. Rate Normalization (12.02.16–06.28.19) 0.75 0.99 0.34 1.00 (2) COVID-19 (01.20.20–08.31.22) 0.10 0.89 0.81 0.27 (3) FED’s Inf.-Containment Rate Hikes (03.16.22–07.31.23) 0.70 0.99 0.89 0.24 Significance level of 0.01 Full sample period (09.02.16–07.31.23) 0.43 0.94 0.49 0.58 Market disruption periods: (1) Post-Crisis Int. Rate Normalization (12.02.16–06.28.19) 0.67 0.98 0.24 1.00 (2) COVID-19 (01.20.20–08.31.22) 0.08 0.85 0.69 0.21 (3) FED’s Inf.-Containment Rate Hikes (03.16.22–07.31.23) 0.66 0.98 0.79 0.10 FF5 model Significance level of 0.05 Full sample period (09.02.16–07.31.23) 0.71 1.00 0.76 0.78 Market disruption periods: (1) Post-Crisis Int. Rate Normalization (12.02.16–06.28.19) 1.00 1.00 0.75 1.00 (2) COVID-19 (01.20.20–08.31.22) 0.23 0.99 0.80 0.68 (3) FED’s Inf.-Containment Rate Hikes (03.16.22–07.31.23) 0.96 1.00 0.86 0.31 Significance level of 0.01 Full sample period (09.02.16–07.31.23) 0.69 1.00 0.69 0.67 Market disruption periods: (1) Post-Crisis Int. Rate Normalization (12.02.16–06.28.19) 1.00 1.00 0.69 1.00 (2) COVID-19 (01.20.20–08.31.22) 0.19 0.99 0.72 0.51 (3) FED’s Inf.-Containment Rate Hikes (03.16.22–07.31.23) 0.95 1.00 0.81 0.09 Notes: This table presents the rejection frequencies with the test statistics Jˆ α, GOS, SW(Pˆ t,u), and SW(Σˆ t,u) of null hypotheses H0 : αi = 0 at significance levels of 0.05 and 0.01. The tests are carried out in the case of the capital asset pricing model (CAPM) and Fama-French five-factor (FF5) models to the securities within S&P 500 index. The test statistics of Jˆ α and GOS are computed using rolling windows of 20 days for daily estimates, spanning both the entire sample period and specific periods of market disruption. The SW(Pˆ t,u) represents the standardized Wald test statistic computed with Pˆt,u, obtained through Bayesian nonparanormal conditional estimation. SW(Σˆ t,u) denotes the standardized Wald test statistic calculated using Σˆ t,u, as estimated by the DCC–NL model Engle et al. (2019). 130 adjusted to a new interest rate regime. The onset of the COVID-19 global pandemic led to unprecedented economic turmoil, supply chain disruptions, and extreme market volatility, prompting a flight to safety among investors and significant intervention by central banks. Lastly, the Federal Reserve’s Inflation-Containment Rate Hikes reflect a period during which the central bank implemented a series of interest rate increases to manage rising inflation, thereby increasing the cost of borrowing and affecting investor sentiment and market liquidity, thus contributing to further market instability. In the case of the CAPM model, the Jˆ α test statistic shows rejection frequencies that range from 0.10 to 0.75 at the 0.05 significance level. During market disruption periods, the rejection frequencies are higher, suggesting that the CAPM model is less reliable during these periods. At the 0.01 significance level, the rejection frequencies are slightly lower but exhibit a similar pattern. The GOS test exhibits high rejection frequencies across all periods and significance levels, often reaching 0.99 or 1.00, which suggests a strong rejection of the null hypothesis and highlights potential inadequacies in the models tested, as shown in Pesaran and Yamagata (2023). For the FF5 model, the rejection rates are also high across all tests, especially during the post-crisis interest rate normalization period. This indicates that even a more comprehensive model like the FF5 model may struggle to explain asset returns during volatile periods. The observed deviation in the rejection frequencies for Jˆ α during the COVID-19 pandemic may be attributed to a combination of factors, including the global scope and unprecedented nature of market disruptions during this time. The stabilizing actions by governments and interventions by central banks during this period might have contributed to a more predictable relationship between returns and identified risk factors, despite this relationship being nonlinear. Additionally, the use of blue-chip stocks from S&P 500 in our analysis could introduce selection bias, potentially influencing these results. Furthermore, the difference in methodologies, where the Jˆ α test employs rolling windows and our proposed SW test 131 calculates conditional precision matrices using the full sample for all t = 1, 2, . . . , T, may also contribute to the observed discrepancies. The comparative evaluation between the SW test and the Jˆ α test reveals that the SW(Pˆ t,u) test yields rejection frequencies that are aligned with those from the Jˆ α test throughout the overall sample periods. However, during periods of market disruptions, especially those coinciding with the COVID-19 pandemic, both tests exhibit distinct variations in rejection frequencies, indicating that they may respond differently to market distress. 3.6 Conclusions This paper makes two main contributions to the literature on high-dimensional multivariate volatility modeling, focusing on the estimation of conditional precision matrices and the exploration of conditional dependence. Our first contribution is the development of a Bayesian method for estimating conditional precision matrices within a high-dimensional DCC-MGARCH framework. This method circumvents the need to invert conditional covariance matrices, which is typically challenging due to semi-positive definiteness. Our Bayesian approach, leveraging the Wishart distribution, simplifies this process. The estimation utilizes the Metropolis-Hastings algorithm within Gibbs sampling. To estimate the unconditional precision matrix, we employ a horseshoe prior, introducing sparsity in high-dimensional contexts. The second contribution is providing estimates of conditional precision and partial correlation matrices, crucial for understanding volatility interconnections. This is achieved through a Bayesian nonparanormal framework, which utilizes rank transformation to convert non-Gaussian distributions into approximate Gaussian ones. Additionally, by implementing a univariate GARCH model for each security, we derive the conditional precision matrix from the conditional inverse correlation matrix, an approach that outperforms standard nonparanormal rank-transformation methods in identifying the precision matrix. 132 We validate our approach through Monte Carlo simulations, comparing it with existing methods, and find our Bayesian estimator to be more effective in the simulation designs, particularly in estimating conditional precision and correlation matrices. Applying our method to empirical data, we analyze daily foreign stock price indices and returns on blue chip stocks from S&P 500. Future research is encouraged to extend the current framework by incorporating variational inference instead of the MCMC approach to improve the computational aspect and the development of conditional tail dependence to improve partial correlation. Such extensions would build upon the foundational work presented here and offer new avenues for capturing more complex dependencies and behaviors in financial markets. 133 Bibliography Aas, K., Czado, C., Frigessi, A. and Bakken, H. (2009). Pair-copula constructions of multiple dependence. Insurance: Mathematics and economics, 44 (2), 182–198. Anatolyev, S. and Pyrlik, V. (2022). Copula shrinkage and portfolio allocation in ultra-high dimensions. Journal of Economic Dynamics and Control, 143, 104508. Aquaro, M., Bailey, N. and Pesaran, M. H. (2021). Estimation and inference for spatial models with heterogeneous coefficients: an application to us house prices. Journal of Applied Econometrics, 36 (1), 18–44. Ardia, D. (2008). Bayesian estimation of the garch (1, 1) model with normal innovations. Financial Risk Management with Bayesian Estimation of GARCH Models: Theory and Applications, pp. 17–37. Arias, O., Hallock, K. F. and Sosa-Escudero, W. (2013). Individual heterogeneity in the returns to schooling: instrumental variables quantile regression using twins data. Economic Applications of Quantile Regression, p. 7. Asai, M. and McAleer, M. (2009). The structure of dynamic correlations in multivariate stochastic volatility models. Journal of Econometrics, 150 (2), 182–192. Ashenfelter, O. and Krueger, A. (1994). Estimates of the economic return to schooling from a new sample of twins. The American economic review, pp. 1157–1173. — and Rouse, C. (1998). Income, schooling, and ability: Evidence from a new sample of identical twins. The Quarterly Journal of Economics, 113 (1), 253–284. Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica, 71 (1), 135–171. — and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica, 70 (1), 191–221. — and — (2008). Forecasting economic time series using targeted predictors. Journal of Econometrics, 146 (2), 304–317. — and — (2013). Principal components estimation and identification of static factors. Journal of econometrics, 176 (1), 18–29. 134 — and Wang, P. (2015). Identification and bayesian estimation of dynamic factor models. Journal of Business & Economic Statistics, 33 (2), 221–240. Bai, Z. and Saranadasa, H. (1996). Effect of high dimension: by an example of a two sample problem. Statistica Sinica, pp. 311–329. Bailey, N., Kapetanios, G. and Pesaran, M. H. (2016). Exponent of cross-sectional dependence: Estimation and inference. Journal of Applied Econometrics, 31 (6), 929–960. —, — and — (2019a). Exponent of cross-sectional dependence for residuals. Sankhya B, 81 (Suppl 1), 46–102. —, — and — (2021). Measurement of factor strength: Theory and practice. Journal of Applied Econometrics, 36 (5), 587–613. —, Pesaran, M. H. and Smith, L. V. (2019b). A multiple testing approach to the regularisation of large sample correlation matrices. Journal of Econometrics, 208 (2), 507–534. Baillie, R. T., Bollerslev, T. and Mikkelsen, H. O. (1996). Fractionally integrated generalized autoregressive conditional heteroskedasticity. Journal of econometrics, 74 (1), 3–30. Barigozzi, M. and Brownlees, C. (2019). Nets: Network estimation for time series. Journal of Applied Econometrics, 34 (3), 347–364. Bauwens, L. and Laurent, S. (2005). A new class of multivariate skew densities, with application to generalized autoregressive conditional heteroscedasticity models. Journal of Business & Economic Statistics, 23 (3), 346–354. Berger, J. O. and Pericchi, L. R. (1996). The intrinsic bayes factor for model selection and prediction. Journal of the American Statistical Association, 91 (433), 109–122. Bernanke, B. S., Boivin, J. and Eliasz, P. (2005). Measuring the effects of monetary policy: a factor-augmented vector autoregressive (favar) approach. The Quarterly journal of economics, 120 (1), 387–422. Bernardo, J. M. (1979). Reference posterior distributions for bayesian inference. Journal of the Royal Statistical Society Series B: Statistical Methodology, 41 (2), 113–128. Beyeler, S. and Kaufmann, S. (2018). Factor augmented VAR revisited: A sparse dynamic factor model approach. Tech. rep., Working paper. — and — (2021). Reduced-form factor augmented var-exploiting sparsity to include meaningful factors. Journal of Applied Econometrics, 36 (7), 989–1012. Bhattacharya, A., Chakraborty, A. and Mallick, B. K. (2016). Fast sampling with gaussian scale mixture priors in high-dimensional regression. Biometrika, p. asw042. 135 — and Dunson, D. B. (2011). Sparse bayesian infinite factor models. Biometrika, 98 (2), 291–306. —, Pati, D., Pillai, N. S. and Dunson, D. B. (2015). Dirichlet–laplace priors for optimal shrinkage. Journal of the American Statistical Association, 110 (512), 1479–1490. Bickel, P. J. and Levina, E. (2008). Covariance regularization by thresholding. The Annals of Statistics, 36 (6), 2577–2604. Billio, M., Caporin, M. and Gobbo, M. (2003). Block dynamic conditional correlation multivariate garch models. Universit`a di Venezia. Boivin, J. and Giannoni, M. (2008). Global forces and monetary policy effectiveness. Tech. rep., National Bureau of Economic Research. — and Ng, S. (2006). Are more data always better for factor analysis? Journal of Econometrics, 132 (1), 169–194. Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of econometrics, 31 (3), 307–327. Bonjour, D., Cherkas, L. F., Haskel, J. E., Hawkes, D. D. and Spector, T. D. (2003). Returns to education: Evidence from uk twins. American Economic Review, 93 (5), 1799–1812. Buchinsky, M. (2001). Quantile regression with sample selection: Estimating women’s return to education in the us. Empirical Economics, 26, 87–113. Cai, T. and Liu, W. (2011). Adaptive thresholding for sparse covariance matrix estimation. Journal of the American Statistical Association, 106 (494), 672–684. Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, 35 (6), 2313 – 2351. Card, D. (1994). Earnings, schooling, and ability revisited. — (1999). The causal effect of education on earnings. Handbook of labor economics, 3, 1801–1863. Carter, C. K. and Kohn, R. (1994). On gibbs sampling for state space models. Biometrika, 81 (3), 541–553. Carvalho, C. M., Chang, J., Lucas, J. E., Nevins, J. R., Wang, Q. and West, M. (2008). High-dimensional sparse factor modeling: applications in gene expression genomics. Journal of the American Statistical Association, 103 (484), 1438–1456. — and West, M. (2007). Dynamic matrix-variate graphical models. Bayesian Analysis, 2 (1), 69 – 97. 136 Chen, C. W., Liu, F.-C. and So, M. K. (2008). Heavy-tailed-distributed threshold stochastic volatility models in financial time series. Australian & New Zealand Journal of Statistics, 50 (1), 29–51. — and So, M. K. (2006). On a threshold heteroscedastic model. International Journal of Forecasting, 22 (1), 73–89. Chen, J., Li, D. and Linton, O. (2019). A new semiparametric estimation approach for large dynamic covariance matrices with multiple conditioning variables. Journal of Econometrics, 212 (1), 155–176. Chen, Z. and Leng, C. (2016). Dynamic covariance models. Journal of the American Statistical Association, 111 (515), 1196–1207. Chernozhukov, V. (2005). Extremal quantile regression. The Annals of Statistics, 33 (2), 806–839. — and Hansen, C. (2005). An iv model of quantile treatment effects. Econometrica, 73 (1), 245–261. — and — (2006). Instrumental quantile regression inference for structural and treatment effect models. Journal of Econometrics, 132 (2), 491–525. Chib, S. and Greenberg, E. (1994). Bayes inference in regression models with arma (p, q) errors. Journal of Econometrics, 64 (1-2), 183–206. Chudik, A. and Pesaran, M. H. (2011). Infinite-dimensional vars and factor models. Journal of Econometrics, 163 (1), 4–22. —, — and Tosetti, E. (2011). Weak and strong cross-section dependence and estimation of large panels. Consonni, G., Fouskakis, D., Liseo, B. and Ntzoufras, I. (2018). Prior Distributions for Objective Bayesian Analysis. Bayesian Analysis, 13 (2), 627 – 679. Dawid, A. P. and Lauritzen, S. L. (1993). Hyper markov laws in the statistical analysis of decomposable graphical models. The Annals of Statistics, pp. 1272–1317. De Mol, C., Giannone, D. and Reichlin, L. (2008). Forecasting using a large number of predictors: Is bayesian shrinkage a valid alternative to principal components? Journal of Econometrics, 146 (2), 318–328. De Nard, G., Ledoit, O. and Wolf, M. (2021). Factor models for portfolio selection in large dimensions: The good, the better and the ugly. Journal of Financial Econometrics, 19 (2), 236–257. Dobra, A. and Lenkoski, A. (2011). Copula gaussian graphical models and their application to modeling functional disability data. The Annals of Applied Statistics, 5 (2A), 969 – 993. 137 Efron, B. and Hastie, T. (2021). Computer age statistical inference, student edition: algorithms, evidence, and data science, vol. 6. Cambridge University Press. Eichler, M. (2007). Granger causality and path diagrams for multivariate time series. Journal of Econometrics, 137 (2), 334–353. — (2012). Graphical modelling of multivariate time series. Probability Theory and Related Fields, 153, 233–268. Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of united kingdom inflation. Econometrica: Journal of the econometric society, pp. 987–1007. — (2002). Dynamic conditional correlation: A simple class of multivariate generalized autoregressive conditional heteroskedasticity models. Journal of Business & Economic Statistics, 20 (3), 339–350. —, Ledoit, O. and Wolf, M. (2019). Large dynamic covariance matrices. Journal of Business & Economic Statistics, 37 (2), 363–375. Fan, J., Liao, Y. and Mincheva, M. (2013). Large covariance estimation by thresholding principal orthogonal complements. Journal of the Royal Statistical Society Series B: Statistical Methodology, 75 (4), 603–680. —, Zhang, J. and Yu, K. (2012). Vast portfolio selection with gross-exposure constraints. Journal of the American Statistical Association, 107 (498), 592–606. Fernandez, C. ´ and Steel, M. F. (1998). On bayesian modeling of fat tails and skewness. Journal of the American Statistical Association, 93 (441), 359–371. Fioruci, J. A., Ehlers, R. S. and Andrade Filho, M. G. (2014). Bayesian multivariate garch models with dynamic correlations and asymmetric error distributions. Journal of Applied Statistics, 41 (2), 320–331. Fitch, A. M., Jones, M. B. and Massam, H. (2014). The Performance of Covariance Selection Methods That Consider Decomposable Models Only. Bayesian Analysis, 9 (3), 659 – 684. Forni, M., Hallin, M., Lippi, M. and Reichlin, L. (2000). The generalized dynamicfactor model: Identification and estimation. Review of Economics and statistics, 82 (4), 540–554. —, —, — and — (2001). Coincident and leading indicators for the euro area. The economic journal, 111 (471), 62–85. Freyaldenhoven, S. (2022). Factor models with local factors-determining the number of relevant factors. Journal of Econometrics, 229 (1), 80–102. Fruhwirth-Schnatter, S. ¨ (1994). Data augmentation and dynamic linear models. Journal of time series analysis, 15 (2), 183–202. 138 Fu, W. and Knight, K. (2000). Asymptotics for lasso-type estimators. The Annals of statistics, 28 (5), 1356–1378. Gagliardini, P., Ossola, E. and Scaillet, O. (2016). Time-varying risk premium in large cross-sectional equity data sets. Econometrica, 84 (3), 985–1046. George, E. I. and McCulloch, R. E. (1993). Variable selection via gibbs sampling. Journal of the American Statistical Association, 88 (423), 881–889. Giannone, D., Lenza, M. and Primiceri, G. E. (2021). Economic predictions with big data: The illusion of sparsity. Econometrica, 89 (5), 2409–2437. Hallin, M. and Liˇska, R. (2007). Determining the number of factors in the general dynamic factor model. Journal of the American Statistical Association, 102 (478), 603–617. Heckman, J. J. (1979). Sample selection bias as a specification error. Econometrica: Journal of the econometric society, pp. 153–161. —, Lochner, L. J. and Todd, P. E. (2006). Earnings functions, rates of return and treatment effects: The mincer equation and beyond. Handbook of the Economics of Education, 1, 307–458. — and Robb Jr, R. (1985). Alternative methods for evaluating the impact of interventions: An overview. Journal of econometrics, 30 (1-2), 239–267. — and Vytlacil, E. (2005). Structural equations, treatment effects, and econometric policy evaluation 1. Econometrica, 73 (3), 669–738. Hoff, P. D. (2007). Extending the rank likelihood for semiparametric copula estimation. The Annals of Applied Statistics, 1 (1), 265 – 283. Huang, J., Ma, S. and Zhang, C.-H. (2008). Adaptive lasso for sparse high-dimensional regression models. Statistica Sinica, pp. 1603–1618. Imbens, G. W. and Angrist, J. D. (1994). Identification and estimation of local average treatment effects. Econometrica, 62 (2), 467–475. Ishwaran, H. and Rao, J. S. (2003). Detecting differentially expressed genes in microarrays using bayesian model selection. Journal of the American Statistical Association, 98 (462), 438–455. — and — (2005). Spike and slab variable selection: Frequentist and Bayesian strategies. The Annals of Statistics, 33 (2), 730 – 773. Kaass, R. and Wasserman, L. (1995). A reference bayesian test for nested hypotheses with large samples. Journal of the ACM, 90 (773-795), 27. Kapetanios, G., Pesaran, M. H. and Reese, S. (2021). Detection of units with pervasive effects in large panel data models. Journal of Econometrics, 221 (2), 510–541. 139 Kaufmann, S. and Schumacher, C. (2019). Bayesian estimation of sparse dynamic factor models with order-independent and ex-post mode identification. Journal of Econometrics, 210 (1), 116–134. Kilian, L. and Lutkepohl, H. (2017). Structural Vector Autoregressive Analysis. Themes in Modern Econometrics, Cambridge University Press. Kim, J.-M. and Jung, H. (2016). Linear time-varying regression with copula–dcc–garch models for volatility. Economics Letters, 145, 262–265. Koller, D. and Friedman, N. (2009). Probabilistic graphical models: principles and techniques. MIT press. Koop, G., Korobilis, D. et al. (2010). Bayesian multivariate time series methods for empirical macroeconomics. Foundations and Trends® in Econometrics, 3 (4), 267–358. Kuo, L. and Mallick, B. (1998). Variable selection for regression models. Sankhy¯a: The Indian Journal of Statistics, Series B, pp. 65–81. Lauritzen, S. L. (1996). Graphical models, vol. 17. Clarendon Press. Le Cam, L. M. and Yang, G. L. (2000). Asymptotics in statistics: some basic concepts. Springer Science & Business Media. Ledoit, O. and Wolf, M. (2003). Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. Journal of Empirical Finance, 10 (5), 603–621. — and — (2004a). Honey, i shrunk the sample covariance matrix. Jounral of Portfolio Management, 4 (30), 110–119. — and — (2004b). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88 (2), 365–411. — and — (2012). Nonlinear shrinkage estimation of large-dimensional covariance matrices. The Annals of Statistics, 40 (2), 1024–1060. — and — (2017). Nonlinear shrinkage of the covariance matrix for portfolio selection: Markowitz meets goldilocks. The Review of Financial Studies, 30 (12), 4349–4388. — and — (2020). Analytical nonlinear shrinkage of large-dimensional covariance matrices. The Annals of Statistics, 48 (5), 3043–3065. — and — (2022). Quadratic shrinkage for large covariance matrices. Bernoulli, 28 (3), 1519–1547. Li, Z. R. and McCormick, T. H. (2019). An expectation conditional maximization approach for gaussian graphical models. Journal of Computational and Graphical Statistics, 28 (4), 767–777. 140 Liu, H., Han, F., Yuan, M., Lafferty, J. and Wasserman, L. (2012). High-dimensional semiparametric gaussian copula graphical models. The Annals of Statistics, 40 (4), 2293 – 2326. —, Lafferty, J. and Wasserman, L. (2009). The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. Journal of Machine Learning Research, 10 (10). Mann, H. B. and Wald, A. (1943). On the statistical treatment of linear stochastic difference equations. Econometrica, Journal of the Econometric Society, pp. 173–220. Mardia, K., Kent, J. and Bibby, J. (1979). Multivariate Analysis. Probability and Mathematical Statistics : a series of monographs and textbooks, Academic Press. Miao, K., Phillips, P. C. and Su, L. (2023). High-dimensional vars with common factors. Journal of Econometrics, 233 (1), 155–183. Mincer, J. A. (1974). The human capital earnings function. In Schooling, experience, and earnings, NBER, pp. 83–96. Mitchell, T. J. and Beauchamp, J. J. (1988). Bayesian variable selection in linear regression. Journal of the american statistical association, 83 (404), 1023–1032. Mohammadi, A., Abegaz, F., Heuvel, E. and Wit, E. C. (2017). Bayesian modelling of dupuytren disease by using gaussian copula graphical models. Journal of the Royal Statistical Society Series C: Applied Statistics, 66 (3), 629–645. — and Wit, E. C. (2015). Bayesian structure learning in sparse gaussian graphical models. Bayesian Analysis, 10 (1), 109 – 138. Mulgrave, J. J. and Ghosal, S. (2020). Bayesian inference in nonparanormal graphical models. Bayesian Analysis, 15 (2), 449 – 475. — and — (2022). Regression-based bayesian estimation and structure learning for nonparanormal graphical models. Statistical Analysis and Data Mining: The ASA Data Science Journal, 15 (5), 611–629. — and — (2023). Bayesian analysis of nonparanormal graphical models using rank-likelihood. Journal of Statistical Planning and Inference, 222, 195–208. Muller, D. ¨ and Czado, C. (2019a). Dependence modelling in ultra high dimensions with vine copulas and the graphical lasso. Computational Statistics & Data Analysis, 137, 211–232. — and Czado, C. (2019b). Selection of sparse vine copulas in high dimensions with the lasso. Statistics and Computing, 29 (2), 269–287. Muller, P. ¨ , Parmigiani, G. and Rice, K. (2007). FDR and Bayesian Multiple Comparisons Rules. In Bayesian Statistics 8: Proceedings of the Eighth Valencia International Meeting June 2–6, 2006, Oxford University Press. 141 Nard, G. D., Engle, R. F., Ledoit, O. and Wolf, M. (2022). Large dynamic covariance matrices: Enhancements based on intraday-data. Journal of Banking & Finance, 138, 106426. Nelson, D. B. (1991). Conditional heteroskedasticity in asset returns: A new approach. Econometrica: Journal of the econometric society, pp. 347–370. Neville, S. E., Ormerod, J. T. and Wand, M. P. (2014). Mean field variational bayes for continuous sparse signal shrinkage: Pitfalls and remedies. Electronic Journal of Statistics, 8 (1), 1113 – 1151. Newey, W. K. (2009). Two-step series estimation of sample selection models. The Econometrics Journal, 12 (suppl 1), S217–S229. Oh, D. H. and Patton, A. J. (2016). High-dimensional copula-based distributions with mixed frequency data. Journal of Econometrics, 193 (2), 349–366. — and — (2017). Modeling dependence in high dimensions with factor copulas. Journal of Business & Economic Statistics, 35 (1), 139–154. — and — (2023). Dynamic factor copula models with estimated cluster assignments. Journal of Econometrics. O’Hagan, A. (1995). Fractional bayes factors for model comparison. Journal of the Royal Statistical Society: Series B (Methodological), 57 (1), 99–118. Paci, L. and Consonni, G. (2020). Structural learning of contemporaneous dependencies in graphical var models. Computational Statistics & Data Analysis, 144, 106880. Pakel, C., Shephard, N., Sheppard, K. and Engle, R. F. (2021). Fitting vast dimensional time-varying covariance models. Journal of Business & Economic Statistics, 39 (3), 652–668. Patton, A. J. (2009). Copula–based models for financial time series. In Handbook of financial time series, Springer, pp. 767–785. Pesaran, M. H. and Yamagata, T. (2023). Testing for alpha in linear factor pricing models with a large number of securities. Journal of Financial Econometrics. Pitt, M., Chan, D. and Kohn, R. (2006). Efficient bayesian inference for gaussian copula regression models. Biometrika, 93 (3), 537–554. Poignard, B. and Asai, M. (2023). High-dimensional sparse multivariate stochastic volatility models. Journal of Time Series Analysis, 44 (1), 4–22. Rothman, A. J., Levina, E. and Zhu, J. (2009). Generalized thresholding of large covariance matrices. Journal of the American Statistical Association, 104 (485), 177–186. Rue, H. and Held, L. (2005). Gaussian Markov random fields: theory and applications. CRC press. 142 Sentana, E. (1995). Quadratic arch models. The Review of Economic Studies, 62 (4), 639–661. — (2009). The econometrics of mean-variance efficiency tests: a survey. The Econometrics Journal, 12 (3), C65–C101. Sims, C. A. (1980). Macroeconomics and reality. Econometrica: journal of the Econometric Society, pp. 1–48. So, M. K. and Yip, I. W. (2012). Multivariate garch models with correlation clustering. Journal of Forecasting, 31 (5), 443–468. Srivastava, M. S. and Du, M. (2008). A test for the mean vector with fewer observations than the dimension. Journal of Multivariate Analysis, 99 (3), 386–402. Stock, J. H. and Watson, M. W. (1999). Forecasting inflation. Journal of monetary economics, 44 (2), 293–335. — and — (2002). Macroeconomic forecasting using diffusion indexes. Journal of Business & Economic Statistics, 20 (2), 147–162. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58 (1), 267–288. Tse, Y. K. and Tsui, A. K. C. (2002). A multivariate generalized autoregressive conditional heteroscedasticity model with time-varying correlations. Journal of Business & Economic Statistics, 20 (3), 351–362. Uhlig, H. (2005). What are the effects of monetary policy on output? results from an agnostic identification procedure. Journal of Monetary Economics, 52 (2), 381–419. Uusitalo, R., Conneely, K. et al. (1998). Estimating Heterogeneous Treatment Effects in the Becker Schooling Model. Tech. rep., Department of Economics. Van der Vaart, A. W. (2000). Asymptotic statistics, vol. 3. Cambridge university press. Villani, M. (2001). Fractional bayesian lag length inference in multivariate autoregressive processes. Journal of Time Series Analysis, 22 (1), 67–86. Whittaker, J. (2009). Graphical models in applied multivariate statistics. Wiley Publishing. Winship, C. and Western, B. (2016). Multicollinearity and model misspecification. Sociological Science, 3 (27), 627–649. Yang, C. F. (2021). Common factors and spatial dependence: An application to us house prices. Econometric Reviews, 40 (1), 14–50. Zhao, P. and Yu, B. (2006). On model selection consistency of lasso. The Journal of Machine Learning Research, 7, 2541–2563. 143 Appendix A Appendix to Chapter 1 A.1 Kolmogorov-Smirnov Test Results Table A.1: Kolmogorov-Smirnov Test Results Hypothesis Test Statistics Critical Values 10% 5% 1% Panel A: IVQR Levels Model with All Covariates No Effect 13.4708 2.4678 2.7320 3.2467 Location Shift 4.1822 2.9172 3.1575 3.8388 Dominance 0 2.5525 2.8284 3.3369 Exogeneity 2.2948 3.0729 3.3390 3.8828 Panel B: IVQR Proxy Model with All Covariates No Effect 11.5599 2.5605 2.7606 3.2280 Location Shift 3.2427 2.7954 3.1367 3.7127 Dominance 0 2.4995 2.8123 3.2862 Exogeneity 2.6846 3.0606 3.3321 3.8346 Notes: “No Effect” indicates the hypothesis that there is no treatment effect, with the treatment effect function α(τ ) = 0, for all τ ∈ (0, 1), suggesting no change attributable to the treatment. “Location Shift” refers to a consistent treatment effect across all outcomes, where α(τ ) = α, signifying a uniform shift in the outcome due to the treatment. “Dominance” hypothesizes that the treatment effect is non-negative across all outcomes, with α(τ ) ≥ 0, implying that the treatment does not result in worse outcomes at any quantile compared to the control. “Exogeneity” implies the hypothesis that the variable of interest is endogenously influenced by the treatment. 144 A.2 Residual Plots for Checking Linearity Figure A.1: Residual Plots of the Levels Model 145 A.3 Mean Estimates of the Levels and Proxy Models Table A.2: Mean Estimates of Returns to Education: Levels Model LS IV Educ only Base only All Educ only Base only All Education 0.1022*** 0.1100*** 0.1220*** 0.1054*** 0.1160*** 0.1276*** (0.0108) (0.0096) (0.0095) (0.0117) (0.0104) (0.0103) Age 0.1039*** 0.0897*** 0.1040*** 0.0892*** (0.0105) (0.0109 ) (0.0105) (0.0109) Age2 – 0.0011*** – 0.0010*** – 0.0011*** – 0.0010*** (0.0001) (0.0001 ) (0.0001) (0.0001) F emale – 0.3180*** – 0.2487*** – 0.3156*** – 0.2465*** (0.0400) (0.0397) (0.0401) (0.0397) W hite – 0.1001 – 0.1009 – 0.0980*** – 0.0996 (0.0722) (0.0689) (0.0722) (0.0690) M arried 0.1035* 0.1102 * (0.0500 ) (0.0503) Union 0.1114 * 0.1130* (0.0472 ) (0.0472 ) T enure 0.0210*** 0.0211*** (0.0028 ) (0.0028) Intercept 1.0077*** – 1.0949 – 1.0669 0.9629*** – 1.1879 – 1.1416 (0.1533) (0.2612) (0.2523) (0.1656) (0.2686) (0.2581) F-statistic 69.06 69.06 57.6 81.19 67.58 56.03 Adjusted R2 (%) 11.5 33.4 40.3 11.5 33.4 40.3 N 678 674 663 678 674 663 Notes: Figures in brackets are standard errors. ∗ ∗ ∗ indicates statistical significance at the 1% level, ** at the 5% level, and ∗ at the 10% level. The instrument employed is the education level of the first twin as reported by the second twin. 146 Table A.3: Mean Estimates of Returns to Education: Proxy Model LS IV Educ only Base only All Educ only Base only All Education 0.1079*** 0.1050*** 0.1185*** 0.1130*** 0.1123*** 0.1250*** (0.0118) (0.0103) (0.0102) (0.0130) (0.0113) (0.0113) Age 0.1062*** 0.0919*** 0.1062*** 0.0912*** (0.0107) (0.0111) (0.0107) (0.0112) Age2 – 0.0011*** – 0.0011*** – 0.0011*** – 0.0010*** (0.0001) (0.0001) (0.0001) (0.0001) F emale – 0.3198*** – 0.2535*** – 0.3178*** – 0.2515*** (0.0405) (0.0404) (0.0405) (0.0404) W hite – 0.1228 – 0.1256 – 0.1179 – 0.1220 (0.0751) (0.0720) (0.0752) (0.0721) M arried 0.1010* 0.1075* (0.0508) (0.0510) Union 0.1138* 0.1146* (0.0485) (0.0485) T enure 0.0204*** 0.0206*** (0.0029) (0.0029) F ather′s Edu. – 0.0084 0.0141 0.0101 – 0.0096*** 0.0124 0.0086 (0.0080) (0.0073) (0.0071) (0.0081) (0.0074) (0.0072) Intercept 1.0255*** – 1.2393 – 1.1708 0.9681 – 1.3273 – 1.2377 (0.1620) (0.2700) (0.2615) (0.1726) (0.2759) (0.2659) F-statistic 44.29 57.77 50.12 40.71 56.87 48.91 Adjusted R2 (%) 11.6 33.9 40.3 11.5 33.9 40.3 N 661 657 646 661 657 646 Notes: Figures in brackets are standard errors. ∗ ∗ ∗ indicates statistical significance at the 1% level, ** at the 5% level, and ∗ at the 10% level. The instrument employed is the education level of the first twin as reported by the second twin. 147 Appendix B Appendix to Chapter 2 B.1 Discussion of Rescaled Spikes and Slab This section considers addressing the potential issues of the collinearity or rank deficiency in high-dimensional dataset. We adopt a rescaled version of the spike and slab priors for the coefficients of the lagged variables within the transition matrix. We evaluate the effectiveness of these estimators, derived from the rescaled spike and slab prior, by translating the Bayesian estimation challenge into a Frequentist ridge estimation problem. Our analysis extends to examining how the posterior mean achieves asymptotic maximization of the posterior distribution through the sensitivity analysis of coefficient priors’ selection. To be specific, within the context of high-dimensional datasets, we can adopt a rescaled version of the spike and slab priors for the coefficients of lagged variables. This approach effectively addresses the issue of collinearity or potential rank deficiency that arises with a high number of variables—notably, the spike and slab prior shares similarities with ridge regression in terms of its properties. Consequently, we evaluate the characteristics of the estimators derived from the rescaled spike and slab prior by transforming the Bayesian estimation problem into a Frequentist ridge estimation problem. This involves assessing how the posterior mean asymptotically maximizes the posterior distribution through analyzing the sensitivity related to the selection of coefficient priors. Specifically, by demonstrating that the log ratio of the posterior distribution achieves local 148 asymptotic normality, we establish that the posterior mean represents the maximizing value within the identified distribution. To foster sparsity in the actual Bayesian estimation, we utilize the “Zcut” method proposed by Ishwaran and Rao (2005). B.1.1 Rescaled Spike and Slab We consider the selective shrinkage estimation method to obtain an updated estimate of the high-dimensional transition matrices, B. To solve the collinearity (or possible rank deficiency) of the matrices, we employ the spikes and slab priors pioneered by Mitchell and Beauchamp (1988) and developed. In particular, to keep the prior effects to the posterior, we use a rescaling technique suggested by Ishwaran and Rao (2005). This rescaled spikes and slab model is a type of weighted generalized ridge regression (WGRR, hereafter) estimator, which can be also viewed as a Bayesian estimator. We assume the similar covariate restrictions used in Ishwaran and Rao (2005). Assumption 3. Let Z be the T × N matrix including (estimated) factors from the model (2.2.4). We assume the following conditions: (i) PT t=1 zti = 0 and PN i=1 z 2 ti = N for each i = 1, . . . , N, (ii) max1≤t≤T ∥zt∥ / √ n −→ 0, where ∥·∥ is ℓ2-norm. (iii) Z′Z is positive definite. (iv) ψT = Z′Z/T −→ ψ0, where ψ0 is positive definite. Condition (i) implies the centered and rescaled covariates. Condition (ii) prevents any covariates from being too large. Condition (iii) and (iv) are addressing the non-invertible design matrix. Now, for simplicity, under the model with k = 1 with B1 = B and Z = Xt−1, a rescaled spikes and slab 149 model with continuous bimodal priors x ∗ t | xt−1, B, Λ, σ2 ind∼ NN (Λ ′ft + B′zt−1,σ 2λT ), t = 1, . . . , T, (B.1.1) Bi | Ii , τ 2 i ind∼ N(0, Iiτ 2 i ), i = 1, . . . , N (Ii | v0, η) i.i.d. ∼ (1 − η)hv0 (·) + ηh1(·), τ −2 i | aτ , bτ i.i.d. ∼ Gamma(aτ , bτ ), η ∼ Uniform[0, 1], σ −2 ∼ Gamma(aσ, bσ), where x ∗ t is a rescaled vector-valued variable xt , λt is a penalty term controlling the variance, Iiτ 2 i = γi is a hypervariance having a continuous bimodal distribution with a spike at v0 and a right continuous tail1 , and hc(·) is a discrete measure centered at the value c. The parameter η controls the probability of Ik to be 1 or v0 = 0.005. We draw τ 2 i and σ 2 from an inverse-gamma distribution with (aτ , bτ ) = (5, 50), and (aσ, bσ) = (0.0001, 0.0001) respectively. The rescaled spike and slab priors effectively correct the correlation issue by drawing upon the properties of WGRR estimation, which allows us to induce sparsity (or stochastic variable selection) simultaneously while estimating the model through the MCMC (Beyeler and Kaufmann, 2021). B.1.2 Asymptotic Properties of the Rescaled Spikes and Slab First, we consider the connection with the posterior mean from the rescaled spike and slab model with the WGRR to analyze the asymptotic properties of estimates. Recall that λt in (B.1.1) is a 1This part corresponds to the spike and slab hierarchical Bayesian model’s (B | γ) ∼ N(0,Γ). penalty term. Then, it is equivalent to solve the optimization problem θb∗ T (γ, σ2 ) = σbT Λb∗ T (γ, σ2 )/ √ n σbT Bb∗ T (γ, σ2 )/ √ n = arg min B (X − F Λ − ZB) ′ W(X − F Λ − ZB) + Λ − Λ0 B − B0 ′ ∆ Λ − Λ0 B − B0 , (B.1.2) where W = IT is a (T ×T) diagonal matrix and ∆ is a (M+N)×(M+N) symmetric positive definite matrix. Let Γ be the (M + N) × (M + N) diagonal matrix; diag{γ1, . . . γM, γM+1, . . . , γM+N } = diag {0, . . . , 0, γM+1, . . . , γM+N }. Thus, when ∆ = σ 2λTΓ −1 , it is an equivalent to the model in (B.1.1) and gives the ridge solution Λb∗ T (γ, σ2 ) Bb∗ T (γ, σ2 ) = F ′F F′Z Z′F Z′Z + ∆ −1 F ′X∗ Z′X∗ = ˆσ −1 T n 1/2 F ′F F′Z Z′F Z′Z + σ 2λTΓ −1 −1 F ′X Z′X , (B.1.3) where Λb∗ T (γ, σ2 ) = ΛbOLS and Bb∗ T (γ, σ2 ) = E(B0 | γ, σ2 , X∗ ) is the conditional posterior mean. Then, its posterior (non-conditional) mean is E Λ B X∗ = ˆσ −1 T n 1/2 ˆ F ′F F′Z Z′F Z′Z + σ 2λTΓ −1 −1 F ′X Z′X (ϕ × ξ) dγ, dσ2 | X∗ which implies that it is a weighted (model) average of ridge (or shrinkage) estimates, where the posterior of γ ∼ ϕ(dγ) and σ 2 ∼ ξ(dσ2 ) control the weight adaptively. Therefore, we can conclude that the WGRR estimator can be considered the Bayesian posterior mean estimator of B under a multivariate Gaussian prior to the linear regression parameter. 15 Remark 5. Note that the penalization only applies to the coefficients of Z, i.e., B. For the coefficients of the estimated factor, Fe, is equivalent to the OLS estimator in its simultaneous estimation. In the proceeding analysis, we consider the consistency and optimality of Bb, through local asymptotics. B.1.3 Consistency of Estimators Recall that a penalty does not apply to the estimated factors. Then, the objective function (B.1.2) is the least-squares objective function with respect to Λ given estimated Fe. Thus, given Bbi , we can have Λbi = (Fe′Fe) −1Fe′ (X − ZBbi ). After concentrating out Λi , the optimization problem becomes σbT Bb∗ T (γ, σ2 )/ √ n = arg min B nMFe (X − ZB) ′ W MFe (X − ZB) + (B − B0) ′ ∆ (B − B0) o , where MFe = IT − Fe(Fe′Fe) −1Fe′ . Then, we can have the following result. Theorem 1. Suppose X is concentrated out by estimated factors and loadings. Assume Assumption 2 holds, where ϵit are independent and E(ϵit) = 0 and E(ϵit) = σ 2 0 . Further assume that there exist some c1, c2 > 0 such that γi ≥ c1 and σ 2 ≤ c2, which bounds both ϕ and ξ. Then, a rescaled conditional posterior mean, θbi∗ T = σbT Bbi∗ T / √ T = Bbi T + Op(λ ∗ T ) p −→ Bi for i = 1, . . . , N if λ ∗ T = λT /T → 0. From the result above, we can estimate the transition matrix BbT = (Bb1 T , . . . , BbN T ). Theorem 1 implies that a posterior mean is asymptotically consistent for Bi , i = 1, . . . , N if a penalization effect, λt/T goes to 0. Therefore, this suggests that λT = T if we want an estimate that a shrinkage effect is persistent asymptotically, which is desirable in the model selection (Ishwaran and Rao, 2005). Notice that Theorem 1 is the same consistency argument for Bridge estimator (Theorem 1 of Fu and Knight, 2000) and Theorem 2 of Ishwaran and Rao (2005). Their model specifications are different from ours, but we can have similar results using the concentration discussed above. B.1.4 Local Asymptotics of Estimators We consider local asymptotics of posterior to quantify the behavior and the sensitivity concerning the prior choice for B. Assume λT = T to ensure that a shrinkage effect does not disappear 1 asymptotically to a posterior mean. Then, from (B.1.1), we can re-write a rescaled spikes and slab model with the concentrated out X having the form: (Y ∗ T | Z, B∗ ) ind∼ N (ZB∗ , n), t = 1, . . . , T, (B.1.4) (B | γ) ∼ P (dB | γ), γ ∼ ϕ (dγ), where Y ∗ T = √ TMFeX∗ T , Z = MFeXt−1, B∗ = B/ √ T and P(dB | γ) is the prior measure for B given γ. Consider the posterior measure for B given Y ∗ T = (Y ∗ T1 , . . . ,Y ∗ T T ) ′ to see its behavior; we write ΠP(s | Y ∗ T ). With out loss of generality, assume that σ 2 0 = 1. Notice that we develop the argument based on each column of Y ∗ T , i.e., Y i∗ T = (Y i∗ T1 , . . . ,Y i∗ T T ) ′ , Bi = (Bi 1 , . . . , Bi T ), and Ei T = (ϵT1, . . . , ϵT T ) ′ . Remark 6. After rescaling the parameter, we call a sequence of statistical models following local asymptotic normality if their likelihood ratio processes converge to a Gaussian model. In other words, it is the convergence of a local statistical experiment to a normal experiment. One important example is a smooth parametric model obtained from repeated sampling (Van der Vaart, 2000). See also Le Cam and Yang (2000). Remark 7. Recall that the set of constraints in the ridge regression can be represented as a sphere over the origin, namely, 0. If the estimate is reduced to zero in R 2 , on the other hand, the contour sets of the sum of the error squares look like ellipses. Then the least-squares estimator is the midpoint of the ellipses, while the ridge regression estimator is the point on the circle where the contour is tangent, reducing the least-squares coefficients in the direction of the zero vector. Then, it is possible to represent the set of ridges estimators as a set of points lies on the sphere whose center is origin with a radius √ τ , a constraint2 . We can write this as S(0, √ τ ). Theorem 2. Suppose X is concentrated out by estimated factors and loadings. Assume that P has a density f, continuous, and positive everywhere. Then, for the true regression model Y i∗ T = ZBi + Ei for each column i = 1, . . . , N, we have E(ϵT i) = 0, E(ϵ 2 T i) = σ 2 = 1, and E(ϵ 4 T i) < M for some 2Under a simple linear model, given the specified ridge parameter λ, we can have the constraint as Y ′X(X′X + λIp) −1 (X′X + λIp) −1X′Y = τ , where p is the number of covariates in this case. 153 M < ∞. If Assumption 2 hold, then for each Bi 1 ∈ R N and each C > 0, ΠP(S(Bi 1 , C/√ T) | Y i∗ T ) ΠP(S(Bi , C/√ T) | Y i∗ T ) d −→ log f(Bi 1 ) f (Bi) (B.1.5) − 1 2 Bi 1 − Bi ′ ψT Bi 1 − Bi + (Bi 1 − Bi ) ′N, where N ∼ N(0, ψ0). Theorem 2 shows that the asymptotic behavior of the log-ratio posterior probability achieves the local asymptotic normality, which identifies the sensitivity of B priors. For example, suppose that B has a normal distribution with a mean 0 and variance Γ0, like our case. Then, from (B.1.5), we can have the limiting distribution as a function of Bi 1 1 2 (Bi′Γ −1Bi ) − 1 2 Bi′ 1 Γ −1Bi 1 − 1 2 Bi 1 − Bi ′ ψ0(Bi 1 − Bi ) + Bi 1 − Bi ′ N, which it achieves the maximum at Bi 1 = (ψ + Γ −1 ) −1 (ψBi + N) following N ΩBi , ΩψΩ , where Ω = (I + ψ −1Γ −1 ) −1 . 3 Let this found limiting normal distribution as P(s | γ) indicates a limiting ridge estimator. Theorem 3. Suppose X is concentrated out by estimated factors and loadings and a transition matrix follows Bi ∼ N(0,Γ) for some fixed Γ. Then, under the same conditions as Theorem 2, the posterior mean of (B.1.4), Bbi∗ T (γ) = E(Bi | γ,Y ∗ T ) d −→ P(s | γ) for i = 1, . . . , N. Theorem 3 implies that the posterior mean asymptotically maximizes the posterior distribution when a normal prior for Bi with a fixed hypervariance Γ0. Furthermore, we can see that the hypervariance induces the sparsity of the transition matrix. This motivates that we need to have a rule that controls the hypervariance. We employ a thresholding rule method called “Zcut” suggested by Ishwaran and Rao (2003, 2005). The Zcut method selects Bbi∗ T (γ) = (Bbi∗ 1,T , . . . , Bbi∗ N,T ) ′ , the posterior mean for Bi if Bbi∗ s,T ≥ zα/2 , where zα/2 is the (100(1 − α/2)) percentile of a standard normal distribution. For example, if we set α = 0.05, zα/2 = 1.96. For the oracle performance of the Zcut method and its consistency for 3Note (ψ + Γ −1 ) −1 = (ψ(I + ψ −1Γ −1 ))−1 = (I + ψ −1Γ −1 )ψ −1 = Ωψ −1 . selecting true variables. Instead, we consider how gamma is actually updated to produce B and selected by the Zcut method in Gibbs sampling. Recall our rescaled spike and slab model in (B.1.1). The prior for Bi follows the normal distribution with mean 0 and variance Iiτ 2 i = γi , a hypervariance. Now, to obtain the updated the hypervariance, we first simulate Ii from its conditional distribution, given Bi , Ii | Bi , τ , η ind∼ η1,i η1,i + η2,i hv0 (·) + η2,i η1,i + η2,i h1(·), where η1,i = (1−η)v −1/2 0 exp − (Bi s) 2 2v0τ 2 i and η2,i = η exp − (Bi s) 2 2τ 2 i for s = 1, . . . , N. Then, Ii takes 1 or v0. Second, we obtain τ −2 i from its conditional distribution, τ −2 i | Bi , I ind∼ Gamma aτ + 1 2 , bτ + Bi s 2 2Ii ! , i = 1, . . . , N. Third, we sample η from its conditional distribution, (η | γ) ∼ Beta (1 + #{s : Is = 1}, 1 + #{s : Is = v0}), s = 1, . . . , N. Fourth, we draw σ −2 from its conditional distribution, (σ −2 | Bi ,Y i∗ ) ∼ Gamma(aσ + T 2 , bσ + 1 2T Y i∗ − ZBi 2 ). Finally, we can update γ by letting γi = Iiτ 2 i for i = 1, . . . , N. Therefore, we can induce the sparsity in the transition matrix and obtain its residuals for the model, which we use to discuss the following Bayesian graphical model selection method. B.2 Proofs B.2.1 Proof of Theorem This proof is the same type of proof in Fu and Knight (2000) and Ishwaran and Rao (2005). We apply it to our model and theoretical results. We start by proving that Bb0,t is a consistent estimator. From (iv) in Assumption 2, by concentrating out the estimated factors and loadings, we can have Λbi = (Fe′Fe) −1Fe′ (X − ZBbi 0 ). After concentrating out Λi , we have Y = MFeX and Z = MFeXt−1, where MFe = IT − Fe(Fe′Fe) −1Fe′ . Then, for each column of Y , write Y i , i = 1, . . . , N Bbi 0,T = (Z ′Z) −1 (Z ′Y i ), = (Z ′Z) −1 (Z ′ZBi 0 ) + T −1 (Z ′Z) −1Z ′E, = Bi 0 + ∆T , p −→ Bi 0 where Bb0,T = (Bb1 0,T , . . . , BbN 0,T ), E = (ε1, . . . , εT ) ′ , E(∆T ) = 0 and Var(∆T ) = σ 2 0ψ −1 T /T. Then, from (B.1.3), and the Woodbury matrix identity, we have θˆ∗ T (γ, σ2 ) = σbT Bbi∗ T (γ, σ2 )/ √ n = Z ′Z + σ 2λTΓ −1 −1 Z ′Y i = (IN − (σ −2λ −1 T Z ′Z + Γ −1 ) −1Γ −1 )Bbi 0,t. Therefore, θbi∗ T = Bbi 0,T − ˆ (σ −2λTZ ′Z + Γ −1 ) −1Γ −1Bbi 0,T (ϕ × ξ) dγ, dσ2 | Y i∗ , = Bbi 0,T − λ ∗ T ˆ σ 2STΓ −1Bbi 0,T (ϕ × ξ) dγ, dσ2 | Y i∗ , (B.2.1) where λ ∗ T = λT /T and ST = ψT + σ 2λ ∗ TΓ −1 . Now, using the Jordan matrix decomposition allows us to re-write ST as ST = P JP −1 = X N i=1 ei,T si,T s ′ i,T , where the columns of P are eigenvectors, J = diag{J1, . . . , Jr} and r is the number of distinct eigenvalues of ST . Each Jordan blocks Ji is mi × mi , where mi is the multiplicity of ei,t. Hence, si,T is a set of orthonormal eigenvectors with corresponding eigenvalues ei,T . Now, reorder the eigenvalues in the increasing order such that e1,T < ... < eN,T . From (iv) in Assumption 2, we know that there exists a minimum eigenvalue for ϕT greater than some e0 > 0 if T is large enough. Therefore, for large T, e1,T ≥ e0 + σ 2λ ∗ T min i γ −1 i ≥ e0 > 0. Now, from (B.2.1) observe that STΓ −1Bbi 0,T 2 = X N i=1 e −2 i,T (s ′ i,TΓ −1Bbi 0,T ) 2 ≤ e −2 0 Bbi 0,T 2X N i=1 γ −2 i . Therefore, if there exist some c1 > 0 such that γi ≥ c1 for each i = 1, . . . , N and 0 < c2 < ∞ such that σ 2 < c2 over the support of ϕ and ξ respectively , the second term in (B.2.1) is stochastically bounded: ˆ σ 2S −1 T Γ −1Bbi 0,T (ϕ × ξ)(dγ, dσ2 | Y i∗ ) ≤ e −1 0 Bbi 0,T X N i=1 ˆ σ 4 γ −2 i (ϕ × ξ)(dγ, dσ2 | Y i∗ ) !1/2 ≤ c2C 1/2 c1e0 Bbi 0,T for some C > 0. Therefore, for each i, θbi∗ T = Bbi 0,T − λ ∗ T ˆ σ 2STΓ −1Bbi 0,T (ϕ × ξ) dγ, dσ2 | Y i∗ = Bbi 0,T + Op(λ ∗ T ) p −→ Bi 0 as desired. ■ Lemma 1. (Lemma A.1 of Ishwaran and Rao, 2005) Assume that for each T, ϵT1, . . . , ϵT,T are independent random variables such that E(ϵT t) = 0, E(ϵ 2 T t) = σ 2 0 and E(ϵ 4 T t) ≤ M for some finite M. If Assumption 2 hold, then √ TZ ′E i T = √ T X T t=1 ϵT tzt d −→ N 0, σ2 0ψ0 , where Ei T = (ϵT1, . . . , ϵT T ) ′ . Proof of Lemma 1. See Lemma A.1 in Ishwaran and Rao (2005). ■ 1 B.2.2 Proof of Theorem 2 We closely follow the proof procedures in Ishwaran and Rao (2005) with a slight modification for our model. Let S(Bi 0 , C/√ T) be a sphere centered at B0 with radius C/√ T. Then, by assuming σ 2 0 = 1, its posterior measure for Bi 0 given Y i∗ T = (Y i∗ T1 , . . . ,Y i∗ T T ) ′ = ZBi 0 + Ei T for each column i = 1, . . . , N is ΠP S(Bi 0 , C/√ T) | Y i∗ T ∝ ˆ I{Bi ∈ S(Bi 0 , C/√ T)} · exp − 1 2 (Y i∗ T − ZBi ) ′ (Y i∗ T − ZBi ) P(dBi ) ∝ ˆ I{Bi ∈ S(Bi 0 , C/√ T)} · exp − 1 2 ((Y i∗ T − ZBi ) ′ (Y i∗ T − ZBi ) P(dBi ) ∝ ˆ I{Bi ∈ S(Bi 0 , C/√ T)} (B.2.2) · exp ( − 1 2 (Bi − Bi 0 ) ′ψT (Bi − Bi 0 ) + 1 √ T X T t=1 ϵT tz ′ t (Bi − Bi 0 ) ) P(dBi ), where Ei T = (ϵT1, . . . , ϵT T ) ′ . Then, by dividing both the numerator and the denominator by the likelihood of a normal density with mean z ′ T tBi 0 and variance , T −N/2 QT t=1 ϕ(Y i∗ T t | z ′ T tBi 0 , T), we can derive the following ratio from (B.2.2) ΠP(S(Bi 1 , C/√ T) | Y i∗ T ) ΠP(S(Bi 0 , C/√ T) | Y i∗ T ) = ´ I{Bi ∈ S(Bi 1 , C/√ T)}D(Bi )P(dBi ) ´ I{Bi ∈ S(Bi 0 , C/√ T)}D(Bi)P(dBi) , (B.2.3) where D(Bi ) = exp ( − 1 2 Bi − Bi 0 ′ ψT Bi − Bi 0 + 1 √ T X T t=1 ϵT tz ′ t Bi − Bi 1 ) . Step 1 Denominator: Now, by substituting Bi into ν = √ T(Bi − Bi 0 ), we can rewrite the denominator in (B.2.3), ΠP S(Bi 0 , C/√ T) | Y i∗ T = ˆ I{ν ∈ S(0, C)}, (B.2.4) · exp ( − 1 2T ν ′ψT ν + 1 T X T t=1 ϵT tz ′ tν ) f Bi 0 + ν/ √ T dν where − 1 2T ν ′ψT ν = O(1/T), which is uniform over ν ∈ S(0, C). From the Cauchy-Schwartz inequality we can derive the following observation: for each δ > 0, P 1 T X T t=1 ϵT tz ′ tν ! ≤ 1 δ 2T2 X T t=1 E ϵT tz ′ tν 2 ≤ C 2 δ 2T2 X T t=1 ∥zt∥ 2 = o(1), where maxt ∥zt∥ / √ T = o(1) by the assumption. In other words, 1 T PT t=1 ϵT tz ′ tν p −→ 0 uniformly over the region ν ∈ S(0, C). From the continuity and boundedness of f over ν ∈ S(0, C), the log of denominator (B.2.4) becomes log (B.2.4) p −→ log ˆ I{ν ∈ S(0, C)}dν + log f Bi 0 (B.2.5) as T → ∞. Step 2 Numerator: Now, we apply the similar procedure to the numerator in (B.2.3). First, by substituting Bi into ν = √ T(Bi − Bi 1 ), we can have ΠP S(Bi 1 , C/√ T) | Y i∗ T = ˆ I{ν ∈ S(0, C)}LT1(ν)f Bi 1 + ν/ √ T dν, (B.2.6) where the log of second term in the right hand side is equivalent to log (LT1(ν)) = − 1 2 Bi 1 − Bi 0 ′ ψT Bi 1 − Bi 0 + 1 √ T X T t=1 ϵT tz ′ t Bi − Bi 1 + 1 √ T X T t=1 ϵT tz ′ t Bi − Bi 1 +o (1) uniformly over ν ∈ S(0, C). From Lemma 1 above, it follows that 1 √ T X T t=1 ϵT tz ′ tL d −→ N 0, B ′ψ0B , where B = (Bi 1 − Bi 0 ). Analogous to the denominator, we also consider the convergence of log of (B.2.6). Then, the log of (B.2.6) converges in distribution − 1 2 Bi 1 − Bi 0 ′ ψT Bi 1 − Bi 0 + Bi 1 − Bi 0 ′ N + log(f(Bi 1 )) + log ˆ I{ν ∈ S(0, C)}dν, ( where √ 1 T PT t=1 ϵT tz ′ t (Bi − Bi 1 ) p −→ 0 uniformly over ν ∈ S(0, C). Step 3 Take a difference: Now, we take the difference between (B.2.7) and (B.2.5), (B.2.7) − (B.2.5) = log(f(Bi 1 )) − 1 2 Bi 1 − Bi 0 ′ ψT Bi 1 − Bi 0 + Bi 1 − Bi 0 ′ N − log f Bi 0 = log f(Bi 1 ) f Bi 0 ! − 1 2 Bi 1 − Bi 0 ′ ψT Bi 1 − Bi 0 + Bi 1 − Bi 0 ′ N, where N ∼ N(0, ψT ). Therefore, ΠP(S(Bi 1 , C/√ T) | Y i∗ T ) ΠP(S(Bi 0 , C/√ T) | Y i∗ T ) d −→ log f(Bi 1 ) f Bi 0 ! − 1 2 Bi 1 − Bi 0 ′ ψT Bi 1 − Bi 0 + Bi 1 − Bi 0 ′ N as desired. ■ B.2.3 Proof of Theorem 3 Recall the concentrated out model Y i∗ T = ZBi 0 + Ei T , where Bi 0 = √ TBi∗ . Then, a rescaled spike and slab estimator (equivalently, ridge estimator) with a penalty λT = T, is Bbi∗ T (γ0) = (Z ′Z + TΓ −1 0 ) −1Z ′Y i∗ T = (Z ′Z + TΓ −1 0 ) −1Z ′ (ZBi 0 + E i T ) = (Z ′Z + TΓ −1 0 ) −1Z ′ZBi 0 + (Z ′Z + TΓ −1 0 ) −1Z ′E i T = (ψ0 + Γ −1 0 ) −1ψ0Bi 0 + (ψ0 + Γ −1 0 ) −1 √ TZ ′E i T + o(1), where o(1) is coming from ψT −→ ψ0 in Assumption 2. Now, from Lemma 1, (ψ0 + Γ −1 0 ) −1 √ TZ ′E i T d −→ (ψ0 + Γ −1 0 ) −1N, where N ∼ N(0, ψ0). Then, we can see that Bbi∗ T (γ0) d −→ P(s | γ0) for s B.2.4 Derivation of Equation (2.3.5) We can obtain the fractional prior (2.3.5) by multiplying a fraction δ = T0/T of the likelihood (B.3.1) to the uninformative prior (2.3.4). To see this, π F (Λ, B, Σ) ∝ g δ (X | Λ, B, Σ) · π D (Λ, B, Σ) ∝ |Σ| − T0 2 exp − T0 2T tr h Σ−1Θ( e Λb, Bb, Ee) i · |Σ| − aD+N+1 2 ∝ |Σ| − M+Nk 2 |Σ| − aD−M−Nk+T0+N+1 2 exp − T0 2T tr h Σ−1Θ( e Λb, Bb, Ee) i , where Θe Λb, Bb, Ee = Λe − Λb B − Bb ′ Fe′Fe Fe′Z Z′F Z e ′Z Λe − Λb B − Bb + Ee′Ee, which is the kernel of the Matrix Normal-Inverse Wishart distribution WNIW(Ψ, Ξ, ζ, R). B.2.5 Derivation of Equation (2.3.6) The fractional marginal likelihood (2.3.6) can be obtained up to a multiplicative factor (2π) (T −T0)N 2 . To see this, take the ratio between the prior and posterior normalizing constants and multiply the factor: q F (T0/T, X) = 1 (2π) (T −T0)N 2 C (Ξ, R, ζ) C Ξ, R, ζ = 1 (2π) (T −T0)N 2 C Ξ,(T0/T)Ee′Ee, aD − (M + N k) + T0 C (T0/T)Ξ, Ee′Ee, aD − (M + N k) + T = 1 (2π) (T −T0)N 2 2π N(M+Nk) 2 2 (aD−(M+Nk)+T )N 2 |(T /T0)Ξ| N 2 (T0/T)Ee′Ee aD−(M+Nk)+T0 2 2π N(M+Nk) 2 2 (aD−(M+Nk)+T0)N 2 |Ξ| N 2 Ee′Ee aD−(M+KN)+T 2 · ΓN ((aD − (M + N k) + T)/2) ΓN ((aD − (M + N k) + T0)/2) = π − (T −T0)N 2 T0 T N(aD+T0) 2 Ee′Ee − (T −T0) 2 ΓN ((aD − (M + N k) + T)/2) ΓN ((aD − (M + N k) + T0)/2). 16 B.2.6 Derivation of Equation (2.3.7) As described in Section 3.2, let xt,J be the response variables at time t corresponding to the set of J ∈ {C, S}, where C and S are the set of cliques and separators of the undirected graph Gu . Then, when Σ is Markov with respect to Gu , g (x1, . . . , xT | Λ, B, Σ, Gu ) = Y T t=1 g (xt | zt ,λ, B, Σ, Gu ) = Y T t=1 ΠC∈Cg(xt,C | λC, BC, ΣCC) ΠS∈Sg(xt,S | λS, BS, ΣSS) = Q C∈C QT t=1 g(xt,C | λC, BC, ΣCC) Q S∈S QT t=1 g(xt,S | λS, BS, ΣSS) = Q C∈C QT t=1 2π −|C|/2 |ΣCC| −1/2 Q S∈S QT t=1 2π−|S|/2 |ΣSS| −1/2 · exp − 1 2 (xt,C − λ ′ C ft − B′ C zt) ′Σ −1 CC(xt,C − λ ′ C ft − B′ C zt) exp − 1 2 (xt,S − λ ′ S ft − B′ S zt) ′Σ −1 SS(xt,S − λ ′ S ft − B′ S zt) = Q C∈C 2π −|C|T /2 |ΣCC| −T /2 Q S∈S 2π−|S|T /2 |ΣSS| −T /2 · exp − 1 2 tr Σ −1 CC(XC − F ΛC − ZBC) ′ (XC − F ΛC − ZBC) exp − 1 2 tr Σ −1 SS(XS − F ΛS − ZBS) ′(XS − F ΛS − ZBS) = Q c∈C g (Xc | Λc, Bc, Σcc) Q s∈S g (Xs | Λs, Bs, Σss) , where BJ is the N k × |J| matrices whose column contains the coefficients associated tot the lagged variables for the selected responses XJ , while ΣJJ is the submatrix of Σ corresponding to variables in J ⊂ {1, . . . , n} for J ∈ {C, S}. The second inequality follows that Σ is decomposable Markov to Gu , and the rest of equalities are derived using the conditional likelihood of multivariate normal distribution in matrix form. For complete proofs of the graph decomposability into cliques and separators, see Dawid and Lauritzen (1993). 162 B.3 Pervasiveness in the Cross-sectional dependence Given estimated factors, Fe and factor loadings, Λe, consider unobservable factors version of conditional likelihood. In matrix form, g(X | Λ, B, Σ) = (2π) − T N 2 |Σ| − T 2 exp − 1 2 tr h Σ−1Θ( e Λb, Bb, Eb) i , (B.3.1) Θ( e Λb, Bb, Eb) = (Λe − Λb) ′Fe′Fe(Λ − Λb) + (B − Bb) ′Z ′Z(B − Bb) + (Λe − Λb) ′Fe′Z(B − Bb) + (B − Bb) ′Z ′Fe(Λe − Λb) + Eb′Eb, where Eb = X − FeΛe − ZBb, and Bb is an updated transition matrix of B. Under the model and its identification, we need to clarify the cross-sectional dependence as defined by Bailey et al. (2019a) and Bailey et al. (2021). Since one of our crucial steps is to estimate the covariance matrix, it is necessary to consider how it captures the cross-sectional dependence. Suppose we have a multi-factor process as described in (2.2.6). Then, E ϵtϵ ′ t = E h Λ ′ft + ut Λ ′ft + ut ′ i = E Λ ′ftf ′ tΛ + E utu ′ t = Λ ′FΛ + U, where F = E (ftf ′ t ) and U = E (utu ′ t ). Note that F is not necessarily equal to IM under our identification condition. To be specific, E(ϵtϵ ′ t ) = F ∗ F ∗Λ′ M+1:N ΛM+1:N F ∗ ΛM+1:N Λ′ M+1:N + U, where F ∗ = E [ftf ′ t ] = E Φ1ft−1f ′ t−1Φ′ 0,1 + E h ϵ f t ϵ f′ t i = Φ1F ∗ −1Φ′ 1 + U f if p = 1. Therefore, E ϵtϵ ′ t = Φ1Φ′ 1 + U f Φ1Φ′ 1Λ′ 0,M+1:N + U fΛ′ 0,M+1:N Λ0,M+1:N Φ1Φ′ 1 + Λ0,M+1:N U f Λ0,M+1:N Λ′ 0,M+1:N + U .Furthermore, since our factors have a dynamic structure, E ϵtϵ ′ t = Λ ′FΛ + U = Λ ′Φ1F ∗ −1Φ′ 1Λ + Λ ′U fΛ + U = Λ ′Φ1Φ′ 1Λ + Λ ′U fΛ + U, if F ∗ −1 = IM. Thus, the analogous conditions to identify the pervasiveness are λmax(U) = O(1), λmax(Λ ′U fΛ) = O(1), where λmax(·) indicates the maximum eigenvalues and for αℓ , an exponent of factor loadings, Φ′ 1Λ ′ Φ′ 1Λ = ⊖ (N αℓ ), which leads to ∥Φ′ 1Λ∥∞ < K, Λ′U fΛ 1 < K and ∥U∥1 < K to have Var(ϵit) is bounded. Note that ⊖ implies fn = ⊖(gn) if there exists n0 > 1 and positive constant C1 and C2 such that infn≥n0 (fn/gn) ≥ C1 and supn≥n0 (fn/gn) ≤ C2 for positive real sequences {fn}∞ n=1 and {gn}∞ n=1. This can imply that the degree of the pervasiveness we capture is coming from the contemporaneous and lagged factors. We recognize that the model’s consistency and asymptotic properties are derived from a sampling approach, and their relevance to the Bayesian perspective remains unclear. Specifically, proving the consistency of an extremum estimator necessitates uniform convergence and additional assumptions that are not requisite from a Bayesian standpoint. This discrepancy warrants further discussion. Despite our reliance on the Bayesian approach for our estimation strategy, we aim for our results to exhibit a comparable degree of pervasiveness. To this end, we attempt to bridge our findings with the exponent of factors in our simulation exercise as a method to explore this connection. B.4 Degree distributions of the U.S. Housing Prices 1 Table B.1: Degree distributions of the U.S. Housing Prices Number of degree Metropolitan Statistical Areas 4 Buffalo-Cheektowaga-Niagara Falls, NY; Lubbock, TX 3 Albany, OR; Asheville, NC; Baton Rouge, LA; Battle Creek, MI; Bay City, MI; Columbus, GA-AL; Lincoln, NE; Tucson, AZ 2 Abilene, TX; Albuquerque, NM; Altoona, PA; Athens-Clarke County, GA; Atlantic City-Hammonton, NJ; Bangor, ME; Beckley, WV; Billings, MT; Boise City, ID; Boulder, CO; Brunswick, GA; Cape Coral-Fort Myers, FL; Cedar Rapids, IA; Charleston-North Charleston, SC; Cleveland-Elyria, OH; Columbia, SC; Danville, IL; Daphne- Fairhope-Foley, AL; Dayton, OH; East Stroudsburg, PA; El Paso, TX; Eugene, OR; Fayetteville, NC; Florence- Muscle Shoals, AL; Fond du Lac, WI; Grand Island, NE; Grand Rapids-Wyoming, MI; Green Bay, WI; Hammond, LA; Harrisburg-Carlisle, PA; Hattiesburg, MS; Hilton Head Island-Bluffton-Beaufort, SC; Jackson, MI; Jackson, TN; Johnson City, TN; Jonesboro, AR; Joplin, MO; Kennewick-Richland, WA; Lakeland-Winter Haven, FL; Lawton, OK; Lynchburg, VA; Mansfield, OH; Memphis, TN-MS-AR; Merced, CA; Muskegon, MI; Napa, CA; Ocala, FL; Palm Bay-Melbourne-Titusville, FL; Panama City, FL; Phoenix-Mesa-Scottsdale, AZ; San Jose- Sunny vale-Santa Clara, CA; San Luis Obispo-Paso Robles-Arroyo Grande, CA; Sebastian-Vero Beach, FL; Sierra Vista-Douglas, AZ; 1 Akron, OH; Albany, GA; Albany-Schenectady-Troy, NY; Alexandria, LA; Allentown-Bethlehem-Easton, PA-NJ; Ames, IA; Anchorage, AK; Anniston-Oxford-Jacksonville; Appleton, WI; Atlanta-Sandy Springs-Roswell, GA; Bakersfield, CA; Barnstable Town, MA; Bellingham, WA; Bend-Redmond, OR; Birmingham-Hoover, AL; Bismarck, ND; Bloomington, IL; Bowling Green, KY; California-Lexington Park, MD; Carbondale-Marion, IL; Carson City, NV; Casper, WY; Champaign-Urbana, IL; Charleston, WV; Clarksville, TN-KY; Colorado Springs, CO; Crestview-Fort Walton Beach-Destin, FL; Cumberland, MD-WV; Davenport-Moline-Rock Island, IA-IL; Deltona-Daytona Beach-Ormond Beach, FL; Des Moines-West Des Moines, IA; Dover, DE; Duluth, MN-WI; Eau Claire, WI; Elizabethtown-Fort Knox, KY; Elkhart-Goshen, IN; Enid, OK; Erie, PA; Evansville, IN-KY; Fargo, ND- MN; Fayetteville-Springdale-Rogers, AR-MO; Flagstaff, AZ; Fort Collins, CO; Fort Smith, AR-OK; Fresno, CA; Gainesville, FL; Gettysburg, PA; Glens Falls, NY; Goldsboro, NC; Grand Junction, CO; Grants Pass, OR; Greensboro-High Point, NC; Gulfport-Biloxi-Pascagoula, MS; Harrisonburg, VA; Hartford-West Hartford-East Hartford, CT; Hinesville, GA; Houma-Thibodaux, LA; Idaho Falls, ID; Indianapolis-Carmel-Anderson, IN Jackson, MS; Jacksonville, FL; Jacksonville, NC; Jefferson City, MO; Johnstown, PA; Kahului-Wailuku-Lahaina, HI Kalamazoo-Portage, MI; Kansas City, MO-KS; Knoxville, TN; Kokomo, IN; Lafayette, LA; Lake Havasu City- Kingman, AZ; Lancaster, PA; Las Cruces, NM; Las Vegas-Henderson-Paradise, NV; Lawrence, KS; Lewiston- Auburn, ME; Logan, UT-ID; Longview, WA; Madison, WI; Mankato-North Mankato, MN; Miami-Fort Lauderdale-West Palm Beach, FL; Midland, TX; Missoula, MT; Modesto, CA; Naples-Immokalee-Marco Island, FL Nashville-Davidson-Murfreesboro-Franklin, TN; New Haven-Milford, CT; New York-Newark-Jersey City, NY-NJ-PA; North Port-Sarasota-Bradenton, FL; Oklahoma City, OK; Olympia-Tumwater, WA; Pensacola-Ferry Pass-Brent, FL; Pittsfield, MA; Portland-Vancouver-Hillsboro, OR-WA; Providence-Warwick, RI-MA; Rapid City, SD; Rochester, MN; Salem, OR; Santa Maria-Santa Barbara, CA; Sherman-Denison, TX; Springfield, MA; Toledo, OH; Topeka, KS, Trenton, NJ, Virginia Beach-Norfolk-Newport News, VA-NC; Visalia-Porterville, CA; Weirton- Steubenville, WV-OH 165 Appendix C Appendix to Chapter 3 C.1 Appendix C.1.1 Algorithm 2: Sampling the transformed standardized residuals Algorithm 2 Sample Z ∼ N (0, S −1 ) 1: for i=1:n do 2: for r=1:T do 3: if r = 1 then 4: zi,t0 = −∞ 5: else 6: zi,tr−1 = ui,tr−1 7: end if 8: if r = T then 9: zi,tT +1 = ∞ 10: else 11: zi,tr+1 = ui,tr+1 12: end if 13: Compute µtr,i = −ψ −1 i,i Ψn\i,iun\i,tr 14: Compute σ 2 i = ψ −1 i,i 15: Sample zi,tr ∼ TN(µtr,i, σ2 i ; zi,tr−1 < zi,tr < zi,tr+1 ) 16: end for 17: end for 166 C.1.2 Algorithm 3: Sampling the sparse unconditional precision matrix Expanding upon the variation in the horseshoe prior specification, as articulated in Model III by Neville et al. (2014), and Mulgrave and Ghosal (2022, 2023) set forth a series of prior distributions for β in Equation (3.2.16). For j = 1, 2, . . . , N and i = j + 1, j + 2, . . . , N, the priors are formulated as follows: Zj |Zi>j , βi>jσ 2 j ∼ N Zi>jβi>j , σ2 j I , βi>j |λ 2 j , bi>j , σ2 j ∼ N 0, σ 2 jbi>jc 2λ 2 j N2i ! , (C.1.1) λ 2 j |aj ∼ IG 1 2 , 1 aj , aj ∼ IG 1 2 , 1 , bi>j |hi>j ∼ IG 1 2 , 1 hi>j , hi>j ∼ IG 1 2 , 1 , σ 2 j ∼ IG (0.01, 0.01), where IG stands for the inverse gamma distribution, and σ 2 j is chosen to be a diffuse prior. Given these priors, the posterior distribution for βi>j can be derived. The probability density in Equation (3.2.16) is: L Zj |Zi>jβi>j , σ2 j ∝ exp ( − 1 2σ 2 j (Zj − Zi>jβi>j ) ′ (Zj − Zi>jβi>j ) ) . (C.1.2) Coupling this with the prior distribution for βi>j in Equation (C.1.1), defined as: p βi>j |λ 2 j , bi>j , σ2 j ∝ exp ( − N2 i 2σ 2 jbi>jc 2λ 2 j β ′ i>jβi>j) . (C.1.3) Then, from Equations (C.1.2) and (C.1.3), the resulting posterior distribution is: p βi>j |Zj , Zi>j , λ2 j , bi>j , σ2 j ∝ L Zj |Zi>jβi>j , σ2 j · p βi>j |λ 2 j , bi>j , σ2 j ∝ exp ( − 1 2σ 2 j (Zj − Zi>jβi>j ) ′ (Zj − Zi>jβi>j ) − N2 i 2σ 2 jbi>jc 2λ 2 j β ′ i>jβi>j) ∝ exp ( − 1 2σ 2 j β ′ i>j Z ′ i>jZi>j + diag ( N2 i 2σ 2 jbi>jc 2λ 2 j )! βi>j + 1 σ 2 j β ′ i>jZ ′ i>jZj ) . The distribution exhibits Gaussian properties with mean mean A−1Z ′ i>jZj and covariance σ 2 j A−1 , where A = Z ′ i>jZi>j + diag N2 i 2σ 2 j bi>j c 2λ 2 j and we can write it as βi>j |λ 2 j , bi>j , σ2 j ∼ N A−1Z ′ i>jZj , σ2 j A−1 . (C.1.4) To ameliorate computational burden, particularly for large N, we also adopts an exact sampling algorithm tailored for Gaussian priors, incorporating data augmentation as outlined in Bhattacharya et al. (2016). Algorithm 3 elaborates this methodology. Algorithm 3 Sample sparse unconditional precision matrix Ω 1: Given initial hyperparameters for λ˜2 j , bi>j , σ 2 j , and c, 2: for j = 1 : N − 1 do 3: Partition Zi>j and Zj . 4: Compute D = diag nλ 2 jbi>j c 2 N2i 1 σ 2 j o for i > j, Φ = q σ 2 jZi>j , and a = q σ 2 jZj . 5: Sample ϕ ∼ N(0, D), v = Φϕ + N(0,I) and solve for w in (ΦDΦ + I)w = (a − v). 6: Given ϕ, D, Φ, and w, sample βi>j = u + DΦ′w. 7: Sample λ 2 j ∼ IG |i>j| 2 + 1 2 , K1 , where K1 = 1 2 β ′ i>jdiag n N2 i σ 2 j bi>j c 2 o βi>j + 1 aj for i > j. 8: Sample aj ∼ IG 1, 1 + λ˜−2 j . 9: Sample bi>j ∼ IG (1, K2), where K2 = N2 i 2σ 2 j λ˜2 j c 2βi>j + 1 hi>j for i > j. 10: Sample hi>j ∼ IG 1, 1 + 1 bi>j . 11: Sample σ 2 j ∼ IG T +|i>j| 2 + 0.01, K3 , where K3 = 1 2 ∥Zj − Zi>jβi>j∥ 2 + 1 2 β ′ i>jdiag N2 i λ˜2 jbi>j c 2 βi>j + 0.01 for i > j. 12: Update Ljj = q σ 2 j and Lij = −βi>j/Ljj for i > j. 13: end for 14: Sample σ 2 N ∼ IG N 2 + 0.01, K4 , where K4 = 0.01 + 1 2 ∥ZN ∥ 2 . 15: Update LNN = p σ 2 N . 16: Compute Ω = LL′ . 16 C.2 Additional Empirical Results C.2.1 Foreign Stock Price Indexes The figure presented below delineates the volatility characteristics of the foreign stock indices examined in the empirical application, as articulated in Section 5.1. 170 Figure C.1: Dynamic Conditional (Partial) Correlations: Foreign Stock Indexes 1 Notes: The dataset has been sourced from Google Finance, spanning the period from January 4, 1991, to August 31, 2023. The measure for volatility is calculated using the formula: Volatility = 100 log(pt/pt−1), where pt represents the closing price of the respective stock index. The abbreviations for the stock indexes are as follows: DJI denotes the Dow Jones Industrial Average, NASDAQ signifies the National Association of Securities Dealers Automated Quotations, DAX stands for Deutscher Aktienindex, CAC40 is an acronym for Cotation Assist´ee en Continu 40, and NIKKEI represents the Nikkei 225 Stock Average. 171 Figure C.2: Dynamic Conditional (Partial) Correlations: Foreign Stock Indexes 2 Notes: The dataset has been sourced from Google Finance, spanning the period from January 4, 1991, to August 31, 2023. The measure for volatility is calculated using the formula: Volatility = 100 log(pt/pt−1), where pt represents the closing price of the respective stock index. The abbreviations for the stock indexes are as follows: DJI denotes the Dow Jones Industrial Average, NASDAQ signifies the National Association of Securities Dealers Automated Quotations, DAX stands for Deutscher Aktienindex, CAC40 is an acronym for Cotation Assist´ee en Continu 40, and NIKKEI represents the Nikkei 225 Stock Average. 172 C.2.2 Daily Returns on Securities Selected from S&P 500 The following figures delineate the conditional and partial correlation coefficients associated with the selected firms within each respective GICS sector. 173 Figure C.3: Dynamic Conditional (Partial) Correlations: Blue Chips Stocks from S&P 500 A Notes: The figure describes the dynamic conditional correlations and dynamic conditional partial correlations among stock price volatilities for sectors characterized by a high degree of unconditional dependence. Stock ticker symbols corresponding to each sector are delineated as follows: Materials (FCX), Health Care (ISRG), Industrials (MMM), Information Technology (MSFT), Utilities (NEE), Communication Services (NFLX), Consumer Discretionary (NKE), Financials (SPG), Consumer Staples (WMT), and Energy (XOM). A comprehensive list of associated stock tickers can be found in Appendix D. To enhance visual representation, the dynamic conditional partial correlations have been scaled by a factor of 10. 174 Figure C.4: Dynamic Conditional (Partial) Correlations: Blue Chips Stocks from S&P 500 B Notes: The figure describes the dynamic conditional correlations and dynamic conditional partial correlations among stock price volatilities for sectors characterized by a high degree of unconditional dependence. Stock ticker symbols corresponding to each sector are delineated as follows: Materials (FCX), Health Care (ISRG), Industrials (MMM), Information Technology (MSFT), Utilities (NEE), Communication Services (NFLX), Consumer Discretionary (NKE), Financials (SPG), Consumer Staples (WMT), and Energy (XOM). A comprehensive list of associated stock tickers can be found in Appendix D To enhance visual representation, the dynamic conditional partial correlations have been scaled by a factor of 10. 175 Figure C.5: Dynamic Conditional (Partial) Correlations: Blue Chips Stocks from S&P 500 C Notes: The figure describes the dynamic conditional correlations and dynamic conditional partial correlations among stock price volatilities for sectors characterized by a high degree of unconditional dependence. Stock ticker symbols corresponding to each sector are delineated as follows: Materials (FCX), Health Care (ISRG), Industrials (MMM), Information Technology (MSFT), Utilities (NEE), Communication Services (NFLX), Consumer Discretionary (NKE), Financials (SPG), Consumer Staples (WMT), and Energy (XOM). A comprehensive list of associated stock tickers can be found in Appendix D. To enhance visual representation, the dynamic conditional partial correlations have been scaled by a factor of 10. 176 Figure C.6: Dynamic Conditional (Partial) Correlations: Blue Chips Stocks from S&P 500 D Notes: The figure describes the dynamic conditional correlations and dynamic conditional partial correlations among stock price volatilities for sectors characterized by a high degree of unconditional dependence. Stock ticker symbols corresponding to each sector are delineated as follows: Materials (FCX), Health Care (ISRG), Industrials (MMM), Information Technology (MSFT), Utilities (NEE), Communication Services (NFLX), Consumer Discretionary (NKE), Financials (SPG), Consumer Staples (WMT), and Energy (XOM). A comprehensive list of associated stock tickers can be found in Appendix D. To enhance visual representation, the dynamic conditional partial correlations have been scaled by a factor of 10. 177 Figure C.7: Dynamic Conditional (Partial) Correlations: Blue Chips Stocks from S&P 500 E Notes: The figure describes the dynamic conditional correlations and dynamic conditional partial correlations among stock price volatilities for sectors characterized by a high degree of unconditional dependence. Stock ticker symbols corresponding to each sector are delineated as follows: Materials (FCX), Health Care (ISRG), Industrials (MMM), Information Technology (MSFT), Utilities (NEE), Communication Services (NFLX), Consumer Discretionary (NKE), Financials (SPG), Consumer Staples (WMT), and Energy (XOM). A comprehensive list of associated stock tickers can be found in Appendix D. To enhance visual representation, the dynamic conditional partial correlations have been scaled by a factor of 10. 178 C.3 Extra Simulations We consider the following data generating process (DGP), one of the simulation designs employed in Pesaran and Yamagata (2023), rit = αi + X 3 ℓ=1 βℓifℓt + κuit, for i = 1, 2, . . . , N; t = 1, 2, . . . , T, where fℓt for ℓ = 1, 2, 3 are the observed factors, and uit = γitvt + ηit in which vt is the missing factor, and ηit is the idiosyncratic component of the return process. 1 The factors under observation are adjusted to align closely with the factors identified by Fama and French in their 1993 study (namely, the market factor, HML, and SMB). These factors are constructed as fℓt = ρℓfℓ,t−1 + eℓt, where the autoregressive coefficients (ρf1, ρf2, ρf3) are set to (−0.1, 0.2, −0.2). The error term eℓt is modeled as eℓt = √ hℓtξℓt, with ξℓt being independently and identically distributed following a normal distribution with mean 0 and variance 1, ξℓt ∼ N(0, 1), and hℓt is defined according to a GARCH(1,1) model: hℓt = ωℓ(1 − ϱℓ − φℓ) + ϱℓhℓ,t−1 + φℓe 2 ℓ,t−1 , where the parameters (ω1, ω2, ω3), (ϱ1, ϱ2, ϱ3), and (φ1, φ2, φ3) are specified as (20.25, 6.33, 5.98), (0.61, 0.70, −0.31), and (0.31, 0.21, 0.10), respectively. 1The scalar κ is employed to adjust the overall model fit to equal the mean fit of individual return regressions, represented as R2 NT = 1 N PN i=1 R2 iT , with R2 iT denoting the R-squared value from the regression of rit for a specific dataset. Pesaran and Yamagata (2023) set κ at 6.5 for a model with N = 500 and T = 120 to achieve an R2 NT of 0.30, in a scenario excluding omitted common factors and spatial discrepancies. Refer to Pesaran and Yamagata (2023) for more comprehensive information. 179 The observed factor loadings are generated by Market Factor:β1i i.i.d. ∼ U(0.3, 1.8), HML:β2i i.i.d. ∼ U(−1.0, 1.0), SMB:β3i i.i.d. ∼ U(−0.6, 0.9). The latent factor are generated as vt i.i.d. ∼ N(0, 1) and its loadings γi follows γi i.i.d. ∼ U (0.7, 0.9), for i = 1, 2, . . . , j N δγ k γi = 0, for j N δγ k + 1, j N δγ k + 2, . . . , N, and to avoid systematic errors we then randomly reshuffle γi over i before assigning them to the individual returns, rit. We consider the values of δγ = {0, 1/4, 1/2}. With latent factors, we also allow the spatial type cross sectional error dependence following ηit = ψ X N j=1 wijηjt + σηiεη,it, for i = 1, 2, . . . , N, which can be solved for ηt = (η1t , η2t , . . . , ηN t) ′ as ηt = (IN − ψW) −1 Dηεη,t, where εη,t = (εη,1t , εη,2t , . . . , εη,N t) ′ , ψ = 1/4, Dη = diag{ση1, ση2, . . . , σηN }, and W = (wij ) = 0.5 if i = 1, 2, . . . , N − 2 and j = i + 1, or j = 3, 4, . . . .N and i = j − 1 1 if (i, j) = (1, 2) or (N − 1, N) 0 if i = j or otherwise and standardized such that PN j=1 wij = 1 for all i. We allow for error cross-sectional heteroskedasticity by generating σ 2 ηi i.i.d. ∼ (1 + χ 2 2,i)/3 and εη,it = √ ωitζit, ζit i.i.d. ∼ N(0, 1), where ωit = σ 2 ηi(1 − ϱ − 180 φ) + ϱωi,t−1 + φε2 η,it−1 . We set ϱ = 0.2 and φ = 0.6. First 50 time-series observations of εη,it are discarded. To investigate power, we consider alternatives αi = ϖi i.i.d. ∼ N (0, 1) for i = 1, 2, . . . , Nα with Nα = Nδα ; αi = 0 for i = Nα + 1, Nα + 2, . . . , N and δα = 0.7. In addition to employing the Standardized Wald (SW) test statistics derived from our suggested precision matrix (Pˆ t), our analysis extends to include a comparison of both the size and power of the test across various established statistical methods. These methods comprise the J-test as proposed by Pesaran and Yamagata (2023), the Gagliardini et al. (GOS, 2016) test, the Principal Orthogonal complEment Thresholding (POET) approach by Fan et al. (2013), the methodology developed by Bai and Saranadasa (1996), referred to as BS, and the SD test formulated by Srivastava and Du (2008). See Table C.1 and C.2. All combinations of T = 60, 120, 240 and N = 50, 100, 200 are considered. All tests are conducted at a 5% significance level and all experiments are based on R = 100 replications. 181 Table C.1: Size of the SW(P) and other tests, GARCH(1,1) errors Size: αi = 0 for all i δγ = 0 δγ = 1/4 δγ = 1/2 (T, N) 50 100 200 50 100 200 50 100 200 SW(P) 60 0.13 0.11 0.12 0.10 0.11 0.13 0.15 0.13 0.14 120 0.08 0.12 0.07 0.07 0.12 0.11 0.04 0.17 0.12 240 0.04 0.06 0.07 0.02 0.07 0.08 0.05 0.07 0.07 Jˆ α 60 0.04 0.02 0.08 0.04 0.02 0.08 0.09 0.04 0.11 120 0.06 0.08 0.07 0.06 0.09 0.08 0.04 0.10 0.08 240 0.05 0.08 0.05 0.04 0.09 0.06 0.04 0.07 0.06 GOS 60 0.20 0.29 0.56 0.19 0.27 0.57 0.19 0.26 0.50 120 0.12 0.12 0.20 0.12 0.15 0.18 0.10 0.19 0.19 240 0.11 0.16 0.11 0.12 0.14 0.09 0.13 0.12 0.09 POET 60 0.13 0.20 0.32 0.15 0.19 0.34 0.19 0.20 0.35 120 0.12 0.13 0.16 0.12 0.15 0.15 0.18 0.22 0.14 240 0.08 0.15 0.08 0.09 0.13 0.08 0.07 0.19 0.12 BS 60 0.01 0.05 0.06 0.01 0.04 0.06 0.00 0.04 0.08 120 0.01 0.05 0.03 0.01 0.06 0.03 0.00 0.05 0.02 240 0.02 0.02 0.03 0.04 0.02 0.02 0.03 0.00 0.04 SD 60 0.07 0.07 0.16 0.08 0.05 0.14 0.10 0.07 0.15 120 0.06 0.09 0.07 0.06 0.11 0.08 0.04 0.12 0.09 240 0.05 0.07 0.06 0.03 0.09 0.06 0.04 0.07 0.06 Note: For additional information on the simulation designs, please refer to Appendix C.3. 182 Table C.2: Power of the SW(P) and other tests, GARCH(1,1) errors Size: αi = 0 for all i δγ = 0 δγ = 1/4 δγ = 1/2 (T, N) 50 100 200 50 100 200 50 100 200 SW(P) 60 0.79 0.98 1.00 0.76 0.97 1.00 0.73 0.97 1.00 120 0.95 0.93 1.00 0.90 0.93 0.99 0.85 0.90 1.00 240 0.99 0.99 1.00 0.99 0.98 1.00 0.98 0.99 1.00 Jˆ α 60 0.71 0.88 0.93 0.70 0.86 0.92 0.57 0.78 0.86 120 0.96 0.95 1.00 0.94 0.92 1.00 0.83 0.91 1.00 240 1.00 1.00 1.00 1.00 1.00 1.00 0.99 1.00 1.00 GOS 60 0.86 0.93 1.00 0.84 0.93 0.99 0.74 0.90 0.97 120 0.97 0.99 1.00 0.95 0.98 1.00 0.92 0.94 1.00 240 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 POET 60 0.84 0.94 0.99 0.86 0.94 0.98 0.80 0.92 0.97 120 0.97 0.99 1.00 0.97 0.99 1.00 0.96 0.98 1.00 240 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 BS 60 0.28 0.24 0.28 0.28 0.27 0.28 0.24 0.22 0.19 120 0.53 0.70 0.71 0.46 0.71 0.72 0.40 0.66 0.74 240 0.83 0.97 0.96 0.85 0.97 0.95 0.82 0.94 0.95 SD 60 0.75 0.91 0.95 0.71 0.89 0.95 0.62 0.80 0.89 120 0.96 0.97 1.00 0.94 0.95 1.00 0.82 0.91 1.00 240 1.00 1.00 1.00 1.00 1.00 1.00 0.99 1.00 1.00 Note: For additional information on the simulation designs, please refer to Appendix C.3. 183 C.4 List of Companies Table C.3 enumerates the corporations incorporated into the empirical analysis pertaining to S&P 500 companies. Table C.3: List of Companies used in Section 3.5.2 Ticker Company GICS (Sector) Ticker Company GICS (Sector) AAPL Apple Inc. Information Technology IBM International Bus. Machines Information Technology ABBV AbbVie Health Care INTC Intel Corp. Information Technology ABT Abbott Laboratories Health Care INTU Intuit Inc. Information Technology ACN Accenture plc Information Technology ISRG Intuitive Surgical, Inc. Health Care ADBE Adobe Inc. Information Technology JNJ Johnson & Johnson Health Care AEP American Electric Power Utilities JPM JPMorgan Chase & Co. Financials AIG American International Group Financials KO The Coca Cola Company Consumer Staples ALL Allstate Corp Financials LLY Lilly (Eli) & Co. Health Care AMGN Amgen Inc. Health Care LMT Lockheed Martin Corp. Industrials AMZN Amazon.com Inc. Consumer Discretionary LOW Lowes Cos. Consumer Discretionary APA Apache Corporation Energy MCD McDonalds Corp. Consumer Discretionary AXP American Express Co. Financials MDLZ Mondelez International Consumer Staples BA Boeing Company Industrials MDT Medtronic Inc. Health Care BAC Bank of America Corp Financials MET MetLife Inc. Financials BAX Baxter International Inc. Health Care META Meta Platforms Inc. Information Technology BK The Bank of New York Mellon Corp. Financials MMM 3M Company Industrials BMY Bristol-Myers Squibb Health Care MO Altria Group Inc. Materials C Citigroup Inc. Financials MRK Merck & Co. Health Care CAT Caterpillar Inc. Industrials MS Morgan Stanley Financials CL Colgate-Palmolive Consumer Staples MSFT Microsoft Corp. Information Technology CMCSA Comcast Corp. Consumer Discretionary NEE NextEra Energy Inc. Utilities CME CME Group Inc. Financials NFLX Netflix Inc. Communication Services COF Capital One Financial Financials NKE NIKE Inc. Consumer Discretionary COP ConocoPhillips Energy NOV National Oilwell Varco Inc. Energy COST Costco Co. Consumer Staples NSC Norfolk Southern Corp. Industrials CRM Salesforce Inc. Information Technology ORCL Oracle Corp. Information Technology CSCO Cisco Systems Information Technology OXY Occidental Petroleum Energy CVS CVS Caremark Corp. Consumer Staples PEP PepsiCo Inc. Consumer Staples CVX Chevron Corp. Energy PFE Pfizer Inc. Health Care DD Du Pont (E.I.) Materials PG Procter & Gamble Consumer Staples DELL Dell Technologies Inc. Information Technology PM Philip Morris International Consumer Staples DIS The Walt Disney Company Consumer Discretionary QCOM QUALCOMM Inc. Information Technology DVN Devon Energy Corp. Energy RTN Raytheon Co. Industrials EBAY eBay Inc. Information Technology SBUX Starbucks Corp. Consumer Discretionary EMR Emerson Electric Industrials SLB Schlumberger Ltd. Energy EXC Exelon Corp. Utilities SO Southern Co. Utilities F Ford Motor Consumer Discretionary SPG Simon Property Group Inc Financials FCX Freeport-McMoran Cp & Gld Materials T AT&T Inc. Communication Services FDX FedEx Corporation Industrials TGT Target Corp. Consumer Discretionary GD General Dynamics Industrials TXN Texas Instruments Information Technology GE General Electric Industrials UNH United Health Group Inc. Health Care GILD Gilead Sciences Health Care UNP Union Pacific Industrials GM General Motors Consumer Discretionary UPS United Parcel Service Industrials GOOG Google Inc. Information Technology USB U.S. Bancorp Financials GS Goldman Sachs Group Financials V Visa Inc. Information Technology HAL Halliburton Co. Energy VZ Verizon Communications Information Technology HD Home Depot Consumer Discretionary WFC Wells Fargo Financials HON Honeywell Intl Inc. Industrials WMT Wal-Mart Stores Consumer Staples HPQ Hewlett-Packard Information Technology XOM Exxon Mobil Corp. Energy 184
Abstract (if available)
Abstract
This dissertation comprises three scholarly articles that explore econometric dependencies in the real world, both from theoretical and empirical standpoints. These papers advance and broaden current methodologies for examining both linear and non-linear relationships among variables, emphasizing the underlying structure connecting them.
This first paper aims to explore the impact of unobserved ability levels on individual's earnings relative to their years of schooling. This paper’s contribution to the literature is using instrumental quantile regression to analyze a twins' sample, distinguishing itself by accounting for both ability and measurement error biases. The findings reveal a variation in returns to education ranging from 9 to 15 percent despite challenges related to weak identification. The analysis confirms significant heterogeneity in individual earnings outcomes, employing a general Wald-type location shift test to demonstrate the complementary effect of education and schooling on earnings. Additionally, the paper examines the influence of positive ability bias and negative measurement error, assesses the linear relationship of education, and analyzes the heterogeneity in returns associated with other factors, including age, race, gender, union membership, and tenure.
In the second paper, we propose a Bayesian approach to estimate dynamic factor-augmented Vector Autoregressive (VAR) models, allowing for the depiction of contemporaneous connections as a graphical representation of cross-sectional dependencies. Our approach starts with the estimation of unobserved factors through principal component analysis based on a predetermined number of factors, followed by the extraction of these factors using the Gibbs sampling method, particularly via the forward-filtering backward-sampling algorithm. After estimating the factors, we apply Bayesian graphical model selection to the residuals, ensuring that the estimated factors are accounted for within the graphical VAR model context. This process is facilitated by the use of the fractional Bayes factor, emphasizing graphical VAR models. We validate the effectiveness of our methodology through Monte Carlo simulations and apply it empirically to analyze the cross-sectional dependencies in housing prices across 384 Metropolitan Statistical Areas in the U.S.
The third paper proposes a Bayesian approach for the estimation of large conditional precision matrices instead of inverting conditional covariance matrices estimated, using, for example, the dynamic conditional correlations (DCC) approach. By adopting a Wishart distribution and horseshoe priors within a DCC–GARCH(1,1) model, our method imposes sparsity and circumvents the inversion of conditional covariance matrices. We also employ a nonparanormal method with rank transformation to allow for conditional dependence without estimating transformation functions to achieve Gaussianity. Monte Carlo simulations show that our approach is effective at estimating the conditional precision matrix, particularly when the number of variables N exceeds the number of observations T. We investigate the utility of our proposed approach with two real-world applications. First, to study conditional partial correlations among international stock price indices. Second, to test for alpha in the context of CAPM and Fama-French 5 factor models with a conditional precision matrix-based Wald-type test. The results indicate stable conditional partial correlations through market disruptions. When there are market disruptions, blue chip stocks chosen from S&P 500 daily returns provide statistically significant evidence against the CAPM and Fama-French five models.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Three essays on econometrics
PDF
Essays on econometrics analysis of panel data models
PDF
Essays on the econometric analysis of cross-sectional dependence
PDF
Two essays on financial econometrics
PDF
Essays on nonparametric and finite-sample econometrics
PDF
Large-scale inference in multiple Gaussian graphical models
Asset Metadata
Creator
Song, Hayun
(author)
Core Title
Three essays on linear and non-linear econometric dependencies
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Economics
Degree Conferral Date
2024-05
Publication Date
04/01/2024
Defense Date
03/25/2024
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Bayesian estimation,Bayesian model selection,conditional partial correlations,factor augmented VAR,fixed effects,high-dimensional multivariate volatility,instrumental variable quantile regression,nonparanormal model,OAI-PMH Harvest,precision matrix,quantile regression,returns to schooling,semiparametric copula model
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Pesaran, Hashem (
committee chair
), Armstrong, Timothy (
committee member
), Hsiao, Cheng (
committee member
), Mukherjee, Gourab (
committee member
)
Creator Email
hayunson@usc.edu,salahiddin1016@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113862190
Unique identifier
UC113862190
Identifier
etd-SongHayun-12740.pdf (filename)
Legacy Identifier
etd-SongHayun-12740
Document Type
Dissertation
Format
theses (aat)
Rights
Song, Hayun
Internet Media Type
application/pdf
Type
texts
Source
20240401-usctheses-batch-1133
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
Bayesian estimation
Bayesian model selection
conditional partial correlations
factor augmented VAR
fixed effects
high-dimensional multivariate volatility
instrumental variable quantile regression
nonparanormal model
precision matrix
quantile regression
returns to schooling
semiparametric copula model