Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Essays on the econometrics of program evaluation
(USC Thesis Other)
Essays on the econometrics of program evaluation
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ESSAYS ON THE ECONOMETRICS OF PROGRAM EVALUATION by Shui-Ki Wan A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Ful llment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ECONOMICS) August 2010 Copyright 2010 Shui-Ki Wan ii Acknowledgements I would like to express my sincere gratitude to my advisor Professor Cheng Hsiao for the continuous support of my Ph.D. study and research, for his patience, motiva- tion,enthusiasm,andimmenseknowledge. Hisguidancehashelpedmetremendously throughoutthe veyears. Withouthiskindestsupport,itwouldhavebeenimpossible for me to nish this dissertation. I also wish to express my warm and sincere thanks to the rest of my thesis com- mittee: Professor Robert Dekle and Professor Lan Luo for their insightful comments and hard questions. MysincerethanksalsogoestoProfessorHarrisonCheng, ProfessorJohnStrauss, and Professor Vivian Wu for o¤ering me constructive comments at the early stage of my research. Finally, I would like to express my sincerest gratitude to my mother for her un- derstanding and endless love through the duration of my studies. iii Table of Contents Acknowledgements ii List of Tables v List of Figures viii Abstract x Chapter 1: Review of Literature 1 1.1 Introduction 1 1.2 Counterfactual Framework 1 1.3 Average Treatment E¤ect 2 1.4 Randomized Experiments versus Observational Data 3 1.4.1 Estimation Methods under Unconfoundedness 4 1.4.2 Regression Methods 5 1.4.3 Methods based on Propensity Score 6 1.4.4 Di¤erence-in-Di¤erence Estimator 7 1.5 Selection on Unobservables 8 1.5.1 Instrumental Variable Method 8 1.6 Concluding Remarks 9 Chapter 2: Measuring Policy E¤ect using Panel Data 10 2.1 Introduction 10 2.2 The Basic Model 10 2.3 A Panel Approach to Construct Counterfactual 13 2.4 Inference of Treatment E¤ect 20 2.5 Choice of Cross-Sectional Units 22 2.5.1 Modeling Strategy 22 2.5.2 Monte Carlo Studies 24 2.6 Concluding Remark 35 Chapter 3: Empirical Applications to Policy Evaluation 36 3.1 Background 37 3.2 Data 38 3.3 Empirical Analysis 39 3.4 Concluding Remarks 52 iv Chapter 4: Forecast Combination 53 4.1 Introduction 53 4.2 Basic Framework 53 4.3 Literature Review 54 4.3.1 Bates and Granger 55 4.3.2 Granger and Ramanathan 55 4.3.3 Variance-Covariance Approach 56 4.4 A Model Selection Approach 57 4.5 Mean and Mean and Scale Corrected Simple Averaging 59 4.6 Monte Carlo Studies 60 4.7 Concluding Remark 67 Chapter 5: Emprical Applications to Forecast Combination 69 5.1 Predicting Real Output Growth 70 5.2 Predicting Excess Equity Premium 71 5.3 Concluding Remark 74 Chapter 6: Conclusion 77 References 78 Appendices: Appendix A 83 Appendix B 88 v List of Tables Table 1 Optimal Choice of m and the Average MSPE, 26 2 Stationary Factors Table 2 Frequency Distribution of Optimal Number of m, 27 2 Stationary Factors Table 3 Optimal Choice of m and the Average MSPE, 27 3 Stationary Factors Table 4 Frequency Distribution of Optimal Number of m, 28 3 Stationary Factors Table 5 Optimal Choice of m and the Average MSPE, i.i.d. Factor 28 Table 6 Frequency Distribution of Optimal Number of m, i.i.d. Factors 29 Table 7 Optimal Choice of m and the Average MSPE, 29 Nearly Non-Stationary Factor Table 8 Frequency Distribution of Optimal Number of m, 30 Nearly Non-Stationary Factor Table 9 Prediction comparison of model (1) 31 Table 10 Prediction comparison of model (2) 32 Table 11 Prediction comparison of model (3) 32 Table 12 Prediction comparison of model (4) 32 Table 13 Prediction comparison of model (5) 33 Table 14 Prediction comparison of model (6) 33 Table 15 Prediction comparison of model (7) 33 Table 16 Prediction comparison of model (8) 34 Table 17 Prediction comparison of model (9) 34 vi Table 18 Prediction comparison of model (10) 34 Table 19 Prediction comparison of model (11) 35 Table 20 AICC - Weights of Control Groups for the Period 41 1993Q1 - 1997Q2 Table 21 AICC - Treatment E¤ect of Political Integration 42 1997Q3 - 2003Q4 Table 22 AIC - Weights of Control Groups for the Period 45 1993Q1 - 1997Q2 Table 23 AIC - Treatment E¤ect of Political Integration 46 1997Q3 - 2003Q4 Table 24 AICC - Weights of Control Groups for the Period 47 1993Q1 - 2003Q4 Table 25 AICC - Treatment E¤ect of Political Integration 47 2004Q1 - 2008Q1 Table 26 AIC - Weights of Control Groups for the Period 48 1993Q1 - 2003Q4 Table 27 AIC - Treatment E¤ect of Political Integration 49 2004Q1 - 2008Q1 Table 28 Sample Variance Covariance Matrix of the Prediction Errors 59 of 15 Models Table 29 MSPE with Factor Loading(s) N(0,1) and Factor(s) 65 i.i.d. N(0,1) Table 30 MSPE with Factor Loading(s) U(-2,2) and Factor(s) 66 i.i.d. N(0,1) Table 31 MSPE with Factor Loading(s) N(1,1) and Factor(s) 66 i.i.d. N(0,1) vii Table 32 MSPE with Factor Loading(s) Normal (0,1) 67 Table 33 MSPE of US Real GDP Prediction 71 Table 34 MSPE of Predicting Excess Equity Premium on S&P500 76 viii List of Figures Figure 1 AICC - Actual and Predicted Real GDP from 1993Q1 to 40 1997Q2 Figure 2 AICC - Actual and Counterfactual Real GDP from 1997Q3 40 to 2003Q4 Figure 3 AICC - Autocorrelation of Treatment E¤ect from 1997Q3 to 41 2003Q4 Figure 4 AIC - Actual and Predicted Real GDP from 1993Q1 to 43 1997Q2 Figure 5 AIC - Actual and Counterfactual Real GDP from 1997Q3 to 43 2003Q4 Figure 6 AIC - Autocorrelation of Treatment E¤ect from 1997Q3 to 43 2003Q4 Figure 7 AICC - Actual and Predicted Real GDP from 1993Q1 to 45 2003Q4 Figure 8 AICC - Actual and Counterfactual Real GDP from 2004Q1 48 to 2008Q1 Figure 9 AICC - Autocorrelation of Treatment Ex oect from 2004Q1 to 48 2008Q1 Figure 10 AIC - Actual and Predicted Real GDP from 1993Q1 to 49 2003Q4 Figure 11 AIC - Actual and Counterfactual Real GDP from 2004Q1 to 51 2008Q1 Figure 12 AIC - Autocorrelation of Treatment Ex oect from 2004Q1 to 51 2008Q1 Figure 13 Actual and Predicted US Real GDP under xed 72 forecasting framework ix Figure 14 Actual and Predicted US Real GDP under continuously 72 updating forecasting framework Figure 15 Actual and Predicted US Real GDP under rolling 73 forecasting framework Figure 16 Actual and Predicted Excess Equity Premium on S&P500 74 under xed forecasting framework Figure 17 Actual and Predicted Excess Equity Premium on S&P500 75 under recursive framework Figure 18 Actual and Predicted Excess Equity Premium on S&P500 75 under rolling framework Figure 19 Predicted Counterfactual for Political Integration from1993Q1 89 to 1997Q2 Figure 20 Predicted Counterfactual for Political Integration from1997Q3 89 to 2003Q4 Figure 21 Counterfactual for Economic Integration using Approximate 90 Factor Model Figure 22 Counterfactual for Economic Integration using Approximate 90 Factor Model x Abstract Many empirical literature in economics, social sciences, and medical treatment studies the causal e¤ects of programs, polices or drug e¤ects. In the economic con- text, the major focus on program evaluation literature is to measure the impact of a particulartreatmentonasetofindividuals,regions,orcountriesthatexposedtosuch a treatment. It is of particular importance to policy makers, medical practitioners and others. In order to evaluate the e¤ect of the treatment, Rubin (1974) proposed the interpretation of causal statements as comparisons of the so-called potential out- comes, which is de ned as a pair of outcomes associated to a particular individual given di¤erent levels of exposure to the treatment with only one of the outcomes observed by researchers. Models are developed for this pair of potential outcomes one for the treated state, another for the control state. In this thesis, a panel data methodology is proposed under the potential outcomes framework to measure the average treatment e¤ect. This methodology is applied to measure the impact of two majorHongKongpolicesontheHongKongseconomy. Duetothemodeluncertainty involved in forming the counterfactual, forecast combination methods are proposed as a way to reduce this uncertainty. Various existing methods together with some new proposed methods are evaluated in a small scale Monte Carlo studies and two applications. 1 Chapter 1: Review of Literature 1.1 Introduction Many empirical literature in economics, social sciences, and medical treatment studies the causal e¤ects of programs, polices or drug e¤ects. In the economic con- text, the major focus on program evaluation literature is to measure the impact of a particulartreatmentonasetofindividuals,regions,orcountriesthatexposedtosuch a treatment. It is of particular importance to policy makers, medical practitioners and others. For instance, LaLonde (1986) used experimental data to evaluate the e¤ectiveness of the job training program on a labor. Our interest is to compare the outcomesoftheunitsinthetreatedandnon-treatedstates. However,theoutcomesof anindividualcannotbesimultaneouslyobservedunderbothstates,asituationwhich isreferredbyHolland(1986)asthefundamentalproblemofcausalinference. Inorder to evaluate the e¤ect of the treatment, Rubin (1974) proposed the interpretation of causal statements as comparisons of the so-called potential outcomes, which is de- ned as a pair of outcomes associated to a particular individual given di¤erent levels of exposure to the treatment with only one of the outcomes observed by researchers. Models are developed for this pair of potential outcomes one for the treated state, another for the control state. 1.2 Counterfactual Framework SincetheseminalproposalofpotentialoutcomesapproachbyRubin(1974),which is also called counterfactual framework, much research has been carried out under thisframeworkinboththestatisticsandeconometricsliterature. Thisreviewsection draws heavily on Wooldridge (2002) and Imbens and Wooldridge (2009). The major elements of the Rubins counterfactual framework are the potential outcomes. Let d i be the treatment dummy for each individual i. If individual i 2 receives treatment, then d i = 1. Realized outcome y i1 is observed and y i0 becomes the potential outcome. When d i = 0, y i0 is realized and y i1 is the potential outcome, where i = 1;:::;N. Therefore, the observed outcome can be written as y i = (1d i )y i0 +d i y i1 : (1) One of the merits of counterfactual framework is that it allows us to de ne the treatment e¤ect at the unit level as either the ratio of the outcomes y i1 =y i0 or the di¤erence between the outcomes with and without treatment y i1 y i0 . The latter is more common in the literature and will be our focus. Since y i1 y i0 is a random variable, we have to specify what properties of its distribution we are interested in. Under the potential outcome framework, we can de ne our own treatment e¤ect beforespecifyingthemechanismoftreatmentassignmentandmakingfunctionalform or distributional assumptions. In contrast, it is more di¢ cult to de ne the treatment e¤ect in terms of the dummy regression approach which is in the form of y i = + d i +" i . It is unclear whether the treatment e¤ect in the latter approach is constant or not, and what the properties of the unobserved component are. However, under thecounterfactual framework, modelingof thepotential outcomesandthetreatment assignment mechanism are separated. It also allows the researcher to rst de ne the treatment e¤ect of interest without considering probabilistic properties of the outcomes or assignment. 1.3 Average Treatment E¤ect Traditional dummy regression approach implicitly assumes that the treatment e¤ect is constant for all individuals. However, the potential outcome framework allows general heterogeneity in treatment e¤ects. Several estimands for treatment e¤ect have been proposed. How to choose among these measures depends on the 3 question we want to answer. The most common ones in the literature are average treatment e¤ect and average treatment e¤ect on the treated. Di¤erence of which lies in the de nition of the underlying population. Oneofthecommonestimandsistheaveragetreatmente¤ect(ATE). Itmeasures the expected treatment e¤ect on a randomly drawn person from the population ATEE(y i1 y i0 ): (2) Whenweareinterestedinthee¤ectonthosetreatedgroup,forexample,wewant tostudythe drug e¤ect onthe patients, we canmeasure the average treatment e¤ect on the treated (ATET) which is de ned as ATET E(y i1 y i0 jd i = 1): (3) 1.4 Randomized Experiments versus Observational Data Whether ATE or ATET can be identi ed or what estimators are consistent and asymptoticallynormaldependsonthetypeofdataandtheassumptionsinthecontext we want to study. In the analysis of randomized experiments, simple averages [ ATE = 1 N N P i=1 (y i1 y i0 ); \ ATET = N P i=1 d i 1 N P i=1 d i (y i1 y i0 ) yields consistent estimation of ATE and ATET. However, only one of the outcomes isobservable. Therefore,estimationcallsforsomeassumptions. Forinstance,iftreat- mentd i israndomlyassignedtotheindividuali,thatis,d i isstatisticallyindependent 4 of the potential outcomes y 0i and y 1i , which can be expressed mathematically as dq(y 0 ;y 1 ); (4) then ATE and ATET can be identi edandare equal to E(yjd = 1)E(yjd = 0) = E(y 1 )E(y 0 ). Thisresultalsoholdstrueunderaweakersetofassumptionsofmean independence E(y 1 jd) = E(y 1 ) (5) E(y 0 jd) = E(y 0 ) 1.4.1 Estimation Methods under Unconfoundedness Randomizedexperiments are seldomavailableineconomics. Whenthetreatment assignment mechanism is not truly randomized and is related to some observed co- variates x i , then the simple averaging estimator is inconsistent since it is subject to sample selection bias on the observables. To identify ATE, Rosenbaum and Rubin (1983) assume ignorability in conditional mean independence E(y 0 jx;d) = E(y 0 jx); (6) E(y 1 jx;d) = E(y 1 jx): (7) ForATET,only(6)isneededforidenti cation. Itmeansthateventhoughtreatment d is not statistically mean independent of (y 0 ;y 1 ), they are uncorrelated after par- tiallingoutx. Ifxiscontrolledfor,thenthereisnoomittedvariablebiasandthusno confounding. Therefore, itisalsocalledunconfoundednessassumption. Basedonthe observationaldatafy i ;x i ;d i ;i = 1;:::;Ng,E(yjx i ;d i = 1)andE(yjx i ;d i = 0)canbe 5 estimated nonparametrically under (6) and (7) by [ ATE = 1 N N P i=1 [b g 1 (x i )b g 0 (x i )]; (8) \ ATET = N P i=1 d i 1 N P i=1 d i [b g 1 (x i )b g 0 (x i )] ; (9) whereb g 0 (x i ) andb g 1 (x i ) are consistent estimators for E(y 0 jx i ) and E(y 1 jx i ) respec- tively. 1.4.2 Regression Methods WecanalsorelatetheestimatorforATE (8)underpotentialoutcomesframework to the standard dummy regression methods. The potential outcomes (y 0 ;y 1 ) are modelled as y 0 = 0 +v 0 ; (10) y 1 = 1 +v 1 ; (11) where j = E(y j ), j = 0;1. Further if the conditional mean independence assump- tions (6) and (7), and in addition that E(v 1 jx) =E(v 0 jx) (12) hold, then both ATE and ATET can be identi ed and are equal to each other. Moreover, if the conditional mean of the error v 0 is assumed to be some function of x, then the coe¢ cient of the treatment dummy obtained by simple OLS regression will be equal to ATE. This can be seen by writing the observed outcomes as y = 0 +( 1 0 )d+v 0 +d(v 1 v 0 ): (13) 6 Under conditional mean independence assumption, E(yjd;x) =E(yjx) = 0 +ATEd+E(v 0 jx)+[E(v 1 jx)E(v 0 jx)]d; (14) the last interaction term disappears under assumption (12). If E(v 0 jx) = 0 + h 0 (x) 0 for some vector control function ofh 0 (x), then E(yjd;x) = 0 +d+h 0 (x) 0 ; (15) where 0 = 0 + 0 . It suggests an OLS regression of the observed outcomes y on a constant, the treatment dummy and a control function of x can yield a consistent estimator of ATE which is . It also equals to ATET when assumption (12) holds. When(12)doesnothold,i.e. whenthereissomeindividualspeci cgainfromthe treatment, ATE can still be obtained under standard dummy regression approach if E(v 1 jx) = 1 +h 1 (x) 1 is assumed whereh 1 (x) is another vector control function. Recall that (14) is written as E(yjd;x) =E(yjx) = 0 +d+E(v 0 jx)+[E(v 1 jx)E(v 0 jx)]d: It suggests that ATE, the coe¢ cient of d can be consistently estimated by OLS regression of y on af1;d i ;x i ;d i (x i x)g ifh j is linear inx, j = 0;1. 1.4.3 Methods based on Propensity Score Rosenbaum and Rubin (1983) show that independence assumption of potential outcomesandtreatmentdummystillholdsafterconditioningsolelyonthepropensity score, p(x), instead of a whole vector ofx, where p(x i ) =P (d i = 1jx i ). It measures theprobabilitythatindividualireceivestreatmentgivenhisownobservedcovariates. 7 If the balancing condition or overlapping condition holds 0<p(x)< 1, for allx; then ATE and ATET can be estimated non-parametrically [ ATE = 1 N N P i=1 [d i b p(x i )]y i b p(x i )[1b p(x i )] ; (16) \ ATET = 1 N N P i=1 d i 1 1 N N P i=1 [d i b p(x i )]y i [1b p(x i )] : (17) Other estimators utilizing p(x) in alternative ways have been proposed. For ex- ample, amatchingestimatorproposedbyRosenbaumandRubin(1983)suggestthat the average treatment e¤ect can be derived by comparing treated group and control group with similar propensity scores. 1.4.4 Di¤erence-in-Di¤erence Estimator SincetheseminalworkbyAshenfelter(1978),themethodofDi¤erence-In-Di¤er- ences(DID)isfrequentlyusedincomparativestudies. Inthissetup,outcomesintwo time periods for the treated and control groups are observed. The before and after treatment-outcome are denoted as y ib and y ia respectively. In the rst period, both groupsdonotreceivetreatment,i.e. d ib = 0foralli. Inthesecondperiod,treatment isassignedsothatd ia = 1forthetreatedgroupandd ia = 0forthecontrols. Then,the averagechangeoftheoutcomesofthecontrolgroupE(y ia y ib jd ia = 0)issubtracted from the average change in the outcomes of the treated group E(y ia y ib jd ia = 1) DID =E(y ia y ib jd ia = 1)E(y ia y ib jd ia = 0): (18) 8 However, there is an underlying assumption that the time trend for both groups in the absence of intervention are the same. 1.5 Selection on Unobservables The above estimation methods are based on the assumption that there is no un- observables a¤ecting both the treatment dummy and potential outcomes after con- trolling forx i . However, if this assumption is not true, the estimation does not yield consistent estimator for the average treatment e¤ect. 1.5.1 Instrumental Variable Method Imbens and Angrist (1994) proposed another estimand called local average treat- ment e¤ect (LATE). Let z i be a scaler binary instrument that determines d i . For example, z i is the eligibility to join a program. If z i = 1, then individual i is eligible to join, z i = 0 otherwise. Even when an individual is eligible to join, he may not choose to join the program eventually, then potential treatment d i1 will be equal to 0. If he is eligible and joins the program, then d i1 = 1. Potential treatment dummy d i0 is similarly de ned. Thus, the observed treatment dummy is d i = (1z i )d i0 +z i d i1 : For identi cation, monotonicity assumption has to hold d i1 >d i0 for all i: This estimand is de ned as LATE =E(y i1 y i0 jd i1 d i0 = 1):It can be interpreted as the average treatment e¤ect for those who would be induced to join by changing z from 0 to 1. There are two major drawbacks associated with this estimator. 9 First, the subpopulation is unidenti ed since we do not know which individual i satis es the condition d i1 d i0 = 1. Besides, its interpretation depends on the instrument. 1.6 Concluding Remarks Theaforementionedtheoretical andempirical studiesarebasedoncross-sectional data, except DID which considers two periods. Although the counterfactual frame- work allows for heterogeneity across individuals, its average treatment e¤ect is as- sumed to be constant over time. Moreover, consistency of the estimators hinges on the conditional mean independence assumption which is not testable. If there is se- lection bias on unobservables, the only solution is to nd some instruments for the endogenous treatment dummy. But then, it is not clear what subpopulation associ- ated with the ATE we can identify. Besides, the de nition of the treatment e¤ect depends on the particular instrumental variable we use. In Chapter 2, we propose a panel data methodology to construct counterfactual by utilizing the information of the correlation structure hidden in the panel data for identifying the average treat- ment e¤ect. We also allow the treatment e¤ect to vary over time under a weaker assumptionthantheconditionalmeanassumption. Chapter3usesthismethodology to measure the impact of the political and economic integration between Hong Kong and China on the economy of Hong Kong. A combination of forecasts aiming at reducing mean squared prediction errors for the counterfactual is proposed in Chap- ter 4. Prediction for US real GDP and equity premium on S&P 500 using various combination methods are studied in Chapter 5. Chapter 6 concludes. 10 Chapter 2: Measuring Policy E¤ect using Panel Data 2.1 Introduction This chapter attempts to propose a panel data methodology under the counter- factual frameworktomeasuretheimpact of thepolitical andeconomicintegrationof HongKongwithChinaonHongKongseconomy. AsmentionedinChapter1,thedif- culty of measuring the economic impact of a policy intervention using observational data is not being able to simultaneously observe the outcomes of each individual un- der the intervention and not under the intervention (e.g. Heckman and Hotz (1989), Rosenbaum and Rubin (1983)). Panel data with observations for a number of indi- viduals over time often contain information on some individuals that are subject to policy intervention and some that are not. If the reactions of individuals towards policychangesaresimilar(e.g. Hsiao(2003),HsiaoandTahmiscioglu(1997))oreven if their responses are di¤erent, as long as they are driven by some common factors (e.g. Gregory and Head (1999), Sargent and Sims (1977)), information on other in- dividuals not subject to policy intervention can help to construct the counterfactuals of those who are subject to policy changes. 2.2 The Basic Model In the panel data setup, denote y 1 it and y 0 it as the pair of potential outcomes of the ith unit at time t with and without treatment respectively. However, we do not simultaneously observe y 0 it and y 1 it but the observed data y it =d it y 1 it +(1d it )y 0 it ; (19) 11 where the time-varying treatment dummy is denoted as d it = 8 > < > : 1 if the ith unit is under treatment at time t 0 otherwise : (20) Therefore, we need to model the counterfactual in order to estimate the treatment e¤ect for the ith unit at time t it =y 1 it y 0 it : (21) Suppose that we have a sample offy it ; i = 1;:::;N;t = 1;:::;Tg. Without loss of generality, let the rst unit be the only individual who receives treatment at time T 1 +1 and onwards, y 1t = 8 > < > : y 0 1t for t = 1;:::;T 1 y 1 1t for t =T 1 +1;:::;T . (22) Theobservedoutcomesofotherunitsi = 2;:::;N whoarenotsubjecttothetreatment are y it =y 0 it for i = 2;:::;N; for t = 1;:::;T. (23) We assume the correlations among cross-sectional units are due to some common factors that drive all cross-sectional units, although their impacts on each cross- sectionalunitmaybedi¤erent. Thepotentialoutcomey 0 it intheabsenceoftreatment for all units are assumed to be generated by a factor model of the form as in Forni and Reichlin (1998), Gregory and Head (1999), etc. y 0 it = i +b 0 i f t +" it , i = 1;:::;N;t = 1;:::;T; (24) 12 where i denotes the xed individual-speci c e¤ects, f t denotes a K1 unobserved commonfactorsthatvaryovertime,b i denotesaK1vectorofconstantsthatvaries across i, " it denotes the ith unit random idiosyncratic component with E(" it ) = 0. Stacking y 0 it for all units at time t into an N1 vector yields y 0 t = +Bf t +" t ; (25) where y 0 t = (y 0 1t ;:::;y 0 Nt ) 0 , = ( 1 ;:::; N ) 0 , " t = (" 1t ;:::;" Nt ), andB =(b 1 ;:::;b N ) 0 is an NK factor loading matrix. We make the following assumptions: (a) kb i k =c i <1 for all i. (b) " t isI(0) withE(" t ) = 0 andE(" t " 0 t ) =V isadiagonalconstantmatrix. (c) E(" t f 0 t ) =0. (d) rank(B) =K. (e) E(" js jd it ) = 0, for j6=i. Remark 2.1: The stacked model (25) assumes that the individual outcome is the sum of two components a component of a function of some common time varying factorsf t that drive all cross-sectional units, and an idiosyncratic component consisting of individual speci c e¤ects i and a random component " it . We assume the idiosyncratic components are uncorrelated across individuals. The correlation across individuals are captured by the common factors f t . However, the impact of common factorsf t on individuals can be heterogeneous byallowingb i 6=b j for i6=j. Remark 2.2: We made no assumption on the time series properties of f t . It can be non-stationary or it can be stationary with lim 1 T T P t=1 kf t k 2 = constant. Remark2.3: Assumption(d)impliesthatthenumberofobservablecross-sectional units, N, is greater than the number of common time-varying factors K. This as- 13 sumption is not restrictive in application because the number of common factors driving many macro economic time series is usually not that large as demonstrated by Sargent and Sims (1977), Stock (2002, 2005) and Stock and Watson (1989). Remark 2.4: Assumption (e) makes no claim about the relationship between d it and " it . If they are correlated, then the observed data are subject to selection on unobservables (e.g. Heckman and Vytlacil (2001)). If they are mean independent, then the observed data satisfy the conditional mean independence assumption of Rosenbaum and Rubin (1983). All we need for our approach is that the jth units idiosyncratic components are mean independent of d it for j6=i. 2.3 A Panel Approach to Construct Counterfactual To estimate the time-varying treatment e¤ect, all we need is to form the counter- factual by predicting y 0 1t . If we can identify 1 , b 0 1 and f t under assumptions (a) to (e), thenb y 0 1t = 1 +b 0 1 f t for t =T 1 +1;:::;T. If both N and T are large, we may use theprocedureofBaiandNg(2002)toidentifythenumberofcommonfactorsK and estimate f t by the maximum likelihood procedure. Often, neither N nor T is large. In this situation, we suggest using other control units x t = (y 2t ;:::;y Nt ) 0 in lieu of f t to predict y 0 1t . Let be an N1 vector lying in the null space of B, N (B). We normalize the rst element of tobe 1 anddenote 0 = (1; 0 ). If2N (B), then 0 B =0 0 , and (25) becomes y 0 1t =+ 0 x t +" 1t 0 " 1t ; (26) where = 0 and " 1t = (" 2t ;:::;" Nt ) 0 . (26) suggests that we can use x t instead of f t to predict y 0 1t . 14 Then for any 2N (B 0 ), y 0 1t =E y 0 1t jx t +u 1t : (27) Taking expectation of (26) conditional onx t yields E y 0 1t jx t = + 0 x t +E(" 1t jx t ) 0 E(" 1t jx t ) (28) = + 0 x t ; where 0 = 0 I N1 Cov(" 1t ;x t )Var(x t ) 1 ; (29) u 1t = 0 " t + 0 Cov(" 1t ;x t )Var(x t ) 1 x t : (30) TheVarandCovfunctionsdenotethelong-runvarianceofx t andcovariancebetween the potential outcomex t and errors " 1t for those control units. The variance of y 0 1t givenx t for 2N (B 0 ) is equal to Var y 0 1t jx t (31) = Var(" 1t )+ 0 Var(" 1t )Cov(" 1t ;x t )Var(x t ) 1 Cov(x t ;" 1t ) We note that (; 0 ) depends on for any 2 N (B 0 ). Since the minimum variance predictor depends on the choice of and the covariance structure of " 1t , we propose to choose (; 0 ) to minimize 1 T 1 T 1 P s=1 T 1 P t=1 a st y 0 1s x 0 s y 0 1t x 0 t ; (32) 15 wherefa st g T 1 s;t=1 arethematrixelementsofaT 1 T 1 positivede niteweightingmatrix A. For identi cation, we further assume: (f) For xed K and N, there exists an 2N (B 0 ) such that in the neighborhood of (; 0 ), E 1 T 1 T 1 P s=1 T 1 P t=1 a st y 0 1s x 0 s y 0 1t x 0 t (33) has a unique minimum. Lemma1: Underassumptions(a)to(f), thesolutionof (32) b ; b 0 convergesto a (; 0 ) that corresponds to an 2N (B 0 ) as T 1 !1. Proof: From(27)and(28),wehavey 0 1t =+ 0 x t +u 1t andE(u 1t jx t ) = 0. There- fore, the minimum distance regression of y 0 1t on a constant and x t yields consistent estimators for (; 0 ) when T 1 !1. (e.g. Amemiya (1985)). Remark 2.5: The null vectors in N(B 0 ) are not unique. However, for given A in the objective function (32), the solution is unique. When A =I, our objective is to obtain the minimum variance predictor of y 0 1t givenx t . Remark 2.6: Although = 0 depends on the choice of, it is just an unknown nite constant under assumption (a) in the regression model (32). Therefore it can be treated as an unknown in (32). Lemma 1 suggests that we can form the counterfactual y 0 1t by b y 0 1t =b + b 0 x t : (34) Therefore, we may predict 1t using b 1t =y 1t b y 0 1t for t =T 1 +1;:::;T: (35) 16 Lemma2: Underassumptions(a)to(f),thetime-varyingaveragetreatmente¤ect is E b 1t jx t = 1t , t =T 1 +1;:::;T; (36) and its variance is Var b 1t =Var(u 1t )+(1;x 0 t )Cov 0 B @ b b 1 C A 0 B @ 1 x t 1 C A : (37) Proof: Under assumptions (a) to (f), E 0 B @ 0 B @ b b 1 C A jx t 1 C A = 0 B @ 1 C A : (38) Lemma 2 follows from (28) and (31). Remark 2.7: Although the original setup for the counterfactual is y 0 1t = 1 + b 0 1 f t +" 1t , t =T 1 +1;:::;T, ourcounterfactualpredictor(34)doesnotdependsonthe individual speci c e¤ects 1 , the common factors f t , the individual speci c response to time-varying common factors b 0 1 , and the idiosyncratic component " 1t , nor the dimension of f t . The information provided by f t is embedded in x t . It follows that the predictor b 1t in (35) that uses x t in lieu of f t allows the evaluation of policy interventions without the need to identify f t or B, which may be di¢ cult in nite sample. 17 Remark2.8: AsstatedinRemark2.4,(e)doesnotmakeanyassumptionabout" is and d it . All we need is that the policy intervention on the ith unit has no bearing on " jt for j6=i. Hence, if the process (25) satis es assumptions (a) to (e), our proposed approachallowsustobypasstheselectionissuethathasbeenacentralconcerninthe programevaluationliterature(e.g. HeckmanandHotz(1989),HeckmanandVytlacil (2001)). Remark 2.9: When (TT 1 ) is large, given (e), one can reverse the procedure to predict y 1 1t by E(y 1 1t jx t ) for t = 1;:::;T 1 , where E(y 1 1t jx t ) may be approximated by b y 1 1t =b + b 0 x t , t = 1;:::;T 1 ; (39) andconstructthetreatmente¤ecthadthepolicyinterventionwasinplacebeforeT 1 , b 1t =b y 1 1t y 1t ,t = 1;:::;T 1 ,whereb and b areestimatedusingdatafromT 1 +1;:::;T. Remark 2.10: The synthetic control method for comparative case studies also use informationofotherindividualstoconstructthecounterfactualsoftreatedindividuals (e.g. AbadieandGardeazabal(2003), CardandKrueger(1994)). However, thefocus and the approach are di¤erent. The synthetic approach assumes that (e.g. Abadie, Diamond and Hainmueller (2007)) y 0 it = t +z 0 i t + 0 i t +" it , for i = 2;:::;N, t = 1;:::;T 1 ;T 1 +1;:::;T, (40) for the control units while for the rst treated unit, they assume y 1t follows (40) for t = 1;:::;T 1 and for t =T 1 +1;:::;T, y 1t equals y 1 1t = 1t + t +z 0 1 t + 0 1 t +" 1t , for t =T 1 +1;:::;T: (41) where t is an unknown common factor with constant factor loading across units, z i 18 is an r1 vector of observed covariates (not a¤ected by the intervention), t is an r 1 vector of unknown parameters, t is a K 1 vector of unobserved common factors, i isaK1vectorofunknownfactorloadingand 1t isthetreatmente¤ect for the rst unit. If we let i = 0, b 0 i = (1;z 0 i ; 0 i ) and f t = ( t ; 0 t ; 0 t ) in (24), model (40) can be putintheformof(24). However, for(26)tohold, weneed 0 B =0 thatimposesthe restriction N P i=1 i = 0, N P i=1 i z 0 i =0 0 and N P i=1 i 0 i =0 0 . Suppose that a sample of y it and an r 1 vector of z it for i = 1;:::;N and t = 1;:::;T isavailable. Thesyntheticcontrolmethodusesaweightedaverageofother control units as a predictor for y 0 1t , i.e. b y 0 1t = N P i=2 w i y it . Since there are K unobserved factors in their setup, their objective function involves augmenting the sample by constructing some M K variables. For instance, y m i = T 1 P s=0 k m s y is , m = 1;:::;M can be constructed. (If k m s = 1 T 1 for s = 1;:::;T 1 , then y m j is just a simple pre- intervention time average). In their example of evaluating the e¤ects of a large-scale tobaccocontrolprograminCaliforniathatwasimplementedin1988,theyaugmented the model with 3 years of lagged variables of y it , fy i;75 ;y i;80 ;y i;88 g, that is M = 3, by setting the weights k 1 75 = 1, k 2 80 = 1, k 3 88 = 1 with other k m s = 0. The weights 0 = (w 2 ;:::;w N ) can be obtained by minimizing (q 1 Q 1 ) 0 V(q 1 Q 1 ) (42) using the pre-intervention time series observations. The variable for the rst treated unitq 1 = (z 0 1 ;y 1;75 ;y 1;80 ;y 1;88 ) 0 isan(r+M)1vector,thesimilarlyde nedvariables for the N1 control units are grouped in an (r+M)(N1) matrixQ 1 andV is a positive de nite matrix. 19 However, their weights are subject to the constraints w i 0 for i = 2;:::;N and N P i=2 w i = 1. To ensure unbiasedness of their estimator, it requires prior knowledge about the dimension of unknown common factors K. Therefore, the cross-sectional units weight w i will be sensitive to the prior choice of z i , M, and k m s , hence the predictedb y 0 1t ,or b 1t . Noristheprobabilitydistributionofb y 0 1t ,or b 1t easilyderivable. On the other hand, we suggest using regression method to choose to mimic the behavior of treated individuals before the intervention as close as possible, say, by minimizing (32). As long as N is xed, our procedure yields a unique weight anduniqueb y 0 1t , henceunique b 1t withknownprobabilitydistribution. Neitherdowe need to impose the constraint w j 0, nor N P j=2 w j = 1. Our approach can also easily beadaptedtoaccommodatethecasewheresomeexogenousvariablesz i areavailable by conditioning (25) on z i . Moreover, instead of carrying their placebo test, we can put the inferential procedure for the time-varying treatment e¤ect in a statistical framework which will be provided in the next section. Remark 2.11: Iff t andb 0 1 are known, then Var(y 0 1t jf t ) is smaller than (31). If N and T are large, one can use Bai (2003), Bai and Ng (2002) procedure to identify the number of unknown factors K and estimate b 0 1 and f t . However, when T is small, then there could be sampling errors in identifying and estimating b 0 1 and f t . It may be better to usex t and in lieu ofb 0 1 andf t . 20 2.4 Inference of Treatment E¤ect Thepredictorforthee¤ectivenessofsocialpolicy(35)allowsthetreatmente¤ect tovaryovertime. Wecanmodeltheestimated b 1t usingsometimeseriestechniques to evaluate its properties. Assumption (h): f" it g is weakly dependent (mixing) for all i. Suppose the treatment e¤ects, 1t , follow an autoregressive - moving average model (ARMA) of the form, (L) 1t =c+(L) t (43) where L is the lag operator, t is an i.i.d. process with zero mean and constant varianceandtherootsof(L) = 0lieoutsidetheunitcircle. Iftherootsof(L) = 0 all lie outside the unit circle, the treatment e¤ect is stationary, and the long-term treatment e¤ect of 1t is (L) 1 c: (44) If one of the roots of (L) = 0 lies on the unit circle, the intervention e¤ects are integrated of order 1, I(1). Fromtheestimated b 1t ,wecanusetheBox-Jenkins(1970)proceduretoconstruct a time series model, b (L) b 1t =b c+ b (L)b t ; (45) whereb v t is i.i.d. with mean zero and varianceb 2 v . Lemma 3: Suppose the roots of (L) = 0 lie outside the unit circle, under as- sumptions (a) to (h), when both T 1 and (TT 1 ) go to in nity, plim b (L) 1 b cplimb = =(L) 1 c (46) 21 and p TT 1 (b )vN 0; 2 ; (47) where 2 = @ @ 0 Var p TT 1 b @ @ (48) and b = b c; b 1 ;:::; b p 0 , assuming b (L) is of p-th order. Proof: If y t is stationary, the estimators of b ; b 0 are p T 1 -consistent. If x t v I(1) and not cointegrated, the estimator of b remain p T 1 -consistent, but b 0 is a T 1 -consistent (e.g. Phillips and Durlauf (1986)). Either way, y 0 1t b y 0 1t =u 1t +O T 1 2 : (49) Adding and subtracting yields b 1t =y 1t b y 0 1t = 1t +u 1t +o(1): (50) Substituting (50) into (43) yields (L) b 1t =c+(L)v t +(L)u 1t +o(1): (51) Since u 1t is a mean zero I(0) process, we obtain (44) by approximating (L)v t + (L)u 1t by a q-th order moving average process, '(L) t . If the roots of '(L) all lie outside the unit circle, b 1t can also be approximated by an AR process, we can multiply both sides of (51) by '(L) 1 '(L) 1 (L) b 1t ='(L) 1 c+ t . (52) 22 Under fairly general conditions, the maximum likelihood estimator (MLE) of (L), '(L) and c are consistent and asymptotically normally distributed. The as- ymptotic variance, 2 , can then be derived by using delta method (e.g. Rao (1973, ch. 2)). If the treatment e¤ects is a stationary process, then the long-term impact of the intervention can also be estimated by taking the simple average of the treatment e¤ects. Lemma 4: Suppose all the roots of (L) = 0 lie outside the unit circle, under (a) to (h), when both T 1 and (TT 1 ) go to in nity, p lim (TT 1 )!1 1 (TT 1 ) T P t=T 1 +1 b 1t = (53) Thevarianceof(53)canbeapproximatedbytheheteroscedastic-autocorrelationcon- sistent (HAC) estimator of Newey and West (1987). Proof: Given (36) and (37), the law of large number holds. 2.5 Choice of Cross-Sectional Units 2.5.1 Modeling Strategy Often there are large number of cross-sectional units that can be used to predict y 0 1t (or that are generated according to (24) or (25)). Intuitively, it would appear to favorusingasmanyavailablecross-sectionalunitsaspossibleaslongasT >N. This will be the case when the number of common factors K is xed, T 1 goes to in nity, and N T 1 ! 0. However, if T 1 or N T 1 is nite, it may be advantageous to use only a subset of available cross-sectional units to predict the counterfactual, in particular, if thedatageneratingprocessesforcross-sectionalunitssatisfytheconditionofLemma 5 (ii) below. 23 Let there be m cross-sectional units that optimally predict y 0 1t and (Nm1) remaining cross-sectional units that could also be included to predict y 0 1t . Let X 1 and X 2 be the T 1 m and T 1 (Nm 1) time series observations for these m cross-sectional unit and (Nm1) cross-sectional units, respectively, then X 1 =FB 0 1 +U 1 ; (54) and X 2 =FB 0 2 +U 2 ; (55) where the t-th row of F takes the form (1;f 0 t ) and the i-th column of B 0 1 and B 0 2 takes the form ( i ;b 0 i ) 0 , and U 1 and U 2 denote the T 1 m and T 1 (Nm1) idiosyncratic components ofX 1 andX 2 , respectively. Lemma 5: Under assumptions (a) to (e), (i) The optimal number of cross-sectional units for constructing y 0 1t is K 6 m6 N1 (ii) If B 0 2 (B 1 1 B 0 1 +I K ) 1 0 B @ 1 b 0 1 1 C A = 0 then X 2 yields no predictive power for E(y 1 jX 1 ), where 1 =E(U 1 U 0 1 ). For proof, see Appendix A. When N is xed and T 1 ! 1, the least squares estimator b a 2 of the objective function 1 T 1 (y 1 X 1 a 1 X 2 a 2 ) 0 (y 1 X 1 a 1 X 2 a 2 ) willconvergeto0undertheconditionofLemma5(ii). Inotherwords,onecanuseall (N1) available cross-sectional units to predict y 0 1t . However, in many occasions, T 1 is nite. Asmorecross-sectionalunitsareused,thevarianceof b ; b 0 0 willincrease. 24 To balance the within-sample t with post-sample prediction error, we suggest the following model selection strategy: Step 1: Use the model selection criterion to select the best predictor for y 0 1t using j cross-sectional units out of (N 1) cross-sectional units, denoted by M(j) , for j = 1;:::;N1. Step 2: From M(1) , M(2) , . . ., M(N 1) , choose M(m) in terms of the some model selection criterion, for instance, AIC and AICC. 2.5.2 Monte Carlo Studies Under the assumption that y t is generated by a factor model of the form (25), in this sub-section we compare the predictive performance of our approach versus Bai and Ng (2002)s approach of rst determining the number of factors K and then estimatingf t , 1 andb 0 1 to generate the counterfactuals when N and T are small. First, we wish to see if there is a need to use all cross-sectional units using our approach. There are a number of model selection criteria one can use to select the best approximating model. The performance of Akaike information criterion (AIC) (Akaike (1973, 1974)), and AICC (Hurvich and Tsai (1989)) will be examined in a small scale Monte Carlo study by comparing the post-intervention mean square prediction error MSPE(p) = 1 TT 1 T P t=TT 1 y 0 1t b y 0 1t (p) 2 ; (56) where b y 0 1t (p) = b 0 p x t , t = T 1 +1;:::;T is generated by using p cross-sectional units data of x it , b p is an OLS estimate of p using datafy 1t ;x it , t = 1;:::;T 1 g. To see which model selection criterion works better, we generate model (24) with N = 21 countries (y 1t ;x 0 t ). We use T 1 = 25;40;and 60 observations, the number of pre-interventionperiodstoapproximatethepathofy 1 beforeintervention. TheOLS estimators are then used to predict y 0 1t for the post-intervention period which has 25 TT 1 = 10 periods. Four di¤erent factor structures are used. The rst one consists of two (K = 2) stationary factors: f 1;t = 0:3f 1;t1 +u 1;t ; (57) f 2;t = 0:6f 1;t1 +u 2;t ; wheretheinnovationforcommonfactorsu 1;t andu 2;t followN (0;1). Thesecondone is a set of three stationary factors: f 1;t = 0:8f 1;t1 +u 1;t ; (58) f 2;t = 0:6f 1;t1 +u 2;t +0:8u 2;t1 ; f 3;t = u 3;t +0:9u 3;t1 +0:4u 3;t2 ; The third one is simply an i.i.d. factor. Last one has a nearly non-stationary factor: f 1;t = 0:95f 1;t1 +u 1;t : (59) Two model selection criterion are compared: AIC(p) = T 1 ln e 0 0 e 0 T 1 +2(p+2) ; (60) AICC(p) = AIC(p)+ 2(p+2)(p+3) T 1 (p+1)2 ; (61) where p is the number of countries included and e 0 denotes the OLS residuals. We repeat the experiment for each of the four data generating process 500 times. Allfoursimulationresultsshowthatthepre-interventionMSEdecreasesinpwhereas post-interventionMSEdecreasesinitiallyandthenincreasesinp. Denotethenumber 26 Table 1: Optimal Choice of m and the Average MSPE, 2 Stationary Factors T 1 = 25;T = 35 2 = 1 2 = 0:5 2 = 0:1 AIC AICC 20 AIC AICC 20 AIC AICC 20 m 11.726 4.432 - 11.42 4.468 - 11.39 4.664 - R 2 0.929 0.841 0.95 0.944 0.881 0.96 0.986 0.97 0.99 MSPE 6.189 2.359 8 2.875 1.221 3.785 0.6227 0.243 0.888 T 1 = 40;T = 50 m 6.872 4.684 - 6.924 4.79 - 7.096 4.73 - R 2 0.818 0.794 0.853 0.88 0.864 0.904 0.962 0.956 0.969 MSPE 1.823 1.67 2.17 0.92 0.849 1.098 0.19 0.174 0.22 T 1 = 60;T = 70 m 6.236 5.098 - 6.266 5.056 - 6.206 5.108 - R 2 0.7711 0.761 0.801 0.855 0.848 0.874 0.953 0.952 0.959 MSPE 1.424 1.39 1.542 0.721 0.715 0.778 0.146 0.143 0.159 of countries corresponding to the minimum AIC, or AICC as m. The results are summarized in Tables 1-8. For all the experiments, the average number of countries m is between K and N 1. This is consistent with the notion of the bias and variance tradeo¤. The frequency distributions show that not once all 20 cross-sectional units are selected in terms of AICC and less than 1% chance the 20 units models are chosen in terms of AIC. On average, about 4 to 5 cross-sectional units are chosen in terms of AICC and 10 to 12 cross-sectional units are chosen in terms of AIC. When T 1 is small, say 25 or 40, the MSPE in terms of models chosen by AICC and AIC are signi cantly smaller than the models using all 20 cross-sectional units. When T 1 becomes large, say 60, models using all 20 cross-sectional units have MSPE converge towards the MSPE of the optimally chosen models in terms of AIC or AICC, but the MSPE based on 20 cross-sectional units are still larger than those based only on m cross- sectional units. These Monte Carlo studies appear to support the theoretical nding that optimal m is between K and (N1), and if N T 1 ! 0, then using all available 27 Table 2: Frequency Distribution of Optimal Number of m, 2 Stationary Factors T 1 =25,T=35 T 1 =40,T=50 T 1 =60,T=70 2 =1 2 =0.5 2 =0.1 2 =1 2 =0.5 2 =0.1 2 =1 2 =0.5 2 =0.1 A B A B A B A B A B A B A B A B A B 1 1 12 1 14 0 2 5 10 0 4 1 6 3 4 1 5 0 0 2 4 59 3 55 5 47 13 34 8 36 8 30 14 31 10 23 7 16 3 9 103 4 89 15 104 23 81 22 82 20 75 28 61 38 64 35 60 4 17 107 16 112 11 109 44 122 42 110 49 125 59 95 50 105 69 110 5 14 81 15 96 18 90 72 97 77 113 79 120 88 120 88 112 89 126 6 17 73 21 65 27 65 82 86 104 73 65 74 96 91 98 95 84 94 7 28 33 37 42 33 43 77 41 66 41 71 42 86 45 81 56 88 60 8 25 19 26 19 29 25 62 20 56 29 57 17 56 29 64 24 55 20 9 34 7 47 3 29 7 46 5 47 10 54 10 34 18 35 10 43 12 10 37 6 40 3 46 5 28 4 30 0 45 1 15 2 19 5 19 2 11 36 44 2 29 2 20 3 25 1 27 10 3 8 1 6 12 45 43 41 1 17 3 13 1 10 9 1 6 5 13 46 35 37 6 4 12 1 2 14 39 37 44 2 3 2 1 15 45 44 38 1 3 16 35 28 39 1 17 38 23 30 0 18 18 23 8 1 19 10 11 16 20 2 2 5 Table 3: Optimal Choice of m and the Average MSPE, 3 Stationary Factors T 1 = 25;T = 35 2 = 1 2 = 0:5 2 = 0:1 AIC AICC 20 AIC AICC 20 AIC AICC 20 m 11.19 4.438 - 11.696 4.58 - 11.704 4.738 - R 2 0.931 0.86 0.952 0.959 0.914 0.971 0.988 0.974 0.991 MSPE 6.083 2.348 8.302 3.053 1.243 4.139 0.646 0.247 0.812 T 1 = 40;T = 50 m 6.692 4.61 - 7.01 4.714 - 7.05 4.834 - R 2 0.856 0.837 0.884 0.917 0.905 0.933 0.975 0.972 0.98 MSPE 1.88 1.713 2.237 0.966 0.874 1.134 0.186 0.165 0.221 T 1 = 60;T = 70 m 6.114 4.922 - 6.172 5.004 - 6.262 5.104 - R 2 0.826 0.818 0.849 0.886 0.881 0.901 0.968 0.967 0.972 MSPE 1.389 1.357 1.538 0.712 0.697 0.784 0.14 0.139 0.155 28 Table 4: Frequency Distribution of Optimal Number of m, 3 Stationary Factors T 1 =25,T=35 T 1 =40,T=50 T 1 =60,T=70 2 =1 2 =0.5 2 =0.1 2 =1 2 =0.5 2 =0.1 2 =1 2 =0.5 2 =0.1 A B A B A B A B A B A B A B A B A B 1 2 17 1 10 0 3 4 13 2 4 0 1 1 1 0 3 0 0 2 5 57 2 49 1 45 9 44 12 31 12 33 15 28 12 23 16 24 3 5 100 7 104 9 94 29 77 23 88 19 71 40 68 30 69 32 61 4 15 104 10 97 17 113 68 127 44 118 55 120 67 116 67 107 64 105 5 28 85 16 101 12 94 70 91 69 111 72 114 93 115 93 118 79 110 6 24 65 22 65 18 61 75 72 75 72 68 79 73 94 94 88 83 103 7 24 36 28 34 28 42 71 42 73 48 68 59 88 43 43 54 88 58 8 31 26 34 20 31 32 58 23 56 21 68 11 48 23 239 22 60 28 9 43 6 30 13 32 8 38 9 62 6 47 8 38 9 54 14 41 8 10 43 1 35 6 38 5 31 1 38 1 22 4 22 2 38 2 23 2 11 40 3 46 0 39 2 26 1 23 16 10 1 19 10 1 12 43 45 1 42 1 12 14 5 3 10 3 13 30 48 47 6 7 1 4 1 14 43 36 56 1 1 1 15 35 42 38 2 0 16 32 35 27 1 17 31 33 37 18 17 16 19 19 7 11 6 20 2 3 3 Table 5: Optimal Choice of m and the Average MSPE, i.i.d. Factor T 1 = 25;T = 35 2 = 1 2 = 0:5 2 = 0:1 AIC AICC 20 AIC AICC 20 AIC AICC 20 m 11.614 4.92 - 11.738 5.03 - 11.904 4.972 - R 2 0.939 0.871 0.956 0.966 0.927 0.976 0.99 0.978 0.993 MSPE 5.93 2.601 8.087 3.375 1.303 4.372 0.659 0.262 0.857 T 1 = 40;T = 50 m 6.174 3.99 - 6.474 4.184 - 6.224 4.05 - R 2 0.673 0.631 0.74 0.729 0.688 0.784 0.865 0.847 0.893 MSPE 1.664 1.486 2.057 0.89 0.767 1.098 0.19 0.15 0.211 T 1 = 60;T = 70 m 6.82 5.61 - 7.016 5.744 - 7.19 5.868 - R 2 0.818 0.81 0.84 0.892 0.887 0.905 0.975 0.974 0.978 MSPE 1.56 1.538 1.63 0.802 0.785 0.839 0.159 0.156 0.166 29 Table 6: Frequency Distribution of Optimal Number of m, i.i.d. Factor T 1 =25,T=35 T 1 =40,T=50 T 1 =60,T=70 2 =1 2 =0.5 2 =0.1 2 =1 2 =0.5 2 =0.1 2 =1 2 =0.5 2 =0.1 A B A B A B A B A B A B A B A B A B 1 0 4 1 3 0 0 8 26 8 27 9 21 1 3 0 0 0 0 2 5 33 1 20 2 21 33 77 23 67 25 72 5 10 4 6 0 3 3 3 73 6 77 5 76 50 113 35 104 36 95 18 343 12 28 12 27 4 11 121 10 110 8 134 63 104 65 108 66 128 38 75 35 80 23 62 5 13 104 18 110 19 98 72 84 68 80 71 95 67 112 67 123 55 114 6 18 69 23 81 22 72 73 58 73 56 82 51 94 126 89 109 111 134 7 35 49 32 49 29 55 54 23 71 33 65 25 103 78 106 88 99 90 8 24 26 22 34 19 32 44 9 42 14 47 10 77 44 74 35 87 45 9 42 13 37 12 29 5 28 3 37 7 41 3 46 14 54 23 45 16 10 37 5 52 2 48 5 36 1 30 2 27 30 4 33 5 36 8 11 41 2 35 2 40 2 18 2 24 0 15 13 16 2 18 1 12 50 0 45 47 11 12 1 11 5 3 1 10 13 59 1 34 46 6 6 1 3 2 6 3 14 46 41 44 1 4 2 1 0 0 15 37 42 36 1 0 1 1 16 27 33 35 0 2 17 18 22 32 2 18 18 25 27 19 13 16 10 20 3 5 2 Table7: OptimalChoiceofmandtheAverageMSPE,NearlyNon-stationaryFactor T 1 = 25;T = 35 2 = 1 2 = 0:5 2 = 0:1 AIC AICC 20 AIC AICC 20 AIC AICC 20 m 11.692 4.28 - 11.728 4.156 - 11.452 4.206 - R 2 0.913 0.805 0.938 0.931 0.839 0.95 0.962 0.922 0.973 MSPE 6.202 2.271 8.47 3.222 1.118 4.14 0.74 0.221 0.935 T 1 = 40;T = 50 m 6.358 4.16 - 6.342 4.114 - 6.488 4.214 - R 2 0.796 0.77 0.838 0.853 0.833 0.883 0.924 0.915 0.939 MSPE 1.777 1.621 2.159 0.851 0.757 1.029 0.179 0.156 0.212 T 1 = 60;T = 70 m 5.468 4.308 - 5.432 4.302 - 5.472 4.272 - R 2 0.77 0.76 0.802 0.824 0.816 0.848 0.914 0.91 0.926 MSPE 1.328 1.281 1.485 0.674 0.652 0.77 0.129 0.125 0.146 30 Table 8: Frequency Distribution of Optimal Number of m, Nearly Non-stationary Factor T 1 =25,T=35 T 1 =40,T=50 T 1 =60,T=70 2 =1 2 =0.5 2 =0.1 2 =1 2 =0.5 2 =0.1 2 =1 2 =0.5 2 =0.1 A B A B A B A B A B A B A B A B A B 1 0 25 2 32 3 25 4 22 2 19 6 15 11 17 6 11 8 20 2 3 73 7 72 7 74 28 75 30 72 22 73 30 55 40 65 24 56 3 12 99 12 101 5 99 47 99 39 98 41 94 53 101 58 102 75 94 4 9 106 11 103 16 102 57 105 64 130 52 124 87 113 70 116 75 115 5 22 78 12 73 24 86 65 84 80 77 76 95 90 100 86 91 87 106 6 25 41 24 62 31 56 74 63 71 59 72 42 81 58 96 56 79 65 7 25 39 19 24 19 26 66 32 58 26 67 32 56 30 56 33 68 21 8 33 23 36 19 33 18 51 13 51 13 51 15 39 20 43 19 32 15 9 27 8 39 11 31 10 33 5 42 5 38 6 31 6 24 5 26 5 10 22 6 32 2 44 2 35 2 24 1 37 2 12 15 1 14 3 11 48 1 37 1 33 1 24 14 14 1 6 4 1 7 12 40 1 32 34 1 11 11 8 1 2 2 3 13 44 47 36 3 6 12 1 1 14 40 40 38 1 6 4 1 1 15 46 43 40 1 2 16 39 34 33 17 35 28 31 18 16 17 22 19 11 21 14 20 3 7 6 31 Table 9: Prediction comparison of model (1) N = 20;T 1 = 25;T = 35;K = 1 F vN(0;1);BvN(0;1);kmax = 8; = 1 R 2 MSPE m AIC 0.825 5.534 11.092 AICC 0.639 2.031 4.034 PC1 0.525 1.572 7.88 PC2 0.499 1.49 7.012 PC3 0.528 1.58 8 IC1 0.335 1.111 1.356 IC2 0.326 1.077 1.002 IC3 0.528 1.58 8 cross-sectional units will be ne because the estimatedb a 2 will converge to zero. We then compare the predictive performance of our approach based on AIC and AICC withBaiandNg(2002)PCandICcriteria. Tables9andTable10providethe resultsofonefactormodelbasedonsettingmaximumnumberofK,kmax,equalto8 and20, respectively. AsonecanseethepredictiveperformanceofBaiandNg(2002) is very sensitive to the prior speci ed number of maximum K. The performance of factor model also deteriorates when the number of factors increased from 1 to 5 (Tables10and11); whentheaverageofb i changedfrom 0 to 0:3 or 1 (Tables13and 14); whenthedistributionofb i changedtoUniform(-1, 1)orN(2, 2)(Tables12and 15); when the idiosyncratic components " it has heteroscedastic variances or serially correlated (Table 16 and 17) and signal-to-noise ratio reduces (Table 19). However, the performance of factor model does improve when T increases (Table 18). Inshort, whenN andT are nite,thelimitedMonteCarlosshowthatgenerating counterfactuals based on the Bai and Ng (2002) approach appears to be sensitive to (a) signal-to-noise ratio; (b) the distributionof factorloadingmatrixB; (c) the aver- age values ofB, 1 N N P i=1 b i ; (d) number of unknown factors K; (e) the a priori assumed maximumnumberofunknownfactors;(f)serialcorrelationsoftheidiosyncraticcom- 32 Table 10: Prediction comparison of model (2) N = 20;T 1 = 25;T = 35;K = 1 F vN(0;1);BvN(0;1);kmax = 20; = 1 R 2 MSPE m AIC 0.84 5.671 11.392 AICC 0.65 2.009 4.112 PC1 0.888 7.936 20 PC2 0.888 7.936 20 PC3 0.888 7.936 20 IC1 0.888 7.936 20 IC2 0.888 7.936 20 IC3 0.888 7.936 20 Table 11: Prediction comparison of model (3) N = 20;T 1 = 25;T = 35;K = 5 F vN(0;1);BvN(0;1);kmax = 8; = 1 R 2 MSPE m AIC 0.935 7.229 12.152 AICC 0.858 3.338 5.366 PC1 0.953 9.258 20 PC2 0.953 9.258 20 PC3 0.953 9.258 20 IC1 0.953 9.258 20 IC2 0.953 9.258 20 IC3 0.953 9.258 20 Table 12: Prediction comparison of model (4) N = 20;T 1 = 25;T = 35;K = 5 F vN(0;1);BvU(1;1);kmax = 20; = 1 R 2 MSPE m AIC 0.882 7.877 11.98 AICC 0.736 2.9 4.864 PC1 0.915 9.908 20 PC2 0.915 9.908 20 PC3 0.915 9.908 20 IC1 0.915 9.908 20 IC2 0.915 9.908 20 IC3 0.915 9.908 20 33 Table 13: Prediction comparison of model (5) N = 20;T 1 = 25;T = 35;K = 5 F vN(0;1);BvN(0:3;1);kmax = 20; = 1 R 2 MSPE m AIC 0.939 8.587 12.018 AICC 0.866 3.052 5.162 PC1 0.955 11.683 20 PC2 0.955 11.683 20 PC3 0.955 11.683 20 IC1 0.955 11.683 20 IC2 0.955 11.683 20 IC3 0.955 11.683 20 Table 14: Prediction comparison of model (6) N = 20;T 1 = 25;T = 35;K = 5 F vN(0;1);BvN(1;1);kmax = 20; = 1 R 2 MSPE m AIC 0.959 8.432 12.296 AICC 0.907 3.753 5.524 PC1 0.969 11.118 20 PC2 0.969 11.118 20 PC3 0.969 11.118 20 IC1 0.969 11.118 20 IC2 0.969 11.118 20 IC3 0.969 11.118 20 Table 15: Prediction comparison of model (7) N = 20;T 1 = 25;T = 35;K = 5 F vN(0;1);BvN(2;2);kmax = 20; = 1 R 2 MSPE m AIC 0.982 9.498 12.39 AICC 0.961 5.086 6.034 PC1 0.987 11.899 20 PC2 0.987 11.899 20 PC3 0.987 11.899 20 IC1 0.987 11.899 20 IC2 0.987 11.899 20 IC3 0.987 11.899 20 34 Table 16: Prediction comparison of model (8) N = 20;T 1 = 25;T = 35;K = 5 F vN(0;1);BvN(0;1);kmax = 20; =U(1;4) R 2 MSPE m AIC 0.834 41.484 11.57 AICC 0.638 16.471 4.466 PC1 0.881 55.844 20 PC2 0.881 55.844 20 PC3 0.881 55.844 20 IC1 0.881 55.844 20 IC2 0.881 55.844 20 IC3 0.881 55.844 20 Table 17: Prediction comparison of model (9) N = 20;T 1 = 25;T = 35;K = 5;F vN(0;1);BvN(0;1) kmax = 20;e it = 0:5e i;t1 +v it ; v = 1 R 2 MSPE m AIC 0.945 10.588 12.188 AICC 0.878 4.345 5.744 PC1 0.959 13.868 20 PC2 0.959 13.868 20 PC3 0.959 13.868 20 IC1 0.959 13.868 20 IC2 0.959 13.868 20 IC3 0.959 13.868 20 Table 18: Prediction comparison of model (10) N = 20;T 1 = 60;T = 70;K = 5 F vN(0;1);BvN(1;1);kmax = 20; = 1 R 2 MSPE m AIC 0.887 1.826 8.024 AICC 0.88 1.842 6.66 PC1 0.899 1.879 20 PC2 0.899 1.879 20 PC3 0.899 1.879 20 IC1 0.899 1.879 20 IC2 0.899 1.879 20 IC3 0.899 1.879 20 35 Table 19: Prediction comparison of model (11) N = 20;T 1 = 25;T = 35;K = 5 F vN(0;1);BvN(0;1);kmax = 20; = 5 R 2 MSPE m AIC 0.851 32.096 11.77 AICC 0.662 13.559 4.618 PC1 0.892 42.919 20 PC2 0.892 42.919 20 PC3 0.892 42.919 20 IC1 0.892 42.919 20 IC2 0.892 42.919 20 IC3 0.892 42.919 20 ponents " it ; and (g) heteroskedasticity of " it . On the other hand, our procedure of usingx t inlieuoff t doesnotappeartobea¤ectedbyanyoftheseissues. Onaverage, they yield much smaller prediction errors than that of the factor approach. 2.6 Concluding Remark A panel data methodology for constructing counterfactual is proposed. The ap- proach is based on the idea that individual outcomes are driven by a small number of common factors and idiosyncratic components. One can therefore use the infor- mation of untreated individuals to capture the main co-movements of the treated individuals in the absence of intervention. The method is easy to implement and the inference appears quite robust. There is no need to distill the fundamental factors and their factor loading matrix as in Bai (2003), Bai and Ng (2002), Bernanke and Boivin(2003)),etc. However,itstillallowsustocapturethee¤ectsofthosecommon factorsthroughtheobservedoutcomesofotherindividualswhichisnotpossiblewith a univariate intervention analysis with short time series (e.g. Box and Tiao (1975)). 36 Chapter 3: Empirical Applications to Policy Evaluation In this chapter we apply the methodology in Chapter 2 to assess the impact of two major Hong Kong policies, one being the political integration, another being the economic integration with Mainland China on Hong Kongs economy by comparing what actually happened to Hong Kongs real GDP growth rates with what would have been if there were no change of sovereignty in July 1997 or no Closer Economic Partnership Arrangement (CEPA) with Mainland China in 2003. More speci cally, we wish to analyze how these events have changed the growth rate of Hong Kong. However, to answer this question through conventional econometric modelings is not easy. We need to know how and why Hong Kong economy has grown over time and how China factor plays a role in Hong Kongs investment, labor migration and Hong Kong as an entrepot between China and the rest of the world, etc. Most growth literature are highly abstract. Empirical analysis based on the theoretical literature would often require the imposition, as Sims (1980) claimed, incrediblea priori identifying restrictions. Data demand will also be huge. Moreover, often when externalconditionschange,peoplesoptimaldecisionrulesalsochange. Theresimply may not have enough post-change observations to provide reliable inferences for the post-change outcomes. If we know the outcomes of a subject under intervention and not under interven- tion, the e¤ect of a policy intervention is just the di¤erence between the outcomes under intervention and in the absence of intervention. However, we rarely simulta- neously observe the outcomes of an individual under intervention or in the absence of intervention. To properly evaluate the e¤ect of a policy intervention on a subject or unit we need to construct the counterfactual of the missing outcomes. Our ap- proach of constructing the counterfactual of the individual subject to intervention, 37 say the ith unit, is to use other units that are not subject to intervention to predict whatwouldhavehappenedtotheithunithaditnotbeensubjecttopolicyinterven- tion. The basic idea behind this approach is to rely on the correlations among the cross-sectional units. We attribute the cross-sectional dependence to the presence of common factors that drive all the relevant cross-sectional units as in Chapter 2. 3.1 Background HongKongwasa shingvillageandcededtoBritainaftertheopiumwarin1842. Many Mainland Chinese have migrated to Hong Kong after the establishment of PeoplesRepublicofChinain1949. Thepopulationat1950wasabout2.6million. In the1960sand1970sHongKongexperiencedrapideconomicgrowthandisconsidered oneofthefourlittledragonsinEastAsia. In1961,itspercapitaincomewasUSD410, about 13.8% of that of the United States. By the eve of reverting sovereignty back to China on July 1, 1997, Hong Kongs population was at 6.5 million with per capita incomeofUSD21,441,whichwas67.2%ofthatoftheUnitedStates. HangSengstock market index was at 15,196. Because Hong Kong had been growing rapidly prior to the reversion of sovereignty to China, many questions have been raised about the impacts of change of sovereignty on the growth of Hong Kong economy (e.g. Sung and Wong (2000)). In addition, Hong Kongs economy has been subject to many external shocks after the reversion of sovereignty. Asian nancial crisis broke out in October, 1997. The Thai Baht/USD exchange rate dropped from 27 in June, 1997 to35.8inSeptember, 1997andfurtherto44.4Baht/USDinDecember. Thecrisis in ThailandquicklyspreadtoSouthKorea,Malaysia,Indonesia,Philippines,Singapore, Taiwan, and other Paci c Rim countries with varying degree of severity. Hong Kong was also hit by other international speculative attacks on four occasions in 1998. 38 Besides, H5N1 Avian u broke out in December 1997 that caused 5 deaths and led to the slaughtering of more than a million chickens. By December 1997, Hang Seng index had fallen to 10722. Asamatteroffact,bytheeveofHongKongsigningCEPAwithMainlandChina in June 2003, the growth rate for the second quarter of 2003 was -0.67%. The per capita income was US$22,673 in 2003. In March 2003 Severe Acute Respiratory Syndrome (SARS) spread to Hong Kong from China. The Hang Seng Index fell to 8717 in April 2003. The CEPA aims to strengthen the linkage between Mainland China and Hong Kong by liberalizing trade in services, enhancing cooperation in the areaof nance,promotingtradeandinvestmentfacilitationandmutualrecognitionof professional quali cations. The implementation of CEPA started on January 1, 2004 where 273 types of Hong Kong products could be exported to the Mainland tari¤ free, another 713 types on January 1, 2005, 261 on January 1, 2006, and a further 37 on January 2007. Chinese citizens residing in selected cities are also allowed to visit Hong Kong as individual tourists, from 4 cities in 2003 to 49 cities in 2007, covering all 21 cities in Guangdong province. 3.2 Data Because Hong Kong is a tiny city relative to other countries and regions, we believe whatever happened in Hong Kong will have no bearing to other countries. In other words, we expect Assumption (e) in Chapter 2 to hold. Therefore, we use quarterly real growth rate of Australia, Austria, Canada, China, Denmark, Finland, France, Germany, Indonesia, Italy, Japan, Korea, Malaysia, Mexico, Netherlands, New Zealand, Norway, Philippines, Singapore, Switzerland, Taiwan, Thailand, UK, and US to predict the quarterly real growth rate of Hong Kong in the absence of intervention. All the nominal GDP, CPI are from OECD Statistics, International Financial Statistics and CEIC Data base. 39 Therearemanywaystocomputequarterlygrowthrates. Onecaneithermeasure the change compared with the corresponding quarter in the previous year (year-on- year) or measure the change since the previous quarter (e.g. Neo (2003)). We note that the four quarters within one year have di¤erent numbers of working days and di¤erent countries have di¤erent seasonal e¤ects on production and expenditure. For instance, Chinese new year always falls in the rst quarter and it is a big holiday for HongKong, virtuallyallbusinessandgovernmentagenciesareclosedforcelebration, but not so for other countries. Since our data are non-seasonally adjusted and our interest is in nding the long term trend, we compute the quarterly growth rate by measuringthechangecomparedwiththecorrespondingquarterinthepreviousyear. 3.3 Empirical Analysis In this section we illustrate the use of our panel data approach for program eval- uation by considering the impact on Hong Kong real GDP growth rate with revert of sovereignty on July 1, 1997 from U.K. to China and the implementation of CEPA starting in2004Q1betweenMainlandChinaandHongKong. Theperformanceeval- uationusingBaiandNgsapproachispresentedinAppendixB.We rstevaluatethe impact of change of sovereignty on real GDP had Hong Kong stayed under British rule. Since there are only 18 observations between 1993Q1 and 1997Q2, we limit the countries under consideration for constructing counterfactual to China, Indonesia, Japan, Korea, Malaysia, Philippines, Singapore, Taiwan, Thailand and US coun- triesthatareeitherintheregionoreconomicallycloselyassociatedwithHongKong. Using AICC, Japan, Korea, US and Taiwan are selected to construct the hypothet- ical growth path of Hong Kong had there been no change of sovereignty. The OLS weights based on 1993Q1 - 1997Q2 data are reported in Table 20 and the estimated treatmente¤ectsarereportedinTable21. Theactualandhypotheticalgrowthpaths for the period 1993Q1 - 1997Q2, and 1997Q3 - 2003Q4 are plotted in Figure 1 and 40 Figure 1: AICC - Actual and Predicted Real GDP from 1993Q1 to 1997Q2 Figure 2: AICC Actual and Counterfactual Real GDP from 1997Q3 to 2003Q4 2, respectively. Because the treatment e¤ects appears to be serially correlated (see Figure 3), we t an AR(2) model for the estimated treatment e¤ects: b 1t =0:0063 (0:0068) + 1:459 (0:1559) b 1t1 0:6547 (0:1558) b 1t2 +b t (62) where estimated standard errors are in parentheses. The implied long-run e¤ect is -0.032. However, the t-statistic is only -1.04 which is not statistically signi cant. 41 Figure 3: AICC Autocorrelation of Treatment E¤ect from 1997Q3 to 2003Q4 Table 20: AICC - Weights of Control Groups for the Period 1993Q1 - 1997Q2 tscore Constant 0.0263 0.017 1.5427 Japan -0.676 0.1117 -6.0522 Korea -0.4323 0.0634 -6.8211 US 0.486 0.2195 2.2141 Taiwan 0.7926 0.3099 2.5576 R 2 = 0:9314;AICC =171:771 The AIC criterion selects Japan, Korea, Philippines, Taiwan and the US. The OLS estimates of the weights are in Table 22 and treatment e¤ects are in Table 23, respectively. The actual and hypothetical growth paths for 1993Q1 - 1997Q2 and 1997Q3 - 2003Q4 are plotted in Figures 4 and 5, respectively. Again, the estimated treatment e¤ect appears serially correlated. The tted AR(2) model takes the form b 1t =0:0066 (0:0078) +1:3821 (0:1722) b 1t1 0:5764 (0:1722) b 1t2 + b t : (63) The implied long-run e¤ect is -0.033. However, the t-statistic is only -0.94 which is again not statistically signi cant. 42 Table 21: AICC - Treatment E¤ect of Political Integration 1997Q3 - 2003Q4 Actual Control Treatment Q3-1997 0.061 0.0798 -0.0188 Q4-1997 0.014 0.081 -0.067 Q1-1998 -0.032 0.1294 -0.1614 Q2-1998 -0.061 0.1433 -0.2043 Q3-1998 -0.081 0.1319 -0.2129 Q4-1998 -0.065 0.139 -0.204 Q1-1999 -0.029 0.0876 -0.1166 Q2-1999 0.005 0.067 -0.062 Q3-1999 0.039 0.04 -0.001 Q4-1999 0.083 0.0445 0.0385 Q1-2000 0.107 0.0434 0.0636 Q2-2000 0.075 0.0398 0.0352 Q3-2000 0.076 0.0524 0.0236 Q4-2000 0.063 0.0318 0.0312 Q1-2001 0.027 0.0118 0.0152 Q2-2001 0.015 -0.0177 0.0327 Q3-2001 -0.001 -0.0177 0.0167 Q4-2001 -0.017 0.0184 -0.0354 Q1-2002 -0.01 0.0314 -0.0414 Q2-2002 0.005 0.05 -0.045 Q3-2002 0.028 0.0577 -0.0297 Q4-2002 0.048 0.0346 0.0134 Q1-2003 0.041 0.0538 -0.0128 Q2-2003 -0.009 0.0251 -0.0341 Q3-2003 0.038 0.0628 -0.0248 Q4-2003 0.047 0.0761 -0.0291 mean 0.018 0.0576 -0.0396 std 0.0478 0.0429 0.0787 t 0.3761 1.3417 -0.5034 43 Figure 4: AIC Actual and Predicted Real GDP from 1993Q1 to 1997Q2 Figure 5: AIC Actual and Counterfactual Real GDP from 1997Q3 to 2003Q4 Figure 6: AIC Autocorrelation of Treatment E¤ect from 1997Q3 to 2003Q4 44 The real GDP growth in Hong Kong appears to be approximated well by the chosen controls before treatment by either criterion. The estimated treatment e¤ect is not statistically signi cant. Therefore, we may conclude that the political inte- gration of Hong Kong with Mainland China do not appear to have any signi cant impact on Hong Kongs economic growth. The lack of intervention e¤ect is hardly surprisinggiventheonecountry, twosystemsconceptproposedbyDengXiaoping. It is generally recognized that apart from change of national ags, Hong Kongs in- stitutional arrangements were basically left untouched during this period. Moreover, the change of sovereignty was known fourteen years in advance and the institutional arrangements were laid down in great detail in the Sino-British Joint Declaration of 1984. Presumably, all needed adjustments had already taken place before 1997. Givenwedonot ndanye¤ectof thechangeof sovereignty, wecanpool thedata of 1993Q1 to 2003Q4 to examine the e¤ect of the CEPA including Individual Travel Scheme and Removal of Preferential Tari¤which was signed on June 29, 2003, but implementation only started on January 1, 2004. Since we now have more degrees of freedom, we can use the model selection strategy discussed in Chapter 2 to generate the hypothetic growth path for Hong Kong had there been no CEPA with Mainland China. The AICC criterion selects Austria, Italy, Korea, Mexico, Norway and Sin- gapore. OLSestimatesoftheweightsarereportedinTable24. Actualandpredicted growthpathfrom1993Q1to2003Q4areplottedinFigure7. Theavailabilityofmore pre-intervention period data appears to allow more accurate estimates of the coun- try weights and better tracing of the pre-intervention path. The estimated quarterly treatment e¤ects are reported in Table 25.. The actual and predicted counterfactual for the period 2004Q1 to 2008Q1 are presented in Figure 8. Under the AIC crite- rion, the selected group consists of Austria, Germany, Italy, Korea, Mexico, Norway, Philippines, Singapore and Switzerland. The OLS estimates of the weights are in 45 Table 22: AIC - Weights of Control Groups for the Period 1993Q1 - 1997Q2 tscore Constant 0.0316 0.0164 1.9283 Japan -0.69 0.1056 -6.5341 Korea -0.3767 0.0688 -5.4721 US 0.8099 0.2873 2.8193 Philippines -0.1624 0.0999 -1.6248 Taiwan 0.6189 0.311 1.9902 R 2 = 0:9438;AIC =180:986 Figure 7: AICC Actual and Predicted Real GDP from 1993Q1 to 2003Q4 Table26andtheestimatedquarterlytreatmente¤ectsareinTable27. Thepre- and post intervention actual and predicted outcomes are plotted in Figures 10 and 11. It is notable that both groups of countries trace closely the actual Hong Kong path beforetheimplementationofCEPA(withR 2 above0.93). Itisalsoquiteremarkable thatthepost-samplepredictionscloselymatchedtheactualturningpointsatalower levelforthetreatmentperiodeventhoughnoHongKongdatawereused. TheCEPA e¤ect at each quarter was all positive and appeared to be serially uncorrelated, see Figures 9 and 12. The average actual growth rate from 2004Q1 - 2008Q1 is 7.26%. The average projectedgrowthratewithoutCEPAis3.23%usingthegroupofcountriesselectedby 46 Table 23: AIC - Treatment E¤ect of Political Integration 1997Q3 - 2003Q4 Actual Control Treatment Q3-1997 0.061 0.0839 -0.0229 Q4-1997 0.014 0.0811 -0.0671 Q1-1998 -0.032 0.1344 -0.1664 Q2-1998 -0.061 0.1438 -0.2048 Q3-1998 -0.081 0.1334 -0.2144 Q4-1998 -0.065 0.1472 -0.2122 Q1-1999 -0.029 0.0952 -0.1242 Q2-1999 0.005 0.0704 -0.0654 Q3-1999 0.039 0.0464 -0.0074 Q4-1999 0.083 0.0473 0.0357 Q1-2000 0.107 0.031 0.076 Q2-2000 0.075 0.0344 0.0406 Q3-2000 0.076 0.0394 0.0366 Q4-2000 0.063 0.0208 0.0422 Q1-2001 0.027 0.0155 0.0115 Q2-2001 0.015 -0.0101 0.0251 Q3-2001 -0.001 -0.0071 0.0061 Q4-2001 -0.017 0.0251 -0.0421 Q1-2002 -0.01 0.0375 -0.0475 Q2-2002 0.005 0.0473 -0.0423 Q3-2002 0.028 0.0593 -0.0313 Q4-2002 0.048 0.027 0.021 Q1-2003 0.041 0.0463 -0.0053 Q2-2003 -0.009 0.0302 -0.0392 Q3-2003 0.038 0.0593 -0.0213 Q4-2003 0.047 0.077 -0.03 mean 0.018 0.0583 -0.0403 std 0.0478 0.0435 0.0815 t 0.3761 1.3393 -0.4953 47 Table 24: AICC - Weights of Control Groups for the Period 1993Q1 - 2003Q4 tscore Constant -0.0019 0.0037 -0.524 Austria -1.0116 0.1682 -6.0128 Italy -0.3177 0.1591 -1.9971 Korea 0.3447 0.0469 7.3506 Mexico 0.3129 0.051 6.1335 Norway 0.3222 0.0538 5.9912 Singapore 0.1845 0.0546 3.3812 R 2 = 0:931;AICC =378:9427 Table 25: AICC - Treatment E¤ect for Economic Integration 2004Q1 - 2008Q1 Actual Control Treatment Q1-2004 0.077 0.0493 0.0277 Q2-2004 0.12 0.0686 0.0514 Q3-2004 0.066 0.0515 0.0145 Q4-2004 0.079 0.0446 0.0344 Q1-2005 0.062 0.0217 0.0403 Q2-2005 0.071 0.0177 0.0533 Q3-2005 0.081 0.0333 0.0477 Q4-2005 0.069 0.029 0.04 Q1-2006 0.09 0.0471 0.0429 Q2-2006 0.062 0.0417 0.0203 Q3-2006 0.064 0.025 0.039 Q4-2006 0.066 0.0009 0.0651 Q1-2007 0.055 -0.0101 0.0651 Q2-2007 0.062 0.0092 0.0528 Q3-2007 0.068 0.0143 0.0537 Q4-2007 0.069 0.0508 0.0182 Q1-2008 0.073 0.0538 0.0192 mean 0.0726 0.0323 0.0403 std 0.0149 0.0213 0.016 t 4.8814 1.5132 2.5134 48 Figure 8: AICC Actual and Counterfactual Real GDP from 2004Q1 to 2008Q1 Figure 9: AICC Autocorrelation of Treatment E¤ect from 2004Q1 to 2008Q1 Table 26: AIC - Weights of Control Groups for the Period 1993Q1 - 2003Q4 tscore Constant -0.003 0.0042 -0.7095 Austria -1.2949 0.2182 -5.9361 Germany 0.3552 0.233 1.5243 Italy -0.5768 0.1781 -3.2394 Korea 0.3016 0.0587 5.1342 Mexico 0.234 0.0609 3.8395 Norway 0.2881 0.0562 5.1304 Switzerland 0.2436 0.1729 1.4092 Singapore 0.2222 0.0553 4.0155 Philippines 0.1757 0.1089 1.6127 R 2 = 0:9433;AIC =385:7498 49 Table 27: AIC - Treatment E¤ect for Economic Integration 2004Q1 - 2008Q1 Actual Control Treatment Q1-2004 0.077 0.0559 0.0211 Q2-2004 0.12 0.0722 0.0478 Q3-2004 0.066 0.0446 0.0214 Q4-2004 0.079 0.0314 0.0476 Q1-2005 0.062 0.0121 0.0499 Q2-2005 0.071 0.0126 0.0584 Q3-2005 0.081 0.0314 0.0496 Q4-2005 0.069 0.0278 0.0412 Q1-2006 0.09 0.0436 0.0464 Q2-2006 0.062 0.0372 0.0248 Q3-2006 0.064 0.0292 0.0348 Q4-2006 0.066 0.0122 0.0538 Q1-2007 0.055 0.0051 0.0499 Q2-2007 0.062 0.0279 0.0341 Q3-2007 0.068 0.0255 0.0425 Q4-2007 0.069 0.0589 0.0101 Q1-2008 0.073 0.062 0.011 mean 0.0726 0.0347 0.0379 std 0.0149 0.0193 0.0151 t 4.8814 1.7929 2.5122 Figure 10: AIC Actual and Predicted Real GDP from 1993Q1 to 2003Q4 50 AICC and3.47%usingthegroupselectedbyAIC. Theestimatedaveragetreatment e¤ect is 4.03% with a standard error of 0.016 based on the AICC group and 3.79% with a standard error of 0.0151 based on the AIC group. The t-statistic is 2.5134 for the former group and 2.5122 for the latter group. Either set of countries yields similar predictions and highly signi cant CEPA e¤ects. In other words, through liberalization and increased openness with Mainland China, the real GDP growth rate of Hong Kong is raisedbymore than4%comparedtothe growthrate hadthere been no CEPA agreement with Mainland China. TheHongKonggovernmentstatisticsappeartocorroboratethis nding. Arecent Hong Kong Legislative Council paper (LC Paper No. CD(1) 1849/06-07(04)) shows that the tari¤-free access of goods produced in Hong Kong has induced rising capital investment from HKD103 million in 2005, to HKD202 million in 2006 and HKD239 million in 2007. Liberalization to trade in services has further induced capital in- vestment. Capital investments in transport, logistics, distribution, advertising and construction stood at HKD1.0 billion in 2004, but were at HKD2.4 billion in 2007. The Individual Visit Scheme (IVS) has led to a substantial increase of tourism from China. Fromtheimplementationoftheschemetotheendof2006,MainlandChinese visitors have made 17.2 million trips to Hong Kong. IVS visitors spending in 2006 was HKD9.3 billion, about 38% higher than 2004. Moreover,theimplementationofCEPAalsohelpedtorebuildcon denceintheecon- omyafteraprolongedperiodof economicstagnation. Forinstance, thevalueof total receipts for the restaurant sector in 2008Q1 was up by 15.8%compared with 2007Q1 and the value of total retail sales in March, 2008 increased by 20% compared with a yearearlier. Ifthefundamentalrelationsbetweentheaggregateandcomponentsstay thesamebeforeandafter2004Q1,thenoneshouldexpecttherelativecontributionof each component to the aggregates should stay the same in those two periods and the 51 Figure 11: AIC - Actual and Counterfactual Real GDP from 2004Q1 to 2008Q1 Figure 12: AIC Autocorrelation of Treatment E¤ect from 2004Q1 to 2008Q1 impact of CEPA would be on its impact on each component. However, a simple sec- toralanalysisofregressinglog(realGDP)onlog(Re-ExportfromChina),log(Import) and log(Number of visitors) also show a statistically signi cant change of the impact oflog(NumberofChineseVisitors)from0.0638inpre-CEPAperiodto0.1663forthe post-CEPA period with highly signi cant t-values. In other words, it appears that IVSisthemostimportantcomponentofCEPAanditsimpactnotjustonincreasing tourist revenue, but serves to raise the con dence level of Hong Kong consumers and investors. Asaresult,theunemploymentratehasdroppedfrom7.9%in2003to4.8% 52 in 2006 and 4.2% in September-November 2007. The per-capita income in 2006 was USD27,604. The Hang Seng Index at the end of 2007 was at 27,812. 3.4 Concluding Remarks In this chapter we applied the panel data approach proposed in Chapter 2 to assess the impact of the political and economic intervention e¤ects on Hong Kongs economy. We nd that the change of sovereignty in 1997 hardly had any impact on Hong Kongs economy. On the other hand, the implementation of CEPA agreement in 2004 has a signi cant impact. Hong Kongs real GDP growth rate is 4% higher than what would have happened in the absence of CEPA. 53 Chapter 4: Forecast Combination 4.1 Introduction In Chapter 2, the counterfactual is constructed based on the best model selected in terms of some particular model selection criterion, for example, AIC or AICC. However,asBurnhamet. al(2002)statedthatthereisalwaysmodeluncertainty. Of- tenthereareanumberofpredictorsforavariableofinterest. Amodelmaybechosen based on the current data set. However, if another data set is available, a di¤erent model may be selected. Instead of focusing on the selection of the best forecasting model,BatesandGranger(1969)havesuggestedtocombinedi¤erentforecasts. Since then,numerousforecastingcombinationmethodsutilizingtheinformationinthevari- ance covariance structure of the forecast errors have been proposed. However, many empirical studies show that simple average of the forecasts yields a smaller mean squared prediction error (MSPE) than other theoretically optimal methods. In this chapter we will rst review the existing literature on forecast combination. Then, some additional forecasting combination methods focusing on adjusting the simple averageforecastsmeancorrectedandmeanandscalecorrectedsimpleaverage, and model selection approach are proposed. We will evaluate the performance of these forecastcombinationapproacheswithsomepopularforecastingcombinationmethods through Monte Carlo studies. 4.2 Basic Framework Let fy t ;x 0 t ;t = 1;:::;T 1 g be our sample where y t is a scalar and x t is an N 1 vector. In chapter 2, our goal is to construct fb y t ;t =T 1 +1;:::;Tg by choosing a subsetofx t basedoneitherAIC orAICC. Ifweconsiderallsubsetsofx t exceptthe meanmodel. ThereareM = 2 N 1modelsintotalwhichcoulduseupallthedegrees of freedominestimatingthecombinationweightof forecasts. Therefore, wewill trim 54 down the number of models to N. For example, we can select the highest likelihood model in each class which is de ned as a set of models with the same number of regressors. In the following, all the forecasts b y t are generated in a linear manner in the form of y t = x 0 it i , i = 1;:::;N using the rst T 1 time series observations where x it contain a constant term 1 and i regressors of x t and i is the corresponding parameter vector to be estimated. Since some of the optimal weighting schemes rely on the post-model estimation training sample, the time series observations are further divided into two periods. The rst T 0 observations are used for estimating . The optimal weight, w =(w 1 ;:::;w N ) 0 , is then based on the forecasts b f it =x it b i , i = 1;:::;N,t =T 0 +1;:::;T 1 .Thecombinedforecastisdenotedasb y t = P N i=1 w i b f it . To evaluate the performance of various forecast combination methods, we will consider a symmetric mean-squared prediction error loss function minE(y t b y t ) 2 (64) The sample version of MSPE will be MSPE = 1 TT 1 T P t=T 1 +1 (y t b y t ) 2 : (65) 4.3 Literature Review Forecasting was originally considered as secondary importance by some of the econometricians who claimed that the primary interest should be the understanding of the economy in the sense that good forecasts will follow automatically from such anunderstanding. Therefore,muche¤ortwasdevotedindevelopingeconomictheory. Recently, heavy emphasis has been placed on evaluating di¤erent models. Forecast combination has some role in the model evaluation literature (Chong and Hendry 55 (1986)). Forinstance, whenthereisimprovementinpoolingvariousforecasts, itsug- gests that the individual model is misspeci ed. Other researchers (Diebold (1989)) even states that "in a world in which information sets can be instantaneously and costlessly combined, there is no role (for forecast combination)". However, as Tim- mermann(2006)suggestedinhishandbookchapter,unlessonecanidentifyexantea particularforecastingmodelthatgeneratessmallerpredictionerrorsthanitscompeti- torsandwhoseforecasterrorscannotbehedgedbyothermodelsforecasterrors,fore- cast combinations o¤er diversi cation gains. Since the true data generating process is unknown, even the most complicated model is likely to be misspeci ed and can, at best, provide a reasonable local approximation. It is highly unlikely that a single model will dominate uniformly over time. Therefore, numerous optimal weighting schemes have been proposed since the seminal work by Bates and Granger (1969). 4.3.1 Bates and Granger Bates and Granger (1969), BG, consider selecting w in terms of the relative precision of the predictive models, where the precision of the ith model is mea- sured in terms of the inverse of the estimated forecast error variance, b 2 (i) = 1 T 1 T 0 P T 1 t=T 0 +1 (y t f it ) 2 . Then the ith element, w i , of the BG weighting scheme is set at w BG i = b 2 (i) 1 P N j=1 b 2 (j) 1 ; i = 1;:::;N: (66) 4.3.2 Granger and Ramanathan Granger and Ramanathan (1984), GR, consider selectingw by applying the least square method to the following models: 56 y t = w 0 f t +u t ; subject to N P j=1 w j = 1; (67) y t = w 0 f t +u t ; (68) y t = +w 0 f t +u t : (69) We shall refer to the selection ofw in terms of regression models (67), (68), and (69) as GR1, GR2, and GR3. GR3 yields an unconstrained regression weights and unconstrained minimum of 1 TT 1 T P t=T 1 +1 (y t b y t ) 2 . GR1 and GR2 can be viewed as constrained regression model of GR3 with GR1 being the most restrictive. If some of the predictive models f it are biased, then the weighting schemes generated by BG, GR1 and GR2 could be biased, but GR3 still generates unbiased predictor as the bias in f it is captured by the intercept. Therefore, if sample size is large, we should expect the mean square prediction error of GR3 mean square prediction error of GR2 mean square prediction error of GR1 unless the restriction is correct. 4.3.3 Variance-Covariance Approach Let be the NN mean squared prediction error matrix of f t = (f 1t ;:::;f Nt ) 0 . It is suggested to choose the weightw to (e.g. Timmermann (2006)) minw 0 w subject toe 0 w = 1; (70) wheree is an N1 vector of 1s, which yields w = e 0 1 e 1 1 e: (71) 57 The sample analogue of can be obtained asS = 1 T 1 T 0 P T 1 t=T 0 +1 (ey t f t )(y t e 0 f 0 t ). Substituting S for in (71), we obtain the solution w = (e 0 S 1 e) 1 S 1 e, which is named as VC. Remark 4.1: Minimizing w 0 Sw subject to e 0 w =1 transforms the norm back to the minimization of the distance between y t and b y t = w 0 f t along the y-axis. Hence, the geometric approach of nding w to minimize (71) with sample covariance S as a substitute will be identical to the regression approach GR1. To see this, plugging w N = 1w 1 :::w N1 into the sample objective function yields w 0 Sw = 1 T 1 T 0 w 0 T 1 P t=T 0 +1 (ey t f t )(ey t f t ) 0 w (72) = 1 T 1 T 0 T 1 P t=T 0 +1 [(y t f Nt )w 1 (f 1t f Nt ):::w N1 (f N1;t f Nt )] 2 : Remark 4.2: The Bates and Granger (1969) method, BG, by setting the weight as in (66) is equivalent to imposing a prior restriction that S is diagonal in the VC solutionw VC = (e 0 S 1 e) 1 S 1 e. 4.4 A Model Selection Approach When the underlying information off t is known, it may be bene cial to combine the information. Since x t is an N 1 vector of explanatory variables for y, if we use all possible subsets of information x t , except the mean model, then there are M = 2 N 1 forecasts in total. One wonders whether trimming some less desirable models could lead to a better forecast combination. Suppose that the underlying information, x t , generating the M linear forecasts are known. 58 Step 1: Categorize the M forecasting models into model classes M (j), for j = 1;:::;N by the number of regressors x t . So, each of the model in class M (j) has j regressors ofx t . Step 2: Use R 2 (or likelihood values) to select the best predicted models in each class. Step 3: From M (1), M (2);:::;M (N), select the best model, say M (m), based on some model selection criterion. For instance, we may use either one of the three popular model selection criteria, AIC (Akaike (1973)), AICC (Hurvich and Tsai (1989)) or BIC (Schwartz (1978)), AIC i = T 1 lnb 2 (i)+2(m+1); (73) AICC i = AIC +2 (m+1)(m+2) T 1 m2 ; (74) BIC i = T 1 lnb 2 (i)+(m+1)ln(T 1 ); (75) where the ML estimateb 2 (i) = 1 T 1 P T 1 t=1 (y t f it ) 2 , and m is the number of parame- ters of the i-th predictive model including the constant term. Remark 4.3: Step 2 downsizes the M models to a tractable number of forecasting models N, which will be combined by various kinds of weighting schemes. Step 3 is equivalent to selecting the best predictive model based on some model selection criteria. We conduct a simulation study to shed light on whether a full set or a subset of information should be used. Table 28 provides some simulation results of optimally combiningN = 15explanatoryvariablesselectedinstep2. Allvariablesaregenerated by an i.i.d factor structure. The mean square prediction error initially declines when more explanatory variables are included in the combination, but then increases. It 59 Table 28: Sample Variance Covariance Matrix of the Prediction Errors of 15 Models 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 4.42 2.52 2.05 1.93 1.82 1.78 1.74 1.74 1.71 1.7 1.71 1.71 1.71 1.71 1.71 2 2.52 3.09 2.23 2.09 2 1.97 1.96 1.94 1.94 1.93 1.93 1.94 1.93 1.93 1.93 3 2.05 2.23 2.65 2.23 2.17 2.15 2.13 2.13 2.13 2.11 2.12 2.11 2.11 2.12 2.11 4 1.93 2.09 2.23 2.63 2.4 2.35 2.32 2.32 2.32 2.31 2.31 2.31 2.31 2.31 2.31 5 1.82 2 2.17 2.4 2.69 2.54 2.51 2.51 2.51 2.5 2.51 2.5 2.51 2.51 2.51 6 1.78 1.97 2.15 2.35 2.54 2.79 2.71 2.7 2.68 2.67 2.68 2.68 2.68 2.69 2.69 7 1.74 1.96 2.13 2.32 2.51 2.71 2.89 2.83 2.82 2.82 2.82 2.82 2.82 2.83 2.83 8 1.74 1.94 2.13 2.32 2.51 2.7 2.83 3 2.95 2.95 2.94 2.95 2.96 2.96 2.96 9 1.71 1.94 2.13 2.32 2.51 2.68 2.82 2.95 3.08 3.05 3.05 3.04 3.05 3.06 3.06 10 1.7 1.93 2.11 2.31 2.5 2.67 2.82 2.95 3.05 3.13 3.12 3.11 3.12 3.12 3.12 11 1.71 1.93 2.12 2.31 2.51 2.68 2.82 2.95 3.05 3.12 3.18 3.17 3.18 3.18 3.19 12 1.71 1.94 2.11 2.31 2.5 2.68 2.82 2.94 3.04 3.11 3.17 3.21 3.22 3.22 3.22 13 1.71 1.93 2.11 2.31 2.51 2.68 2.82 2.95 3.05 3.12 3.18 3.22 3.25 3.25 3.25 14 1.71 1.93 2.12 2.31 2.51 2.69 2.83 2.96 3.06 3.12 3.18 3.22 3.25 3.27 3.27 15 1.71 1.93 2.11 2.31 2.51 2.69 2.83 2.96 3.06 3.12 3.19 3.22 3.25 3.27 3.27 suggests that combining all available information may not yield a more accurate pre- diction than just combining a subset ofx t . Therefore, when information is available, model selection criterion can be used to select the forecasting model which balances the within-sample t and forecast accuracy. 4.5 Mean and Mean and Scale Corrected Simple Averaging Simple average of predictive models have been documented to produce good fore- casts(e.g. BryanandMolloy(2007),StockandWatson(2004),Timmermann(2006)). However,someorallN predictivemodelscouldbebiasedasnotedbyPalmandZell- ner (1992). 60 We can correct for possible bias of simple average method by considering a mean corrected simple averaging (MCSA), b y t =+y t ; (76) where y t denotes the simple averaging predictor for y t . The mean is obtained as the average of (y t y t ). In addition to correcting the bias by adding an intercept to the simple averaging predictor, we can also make a scale correction by considering the predictive model (MSCSA), b y t =+cy t ; (77) where the mean and scale c are obtained by regressing y t on a constant and y t . 4.6 Monte Carlo Studies In this section we conduct a small scale Monte Carlo studies to evaluate the nite sample performance of our proposed forecasting combination methods to some popular forecast combination methods such as the simple average SA(N), Bates and Granger (1969), BG (66), Granger and Ramanathan (1984) models - GR1, GR2 and GR3 ((67), (68), (69)). Inaddition,wealsoconsiderBayesianaveraging. Giventhepriorthatitisequally likely for each model to be the best model for forecasting y t , Buckland et. al (1997) show that w BIC;i = exp 1 2 BIC i N P j=1 exp 1 2 BIC j ;i = 1;:::;N; (78) gives the posterior odds for the i-th model being the best predictive model where BIC i =BIC i min j (BIC j ). WeightsforAIC andAICC canbesimilarlyde ned. 61 To compare the predictive performance of various formulae, we generate y and the predictorsx from a simple factor structure, 0 B @ y t x t 1 C A = 0 B @ b 0 B 1 C A t +u t ; (79) whereu t and t are uncorrelated and E t = 0, E t 0 t =I r , Eu t u 0 t = 2 I. Hence the covariance matrix of (y t ;x 0 t ), , takes the form = 0 B @ yy yX Xy XX 1 C A = 0 B @ b 0 b+1 b 0 B B 0 b B 0 B+I N 1 C A : (80) We have T 1 observations and would like to predict y t , t = T 1 + 1;:::;T. The pre-break period is divided into 2 periods: t = 1;:::;T 0 and t = T 0 +1;:::;T 1 . The rst period is used to estimate the parameters of the models. In the second pre- break period, forecasts are made, the covariance matrix of the forecast errors are calculated, and weights corresponding to various methods are determined. We set N = 10, T 1 = 100;200;300, T 0 = T 1 =2. The forecast horizon, T T 1 , is xed at 50. The number of replications is 1000. For Table 29 to 31, the number of unobserved common factor, K, is either 1 or 5, andu t is generated from independently normally distributed random variables with mean 0 and variance 1. For Table 32, we consider athree-factormodelwithARMA(p;q)structure (1L)f t = (1+ 1 L+ 2 L 2 )u t as follows: f 1t = 0:8f 1;t1 +u 1;t f 2;t = 0:6f 2;t1 +u 2;t +0:8u 2;t f 3;t = u 3;t +0:9u 3;t1 +0:4u 3;t2 : 62 We also consider one-factor model following the rst AR(1) structure. Then the true covariance structure becomes =B 0 B B B B @ 2 1 2 0 0 0 2 (1+ 2 1 +2 1) 1 2 0 0 0 2 1+ 2 1 + 2 2 1 C C C C A B 0 +I N+1 : We compare the out-of sample MSPE with the benchmarks, namely, the best information combination model vs forecast combination methods. The benchmark models we consider are the ones where an investigator has the completeinformationset(79)and(80). The rstbenchmarkmodelistheoneassum- ing an investigator knows the parameters of the data generating process for (y t ;x 0 t ). Then the optimal information combination forx t is given by b y t =x 0 t 1 XX Xy : (81) We shall indicate simulation results based on (81) by TVC. Thesecondbenchmarkmodelassumesthattheparametersofthedatagenerating process are unknown, but sample observations are available so the weighting vector is b y t =x 0 t S 1 XX S Xy (82) where S XX and S Xy are sample estimates of XX and Xy . The simulation results based on (82) will be indicated by EVC. The third benchmark model assumes an investigator taking a simple average method to combine all possible forecasts. With 10 regressors, we have M = 2 10 1 models. Each model relies on the rst T 1 observations to form the forecasts b y t , t = T 1 + 1;:::;T. We use least squares 63 methodtoobtain b i . Thesimpleaverageoftheforecastsassignsw SA(N) = 1 N toeach predictive of the model f it . The simulation results of simple average of all predictive models are indicated by SA(M). However, not all the times one can observe all the underlying information set. Often only forecasts are available. We assume there are 10 forecast models each con- sistingof 1or2or....orupto10elementsofx,whereeachofthej-elementpredictive model is generated using the Step 1 of our model selection procedure described in the previous section, namely we select j out of 10 elements of x that yields highest likelihood value for j = 1;2;:::10. The parameters of these 10 models are estimated based on the rst pre-break period. The forecasts f t (m), t = T 0 + 1;:::;T 1 ; m = 1;:::;10 are available to the users. We then consider the post sample prediction performance of various forecast combinationmethods. WeuseBG,GR1,GR2;GR3todenotetheMSPE using(66), (67),(68)and(69); AIC,AICC,BIC theoutcomesofourmodelselectionapproach to combine forecasts based on AIC, AICC and BIC model selection criterion (73), (74) or (75); MAAIC, MAAICC, MABIC denote the Bayesian model averaging using AIC, AICC, or BIC combination scheme of (78), and SA(N) denotes the MSPE with equal weight. Table 29 to 31 display results of 1000 replications with factor loading(s) following Bv N (0;1), U (2;2) and Bv N (1;1) respectively. Note that given the way the dataandpredictivemodelsaregenerated, thedi¤erentforecastcombinationschemes all yield unbiased predictions. The Monte Carlo results show that: First, if the parameters of the data generating process are known, then optimally combiningtheinformationsetyieldsthemostaccurateprediction(TVC). Itcon rms Diebolds (1989) statement that "if information sets can be instantly and costlessly combined, there is no role for forecast combinations". However, if the parameters 64 of the data generating process are unknown, then information combination does not necessarilyyieldmoreaccuratepredictionsthanforecastcombinationin nitesample. (EVC and. AIC, AICC, BIC vs MAAIC, MAAICC, MABIC for (50,100)). Second, if the parameters of the data generating processes are unknown, using the theoretically optimal way to combine the information is not going to generate moreaccuratepredictionsthansimplytakingsimpleaverageofallpossiblepredictive modelsifthecorrelationsamongthepredictivemodelsareduetooneorafewcommon omitted factors. On the other hand, if the correlations among predictive models are due to a number of omitted common factors, then using the theoretically superior ways to combine information performs better than simple average of all predictive models (SA(M) vs. EVC, one factors vs. ve factors). Third, in nite sample when only estimated prediction error covariance matrix is available, combination methods that are theoretically superior could yield less accu- rate prediction than the combination methods that use less information. In terms of sample information exploited to obtain optimal weighting, we have the ranking VC > BG and GR3 > GR1 > GR2 > BG. However, in terms of the accuracy of prediction, the rankings in most cases are the reverse. In other words, when there are only limited sample observations, using less information is probably better than using more information. Fourth, in nite sample the Bayesian posterior odds weighting schemes appear to dominate sampling approach weighting schemes (MAAIC, MAAICC, MABIC vs BG, GR1, GR2, GR3, VC) and dominate equal weighted average (SA(N)). Fifth, the weighting based on AIC or AICC appears to perform better than the weighting based on BIC, be the classical sampling approach or Bayesian averaging approach. 65 Table 29: MSPE with Factor Loading(s) N(0,1) and Factor(s) i.i.d. N(0,1) (T 0 ;T 1 ) (50,100) (50,100) (100,200) (100,200) (150,300) (150,300) r = 1 5 1 5 1 5 TVC 1.1003 1.8308 1.1271 1.8243 1.1064 1.8082 EVC 1.2309 2.036 1.1868 1.916 1.1403 1.8714 SA(M) 1.1678 2.3316 1.1741 2.3076 1.1437 2.2624 AIC 1.2248 2.0827 1.191 1.9412 1.1456 1.8854 AICC 1.2253 2.0931 1.1919 1.9435 1.1457 1.886 BIC 1.2239 2.1514 1.1985 1.9819 1.1537 1.916 MAAIC 1.2143 2.073 1.1845 1.9344 1.1422 1.8826 MAAICC 1.2149 2.0816 1.1849 1.9364 1.1425 1.8833 MABIC 1.2214 2.1381 1.1961 1.9782 1.1534 1.9104 SA(N) 1.2082 2.1144 1.1871 2.017 1.1463 1.9692 BG 1.3212 2.2692 1.2354 2.0636 1.1765 1.9791 VC 1.4198 2.3268 1.2693 2.0402 1.1901 1.9444 GR1 1.4198 2.3268 1.2693 2.0402 1.1901 1.9444 GR2 1.4067 2.3289 1.2656 2.0363 1.1857 1.945 GR3 1.4264 2.3549 1.2656 2.0439 1.1893 1.9484 MSCSA 1.2776 2.2818 1.2199 2.0869 1.1704 1.9949 MCSA 1.3272 2.2836 1.2367 2.093 1.1794 2.0106 Sixth, the modeling selection approach for forecast combination performs almost as good as Bayesian approach but is computationally simpler. Seventh, mean corrected and mean and scale corrected simple average do not appeartodobetterthansimpleaveragingofpredictivemodels(MCSA, MCSSAvs SA(N)), probably because all predictive models are unbiased. Eighth, when the common factor t has an ARMA(p;q) structure, neither the informationcombinationmethods,northeforecastingcombinationmethodsdiscussed here are optimal. The mean and scale corrected simple average (MCSSA) then dominates all the forecasting combination methods. Ninth, given there is no structural change and the sample size is not small, theo- retically superior combination methods do dominate simple average of all predictive models. 66 Table 30: MSPE with Factor Loading(s) U(-2,2) and Factor(s) i.i.d. N(0,1) (T 0 ;T 1 ) (50,100) (50,100) (100,200) (100,200) (150,300) (150,300) K = 1 5 1 5 1 5 TVC 1.0894 1.835 1.1073 1.8403 1.0931 1.8341 EVC 1.2196 2.0338 1.1654 1.9346 1.1343 1.8961 SA(M) 1.1456 2.4371 1.1427 2.4185 1.1255 2.3804 AIC 1.2271 2.0885 1.1771 1.9605 1.145 1.9112 AICC 1.2286 2.0995 1.1773 1.9615 1.1452 1.9125 BIC 1.2364 2.1549 1.191 1.9999 1.1569 1.9397 MAAIC 1.2135 2.0723 1.1723 1.9548 1.1414 1.9107 MAAICC 1.2155 2.0825 1.1732 1.9567 1.142 1.9115 MABIC 1.2316 2.1471 1.1909 1.9929 1.1574 1.9368 SA(N) 1.2079 2.145 1.1765 2.0512 1.1479 2.002 BG 1.3185 2.3005 1.2252 2.0895 1.1766 2.0101 VC 1.4038 2.321 1.2497 2.0509 1.1845 1.9744 GR1 1.4038 2.321 1.2497 2.0509 1.1845 1.9744 GR2 1.3991 2.3281 1.2427 2.0516 1.1824 1.971 GR3 1.424 2.3405 1.2496 2.0545 1.1859 1.9724 MSCSA 1.286 2.3144 1.217 2.1163 1.1729 2.028 MCSA 1.326 2.3189 1.2303 2.1239 1.1782 2.0469 Table 31: MSPE with Factor Loading(s) N(1,1) and Factor(s) i.i.d. N(0,1) (T 0 ;T 1 ) (50,100) (50,100) (100,200) (100,200) (150,300) (150,300) K = 1 5 1 5 1 5 TVC 1.1029 1.8606 1.1174 1.8563 1.1049 1.8143 EVC 1.227 2.0842 1.1793 1.9515 1.1437 1.8728 SA(M) 1.1643 2.4489 1.1611 2.4021 1.139 2.3022 AIC 1.2318 2.1309 1.1844 1.9807 1.1502 1.8858 AICC 1.2325 2.1434 1.1851 1.9834 1.1502 1.8866 BIC 1.2322 2.1932 1.1955 2.0257 1.1638 1.9104 MAAIC 1.219 2.1194 1.1815 1.9736 1.1481 1.8816 MAAICC 1.2205 2.129 1.1821 1.9757 1.1485 1.8823 MABIC 1.2329 21872 1.1945 2.0191 1.1616 1.9084 SA(N) 1.213 2.1788 1.1842 2.0661 1.1547 1.98 BG 1.3338 2.3245 1.2359 2.0927 1.1843 1.9825 VC 1.4034 2.3821 1.2579 2.077 1.1947 1.9461 GR1 1.4034 2.3821 1.2579 2.077 1.1947 1.9461 GR2 1.4029 2.3964 1.2529 2.0797 1.1919 1.9473 GR3 1.422 2.4182 1.2557 2.0833 1.194 1.9462 MSCSA 1.2996 2.3512 1.2227 2.1288 1.1791 2.0116 MCSA 1.3389 2.3529 1.2373 2.1284 1.1855 2.025 67 Table 32: MSPE with Factor Loading(s) Normal (0,1) (T 0 ;T 1 ) (50,100) (50,100) (100,200) (100,200) (150,300) (150,300) K = 1 3 1 3 1 3 TVC 1.1253 1.4043 1.1021 1.4084 1.1118 1.4129 EVC 1.1107 1.1058 1.0486 1.0646 1.0354 1.0531 SA(M) 1.0354 1.0269 1.0104 1.0264 1.012 1.0297 AIC 1.0769 1.0682 1.0282 1.0445 1.0248 1.-421 AICC 1.0728 1.0656 1.028 1.0432 1.0246 1.042 BIC 1.0522 1.0452 1.0144 1.0336 1.0173 1.0345 MAAIC 1.0543 1.0484 1.0189 1.0355 1.0182 1.0351 MAAICC 1.0509 1.0449 1.018 1.0347 1.0178 1.0348 MABIC 1.0386 1.0301 1.0096 1.0266 1.0117 1.0296 SA(N) 1.0665 1.0608 1.0265 1.0428 1.0222 1.0398 BG 1.1889 1.182 1.0836 1.1009 1.0573 1.0755 VC 1.3358 1.3144 1.1417 1.158 1.0988 1.1143 GR1 1.3358 1.3144 1.1417 1.158 1.0988 1.1143 GR2 1.2571 1.2423 1.1052 1.1194 1.0732 1.0908 GR3 1.2912 1.2755 1.1177 1.1311 1.081 1.0968 MSCSA 1.0396 1.0321 1.012 1.0292 1.0132 1.031 MCSA 1.194 1.1898 1.0843 1.102 1.0579 1.0756 4.7 Concluding Remark In this chapter we have suggested a model selection approach when sample size relative to the number of predictive models is nite. In addition we also suggested a mean corrected and a mean and scale corrected simple average methods as supple- ments to simple averaging method to correct biases in some of the predicted models. WeconductedlimitedMonteCarlostudiestocomparetheir nitesampleperformance tosomepopularsamplingandBayesianapproachofcombiningforecasts. Whenthere is no structural break, the various forecasting combination methods dominate simple averageofallforecastingmodels. However,sincetheoptimalityofvariousforecasting methods depend on the true parameters, if sample size is not large, it appears that using less sample information is better than trying to fully exploit the information in the sample. The Bayesian approach to combine forecasts appears to dominate the 68 sampling approach. The model selection approach also appears to yield reasonably good forecasts. The mean square prediction error based on the model selection ap- proach is close to those obtained based on Bayesian approach but computationally simpler. Given that the optimal combination methods require the knowledge of true pa- rameters and frequently we only have nite sample, it will be very useful to obtain nitesampleadjustmentproceduresfortheoptimalcombinationmethods. Moreover, there may be frequent structural breaks, at least in a modeling sense if a model is only good as a "local approximation. In the absence of reliable information about the break points and the size of the breaks, rolling window simple average or a mean orameanandscale correctedsimpleaverageof all predictivemodels appears tobea robustwaytodealwiththechanceevents. However,theperformanceofanyforecast- ing combination methods will depend on the window size. Pesaran and Pick (2010) have provided some useful guide for a random walk with a jump drift. It would be useful to generalize their approach to more general cases. 69 Chapter 5: Empirical Applications to Forecast Combination Over the past twenty years, there are considerable research on forecasting real GDP growth rate using asset prices including interest rates, spreads, returns, and other macroeconomic fundamentals, for instance, changes in industrial production and monetary aggregates. Existing literature has identi ed several leading indica- tors. These include interest rates, term spreads, stock returns, dividend yields, and exchangerates. Businesseconomistswhoneedtokeeptrackoftheeconomydailyneed to know how these indicators can reliably predict real GDP. Often, when forecasts based on asset prices break down, it leads to another wave of re ning the macro- economic models. Similarly, many researchers try to identify predictors of the stock market indices. However, Welch and Goyal (2008) show that many well-known pre- dictorsfortheequitypremiumonS&P500donotperformwellintermsofforecasting when those predictors are tested in the particular time frame they pick. All of the evidence suggests that the models for predicting GDP or equity premium can only approximate the data well in some particular periods but not others. Inthischapter,weexaminetheperformanceofvariousforecastcombinationmeth- odstoforecasttheUSoutputgrowthrateandtheexcessequitypremiumonS&P500. These methods under three forecasting frameworks xed forecasting, continuously updating and rolling window with window size equals to T 1 T 0 will be examined. Under xed forecasting framework, the rst T 0 observations are used to calculate theparameters. ForecastsfromT 0 +1toT areformed. Forthosemethodsrelyingon an estimate of the prediction error matrixS, forecasts from T 0 +1 to T 1 are used to calculateS. Weights of BG, GR1, GR2 and GR3 are then obtained. Using the same period of data, we can also get the regression parameters for MCSA and MSCSA. Based on these weighting schemes, forecasts are formed for period from T 1 + 1 to 70 T. For those model selection procedures and simple averaging methods, MSPEs are based on those forecasts from T 1 +1 to T for fair comparison. Undercontinuouslyupdatingforecastingframework,thenewesttimeseriesobser- vations are added for estimating the parameters and the prediction error covariance matrix. For example, the rst set of regressions uses time series observations from 1 to T 0 to form forecasts at T 0 +1; the second set of regressions uses observations from 1 to T 0 +1 to form forecasts at T 0 +2. This procedures continues until forecasts at T is obtained. S is then based on forecasts from T 0 +1 to T 1 . MSPEs for various methods are calculated in the same way as for the xed forecasting framework. The rolling framework with window size T 1 T 0 works similarly except that the beginningobservationisdroppedforeachsetofregressions. Thismethodisexpected to perform better when there were structural breaks. 5.1 Predicting Real Output Growth The predictors for the US output growth consist of the rst lag of the dependent variable, y t1 = 400(GDP t1 =GDP t2 1), which is the growth rate of real GDP in quarter t1; the rst lag of the term spread de ned as TS t1 = GB t1 FFR t1 where GB t1 is the long-term government bond yield at t 1 and FFR t1 is the Fed Fund Rate at t 1; the change of the Treasury-Bill rate at t 1, TB t1 ; the rate of change of seasonally adjusted M2, RM t1 ; the rate of change of S&P Industrials, RSP t1 . Therefore,x t = (y t1 ;TS t1 ;TB t1 ;RM t1 ;RSP t1 ) 0 . Allthe data, except real output, are obtained from International Financial Statistics, while therealoutputaredownloadedfromFederalReserveBankofSt. Louis. Dataranges from1970Q3to2009Q3withT = 157. Weestimatetheparameterswithallthetime series observation up to 1990Q4 (T 0 = 82). These parameters are used to form the forecasting paths starting from 1991Q1 to 1998Q4 (T 1 = 114). 71 Table 33: MSPE of US Real GDP Prediction Fixed Continuously Updating Rolling SA(M) 7.1918 6.657 5.9212 AIC 8.0156 7.0522 6.2429 AICC 8.0156 7.0522 6.2429 BIC 8.0156 7.0522 6.3791 MAAIC 8.1917 7.055 5.9168 MAAICC 8.1536 7.055 5.9168 MABIC 8.083 7.055 5.9168 SA(N) 9.027 7.9199 6.6923 BG 12.0631 7.7821 5.9574 VC 7.5227 7.9039 6.7447 GR1 7.5227 7.9039 6.7447 GR2 8.3539 7.308 8.6456 GR3 9.0815 7.6344 8.4355 MSCSA 11.0219 8.15 6.7273 MCSA 9.3009 6.7991 5.7617 Results are reported in Table 33. Actual and predicted path of each methods under the 3 forecasting frameworks are plotted in Figure 13 to 15. Wenotethat rst,therollingframeworkwith xedwindowsizegenerallyperforms better than the other 2 frameworks except the three GR approaches. Second, the ranking of di¤erent forecast combination methods appear to be exact the opposite to the simulation result. Third, the mean corrected simple average in the rolling window framework appears to dominate all forecast combination method. Fourth, the Bayesian averaging methods are the close second. 5.2 Predicting Excess Equity Premium TopredicttheannualexcessequitypremiumoverS&P500index,whichisde ned as (P t + D t )=P t12 1 TB t1 , where P t is the closing index price on the last trading day of month t obtained in CRSP; D t is the corresponding dividend; TB t1 stands for one-month lagged US Treasury Bill rate. The nine predictors are (1) dividend yield DY t = logD t logP t1 , (2) one-month and (3) two-month lagged 72 Figure 13: Actual and Predicted US Real GDP under xed forecasting framework Figure 14: Actual and Predicted US Real GDP under continuously updating fore- casting framework 73 Figure 15: Actual and Predicted US Real GDP under rolling forecasting framework T-bill rates TB t1 and TB t2 , (4) rate of change of seasonally adjusted M2, M t = M t =M t12 1, (5) two-month lagged ination rate t2 , which is computed using producer price index, (6) rate of seasonally adjusted industrial production IP t = IP t =IP t12 1, (7) earnings price ratio EP t = logE t logP t , where E t is 12-month moving sums of earnings on the S&P 500 index, and (8) the one-month and (9) two- month lagged government bond rate GB t1 and GB t2 . Except DY t and EP t which are obtained from Welch and Goyal (2008), other explanatory variables are obtained from International Financial Statistics. The monthly data runs from 1960M3 to 2008M12, with total time series obser- vations T = 586. We use the rst T 0 = 216, which corresponds to 1978M2, time series observations for parameters estimationandforecast. The forecastingpaths be- tweenT 1 = 376, correspondingto 1991M6andT 0 areusedtoestimatetheprediction error covariance matrix. Same procedures as the above subsection are carried out. Results are reported in Table 34 and the 3 gures under various kinds of forecasting frameworks are plotted in Figure 16 to Figure 18. 74 Figure 16: Actual and Predicted Excess Equity Premium on S&P500 under xed forecasting framework Again,thecontinuousupdatingandrollingwindowapproachdominate xedwin- dow. Some variants of simple average dominate all other combination approaches. The rolling window mean and scale corrected simple average performs substantially better than the combination methods that are supposed to be superior in theory. 5.3 Concluding Remark We applied various forecast combination methods listed in Chapter 4 to predict the U.S. real GDP growth and excess equity premium The rolling window approach dominates the xed window and continuously updating approach. Moreover, the rolling window mean corrected or mean and scale corrected simple average actually yield more accurate forecasts than all the forecast combination methods. This is probably because in addition to the fact that our forecast models are likely to be misspeci edandmayatbestprovideareasonable"local"forecasts, therewerestruc- turalbreaksduringthistime. Inaworldwheretherecouldbestructuralchanges,the issue of optimally combining forecasts is an ill-posed question. The relative accuracy of a predictive model in one year does not make it any more likely it will yield more accurate prediction in the following years. 75 Figure17: ActualandPredictedExcessEquityPremiumonS&P500underrecursive framework Figure 18: Actual and Predicted Excess Equity Premium on S&P500 under rolling forecasting framework 76 Table 34: MSPE of Predicting Excess Equity Premium on SP500 Fixed Continuously Updating Rolling SA(M) 326.7875 275.3401 282.3946 AIC 477.9411 342.124 325.7437 AICC 477.9411 342.0089 326.7193 BIC 477.9411 343.293 322.2143 MAAIC 489.1148 340.7402 325.2644 MAAICC 488.1087 340.7402 325.2644 MABIC 474.4861 340.7402 325.2644 SA(N) 409.4094 302.4649 299.7352 BG 509.4816 304.3279 228.2853 VC 336.6183 248.2815 331.6499 GR1 336.6183 248.2815 331.6499 GR2 353.9732 258.4531 272.8282 GR3 1453.659 280.5116 228.5891 MSCSA 627.5298 267.2967 208.487 MCSA 1043.044 325.0013 229.8807 77 Chapter 6: Conclusion In this thesis, a new panel data methodology is proposed in Chapter 2. The es- timator of the time-varying counterfactual is shown to be consistent. The treatment e¤ect which is de ned as the di¤erence between the observed data and the estimated counterfactual is shown to be consistent and asymptotically normal. This methodol- ogy is applied to study the e¤ect of the handover of sovereignty of Hong Kong from Britain to China in 1997 and the implementation of CEPA in 2004 on Hong Kongs economy. Results show that there is not much signi cant e¤ect for the handover. However, the CEPA agreement which is comprised of the removal of trade barriers between Hong Kong and China and the Individual Visitor Scheme has raised the Hong Kongs real GDP by about 4%. However, one may wonder if the average treat- mente¤ectwillbechangedifanotherdatasetwereobserved. Thisproblemofmodel uncertaintycanbedealtwithbyforecastcombination. Existingmethodsandseveral new combination approaches have been proposed in Chapter 4. Two applications in predicting US real GDP and equity premium on S&P 500 are used to evaluate those methods. It shows that forecast combination does reduce the mean squared forecast error. 78 References Abadie, A. and J. Gardeazabal (2003), The Economic Costs of Conict: A Case Study of the Basque Country, The American Economics Review, 93, 113-132. Abadie, A. , A. Diamond and J. Hainmueller (2007), Synthetic Control Methods for Comparative Case Studies: Estimating the E¤ect of Cali- fornias Tobacco Control Program, mimeo. Akaike, H. (1973), Information Theory and an Extension of the Maximum Likelihood Principle, in Proc. 2nd Int. Symp. Information Theory,ed. by B.N. Petrov and F. Csaki, 267-281, Budapest Akademiai Kiado. Akaike, H. (1974), A New Look at the Statistical Model Identi cation, IEEE Transactions on Automatic Control AC19, 716-723. Amemiya, T. (1985), Advanced Econometrics, Cambridge: Harvard University Press. Anderson, T.W. (2003), An Introduction to Multivariate Statistical Analysis, John Wiley & Sons, Inc. Ashenfelter, Orley. (1978), Estimating the E¤ect of Training Programs on Earnings,Review of Economics and Statistics, 6(1): 4757. Bai, J. (2003), Inferential Theory for Factor Models of Large Dimensions, Econometrica, 71, 135-171. Bai, J. and S. Ng (2002), Determining the Number of Factors in Approximate Factor Models, Econometrica 70, 191-221. Bates, J.M., Granger, C.M.W. (1969), "The Combination of Forecasts," Opera- tions Research Quarterly 20, 451-468. Bernanke, B. and J. Boivin (2003), Monetary Policy in a Data Rich Environ- ment, Journal of Monetary Economics, 50, 525-546. Box, G.E.P. and G.M. Jenkins (1970), Time Series Analysis, Forecasting and Control, San Francisco: Holden Day. 79 Box, G.E.P. and G. Tiao (1975), Intervention Analysis with Applications to Economic and Environmental Problems, Journal of the American Statistical Association, 70, 70-79. Bryan and Molloy (2007), "Mirror, Mirror, Whos the Best Forecaster of Them All?," Cleveland FRB report. Buckland, S.T., Burnhamn, K.P., Augustin, N.H. (1997), "Model selection: An integral part of inference," Biometrics 53, 603-618. Burnham, Kenneth P., Anderson, David R., (2002), Model Selection and Multi- model Inference: A Practical Information-Theoretic Approach, Second ed., Springer, New York. Card, D. and A.B. Krueger (1994), Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania, American Economic Review, 84, 772-793. Chong, YY and DF Hendry (1986), "Econometric Evaluation of Linear Macro- economic Models", Review of Economic Studies, 53, 671-690. Diebold, FX (1989), "Forecast Combination and Encompassing: Reconciling Two Divergent Literatures", International Journal of Forecasting, 5, 589-592. Forni, M. and L. Reichlin (1998), Lets Get Real: A Factor-Analytic App- roach to Disaggregrated Business Cycle Dynamics, Review of Economic Studies, 65, 453-473. Granger, C.W.J., Ramanathan, R., (1984). "Improved Methods of Combining Forecast Accuracy," Journal of Forecasting 19, 197-204. Gregory, A. and A. Head (1999), Fluctuations in Productivity, Investment, and the Current Account, Journal of Monetary Economics, 44, 423-452. Heckman, J.J. and V.J. Hotz (1989) Choosing Among Alternative Nonex- perimental Methods for Estimating the Impact of Social Programs: The Case of Manpower Training, Journal of the American Statistical Association, 84, 862-874. 80 Heckman, J.J. and E.J. Vytlacil (2001), Local Instrumental Variables, in Nonlinear Statistical Modeling, ed. by C. Hsiao, K. Morimune and J.L. Powell, Cambridge: Cambridge University Press, 1-46. Holland, P. W. (1986), "Statistics and Causal Inference," Journal of American Statistical Association, 81 (396): 945 - 60. Hong Kong Legislative Council Panel on Commerce and Industry (2006), Mainland and Hong Kong Closer Economic Partnership Arrangement (CEPA): Impact on the Hong Kong Economy, LC Paper No. CD(1) 1849/06-07(04). Hurvich and Tsai (1989), Regression and Time Series Model Selection in Small Samples, Biometrika 76, 2, 297-307. Hsiao, C. (2003), Analysis of Panel Data, 2nd edition, New York: Cambridge University Press. (Econometric Society Monograph no. 34). Hsiao, C. and A.K. Tahmiscioglu (1997), A Panel Analysis of Liquidity Cons- traints and Firm Investment, Journal of the American Statistical Association, 92, 455-465. Imbens, G. W., and J. D. Angrist (1994), Identi cation and Estimation of Local Average Treatment E¤ects.Econometrica, 62(2): 46775. Imbens, Guido W. and Je¤rey M. Wooldridge, (2009), "Recent Developments in the Econometrics of Program Evaluation," Journal of Economic Literature, 47(1): 5-86. LaLonde, R. J. (1986), "Evaluating the Econometric Evaluations of Training Programs with Experimental Data," American Economic Review, 76(4): 604- 620. Neo Poh Cheem (2003), Quarterly Growth Rates, Statistics Singapore News- letter, March 2003, 7-10. Newey, W.K. and K.D. West (1987), A Simple Positive Semi-De nite, Hetero- skedasticity and Autocorrelation Consistent Covariance Matrix, Econo- metrica, 55, 703-708. 81 Palm, F. C. and A. Zellner (1992), "To Combine or not to Combine? Issues of Combining Forecasts," Journal of Forecasting 11: 687-701. Pesaran, M.H. and A. Pick (2010), "Forecast Combination across Estimation Windows," forthcoming in the Journal of Business Economics and Statistics. Phillips, P.C.B. and S.N. Durlauf, (1986), Multiple Time Series with In- tegrated Processes, Review of Economic Studies, 53 (4), 473-495. Rao, C.R. (1973), Linear Statistical Inference and Its Applications, 2nd ed., New York: Wiley. Rosenbaum, P.R. and D.B. Rubin (1983), The Central Role of the Pro- pensity Score in Observational Studies for Causal E¤ects, Biometrika, 70, 41-55. Rubin, Donald B. (1974), "Estimating Causal E¤ects of Treatments in Ran- domized and Nonrandomized Studies," Journal of Educational Psychology, 66(5): 688-701. Sargent, T.J. and C.A. Sims (1977), Business Cycle Modeling without Pre- tending to Have Too Much A-Priori Economic Theory, in C. Sims et.al. eds., New Methods in Business Cycle Research, Minneapolis: Federal Reserve Bank of Minneapolis. Schwarz, G. (1978), "Estimating the Dimension of a Model," Annals of Statis- tics, 6, 461-464. Sims, C.A. (1980), Macroeconomics and Reality, Econometrica, 48, 1-48. Stock, J.H. and M.W. Watson (1989), New Indexes of Coincident and Lead- ing Economic Indicators, NBER Macroeconomics Annual, 351-393. Stock, J.H. (2002), Macroeconomic Forecasting Using Di¤usion Indexes, Journal of Business and Economic Statistics, 20, 147-162. Stock, J.H. (2005), Implications of Dynamic Factor Models for VAR Analy- sis, mimeo. 82 Stock, J.H., Watson, M.W. (2004), "Combination forecasts of output growth in a seven-country data set,".Journal of Forecasting 23, 405-430. Sung, Y.W. and K.Y. Wong (2000), Growth of Hong Kong Before and After Its Reversion to China: The China Factor, Paci c Economic Review, 5, 201- 228. Timmermann, Allan (2006), "Forecast combinations," In: Elliott, G., Granger, C.W.J., Timmermann, A. (eds.), Handbook of Economic Forecasting, Vol. 1, Elsevier, Amsterdam, pp. 135-196. Welch I and A. Goyal (2008), A Comprehensive Look at the Empirical Per- formance of Equity Premium Prediction, The Review of Financial Studies, 21 (4): 1455-1508. Wooldridge, J. (2002), Econometric analysis of cross section and panel data, Cambridge, Mass: MIT Press, 2002. 83 Appendices Appendix A In this appendix, we prove Lemma 5. Using the notation in Chapter 2.5.1, we decompose the time series of the control units X into X 1 and X 2 . Noting that a = (BB 0 +) 1 B e b 1 anda 1 = (B 1 B 0 1 + 1 ) 1 B 1 e b 1 minimize E y 0 1 Xa 0 y 0 1 Xa ; E y 0 1 X 1 a 1 0 y 0 1 X 1 a 1 respectively, where e b 0 1 = (1;b 0 1 ), and =E(" t " 0 t ) = 2 6 4 1 0 0 2 3 7 5 . Then MSE(X 1 ) = 2 1 + e b 0 1 h I K B 0 1 (B 1 B 0 1 + 1 ) 1 B 1 i e b 1 MSE(X) = 2 1 + e b 0 1 h I K B 0 (BB 0 +) 1 B i e b 1 We rst show that if m<K, then MSE(X)<MSE(X 1 ). MSE(X)<MSE(X 1 ) holds i¤ I K B 0 (BB 0 +) 1 B <I K B 0 1 (B 1 B 0 1 + 1 ) 1 B 1 () 0 B @ (B 1 B 0 1 + 1 ) 1 0 0 0 0 1 C A 0 B @ B 1 B 0 1 + 1 B 1 B 0 2 B 2 B 0 1 B 2 B 0 2 + 2 1 C A 1 < 0 () 0 B @ A J J 0 C 1 C A =Z < 0 84 Appendix A continued where A = (B 1 B 0 1 + 1 ) 1 (B 1 M 2 B 0 1 + 1 ) 1 J = (B 1 M 2 B 0 1 + 1 ) 1 B 1 B 0 2 (B 2 B 0 2 + 2 ) 1 J 0 = (B 2 B 0 2 + 2 ) 1 B 2 B 0 1 (B 1 M 2 B 0 1 + 1 ) 1 C = (B 2 B 0 2 + 2 ) 1 B 2 B 0 1 (B 1 M 2 B 0 1 + 1 ) 1 B 1 B 0 2 (B 2 B 0 2 + 2 ) 1 (B 2 B 0 2 + 2 ) 1 M 2 = I K B 0 2 (B 2 B 0 2 + 2 ) 1 B 2 Remark 1: M 2 = I K B 0 2 (B 2 B 0 2 + 2 ) 1 B 2 = I K B 0 2 D 1 2 D 1 2 B 2 I K U 0 U, where U =D 1 2 B 2 and D =B 2 B 0 2 + 2 . By Theorem A.3.5 (See P.639 of Andersen (2003)), the conditions for I K U 0 U and I L UU 0 to be positive de nite are the same. Thus, I L UU 0 =I L D 1 2 B 2 B 0 2 D 1 2 > 0 () DB 2 B 0 2 = 2 > 0 Since the last statement holds, we conclude that M 2 > 0. P 2 I K M 2 =B 0 2 (B 2 B 0 2 + 2 ) 1 B 2 > 0 Thus, (B 1 B 0 1 + 1 )(B 1 M 2 B 0 1 + 1 ) =B 1 (I K M 2 )B 0 1 =B 1 P 2 B 0 1 > 0 Then, A = (B 1 B 0 1 + 1 ) 1 (B 1 M 2 B 0 1 + 1 ) 1 < 0. 85 Appendix A continued Remark 2: C = (B 2 B 0 2 + 2 ) 1 B 2 B 0 1 (B 1 M 2 B 0 1 + 1 ) 1 B 1 B 0 2 (B 2 B 0 2 + 2 ) 1 +(B 2 B 0 2 + 2 ) 1 > 0 () B 2 B 0 1 (B 1 M 2 B 0 1 + 1 ) 1 B 1 B 0 2 (B 2 B 0 2 + 2 ) 1 +I L > 0 () B 2 B 0 1 (B 1 M 2 B 0 1 + 1 ) 1 B 1 B 0 2 +(B 2 B 0 2 + 2 )> 0 () B 2 B 0 1 ( B 1 M 2 B 0 1 + 1 ) 1 B 1 +I K B 0 2 + 2 > 0 Therefore, C is a negative de nite matrix. Remark 3: The Schur Complement (Z)=(A) =S =C(J 0 )(A) 1 (J) =(CJ 0 A 1 J) S = (B 2 B 0 2 + 2 ) 1 B 2 B 0 1 G 1 B 1 B 0 2 (B 2 B 0 2 + 2 ) 1 +(B 2 B 0 2 + 2 ) 1 +(B 2 B 0 2 + 2 ) 1 B 2 B 0 1 G 1 H 1 G 1 1 G 1 B 1 B 0 2 (B 2 B 0 2 + 2 ) 1 (B 2 B 0 2 + 2 )S(B 2 B 0 2 + 2 ) = B 2 B 0 1 G 1 B 1 B 0 2 +(B 2 B 0 2 + 2 ) +B 2 B 0 1 G 1 H 1 G 1 1 G 1 B 1 B 0 2 = B 2 B 0 1 G 1 B 1 B 0 2 +(B 2 B 0 2 + 2 ) +B 2 B 0 1 G H 1 G 1 G 1 B 1 B 0 2 = B 2 B 0 1 G 1 B 1 B 0 2 +(B 2 B 0 2 + 2 ) +B 2 B 0 1 GH 1 GG 1 B 1 B 0 2 = B 2 B 0 1 h G 1 GH 1 GG 1 i B 1 B 0 2 +(B 2 B 0 2 + 2 ) 86 Appendix A continued where G = B 1 M 2 B 0 1 + 1 and H = B 1 B 0 1 + 1 . To see if RHS > 0, we need to check G 1 (GH 1 GG) 1 > 0. Note that GH 1 GG =G(H 1 GI K )< 0 because: H 1 G = (B 1 I K B 0 1 + 1 ) 1 (B 1 M 2 B 0 1 + 1 )<I K since I K M 2 > 0. Then, G 1 (GH 1 GG) 1 > 0 () G 1 > (GH 1 GG) 1 = (H 1 GI K ) 1 G 1 () I K > (H 1 GI K ) 1 which is always true. Therefore, S > 0. By the positivity of Schur Complement,Z > 0, or Z < 0. We now show that given the optimal choice of m cross-sectional units, any addi- tional cross-sectional units yield no predictive power. Rewrite MSE(Y) =E y 0 1 X 1 a 1 X 2 a 2 0 y 0 1 X 1 a 1 X 2 a 2 (83) Minimizing (83) yields a 1 = (B 1 B 0 1 + 1 ) 1 B 1 e b 1 B 1 B 0 2 a 2 ; (84) and a 2 = B 2 B 0 2 + 2 1 B 2 e b 1 B 2 B 0 1 a 1 : (85) 87 Appendix A continued Substituting (84) into (85) yields h (B 2 B 0 2 + 2 )B 2 B 0 1 (B 1 B 0 1 + 1 ) 1 B 1 i a 2 = B 2 h I K B 0 1 (B 1 B 0 1 + 1 ) 1 B 1 i e b 1 (86) = B 2 B 1 1 1 B 0 1 +I K 1 e b 1 Thereforea 2 equals to zero i¤the right hand side of (86) is equal to zero. a 1 = (B 1 B 0 1 + 1 ) 1 (B 1 B 1 B 0 2 a 2 ) (87) 88 Appendix B ThisappendixpresentsthepredictionsofHongKongsrealeconomicgrowthrate had there been no change in sovereignty or no CEPA implementation with Mainland China using the factor approach. IC1 and IC2 are used to estimate the number of underlying common factors with maximum number of K equal to 20. Both methods give b K = 20, and therefore same predicted path. Figures 19 - 22 plot the within sampleandpost-samplepredictionsunderpoliticalandeconomicintegration. Asone can see from these gures, both the within and post-sample predictions are a lot more volatile than using the observed data. The estimated treatment e¤ects tting an AR(1) model is 4:63% and is statistically signi cant with t-statistic equal to 8:9. 89 Appendix B continued Figure19: PredictedCounterfactualforPoliticalIntegrationfrom1993Q1to1997Q2 Figure20: PredictedCounterfactualforPoliticalIntegrationfrom1997Q3to2003Q4 90 Appendix B continued Figure21: CounterfactualforEconomicIntegrationusingApproximateFactorModel Figure22: CounterfactualforEconomicIntegrationusingApproximateFactorModel
Abstract (if available)
Abstract
Many empirical literature in economics, social sciences, and medical treatment studies the causal effects of programs, polices or drug effects. In the economic context, the major focus on program evaluation literature is to measure the impact of a particular treatment on a set of individuals, regions, or countries that exposed to such a treatment. It is of particular importance to policy makers, medical practitioners and others. In order to evaluate the effect of the treatment, Rubin (1974) proposed the interpretation of causal statements as comparisons of the so-called potential outcomes, which is de fined as a pair of outcomes associated to a particular individual given different levels of exposure to the treatment with only one of the outcomes observed by researchers. Models are developed for this pair of potential outcomes one for the treated state, another for the control state. In this thesis, a panel data methodology is proposed under the potential outcomes framework to measure the average treatment effect. This methodology is applied to measure the impact of two major Hong Kong polices on the Hong Kong s' economy. Due to the model uncertainty involved in forming the counterfactual, forecast combination methods are proposed as a way to reduce this uncertainty. Various existing methods together with some new proposed methods are evaluated in a small scale Monte Carlo studies and two applications.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Essays on the econometrics of program evaluation
PDF
Essays on econometrics
PDF
Essays on child labor and poverty in the context of a conditional cash transfer program in Nicaragua
PDF
Essays on economic modeling: spatial-temporal extensions and verification
PDF
Three essays on the credit growth and banking structure of central and eastern European countries
PDF
Essays on labor and development economics
PDF
Essays on high-dimensional econometric models
PDF
Essays on political economy of privatization
PDF
Essays on commodity futures and volatility
PDF
Essays on the properties of financial analysts' forecasts
PDF
Essays on econometrics analysis of panel data models
PDF
The impact of agglomeration policy on CO₂ emissions: an empirical study using China’s manufacturing data
PDF
Assessment of the impact of second-generation antipscyhotics in Medi-Cal patients with bipolar disorder using panel data fixed effect models
PDF
Essays on causal inference
PDF
Essays on the economics of radio spectrum
PDF
Essays on the economics of subjective well-being in transition countries
PDF
An empirical analysis of the quality of primary education across countries and over time
PDF
Essays on the estimation and inference of heterogeneous treatment effects
PDF
Maternal full-time employment and childhood obesity
PDF
Essays on the econometric analysis of cross-sectional dependence
Asset Metadata
Creator
Wan, Shui-Ki
(author)
Core Title
Essays on the econometrics of program evaluation
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Economics
Publication Date
05/25/2010
Defense Date
04/06/2010
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
forecast combination,OAI-PMH Harvest,program evaluation
Place Name
administrative areas: Hong Kong
(geographic subject),
China
(countries),
USA
(countries)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Hsiao, Cheng (
committee chair
), Dekle, Robert (
committee member
), Luo, Lan (
committee member
)
Creator Email
shuikiwan@gmail.com,shuiwan@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m3096
Unique identifier
UC1445935
Identifier
etd-Wan-3794 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-331132 (legacy record id),usctheses-m3096 (legacy record id)
Legacy Identifier
etd-Wan-3794.pdf
Dmrecord
331132
Document Type
Dissertation
Rights
Wan, Shui-Ki
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
forecast combination
program evaluation