Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Three essays on econometrics
(USC Thesis Other)
Three essays on econometrics
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Three Essays on Econometrics by Brian Finley A Dissertation Presented to the F ACUL TY OF THE USC GRADUA TE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial F ulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ECONOMICS) May 2021 Cop yrigh t 2021 Brian Finley F or Muppin. ii Acknowledgments I am profoundly indebted and grateful to a long list of p eople without whose help, I w ould nev er ha v e b een able to b egin or nish this PhD. Clearly rst in the list is m y advisor, M. Hashem P esaran, who w as alw a ys happ y to tak e as m uc h time as needed to discuss m y latest drafts and half-bak ed ideas, while still allo wing me the freedom I needed to pursue m y (admittedly meandering) researc h agenda. Arie Kapteyn w as extremely generous with his time and funding in letting me sp end four y ears researc h assisting at the Cen ter for Economic and So cial Researc h. A v e CESR! It has b een a pleasure w orking with all m y collab orators and friends at and asso ciated with CESR o v er the y ears, including (in no particular order) Arie, Marco Angrisani, A driaan Kalwij, Anna Saa v edra, P eter Levine, Ying Liu, Jill Darling, Erik Meijer, Joanne Y o ong, Am y Mahler, Andreas Aristidou, Jillian W allace, Bry an T ysinger, Raquel F onseca, Drystan Phillips, Mic hael Moldo, Sw aro op Samek, F rancisco P erez-Arce, Arth ur Stone, and Maria Jose Prados. Ap ologies to an y one I missed! I learned a lot w orking with y’all and I lo v ed doing it. Geert Ridder oered me substan tial help and advice on sev eral of m y pro jects and I appreciate his taking the time to serv e on m y dissertation and qualifying exam committees as w ell as to write me a recommendation letter. Gourab Mukherjee and Jinc hi Lv w ere extremely generous with their time, resp ectiv ely serving as the extra-departmen tal mem b ers of m y dissertation and qualifying exam committees, despite m y b eing essen tially a stranger when I rst requested their presence. Mic hael Leung w as similarly kind, coming from within the departmen t. Sev eral of m y pro jects ha v e also b eneted from helpful input b y a n um b er of seminar sp eak ers visiting USC: Marcelo Moreira, Sh u Shen, Alwyn Y oung, Man uel Arellano, Isiah Andrews, and JosØ Luis Mon tiel Olea. Both early in m y PhD when I w as con templating a sp ecialization in macro and later when w e found a shared in terest in horse racing, Caroline Betts has b een a fan tastic collab orator and friend. While none of our racing-related collab orations ha v e come to fruition y et, I lo ok forw ard to our ev en tual top 5(s). Matt Kahn also deserv es a sp ecial thanks for some critical con v ersations in whic h he basically supplied m y (wildly successful) job mark et strategy . His p ersp ectiv e as an applied researc her and an administrator has b een in v aluable o v er the last y ear, and I doubt m y career w ould lo ok nearly as brigh t without his input. Of course, I also w ould nev er ha v e made it here without the help, supp ort, and commiseration of all m y fello w PhD studen ts and p ostdo cs. Among man y others, I w an t to thank Mahrad Sharifv aghe, Jeongh w an Y un, Jisu Cao, Eunjee K w on, Simon Reese, Andrea No cera, F abrizio Piasini (y ou should ha v e b een here), Mallory Mon tgomery , Urv ashi Jain, Ida Johnsson, Jorge T arraso Argomedo, Andreas Aristidou, Bora Kim, Grigory F ranguridi, Jingb o W ang, Yiw ei Qian, Rac hel Lee, Ruozi Song, Ra jat K o c har, Juan Espinosa T orres, Rashad Ahmed, and T al Roitb erg. iii The administrativ e sta in the departmen t ha v e also b een fan tastic and remark ably patien t with m y rep eated o v ersigh ts and tardiness in form-lling and deadline-meeting. My eternal thanks to Y oung Miller, Morgan P onder, Alexander Karnazes, and Akik o Matsukiy o. I also w an t to rep eat m y heartfelt thanks to ev ery one who made it p ossible for me to ev en get in to a PhD program. My undergraduate advisor, Junfu Zhang, w as ob viously critical on this coun t, and I con tin ue to marv el at the amoun t of time he to ok in sup ervising m y undergraduate thesis and listening to me hold forth ab out the economics of happiness. Lac king the mathematical bac kground to go straigh t to a PhD from undergrad, I w ork ed b y da y at W a yfair while, b y nigh t, I learned to coun t at the Harv ard Extension Sc ho ol. Here, I o w e a deep thanks to P aul Bam b erg, for adding adv anced math courses to the Extension Sc ho ol’s catalogue, for teac hing t w o of them, and for writing me a letter of recommendation. A t W a yfair, to o, I o w e thanks to m y managers Da v e Drollette and John Kim for their dedication to w ork-life balance, for k eeping me on despite kno wing m y career plans la y elsewhere, for letting me switc h to part-time w ork to tak e more Extension Sc ho ol classes, and for giving me the exibilit y to w ork on pro jects where I could learn statistical theory that I am still b eneting from. Finally , at W a yfair, I o w e a sp ecial thanks to the economists Tilman Dette, Johann Blauth, Zhen yu Lai, and Rob ert McMillan for their advice and help with the PhD application pro cess, as w ell as for Rob ert’s letter of recommendation. Finally , of course, there’s no doing an ything without m y family . T o Steph, Muppin, Pic kle, Mort, and Garc: tee p o. Lo v e y’all. iv T able of Contents Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii A c kno wledgmen ts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of T ables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Chapter 1: Can In ternet Matc h High-Qualit y T raditional Surv eys? Comparing the Health and Retiremen t Study and its Online V ersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 In tro duction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Metho ds and Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 3 HRS and UAS Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3.1 The Health and Retiremen t Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3.2 The Understanding America Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.3 UAS Sampling and W eigh ting Pro cedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 4 Comparing So cio economic V ariables in the HRS, UAS, and CPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.1 Represen tativ eness of the HRS and UAS Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 5 Comparing Surv ey Outcomes in the HRS and UAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 5.1 Mo de Eects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Chapter 2: T esting for W eak-Instrumen t Bias in the Just-Iden tied 2SLS . . . . . . . . . . . . . . . . . . . . 24 7 In tro duction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 7.1 Relation to the Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 8 The Mo del . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 9 Bounds and Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 10 Bounds in Empirical W ork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 11 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Chapter 3: A v erage P artial Eects in Short-T P anels with Correlated Random Co ecien ts 34 12 In tro duction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 12.1 Relation to the Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 13 The Mo del . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 13.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 14 Iden tication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 14.1 Noniden tication in the General Case Where T <K . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 14.2 Dep endence Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 14.2.1 Motiv ation: Essen tial Heterogeneit y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 14.3 The Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 14.3.1 Lemma 2 - Ceteris P aribus V ariation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 14.3.2 Corollaries to Lemma 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 14.3.3 Lemmas 2 and 3 - Con tin uit y and Extrap olation from Dep endence Restrictions . . . . . . . . . . . 51 14.3.4 Iden tication With F unctionally Dep enden t Regressors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 14.4 Iden tication Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 14.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 14.5.1 Irregular Iden tication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 15 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 16 Empirical Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 17 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 App endix to Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 v App endix to Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 App endix to Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 vi List of T ables 1 Comparison of Demographics A cross Surv eys . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2 Home Ownership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3 Health Insurance Co v erage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4 Whether Retired . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 5 Individual Earnings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 6 Self-Rep orted Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 7 Ho w Satised with Life . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 8 Critical V alues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 9 Condence In terv als at F = 10 (|t-stat|=3:162) . . . . . . . . . . . . . . . . . . . . . . . . . . 31 10 Estimated A v erage Eects of 60 Hours of T raining in Subp opulation with Iden tied T raining CAPEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 11 Self-rep orted Health With CPS Limited to HH Resp onden ts . . . . . . . . . . . . . . . . . . . 91 12 Home Ownership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 13 Health Insurance Co v erage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 14 Self-rep orted Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 15 Ho w Satised with Life . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 vii List of Figures 1 Predicted Mean Health Status b y Age and Surv ey Mo de . . . . . . . . . . . . . . . . . . . . . 18 2 (a) Predicted Probabilit y of Cho osing the First Option (Excellen t Health) b y Age and Surv ey Mo de and (b) Predicted Probabilit y of Cho osing the Last Option (P o or Health) b y Age and Surv ey Mo de. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3 Predicted Mean Life Satisfaction b y Age and Surv ey Mo de . . . . . . . . . . . . . . . . . . . 21 4 (a) Predicted Probabilit y of Cho osing the First Option (Completely Satised) b y Age and Surv ey Mo de and (b) Predicted Probabilit y of Cho osing the Last Option (Not at All Satised) b y Age and Surv ey Mo de. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5 Condence in terv als for w orst-case P [ ^ <] with asymptotic co v erage probabilit y in sp ec- ications collected b y Andrews et al. (2019). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 6 Predicted Mean Health Status b y Age and Surv ey Mo de, and CPS HH Resp onden t Status . . 92 7 Predicted Probabilit y of Cho osing the First Option (Excellen t Health) b y Age, Surv ey Mo de, and CPS HH Resp onden t Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 8 Predicted Probabilit y of Cho osing the Last Option (P o or Health) b y Age, Surv ey Mo de, and CPS HH Resp onden t Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 viii Abstract This dissertation compiles three essa ys on applied and theoretical econometrics. Chapter 1, coauthored with Marco Angrisani and Arie Kapteyn, examines sample c haracteristics and elicited surv ey measures of t w o studies, the Health and Retiremen t Study (HRS), where in terviews are done either in p erson or b y phone, and the Understanding America Study (UAS), where surv eys are completed online and a replica of the HRS core questionnaire is administered. By considering v ariables in v arious domains, our in v estigation pro vides a comprehensiv e assessmen t of ho w In ternet data collection compares to more traditional in terview mo des. W e do cumen t clear demographic dierences b et w een the UAS and HRS samples in terms of age and education. Y et, sample w eigh ts correct for these discrepancies and allo w one to satisfactorily matc h p opulation b enc hmarks as far as k ey so cio demographic v ariables are concerned. Comparison of a v ariet y of surv ey outcomes with p opulation targets sho ws a strikingly go o d t for b oth the HRS and the UAS. Outcome distributions in the HRS are only marginally closer to p opulation targets than outcome distributions in the UAS. These patterns arise regardless of whic h v ariables are used to construct p ost-stratication w eigh ts in the UAS, conrming the robustness of these results. W e nd little evidence of mo de eects when comparing the sub jectiv e measures of self-rep orted health and life satisfaction across in terview mo des. Sp ecically , w e do not observ e v ery clear primacy or recency eects for either health or life satisfaction. W e do observ e a signican t so cial desirabilit y eect, driv en b y the presence of an in terview er, as far as life satisfaction is concerned. By and large, our results suggest that In ternet surv eys can matc h high-qualit y traditional surv eys. Chapter 2 prop oses a test and condence pro cedure to gauge the p ossible impact of w eak instrumen ts in the linear mo del with one excluded instrumen t and one endogenous regressor, the mo del t ypically used with instrumen tal v ariables in applied w ork. Where ^ is the t w o-stage least squares estimator of the endogenous regressor’s co ecien t, , w e p erform inference on w orst-case asymptotic v alues of P [ ^ <]. The deviation of P [ ^ <] from:5 can b e in tuitiv ely read as a deviation from median un biasedness, pro viding an in terpretable bias test for the just-iden tied mo del, where the mean bias E[ ^ ] is undened. These inference pro cedures can easily b e made robust to error heterosk edasticit y and dep endence suc h as clustering and serial correlation. Chapter 3 studies the linear panel data mo del with correlated random co ecien ts and few er time p erio ds than regressors. Under unrestricted co ecien t heterogeneit y , a v erage partial eects (APEs) are not iden tied in this mo del. W e iden tify APEs b y in tro ducing exclusion restrictions that restrict eac h co ecien t’s mean, conditional on the regressors, to b e a function of a co ecien t-sp ecic subset of the regressors. These subsets can b e dened completely indep enden tly for eac h co ecien t, extending past results that assumed eac h co ef- cien t’s mean to dep end either on all or on no regressors. W e dev elop an in tuitiv e, generally applicable pro of ix strategy for nding (sub)p opulations whose APEs are iden tied under a giv en set of exclusion restrictions and apply it to a n um b er of examples to sho w that iden tication is p ossible under m uc h w eak er restrictions than previously studied. W e dev elop inference and estimation results for APEs in a set of tractable mo dels and, in more complex cases, for analogous pseudo-parameters from exible parametric mo dels informed b y our nonparametric iden tication results. T o illustrate the metho ds, w e pro vide an empirical application estimating the returns to job training. x Chapter 1: Can Internet Match High-Quality T raditional Surveys? Comparing the Health and Retirement Study and its Online V ersion 1 1 Introduction The collection of high-qualit y data on households and individuals tends to b e lab or in tensiv e, costly and slo w. When adopting traditional surv ey mo des lik e face-to-face or telephone in terviewing, t ypically sev eral y ears elapse from the momen t a surv ey is designed to nal data a v ailabilit y . The In ternet, with its promise of real-time results and lo oming ubiquit y , pro vides a tempting alternativ e for faster and more cost-eectiv e data collection. Online surv eys, ho w ev er, dier from more traditional surv eys in sev eral resp ects whic h ma y aect b oth sample represen tativ eness and data qualit y . First, In ternet co v erage is still not en tirely p erv asiv e, esp ecially among more economically disadv an taged groups and the elderly . Data from the P ew Researc h Cen ter sho w ed that only 51% of Americans aged 65 or older had a home broadband connection in 2016, while the fraction of home broadband o wners w as ab out 77% among 1829 y ear olds. Lik ewise, home broadband co v erage w as only 53% for Americans with incomes of less than $30,000 and 93% for those with incomes of $75,000 or greater. 2 As a result, the represen tativ eness of online surv eys ma y b e jeopardized (Sc honlau et al., 2009). T elephone surv eys, ho w ev er, face similar diculties with the widespread adoption of v oice mail and cell phones (Blum b erg et al., 2004). Second, ev en with complete co v erage of the p opulation, individual c haracteristics are b ound to inuence the lik eliho o d of completing an online surv ey v ersus a face-to-face or phone surv ey , thereb y in tro ducing relev an t selectivit y issues and nonresp onse biases that ma y v ary b y in terviewmo de (Coup er, 2011). Third, mo de eects need to b e considered, as the same question ma y b e answ ered dieren tly in p erson, b y phone or o v er the In ternet (Sc h w arz and Sudman, 1992). F ace-to-face and phone in terviews lea v e more ro om for clarication and oer more con trol of who is actually answ ering the questionnaire. On the other hand, W eb surv eys oer more priv acy and could thereb y encourage more accurate and honest rep orting on p ersonal and sensitiv e matters, while the presence of an in terview er in face-to-face and telephone in terviews ma y induce in terview er eects. Chang and Krosnic k (2009) compare sample represen tativ eness and data qualit y of In ternet-based surv eys and phone-based surv eys. They conclude that as long as In ternet data are collected from a probabilit y-based 1 This is join t w ork, coauthored with Marco Angrisani and Arie Kapteyn. 2 The fact sheet can b e found at h ttp://www.p ewin ternet.org/fact-sheet/in ternetbroadband/. 1 sample, these exhibit higher accuracy than data collected b y phone. Their study , ho w ev er, is limited to a rather sp ecic topic, namely p olitics. In view of the existing literature, the con tribution of this pap er is t w ofold. First, w e revisit the question of comparabilit y of online and more traditional in terview mo des b y studying dierences across In ternet-based, face-to-face and phone-based surv eys. Second, w e fo cus on a div erse set of outcomes, ranging from home o wnership and lab or force status to self-rep orted health and life satisfaction. The aforemen tioned sources of dierences b et w een W eb surv eys and face-to-face or telephone surv eys ma y aect eac h of these outcomes in dieren t w a ys. Th us, b y considering v ariables in v arious domains, our in v estigation pro vides a more robust and comprehensiv e assessmen t of ho w In ternet data collection compares to more traditional in terview mo des. Moreo v er, while our analysis is p erformed at a time when In ternet co v erage has increased substan tially in the p opulation, w e fo cus (b ecause of data a v ailabilit y and comparabilit y issues) on the subgroup of individuals aged of 55 and older. Within this segmen t of the p opulation, barriers to adoption of new tec hnology ma y still imply signican t selectivit y issues and limit study generalizabilit y , as recen tly p oin ted out b y Remillard et al. (2014). The exten t to whic h In ternet data are comparable to data collected with more traditional in terview mo des is then of particular scien tic in terest in researc h concerned with this subp opulation. 2 Methods and Outline W e consider and compare sample c haracteristics and elicited surv ey measures of t w o studies, the Health and Retiremen t Study (HRS), where in terviews are done either in p erson or b y phone, and the Understanding America Study (UAS), where surv eys are completed online. Both the UAS and HRS main tain a panel of American households, but while the HRS fo cuses for the most part on its core questionnaire issued ev ery t w o y ears, the UAS issues a wide v ariet y of surv eys to its panel. Included among the surv eys administered in the UAS is a replica of the HRS core questionnaire (with some adaptation to accommo date dierences in format b et w een v erbal and self-administered in terviews). Th us, b y examining resp onses to this questionnaire in the t w o studies, w e can in v estigate not only dierences in sample comp osition across these t w o studies but also dierences in surv ey outcomes p oten tially stemming from dieren t in terview mo des. In what follo ws, w e will refer to the HRS core questionnaire simply as the HRS, and similarly refer to its replica in the UAS. Whenev er p ossible, w e con trast HRS and UAS surv ey outcomes with comparable measures in the Curren t P opulation Surv ey (CPS), whic h w e view as p opulation b enc hmarks. The goal is to assess the exten t to whic h HRS and UAS surv ey outcomes matc h p opulation b enc hmarks and iden tify p ossible c hannels whic h observ ed discrepancies ma y stem from. Section 3 briey sk etc hes the general features of the HRS and the UAS panels. Lik e other probabilit y- 2 based In ternet surv eys, the UAS suers from lo w recruitmen t rates. Hence, w eigh ting b ecomes critical to matc hing the underlying p opulation c haracteristics. Section 3.3, therefore, go es in to detail ab out the sampling and w eigh ting pro cedures adopted b y the UAS. Section 4 follo ws with a comparison of surv eys’ basic demographics to eac h other and to the reference p opulation, as represen ted b y the CPS. Section 5 p erforms similar comparisons fo cusing on surv ey outcomes suc h as home o wnership, health insurance co v erage and lab or force status, as w ell as self-rep orted health and sub jectiv e w ell-b eing. By using a div erse arra y of measures, for whic h biases induced b y represen tativ eness issues and surv ey mo des ma y dier, w e aim to pro vide a fairly complete picture of the surv eys’ abilit y to matc h eac h other and the reference p opulation. This analysis also compares dieren t w eigh ting pro cedures for the UAS to examine the imp ortance of the c hoice of p ost-stratication v ariables on the qualit y of w eigh ting. Section 5.1 fo cuses on the comparison of mo de eects for t w o sub jectiv e v ariables self-rep orted health and satisfaction with life. Section 6 concludes. 3 HRS and UAS Descriptions 3.1 The Health and Retiremen t Study The HRS is a m ultipurp ose, longitudinal household surv ey represen ting the US p opulation o v er the age of 50. Since 1992, the HRS has surv ey ed age-eligible resp onden ts and their sp ouses ev ery t w o y ears to trac k transitions from w ork in to retiremen t, to measure economic w ellb eing in later life and to monitor c hanges in health status as individuals age. Starting in 2006, study participan ts 80 y ears or y ounger ha v e b een randomly assigned to either a phone or an enhanced face-to-face in terview. In the former case, the HRS questionnaire is administered via computer-assisted telephone in terviewing. In the latter case, the questionnaire is administered in p erson b y an in terview er using computer-assisted p ersonal in terviewing tec hnology and is complemen ted with a set of ph ysical p erformance measures, collection of biomark ers and a surv ey on psyc hoso cial topics. Resp onden ts o v er the age of 80 are only in terview ed face to face. Initially , the HRS consisted of individuals b orn b et w een 1931 and 1941 and their sp ouses, but additional cohorts ha v e b een added in 1998, 2004 and 2010, the y oungest cohort to date comprising individuals b orn b et w een 1954 and 1959. Once added, a cohort is indenitely administered the HRS questionnaire on the same t w o-y ear cycle as previously existing panel mem b ers. Because of refresher samples o v er the y ears, the HRS is represen tativ e of households in whic h at least one mem b er is 51 y ears old in 1998, 2004 and 2010, when new cohorts w ere added to the surv ey . In 2000, 2006 and 2012, the HRS represen ts households with mem b ers 53 or older; in 2002, 2008 and 2014, it represen ts households with mem b ers 55 and older. W e use the 2014 w a v e of the HRS and rely on the RAND v ersion of the data, a large user-friendly subset 3 of the HRS that com bines data from all w a v es, adds information that ma y ha v e b een pro vided b y the sp ouse to the resp onden t’s record and has consisten t imputation of nancial v ariables. As men tioned ab o v e, the 2014 HRS w a v e is represen tativ e of individuals aged 55 or older. A ccordingly , w e will select only individuals who are 55 or older in b oth the CPS and the UAS to pro ceed with our comparison exercise. F or HRS w a v es 19922004, the CPS w as used to establish p opulation b enc hmarks for the p ost-stratication of sample w eigh ts. Starting from 2006, ho w ev er, the American Comm unit y Surv ey (A CS) has serv ed as the basis for p ost-stratication. Hence, p opulation targets for the 2014 HRS w a v e, whic h is used in this study , ha v e b een computed o the A CS. P ost-stratication in the HRS is based on gender, age, race/ethnicit y (Hispanic, Blac k non-Hispanic, other non-Hispanic) and geograph y (Metrop olitan Statistical Area (MSA) and non-MSA coun ties). 3.2 The Understanding America Study The UAS is a nationally represen tativ e In ternet panel of appro ximately 6,000 resp onden ts. It b egan in 2014 and is managed b y the Cen ter for Economic and So cial Researc h at the Univ ersit y of Southern California. The UAS is based on a probabilit y sample dra wn from the US p opulation aged 18 and older. P anel mem b ers are selected through address-based sampling. After joining the panel, individuals are in vited to tak e, on a v erage, t w o surv eys eac h mon th. In vitations to panel mem b ers to tak e surv eys are sen t b y email and surv eys are answ ered online. Resp onden ts are t ypically comp ensated $20 for a 30-min ute surv ey . Individuals who do not ha v e In ternet access are pro vided with b oth access and a tablet for completing surv eys. This is a v ery imp ortan t feature of the recruitmen t pro cedure, extending co v erage to groups that w ould otherwise not b e reac hed. The UAS has an estimated recruitmen t rate of 15 to 20% whic h is comparable to or sligh tly higher than those of other probabilit y-based In ternet panels lik e the GfK Kno wledgeP anel or the RAND American Life P anel. Suc h lo w recruitmen t rates ha v e led some researc hers to argue that there is little practical dierence b et w een opting out of a probabilit y sample and opting in to a nonprobabilit y (con v enience) In ternet panel (Riv ers, 2013). In general, probabilit y-based panels tend to b etter represen t the underlying p opulation in terms of demographic c haracteristics (Chang and Krosnic k, 2009).Y et, nonprobabilit y In ternet panels ma y still b e used as a basis of p opulation norms as long as the data can b e appropriately w eigh ted to comp ensate for co v erage errors and selection bias. Ha ys et al. (2015) conclude that No hard-and-fast rules determine when con v enience panels are adequate for use in p opulation inference or when resp onse rates to probabilit y In ternet panels will b e high enough to assume un biased estimates. Our study do es not try to address this issue: w e will not compare the UAS a probabilit y-based In ternet panel to a con v enience In ternet panel. 4 Ho w ev er, it is imp ortan t to remem b er that the UAS shares lo w recruitmen t rates with other probabilit y-based In ternet panels. Because of this, w eigh ting ma y b ecome a crucial issue when the ob jectiv e is to represen t the underlying p opulation. In view of this, the next section describ es in some detail the w eigh ting pro cedures used b y the UAS. 3 3.3 UAS Sampling and W eigh ting Procedures An imp ortan t feature of the UAS sampling pro cedure is that mem b er recruitmen t is done in batc hes and that the rst recruitmen t batc h w as sampled dieren tly from subsequen t batc hes. 4 The rst batc h w as a simple random sample of addresses from the United States P ostal Service (USPS) database. Subsequen t batc hes w ere based on the sequen tial imp ortance sampling (SIS) algorithm dev elop ed b y Meijer (2014) and Angrisani et al. (2014). 5 This is a t yp e of adaptiv e sampling (Gro v es and Heeringa, 2006; T ourangeau et al., 2016; W agner et al., 2012) that generates unequal sampling probabilities with desirable statistical prop erties. Sp ecically , b efore sampling an additional batc h, the SIS algorithm computes the un w eigh ted distributions of sp ecic demographic c haracteristics (e.g., sex, age, marital status and education) in the UAS at that p oin t in time. It then assigns to eac h zip co de a nonzero probabilit y of b eing dra wn, whic h is an increasing function of the degree of desirabilit y of the zip co de. The degree of desirabilit y is a measure of ho w m uc h, giv en its p opulation c haracteristics, a zip co de is exp ected to mo v e the curren t distributions of demographics in the UAS to w ard those of the US p opulation. F or example, if at a particular p oin t in time the UAS panel underrepresen ts females with a high sc ho ol degree, zip co des with a relativ ely high prop ortion of females with a high sc ho ol degree receiv e a higher probabilit y of b eing sampled. The SIS is implemen ted iterativ ely . That is, after selecting a zip co de, the distributions of demographics in the UAS are up dated according to the exp ected con tribution of this zip co de to w ard the panel’s represen tativ eness; up dated measures of desirabilit y are computed and new sampling probabilities for all other zip co des are dened. This pro cedure pro vides a list of zip co des to b e sampled. F or eac h zip co de in this list, addresses are then dra wn in a simple random sample from the USPS database. In the UAS, sample w eigh ts are surv ey-sp ecic. They are pro vided with eac h UAS surv ey and are mean t to mak e eac h surv ey data set represen tativ e of the reference US p opulation with resp ect to a predened set of so cio demographic v ariables. Sample w eigh ts are constructed in t w o steps. In a rst step, a base w eigh t is created to accoun t for unequal probabilities of sampling zip co des pro duced b y the SIS algorithm and to reect the probabilit y of a household b eing sampled, conditional on its zip co de b eing sampled. In a second 3 Alattar et al. (2018) oer a comprehensiv e o v erview of the UAS. 4 Details on UAS sample recruitmen t can b e found at h ttps://uasdata.usc.edu/index.php. 5 The SIS algorithm is implemen ted to recruit all UAS resp onden ts, except those b elonging to t w o sp ecial purp ose samples, namely Nativ e Americans and Los Angeles Coun t y residen ts with y oung c hildren, for whom dieren t sampling pro cedures are adopted. Because of their sp ecic sampling pro cedures, these t w o groups receiv e zero w eigh t. 5 step, nal p ost-stratication w eigh ts are generated to correct for dieren tial nonresp onse rates and to bring the nal surv ey sample in line with the reference p opulation as far as the distribution of k ey v ariables of in terest is concerned. More precisely , to compute the base w eigh t, the unit of analysis is a zip co de. A logit mo del is estimated for the probabilit y that a zip co de is sampled as a function of its c haracteristics, namely census region, urbanicit y and p opulation size, as w ell as its sex, race, age, marital status and education comp osition. Estimation is carried out on an A CS le that con tains v e-y ear a v erage c haracteristics at the zip co de lev el, with urbanicit y deriv ed from 2010 Urban Area to ZIP Co de T abulation Area Relationship File of the US 6 Census Bureau and merged to this. The outcome of this logit mo del is an estimate of the marginal probabilit y of a zip co de b eing sampled, whic h, b ecause of the implemen tation of the SIS algorithm, is not kno wn ex an te. Indicate b y w b 1 the in v erse of the logit-estimated probabilit y of sampling eac h zip co de. Next, for eac h sampled zip co de, the ratio of the n um b er of households in the zip co de to the n um b er of sampled households within the zip co de, denoted b y w b 2 is computed. F or the rst recruitmen t batc h, whic h is a simple random sample of addresses from the US p opulation and do es not use the SIS algorithm, it is assumed (without loss of generalit y) that w b 1 =w b 2 = 1 instead. The base w eigh t is a zip co de-lev el w eigh t dened as base w eigh t =w b 1 w b 2 a where a is a correction factor suc h that the sum of the base w eigh ts is equal to the n um b er of all selected households (if all of them resp ond). This n um b er is equal to the size of the rst recruitmen t batc h (10,000) and to the n um b er of sampled zip co des times 40 (the n um b er of sampled households within eac h dra wn zip co de) for all subsequen t recruitmen t batc hes. Hence, the correction factors tak e t w o v alues, one for the rst recruitmen t batc h and one for all subsequen t recruitmen t batc hes. UAS mem b ers are assigned a base w eigh t, computed as describ ed ab o v e, dep ending on the zip co de where they reside at the time of recruitmen t. The p ost-stratication w eigh ts in the second step are generated b y a raking algorithm that, starting from the base w eigh t, compares, iterativ ely adjusts, and ev en tually matc hes relativ e frequencies in the target p opulation with relativ e w eigh ted frequencies in the surv ey sample for the follo wing one and t w o-w a y marginal distributions: race, genderage, gendereducation, household sizetotal household income, census regions and urbanicit y . The b enc hmark distributions against whic h UAS surv eys are w eigh ted are deriv ed from the CPS Ann ual So cial and Economic Supplemen t administered in Marc h of eac h y ear. 6 The v ariable urbanicit y tak es three m utually exclusiv e v alues indicating whether the area of residence of a resp onden t is rural, mixed, or urban. 6 P ost-stratication w eigh ts are trimmed to limit v ariabilit y and impro v e the eciency of estimators using the w eigh ts. This is p erformed using the general w eigh t trimming and redistribution pro cedure describ ed b y V allian t et al. (2013). More precisely , indicating b y w i;raking , the raking w eigh t for resp onden t i and with w raking the sample a v erage of raking w eigh ts within the surv ey sample; the pro cedure in v olv es 1. Setting the lo w er and upp er b ounds on w eigh ts equal to L = 0:25 w raking and U = 4 w raking , resp ec- tiv ely 7 ; 2. Resetting an y w eigh ts smaller than the lo w er b ound to L and an y w eigh ts greater than the upp er b ound to U : w i;trim = 8 > > > > > > < > > > > > > : L w i;raking L w i;raking L<w i;raking <U U w i;raking U 3. Computing the amoun t of w eigh t lost b y trimming as w lost = P Nc i=1 (w i;raking w i;trim ) and distributing it ev enly among the resp onden ts whose w eigh ts are not trimmed. While raking w eigh ts can matc h p opulation distributions of selected v ariables, trimmed w eigh ts t ypically do not. Therefore, the raking algorithm and the trimming pro cedure are iterated un til p ost-stratication w eigh ts are obtained that resp ect the w eigh t b ounds and align sample and p opulation distributions of selected v ariables. 8 The nal p ost-stratication w eigh t for eac h surv ey resp onden t, w i;post , is the w eigh t generated b y applying the raking/trimming pro cedure just describ ed to the base w eigh t. 9 It should b e noted that in the UAS w eigh ting pro cedure, there is no explicit nonresp onse adjustmen t to the base w eigh ts. Rather, it is the p ost-stratication factor that is mean t to correct for dieren tial nonresp onse across surv ey in vitees. A similar approac h is adopted b y the HRS. In the HRS, p ost-stratication of base w eigh ts is p erformed in the rst w a v e to adjust for nonparticipation in the study and create a baseline w eigh t. In subsequen t w a v es, a p ost-stratication factor applied to baseline w eigh ts corrects for w a v e-sp ecic nonresp onse. Moreo v er, while the UAS uses the CPS to establish p opulation b enc hmarks for p ost-stratication, the HRS considered in this study relies on the A CS for p oststratication. P opulation con trols for CPS w eigh ts 7 While these v alues are arbitrary , they are in line with those describ ed in the literature and follo w ed b y other surv eys (Battaglia et al., 2009). 8 A maxim um of 50 iterations are allo w ed. If an exact alignmen t resp ecting the w eigh t b ounds cannot b e ac hiev ed, the trimmed w eigh ts will ensure the exact matc h b et w een surv ey and p opulation relativ e frequencies, but ma y tak e v alues outside the in terv al dened b y the pre-sp ecied lo w er and upp er b ounds. 9 A complete description of the UAS w eigh ting pro cedure can b e found at h ttps://uasdata.usc.edu/addons/do cumen tation/UAS%20W eigh ting%20Pro cedures.p df. 7 are deriv ed from the census and the A CS. F or the purp oses of our exercise, w e will consider w eigh ted CPS measures as p opulation targets to whic h UAS and HRS surv ey outcomes are compared. Owing to the CPS’s close corresp ondence with the A CS, w e do not exp ect that this should particularly fa v or one surv ey o v er the other. 4 Comparing Socioeconomic V ariables in the HRS, UAS, and CPS In this section, w e compare the distributions of k ey demographic v ariables in the HRS and in the UAS with those in the CPS. This will allo w us to gauge the represen tativ eness of these t w o studies relativ e to the reference p opulation. T o ensure full comparabilit y of the underlying samples across these studies, w e tak e the follo wing t w o sample selection steps. First, throughout our analysis, w e only consider resp onden ts aged at least 55 in all surv eys. Individuals aged 55 or older constitute the age group whic h the 2014 HRS w a v e is represen tativ e of, and, consequen tly , those who receiv e a nonzero sample w eigh t in the 2014 HRS w a v e. Second, w e drop HRS resp onden ts living in n ursing homes, as the UAS and CPS do not sample from this p opulation. 4.1 Representativ eness of the HRS and UAS Samples T able 1 sho ws the distributions of basic demographic v ariables for the un w eigh ted and w eigh ted HRS and UAS samples. The rst column rep orts the target distributions in the US p opulation of individuals aged 55 and older. These are computed using the 2015 CPS and its pro vided sample w eigh ts. As men tioned ab o v e, while the UAS relies on the CPS to obtain p opulation b enc hmarks for p ost-stratication, the HRS uses the A CS. Y et, since the CPS itself w eigh ts to matc h the census and the A CS, it is plausible to assume that b oth the UAS and HRS are w eigh ted to align their samples to essen tially the same p opulation. The c hoice of referring to the y ear 2015 for p opulation b enc hmarks is due to the fact that our comparison exercise uses data from the 2014 HRS w a v e and from the rst HRS w a v e in the UAS, whic h, although based on the 2014 HRS questionnaire, w as completed b et w een the y ears 2015 and 2017. 10 It is w orth restating that the UAS nal w eigh ts allo w sample distributions to matc h the p opulation distributions of gender, race/ethnicit y , age, education, household income, household comp osition and lo cation (i.e., census region and urbanicit y). The HRS adopts a more parsimonious mo del, where nal w eigh ts align the sample to the p opulation along the dimensions of gender, age of resp onden t and sp ouse/partner, race/ethnicit y and geograph y . When comparing demographic distributions, w e use a similar sp ecication for the UAS, excluding education and income from the set of raking factors. In this case, w eigh t v ariabilit y is 10 Referring to the y ears 2014 and 2016 for p opulation b enc hmarks do es not c hange the results of the analysis. 8 T able 1: Comparison of Demographics A cross Surv eys CPS HRS UAS Un w eigh ted W eigh ted Un w eigh ted W eigh ted Gender Male 0.462 0.425 0.461 0.491 0.462 F emale 0.538 0.575 0.539 0.509 0.538 Me an abs. di - 0.036 0.000 0.030 0.000 Race/Ethnicit y White 0.751 0.641 0.777 0.835 0.751 Blac k 0.098 0.191 0.100 0.067 0.098 Other 0.060 0.032 0.035 0.060 0.060 Hispanic 0.090 0.136 0.088 0.038 0.090 Me an abs. di - 0.069 0.013 0.042 0.000 Age 55-64 0.466 0.405 0.467 0.571 0.466 65-74 0.313 0.281 0.309 0.327 0.313 75-84 0.156 0.236 0.161 0.087 0.156 85+ 0.065 0.078 0.064 0.015 0.065 Me an abs. di - 0.046 0.003 0.059 0.000 Education HS or less 0.461 0.521 0.453 0.267 0.252 Some college 0.163 0.191 0.194 0.255 0.240 Asso c. coll. degree 0.088 0.060 0.067 0.131 0.107 Bac helor 0.171 0.140 0.171 0.191 0.232 P ostgrad 0.117 0.089 0.115 0.156 0.169 Me an abs. di - 0.035 0.013 0.078 0.084 Household Income <$30k 0.306 0.388 0.313 0.287 0.268 [$30k, $60k) 0.301 0.262 0.248 0.299 0.284 [$60k, $100k) 0.205 0.169 0.189 0.243 0.255 $100k+ 0.189 0.181 0.250 0.171 0.194 Me an abs. di - 0.041 0.034 0.019 0.027 N 37,795 16,751 16,751 1,852 1,852 9 sligh tly lo w er and the eectiv e sample size is higher in the HRS than in the UAS. Sp ecically , the standard deviation of relativ e nal w eigh ts is 0.8 in the HRS and 1.1 in the UAS. Eectiv e sample sizes are 61% and 46%, resp ectiv ely . F or eac h demographic v ariable, w e rep ort the mean absolute dierence b et w een the cells for the CPS and those for the HRS and UAS. W e treat this as a summary statistic of the distance b et w een HRS and UAS sample distributions from their p opulation coun terparts. Within the p opulation aged 55 and older, the fraction of females is 54%. Before sample w eigh ts are applied, the HRS o v errepresen ts female resp onden ts, while the UAS underrepresen ts them. The former, ho w- ev er, exhibits relativ ely smaller discrepancy with the b enc hmark gender prop ortions. The HRS o v ersamples minorit y groups b y design, whic h is reected in the dierence b et w een the un w eigh ted HRS race/ethnicit y distribution and its p opulation coun terpart. The fraction of White resp onden ts is substan tially lo w er than the one observ ed in the CPS, while the fractions of Blac k and Hispanic resp onden ts are larger than in the CPS. After w eigh ting, White resp onden ts are sligh tly o v errepresen ted, whereas Hispanics are sligh tly under- represen ted. In the UAS, Whites are considerably o v errepresen ted, while the un w eigh ted fractions of Blac k and Hispanic resp onden ts fall short of their p opulation b enc hmark b y ab out 3 to 4 p ercen tage p oin ts. These dierences disapp ear when w eigh ts are applied. Before w eigh ting, mean absolute dierence from the CPS is 6.9 p ercen tage p oin ts for the HRS and 4.2 for the UAS. After w eigh ting, they b ecome 1.3 and essen tially 0 p ercen tage p oin ts, resp ectiv ely . The un w eigh ted age distribution in the UAS is sligh tly further from its b enc hmark than is the un w eigh ted age distribution in the HRS. The latter notably o v errepresen ts individuals in the 7584 age range, while the UAS app ears to o v erselect y ounger resp onden ts, with a substan tial o v errepresen tation of the 5564 age group and fairly substan tial underrepresen tation of the 75+ age groups. Before w eigh ts are applied, mean absolute dierence from the CPS is 0.046 for the HRS and 0.059 for the UAS. The age brac k ets sho wn in the table are those used to construct w eigh ts in the UAS, but they do not necessarily o v erlap with those used in the HRS w eigh ting pro cedure. Hence, the dierence b et w een w eigh ted and p opulation distributions is, b y construction, minimized for the UAS. Nonetheless, after w eigh ting, the mean absolute dierence b et w een the CPS and HRS is only 0.3 p ercen tage p oin ts. As far as education is concerned, the HRS o v errepresen ts individuals with high sc ho ol or less and under- represen ts individuals with college education or more. This is consisten t with the fact that the HRS sample is biased to w ard the elderly . In sharp con trast, the UAS largely underrepresen ts lo w-educated individuals and o v errepresen ts those with some college education. Individuals holding higher degrees (bac helor and p ost- graduate degrees) are o v errepresen ted b y a more mo dest margin. On a v erage, the distance b et w een the HRS un w eigh ted education distribution and the one in the CPS is 3.5 p ercen tage p oin ts and shrinks to ab out 1.3 p ercen tage p oin ts when w eigh ts are applied. On a v erage, the distance b et w een the UAS un w eigh ted 10 education distribution and its p opulation coun terpart is 7.8 p ercen tage p oin ts and increases to 8.4 when w eigh ts are applied. Despite the observ ed div ergence in the education distributions b et w een the UAS and the CPS, the household income distributions align reasonably w ell. The HRS o v errepresen ts lo w-income households and underrepresen ts high-income households. This is consisten t with the o v ersample of minorit y groups who tend to b e less auen t. Since p ost-stratication is not based on income, dierences b et w een the surv eys and the p opulation income distribution remain ev en after w eigh ting, with a mean absolute deviation of ab out 3.4. The UAS matc h to the p opulation distribution deteriorates somewhat with w eigh ting to the same lev el as the HRS. Before w eigh ting, demographic deviations from the p opulation are observ ed for b oth samples. They are more pronounced in the HRS as far as gender, race and household income are concerned, but larger in the UAS for age and education. W e should note that in HRS, the o v errepresen tation of minorities (and the implied deviation from the p opulation income distribution) is b y design. After w eigh ting, b oth surv eys’ distributions matc h the p opulation on the dimensions used in the construction of the w eigh ts, as exp ected. Ho w ev er, some discrepancies in the distribution of v ariables not used for p ost-stratication are apparen t. Sp ecically , the distribution of household income do es not matc h its p opulation coun terparts in either the HRS or the UAS, and the distribution of education in the UAS remains far from its p opulation target. 5 Comparing Survey Outcomes in the HRS and UAS Ha ving assessed the represen tativ eness of the t w o samples, w e mo v e on to compare surv ey outcomes in the HRS and UAS. W e primarily fo cus on surv ey outcomes for whic h p opulation b enc hmarks can b e obtained from the CPS. Relativ e to previous researc h in v estigating dierences across studies stemming from dieren t co v erage as w ell as in terview mo de, w e consider a broad and div erse range of outcomes, among whic h p oten tial selection biases and mo de eects ma y dier. Moreo v er, exploiting the randomization of the in terview mo de in the HRS, w e can compare resp onse qualit y of face-to-face, telephone and online in terviews. Before presen ting and commen ting on the results of this analysis, it is imp ortan t to p oin t out that the w a y surv ey questionnaires are administered v aries across studies. In the HRS and in the UAS, questions are t ypically answ ered b y individuals on their o wn b ehalf, although resp onden ts are explicitly instructed to answ er on b ehalf of their household in some instances (e.g., in the mo dule eliciting household w ealth). The CPS asks a single household resp onden t to answ er questions including individual-sp ecic questions suc h as those ab out one’s lab or mark et outcomes and health on b ehalf of all other household mem b ers. This dierence ma y accoun t for some div ergence in outcome distributions across studies as w e will indicate in the 11 follo wing discussion. W e carry out the comparison of surv ey outcomes on the p opulation aged 55 and older and referring to the 2015 CPS for p opulation b enc hmarks. F or the UAS, w e apply sev eral dieren t w eigh ting sc hemes to assess the eect of p ost-stratifying on dieren t so cio demographic c haracteristics on w eigh ted sample outcomes.W e b egin with the default UAS sc heme (wgh0), where p ost-stratication factors are gender, race, age, education, income, household size, census region and urbanicit y and consider the follo wing v e alternativ e sc hemes. First, w e adopt ner age brac k ets (wgh1) to b etter accoun t for the underrepresen tation of older resp onden ts. Second, starting from the baseline w eigh ts, w e drop education from the set of raking factors (wgh2); this is the demographic v ariable that exhibits the largest discrepancy from its p opulation b enc hmark. Third, starting from the baseline w eigh ts, w e drop household income (but retain education) from the set of raking factors (wgh3). F ourth, starting from the baseline w eigh ts, w e drop b oth education and household income from the set of raking factors (wgh4); this sp ecication is the closest to the one adopted b y the HRS (and the one used in T able 1). Finally , starting from the baseline w eigh ts, w e drop geographic indicators census region and urbanicit y from the set of raking factors (wgh5). In ligh t of the demographic deviations from p opulation b enc hmarks highligh ted in the previous section, these v arious sp ecications allo w us to gauge to what exten t dieren t w eigh ting sc hemes impact surv ey outcomes and correct for p ossible sources of bias. In T ables 27, the rst column rep orts the p opulation distribution of the outcome of in terest as tak en from the CPS (if a v ailable). The second, third and fourth columns presen t the w eigh ted HRS distributions for the en tire HRS sample and separately for the phone and in-p erson in terview subsamples. Column 5 rep orts the un w eigh ted distribution in the UAS; columns 611 sho w the w eigh ted UAS distributions for eac h of the six sets of p ost-stratication w eigh ts describ ed ab o v e. F or all the outcomes considered in the analysis, item nonresp onse is rather lo w in b oth HRS and UAS and largely comparable in the t w o studies. W e note that 1.5% of the selected sample has no health insurance co v erage information and 1% do not rep ort retiremen t status in the UAS, while these fractions are essen tially zero in the HRS. On the other hand, there is virtually no item nonresp onse in the UAS as far as life satisfaction is concerned, while in the HRS, ab out 9% of resp onden ts in terview ed b y phone and 2% in terview ed in p erson ha v e missing life satisfaction. F ull details of item nonresp onse across studies are pro vided in App endix. 11 The rst comparison exercise concerns home o wnership. This is a rather ob jectiv e measure, and as a result, w e exp ect it to b e relativ ely more aected b y co v erage/represen tativ eness biases than b y in terview mo de. In the p opulation of adults aged 55 and older, 80% are home o wners and 20% are ren ters. 12 Within 11 W e are not a w are of signican t dierences in surv ey non-resp onse rates b et w een phone and in-p erson in terview mo de in the HRS. 12 In the HRS and UAS resp onden ts can rep ort whether they are o wners, ren ter or other. The latter option is not a v ailable in the CPS. The prop ortion of resp onden ts falling in the other category is minimal in b oth HRS and UAS and should not appreciably aect the comparison. 12 T able 2: Home Ownership CPS HRS UAS F ull Phone P erson un wgh wgh0 wgh1 wgh2 wgh3 wgh4 wgh5 Own 0.803 0.790 0.803 0.779 0.805 0.760 0.760 0.770 0.758 0.784 0.775 Do es not o wn 0.197 0.210 0.197 0.221 0.195 0.240 0.240 0.230 0.242 0.216 0.225 Me an abs. di - 0.013 0.000 0.025 0.002 0.043 0.043 0.033 0.045 0.019 0.028 Notes: wgh0, default UAS w eigh ts using gender, race, age, education, income, household size, census region, and urbanicit y; wgh1, as wgh0 with ner age brac k ets; wgh2, as wgh0 without education, wgh3, as wgh0 without income; wgh4, as wgh0 without education and income; wgh5, as wgh0 without census region and urbanicit y . the HRS, phone in terview ees are more lik ely to o wn a home than in-p erson in terview ees b y a statistically signican t margin ( p -v alue: 0.006). This dierence is not statistically signican t an y longer ( p -v alue: 0.106) when w e limit the sample to resp onden ts y ounger than 80, among whom mo de assignmen t is random. Th us, there seem to b e no evidence of a mo de eect for home o wnership as observ ed dierences b et w een in-p erson and phone in terview ees are lik ely stemming from the dieren t age comp osition of these t w o groups. In the UAS, the un w eigh ted home o wnership rate is 80% and ranges from 76% to 78% when w eigh ts are applied. When p ost-stratication is not based on education (wgh2 and wgh4), the w eigh ted home o wnership rate is closer to its p opulation b enc hmark as w ell as to the p opulation-lev el gures inferred from the HRS. A cross the v arious w eigh ting sc hemes, the mean absolute dierence b et w een the UAS and the CPS ranges from 1.9 to 4.5 p ercen tage p oin ts, while the un w eigh ted mean is righ t on the mark (mean absolute dierence equal to 0.2 p ercen tage p oin ts). When comparing the HRS (p o oling the phone and in-p erson samples) and the CPS, the mean absolute dierence is 1.3 p ercen tage p oin ts. In T able 3, w e fo cus on another arguably ob jectiv e measure, namely health insurance co v erage. As can b e seen, there exist some dierences within the HRS. The fraction of insured individuals is 94.2% among those in terview ed b y phone and 95.7% among those in terview ed in p erson. While this dierence is statistically signican t for the en tire sample ( p -v alue: 0.002), it is not among resp onden ts y ounger than 80 (p -v alue: 0.172). Again, this pattern suggests dierences in age comp osition b et w een the t w o samples, as all resp onden ts 80 and o v er are eligible for Medicare. F or the UAS, the un w eigh ted fraction of insured individuals is 95%, as in the reference p opulation, and do es not c hange appreciably when w eigh ts are applied. It increases sligh tly to 95.8% when the w eigh ting sc heme do es not use education and income (wgh4), but the dierence with the CPS remains small and under no w eigh ting sc heme is the dierence b et w een the UAS and CPS distributions statistically signican t. Questions ab out retiremen t status are b ound to b e sub ject to p ersonal in terpretation and answ ers to them ma y also b e aected b y so cial desirabilit y . In T able 4, w e observ e apparen tly sizeable, but statistically insignican t dierences in the prop ortion of retirees across surv eys. The un w eigh ted UAS prop ortion of retirees is signican tly dieren t from the CPS and HRS prop ortions ( p -v alue: 0.000 in b oth cases), but tests 13 T able 3: Health Insurance Co v erage CPS HRS UAS F ull Phone P erson un wgh wgh0 wgh1 wgh2 wgh3 wgh4 wgh5 No 0.052 0.050 0.058 0.043 0.049 0.050 0.050 0.045 0.052 0.042 0.046 Y es 0.948 0.950 0.942 0.957 0.951 0.950 0.950 0.955 0.949 0.958 0.954 Me an abs. di - 0.002 0.006 0.009 0.003 0.002 0.002 0.007 0.001 0.010 0.006 See T able 2. T able 4: Whether Retired CPS HRS UAS F ull Phone P erson un wgh wgh0 wgh1 wgh2 wgh3 wgh4 wgh5 No 0.529 0.535 0.584 0.492 0.597 0.561 0.551 0.557 0.557 0.554 0.557 Y es 0.471 0.465 0.416 0.508 0.403 0.439 0.449 0.443 0.443 0.446 0.443 Me an abs. di - 0.007 0.055 0.037 0.068 0.032 0.022 0.028 0.028 0.025 0.028 See T able 2. for dierences in the prop ortion retired b et w een all pairs of the CPS, full HRS, and UAS w eigh tings ha v e p -v alues b et w een 0.1 and 0.4. It should b e noted that, for this sp ecic outcome, dierences ma y also stem from the t yp e of questions administered to resp onden ts to elicit lab or force status and the t yp e of reco ding used b y eac h study . Sp ecically , w e rely on the ma jor lab or force status reco de of the CPS, whic h is based on answ ers to a series of lab or force items in the main questionnaire. F or the HRS and the UAS, w e adopt the RAND-HRS indicator of retiremen t, whic h is based on a question where resp onden ts can select more than one emplo ymen t status at once (e.g., w orking part-time and retired). Not surprisingly , the fraction of retired individuals is higher among HRS resp onden ts in terview ed in p erson than among those in terview ed b y phone. This plausibly reects the fact that the former group is, on a v erage, four y ears older (a v erage age is 67 for the phone in terview group and 71 for the face-to-face in terview group). Ov erall, ev en after w eigh ting, the HRS seems to somewhat underrepresen t retirees, with a prop ortion of 46.5% compared to 47.1% in the reference p opulation. In con trast, the un w eigh ted prop ortion of retired individuals in the UAS is substan tially (and statistically signican tly) lo w er than in the CPS, at 40.3%. Suc h a dierence ma y b e lik ely driv en b y represen tativ eness/selection bias. Individuals who answ er online surv eys tend to b e y ounger, b etter educated, and more attac hed to the lab or force. When default w eigh ts are applied, the fraction of retirees in the UAS increases b y 3 p ercen tage p oin ts. In terestingly , the UAS w eigh ting with the closest-to-target prop ortion of retired individuals (44.9%) and the lo w est mean absolute dierence with the CPS (0.022) is ac hiev ed when ner age brac k ets are used (wgh1), whic h b etter correct for the underrepresen tation of seniors. Y et, dierences across w eigh ting sc hemes are rather mo dest. In T able 5, w e compare the distribution of individual earnings across surv eys. The prop ortion of individ- uals with earnings b elo w $25,000 p er y ear is larger in the HRS than in the CPS b y a statistically signican t margin ( p -v alue: 0.001) and apparen tly more sizeable among those who are administered a face-to-face in- 14 T able 5: Individual Earnings CPS HRS UAS F ull Phone P erson un wgh wgh0 wgh1 wgh2 wgh3 wgh4 wgh5 [0-$25k) 0.726 0.745 0.715 0.771 0.695 0.747 0.756 0.736 0.754 0.718 0.746 [$25k-$50k) 0.118 0.098 0.105 0.092 0.125 0.109 0.108 0.108 0.110 0.113 0.111 [$50k-$75k) 0.070 0.067 0.077 0.058 0.070 0.051 0.049 0.053 0.051 0.058 0.052 [$75k-$100k) 0.036 0.035 0.037 0.033 0.045 0.043 0.040 0.046 0.041 0.049 0.040 $100k+ 0.050 0.055 0.065 0.046 0.064 0.049 0.047 0.057 0.045 0.062 0.052 Me an abs. di - 0.010 0.010 0.018 0.012 0.011 0.014 0.011 0.013 0.010 0.011 See T able 2. terview. Con v ersely , the fraction of high earners (ab o v e $75,000) in the HRS is 0.4 p ercen tage p oin ts higher than in the CPS, but this dierence is not signican t ( p -v alue: 0.261). The UAS sligh tly underrepresen ts lo w earners, and the observ ed dierence with the CPS is statistically signican t ( p -v alue: 0.006). When w eigh ts are applied, the fraction of individuals with earnings b elo w $25,000 is closer to its p opulation b enc hmark, with only b orderline-signican t dierences with the CPS for only some w eigh ts ( p -v alues range from 0.051 to 0.605). The fraction of w ork ers with earnings ab o v e $75,000 is 2.4 p ercen tage p oin ts larger in the UAS relativ e to the CPS, and the dierence is statistically signican t (p -v alue: 0.002). With w eigh ts, this gap v aries b et w een virtually 0 and 2.5 p ercen tage p oin ts and only the dierence using wgh4 is statistically signican t at the 5% lev el ( p -v alue 0.013). When comparing v arious sets of w eigh ts in the UAS, w e observ e only minor dierences among them in terms of mean absolute dierence from the reference p opulation. Only sligh tly larger deviations are sho wn when the set of raking factors features ner age brac k ets (wgh1) and do es not include household income (wgh3). In general, the earnings distribution in b oth the UAS and the HRS matc hes the one in the CPS v ery closely . The mean absolute dierence is ab out 1 p ercen tage p oin t in the HRS the UAS b oth b efore and after w eigh ting. Next, w e examine t w o sub jectiv e outcomes, that is, self-rep orted health and life satisfaction. F or b oth of them, w e exp ect mo de eects to b e more apparen t (w e analyze mo de eects for these t w o measures in more detail in the next section). T able 6 rep orts the distribution of self-rep orted health. All three surv eys ask their resp onden ts to rate their health on a v e-p oin t scale, excellen t (1), v ery go o d (2), go o d (3), fair (4) and p o or (5). As men tioned ab o v e, though, while in the UAS and in the HRS all resp onden ts answ er ab out their o wn health, in the CPS the household resp onden t rep orts ab out his/her o wn health as w ell as that of other household mem b ers. In T able 6, w e rely on all household mem b ers’ health status rep orts in the CPS. The distribution of health status in the CPS remains virtually unc hanged when w e only use health status referring to the household resp onden t (see T able 11 in App endix). The rst thing to notice is the absence of an y dierence b et w een the measures elicited b y the HRS via telephone and in-p erson in terview. The only sizeable and marginally signican t deviation is observ ed for 15 T able 6: Self-Rep orted Health CPS HRS UAS F ull Phone P erson un wgh wgh0 wgh1 wgh2 wgh3 wgh4 wgh5 Excellen t 0.151 0.094 0.097 0.091 0.108 0.102 0.098 0.111 0.099 0.116 0.102 V ery go o d 0.274 0.321 0.328 0.315 0.354 0.327 0.334 0.353 0.328 0.363 0.333 Go o d 0.329 0.328 0.328 0.328 0.327 0.339 0.335 0.337 0.342 0.335 0.345 F air 0.171 0.191 0.183 0.198 0.171 0.187 0.188 0.157 0.186 0.148 0.177 P o or 0.076 0.067 0.066 0.068 0.041 0.045 0.045 0.043 0.045 0.039 0.043 Me an abs. di - 0.027 0.026 0.027 0.032 0.032 0.034 0.035 0.033 0.038 0.033 See T able 2. the fraction of individuals rep orting fair health ( p -v alue: 0.041). No deviation is remotely signican t when limiting the sample to resp onden ts y ounger than 80, though. Mo de eects, then, do not seem to b e presen t for this outcome. It is common practice in the health economics literature to classify individuals in to t w o groups, one in p o or and fair health and another in go o d, v ery go o d and excellen t health. Compared to the CPS, the HRS somewhat o v errepresen ts individuals in fair and p o or health. This fraction is 1.1 p ercen tage p oin ts higher in the HRS and the dierence is statistically signican t ( p -v alue: 0.029). The un w eigh ted UAS distribution sho ws an underrepresen tation of individuals in excellen t health, an o v errepresen tation of those in v ery go o d health, and an underrepresen tation of those in p o or health relativ e to the CPS. These deviations from p opulation b enc hmarks are not corrected b y sample w eigh ts, regardless of the p ost-stratication sc heme adopted. When using the aforemen tioned binary health indicator, the un w eigh ted prop ortion of individuals in fair and p o or health in the UAS falls short of the CPS b enc hmark b y 3.4 p ercen tage p oin ts. This dierence is statistically signican t ( p -v alue: 0.001). The gap is reduced to 1.4 p ercen tage p oin ts (not statistically signican t, with p -v alue: 0.356) when default UAS w eigh ts are applied. Ov erall, the HRS and UAS p erform similarly in terms of their abilit y to matc h the p opulation distribution of self-rep orted health after w eigh ting. Sp ecically , mean absolute dierence relativ e to the CPS is 0.027 for the HRS and 0.032 for the UAS with default w eighs (wgh0). The HRS and UAS resp onden ts are ask ed ab out their life satisfaction. Answ ers are on a v e-p oin t scale, from completely satised (1) to not at all satised (5). There is no analogous question in the CPS instrumen t, so w e do not ha v e a p opulation b enc hmark for this outcome. The fraction of HRS resp onden ts rep orting complete satisfaction is 2.8 p ercen tage p oin ts higher for the in-p erson than the phone in terview and the dierence is signican t ( p -v alue: 0.001). In con trast, those in terview ed b y phone tend to express more mo derate judgmen ts. The fraction of those stating that they are somewhat satised with their life is 2.1 p ercen tage p oin ts higher when the questionnaire is administered o v er the phone and, again, the dierence is statistically signican t ( p -v alue: 0.027). Observ ed dierences in the fractions of individuals 16 T able 7: Ho w Satised with Life HRS UAS F ull Phone P erson un wgh wgh0 wgh1 wgh2 wgh3 wgh4 wgh5 Completely 0.220 0.205 0.232 0.136 0.151 0.151 0.146 0.148 0.145 0.155 V ery 0.464 0.466 0.462 0.491 0.473 0.475 0.489 0.469 0.493 0.483 Somewhat 0.265 0.277 0.256 0.304 0.301 0.300 0.300 0.307 0.296 0.292 Not v ery 0.040 0.041 0.039 0.059 0.061 0.060 0.058 0.063 0.059 0.057 Not at all 0.012 0.012 0.011 0.010 0.014 0.014 0.008 0.013 0.007 0.012 See T able 2. who are not v ery and not at all satised with their life are v ery small in magnitude and not statistically signican t. When restricting atten tion to resp onden ts y ounger than 80, the only signican t dierence is for the somewhat satised group ( p -v alue: 0.04). W e will delv e more in to p oten tial mo de eects for self-rep orted life satisfaction in the next section and shed some ligh t on the exten t to whic h the dieren t age comp osition of these t w o groups of HRS resp onden ts ma y con tribute to these results. In the UAS, un w eigh ted and w eigh ted life-satisfaction distributions are remark ably similar, regardless of whic h set of p ost-stratication w eigh ts is considered. UAS resp onden ts are signican tly less lik ely to rep ort complete life satisfaction ( p -v alues are 0.000 for all w eigh ting sc hemes) and more inclined to state that they are somewhat ( p -v alues range from 0.001 to 0.084 across dieren t w eigh ts) or not v ery ( p -v alues from 0.001 to 0.022) satised with their life. Based on this, it is not surprising to see that the UAS distribution is relativ ely closer to the one obtained from the HRS phone in terview. Ev en so, dierences b et w een these t w o distributions are v ery pronounced. W e construct a binary v ariable taking the v alue 1 for completely or v ery satised and 0 otherwise. With default w eigh ts (wgh0), w e estimate that the fraction of individuals with a p ositiv e outlo ok on their life (i.e., this indicator equal to 1) is 6 p ercen tage p oin ts lo w er in the UAS than in the HRS and nd that this dierence is statistically signican t ( p -v alue: 0.002). 5.1 Mode Eects Compared to face-to-face and online surv eys, phone in terviews lac k visual aids. This is an imp ortan t c harac- teristic to accoun t for when studying resp onse qualit y across dieren t in terview mo des. When resp onden ts are ask ed to pic k one category from a list for example, in rating their health and life satisfaction on v e-p oin t scales lik e those adopted b y the HRS instrumen t there are t w o w ell-kno wn resp onse eects: a primacy eect (a tendency to pic k the rst resp onse category) and a recency eect (a tendency to pic k the last resp onse category). Imp ortan tly , primacy and recency eects sho w age gradien ts. As discussed b y Knaup er (1999), older resp onden ts are more lik ely to c ho ose the last category (recency), while y ounger resp onden ts are more lik ely to c ho ose the rst category (primacy). A p ossible explanation for this phenomenon comes from the decline of memory when p eople age. 17 Figure 1: Predicted Mean Health Status b y Age and Surv ey Mo de 3 3.25 3.5 55-59 60-64 65-69 70-74 75-79 Age UAS 3 3.25 3.5 55-59 60-64 65-69 70-74 75-79 Age HRS-Phone 3 3.25 3.5 55-59 60-64 65-69 70-74 75-79 Age HRS-in-Person 3 3.25 3.5 55-59 60-64 65-69 70-74 75-79 Age CPS by age and survey mode Predicted Mean Health Status In view of this and the order of resp onse categories in the health status and life-satisfaction questions (whose distributions are rep orted in T ables 6 and 7, resp ectiv ely), w e ma y exp ect a sharp er decline (or a less steep increase) in health and life satisfaction with age in auditory mo de (i.e., b y phone), then o v er the In ternet and in p erson (since in the latter case HRS uses sho w cards). The CPS carries out in terviews b oth b y phone and in p erson. The preferred mo de of in terview is face-to-face for the rst and last mon ths of a household’s time in the rotating panel, while the in terview mo de defaults to telephone during the in terv ening three mon ths. The CPS data do not include a v ariable indicating whether an in terview w as conducted in p erson or o v er the phone. Also, there is no indication of adopting sho w cards during face-to-face in terviews, thereb y making these more akin to an HRS phone than in-p erson in terview. A dierence b et w een the UAS and the other surv eys is the absence of an in terview er. In teractions b et w een in terview ers and resp onden ts ma y generate so cial desirabilit y eects. As a result of that, the mere presence of an in terview er w ould most lik ely lead to higher lev els of self-rep orted health and life satisfaction in face-to-face and phone in terviews, compared to In ternet surv eys (Chang and Krosnic k, 2009). Figures 14 sho w the results of (un w eigh ted) regressions of self-rep orted health and life satisfaction on a n um b er of age dummies represen ting age brac k ets (55 59, 6064, 6569, 7074, 7579), and as con trols gender, race, education, and household income. Since most resp onden ts 80 and older in the HRS are in terview ed face to face (while assignmen t to telephone or face to face is random for y ounger ages), w e restrict the samples to resp onden ts 79 y ears or y ounger to a v oid the comparison of surv ey outcomes b et w een 18 Figure 2: (a) Predicted Probabilit y of Cho osing the First Option (Excellen t Health) b y Age and Surv ey Mo de and (b) Predicted Probabilit y of Cho osing the Last Option (P o or Health) b y Age and Surv ey Mo de. (a) 0 .1 .2 55-59 60-64 65-69 70-74 75-79 Age UAS 0 .1 .2 55-59 60-64 65-69 70-74 75-79 Age HRS-Phone 0 .1 .2 55-59 60-64 65-69 70-74 75-79 Age HRS-in-Person 0 .1 .2 55-59 60-64 65-69 70-74 75-79 Age CPS by age and survey mode Predicted Probability of Choosing the First Option (Excellent Health) (b) 0 .1 .2 55-59 60-64 65-69 70-74 75-79 Age UAS 0 .1 .2 55-59 60-64 65-69 70-74 75-79 Age HRS-Phone 0 .1 .2 55-59 60-64 65-69 70-74 75-79 Age HRS-in-Person 0 .1 .2 55-59 60-64 65-69 70-74 75-79 Age CPS by age and survey mode Predicted Probability of Choosing the Last Option (Poor Health) 19 the phone and in-p erson in terview mo des in the HRS b eing confounded b y the dieren t age comp osition of the t w o groups. The graphs rep ort the a v erage predicted lev els of self-rep orted health and life satisfaction b y age, with other demographic v ariables set to their w eigh ted sample means. 13 The grey , capp ed spik es represen t 95% p oin t wise condence in terv als. In Figure 1, self-rep orted health is rev erse co ded so that higher n um b ers indicate b etter health. The bar c harts presen ted in the gure app ear to b e in line with a h yp othesized age gradien t of the recency eect. The UAS and the HRS inp erson mo de sho w no statistically signican t dierences in predicted self-rep orted health b et w een y ounger (5559) and older (7579) resp onden ts ( p -v alues: 0.766 for the UAS and 0.883 for the HRS in-p erson mo de). The HRS phone mo de sho ws a decrease at the oldest age brac k et. The estimated gap with the y oungest age brac k et is sizeable and statistically signican t ( p -v alue: 0.014). Remark ably , the CPS sho ws the steep est health decline with age. T o shed further ligh t on this, w e in v estigate patterns in the probabilit y of c ho osing the rst (excellen t health) and last (p o or health) option in Figure 2. The probabilit y of c ho osing the rst option (excellen t health) in the UAS and the HRS phone group exhibits no apparen t age gradien t, although in the HRS in- p erson group, the lik eliho o d of rep orting excellen t health is statistically signican tly higher among y ounger than older resp onden ts ( p -v alue: 0.015). It decreases monotonically and to a greater exten t with age in the CPS. There is no clear age pattern in the probabilit y of c ho osing the last option (p o or health). Most imp ortan tly , the fraction of resp onden ts aged 7579 rep orting p o or health is rather comparable across surv eys. Ov erall, w e do not ha v e clear evidence of mo de eects in self-rep orted health. Y et, since the CPS in terviews are either o v er the phone or face-to-face without visual aid, the sharp er decline in health observ ed in the CPS could b e consisten t with an age-related recency eect. This in terpretation remains rather sp eculativ e, as no recency eect can b e detected for the HRS phone in terview, where no visual aid is a v ailable. W e also compare self-rep orted health across in terview mo des to assess the exten t of the so cial desirabilit y eect induced b y the presence of an in terview er. F or this purp ose, w e regress (rev erse co ded) health on surv ey mo de indicators, age-group dummies, and basic demographics. Relativ e to the UAS, where no in terview er is presen t, a v erage self-rep orted health is signican tly higher in the CPS ( p -v alue: 0.010) but signican tly lo w er in the HRS-phone ( p -v alue: 0.019) and HRS-in-p erson ( p -v alue: 0.003), although the size of these dierences is small within 2.4% of the o v erall sample mean. Th us, w e conclude that, if presen t, so cial desirabilit y eects in health self-rep orts are rather unimp ortan t. Figure 3 sho ws predicted life satisfaction b y age for the UAS, HRS-phone and HRS-in-p erson (no measure of life satisfaction is a v ailable in the CPS). F or this v ariable, as w ell, w e adopt rev erse co ding so that higher n um b ers indicate greater life satisfaction. There app ears to b e little dierence b et w een HRS phone and HRS 13 The graphs based on regressions without con trols lo ok v ery similar and are not rep orted here. 20 Figure 3: Predicted Mean Life Satisfaction b y Age and Surv ey Mo de 3.5 4 4.5 55-59 60-64 65-69 70-74 75-79 Age UAS 3.5 4 4.5 55-59 60-64 65-69 70-74 75-79 Age HRS-Phone 3.5 4 4.5 55-59 60-64 65-69 70-74 75-79 Age HRS-in-Person by age and survey mode Predicted Mean Life Satisfaction face-to-face in terview mo des. The age gradien t in life satisfaction is steep er for the UAS than for the HRS. Figure 4 sho ws that the probabilit y of rep orting the lo w est v alue of life satisfaction (last option) tic ks up in the HRS phone sample in the highest age brac k et, but not in the face-to-face sample. It also tic ks up in the UAS, but the condence in terv al b ecomes v ery wide in the 7579 age category . When w e regress life satisfaction on in terview mo de indicators, conditional on age and demographics, w e nd evidence consisten t with a so cial desirabilit y eect. Relativ e to the UAS, a v erage life satisfaction is signican tly higher in the HRS, b oth o v er the phone and face to face ( p -v alue: 0.000 for b oth mo des), although the magnitude of this eect is mo dest, b eing within 2.8% of the sample mean. There is no evidence that a v erage life satisfaction is dieren t in the HRS-in-p erson and in the HRS-phone samples. 6 Conclusions W e ha v e do cumen ted some clear demographic dierences b et w een the UAS and HRS. Compared to the US p opulation aged 55 and older, the UAS has relativ ely few er resp onden ts at older ages, while the HRS o v errepresen ts older age groups. The UAS underrepresen ts individuals with high sc ho ol or less, while the HRS underrepresen ts the higher education strata. In general, sample w eigh ts correct for these discrepancies and allo w one to satisfactorily matc h p opulation b enc hmarks as far as k ey so cio demographic v ariables are concerned. F or instance, ac kno wledging the signican t underrepresen tation of individuals with high sc ho ol or 21 Figure 4: (a) Predicted Probabilit y of Cho osing the First Option (Completely Satised) b y Age and Surv ey Mo de and (b) Predicted Probabilit y of Cho osing the Last Option (Not at All Satised) b y Age and Surv ey Mo de. (a) .1 .2 .3 .4 55-59 60-64 65-69 70-74 75-79 Age UAS .1 .2 .3 .4 55-59 60-64 65-69 70-74 75-79 Age HRS-Phone .1 .2 .3 .4 55-59 60-64 65-69 70-74 75-79 Age HRS-in-Person by age and survey mode Predicted Probability of Choosing the First Option (Completely Satisfied) (b) 0 .025 .05 55-59 60-64 65-69 70-74 75-79 Age UAS 0 .025 .05 55-59 60-64 65-69 70-74 75-79 Age HRS-Phone 0 .025 .05 55-59 60-64 65-69 70-74 75-79 Age HRS-in-Person by age and survey mode Predicted Probability of Choosing the Last Option (Not at All Satisfied) 22 less, the default UAS w eigh ts are p ost-stratied on the in teraction of gender and education, thereb y aligning sample distributions of education b y gender with their p opulation coun terparts. Comparison of a v ariet y of surv ey outcomes with p opulation targets tak en from the CPS sho ws a strikingly go o d t for b oth the HRS and the UAS. Outcome distributions in the HRS are marginally closer to those in the CPS than outcome distributions in the UAS. These patterns arise for the most part regardless of whic h v ariables are used to construct p ost-stratication w eigh ts in the UAS, conrming the robustness of these results. W e nd little evidence of mo de eects when comparing the sub jectiv e measures of self-rep orted health and life satisfaction across in terview mo des. Sp ecically , w e do not observ e v ery clear primacy or recency eects for either health or life satisfaction. W e do nd a signican t so cial desirabilit y eect, driv en b y the presence of an in terview er, as far as life satisfaction is concerned. While relativ ely simple and merely descriptiv e, the analyses in this study oer a comprehensiv e compar- ison of surv eys administered b y dieren t in terview mo des b oth in terms of sample represen tativ eness and data qualit y . They also pro vide rather consisten t empirical evidence whic h leads us to answ er the question ask ed in the title of this pap er with a ten tativ e Y es. 23 Chapter 2: T esting for W eak-Instrument Bias in Just-Identied 2SLS 7 Introduction W e pro vide inference pro cedures for an easily in terpreted measure of w eak-instrumen t bias in the linear mo del with a single endogenous regressor and a single excluded instrumen tal v ariable (IV). The mo del ma y include additional exogenous regressors. Where is the endogenous regressor’s true co ecien t and ^ is its t w o-stage least squares (2SLS) estimator, 14 Mon tiel Olea and Pueger (2013) dev elop a widely used test for the Nagar bias of ^ . The Nagar bias, due to Nagar (1959), is the asymptotic exp ectation of a tractable appro ximation to ^ , and is eectiv ely used to appro ximate the asymptotic mean bias, E[ ^ ]. The mean bias, ho w ev er, generally fails to exist when the mo del is just iden tied, lea ving the appropriate in terpretation of the Nagar bias unclear. Andrews et al. (2019) suggest that one could instead use a test for asymptotic size distortion in the W ald test asso ciated with ^ , but this still lac ks a direct in terpretation in terms of estimator bias. W e prop ose a test closely related to that of Mon tiel Olea and Pueger (2013), replacing the Nagar bias with the quan tile of in ^ ’s sampling distribution: P [ ^ <]. When P [ ^ <] =:5, the estimator is median un biased, so the absolute v alue of :5P [ ^ < ], while not the median bias itself, indicates the degree of deviation from median un biasedness and serv es as an easily understandable bias concept. In this pap er, w e establish b ounds on asymptotic v alues of P [ ^ < ] that are a function of instrumen t strength and pro vide inference pro cedures on these b ounds to deliv er tests and condence in terv als for w orst-case asymptotic v alues of P [ ^ <]. While our fo cus on just-iden tied 2SLS estimation with a single IV ma y sound narro w, this setting is com- mon enough in applied econometric practice to w arran t sp ecial treatmen t. In a review of empirical practice using IV mo dels, Y oung (2020) samples pap ers published b et w een 2006 and 2016 in the American Economic Review and American Economic Journals b y searc hing www.aea w eb.org with the k eyw ord "instrumen t." Based on this sample, he states that con v en tional linear t w o stage least squares . . . is the o v erwhelmingly dominan t approac h in this literature, and, after narro wing the sample do wn to 32 feasibly replicable pap ers, he nds that 1087 of their 1400 2SLS estimates (77.6%) use a single instrumen t and endogenous v ariable. 14 ^ could also b e regarded as the limited information maxim um lik eliho o d estimator, this b eing iden tical to 2SLS in the just-iden tied mo del. 24 7.1 Relation to the Literature This w ork adds to the literature on testing for w eak instrumen ts, and, in particular, supplemen ts the tests dev elop ed b y Sto c k and Y ogo (2005) and Mon tiel Olea and Pueger (2013). T o help empirical researc hers assess instrumen t w eakness, these pap ers dev elop tests of n ull h yp otheses that instrumen ts are w eak enough to ha v e practical consequences in terms of estimator bias and test size distortion. Setting the stage for these tests, Staiger and Sto c k (1997) in tro duce w eak-instrumen t asymptotics, a k ey tec hnical to ol for analyzing w eak IV problems. Lo osely , instrumen t strength is increasing in sample size and the rst-stage co ecien ts’ magnitudes, and these asymptotics hold instrumen t strength constan t as the sample size gro ws b y shrinking the rst-stage co ecien ts suc h that the t w o eects cancel out. This results in biased, non-Gaussian asymp- totic distributions for common IV estimators suc h as 2SLS. Compared to traditional asymptotics with xed rst-stage co ecien ts, w eak-instrumen t asymptotics often b etter appro ximate the nite-sample distributions of IV-based estimators and test statistics when sample sizes are large and rst-stage co ecien ts are small. Sto c k and Y ogo (2005) use this asymptotic regime to dev elop tests whose n ull h yp otheses sp ecify either that instrumen ts are w eak enough to cause a sp ecied lev el of asymptotic bias in one of sev eral IV estimators or that they are w eak enough to cause a sp ecied asymptotic size distortion in the estimators’ asso ciated W ald tests. F or mo dels with a single endogenous v ariable, the test statistic is simply the con v en tional homosk edastic F-statistic for join t signicance of the instrumen ts in the rst stage. Both Staiger and Sto c k (1997) and Sto c k and Y ogo (2005), ho w ev er, assume homosk edastic, uncorrelated errors, and Sto c k and Y ogo (2005) dev elop tests of estimator bias only for cases with at least t w o more instrumen ts than endogenous regressors. Sk eels and Windmeijer (2018) extend the Sto c k and Y ogo (2005) bias tests to the case with t w o instrumen ts and one endogenous regressor, under additional assumptions guaran teeing that the bias exists and is nite. F or mo dels with a single endogenous v ariable, Mon tiel Olea and Pueger (2013) create analogues of the Sto c k and Y ogo (2005) bias tests for use with heterosk edastic or dep enden t errors and an y n um b er of instrumen ts. In the just-iden tied case, E[ ^ ] do es not exist, and a c hange in the bias concept is k ey to dev eloping a test here. The bias concept that Sto c k and Y ogo (2005) test for is w orst-case asymptotic mean bias relativ e to ordinary least squares (OLS): the maximal asymptotic v alue of jE[ ^ IV ]j=jE[ ^ OLS ]j, where ^ IV is the estimator whose bias is b eing tested and the maxim um is tak en o v er the correlation structure of the rst and second stage errors (i.e. the degree of endogeneit y). The n ull of the Sto c k and Y ogo (2005) bias test can then b e though t of as sa ying correlation b et w een the rst and second stage errors could b e high enough that ^ IV ’s asymptotic mean bias is at least a% of ^ OLS ’s, for some v alue of a sp ecied b y the researc her. 25 Mon tiel Olea and Pueger (2013) instead adapt the bias concept dev elop ed b y Nagar (1959). T o obtain the Nagar bias, they express ^ IV in terms of a parameter capturing instrumen t w eakness and expand the result in a T a ylor series ab out the p oin t where instrumen ts are p erfectly strong. 15 An appro ximation to ^ IV is formed b y truncating the series to a second-order expansion, and the asymptotic distribution of this appro ximation is then used as an appro ximation to the asymptotic distribution of ^ IV . In particular, momen ts from the appro ximating distribution are used to appro ximate momen ts from the truth. The truncated series’ mean exists in the just-iden tied case, allo wing its use in our setting, but, as discussed b y Sriniv asan (1970), it is not clear ho w one should in terpret suc h an appro ximation to a momen t that do es not exist. Mon tiel Olea and Pueger (2013) dev elop a test statistic they refer to as the eectiv e F-statistic and use it to test the n ull h yp othesis that, relativ e to a b enc hmark 16 , the w orst-case asymptotic Nagar bias of an estimator is greater than a sp ecied lev el. In the presen t, just-iden tied setting, the eectiv e F-statistic is iden tical to the con v en tional robust F-statistic. In a recen t review pap er, Andrews et al. (2019) call for empirical researc hers to use this statistic more widely in screening for w eak instrumen ts. A ddressing the in terpretabilit y of the Nagar bias as a criterion b y whic h to set thresholds, ho w ev er, Andrews et al. (2019) p oin t out that with a single instrumen t and endogenous regressor, it is v alid and p erhaps preferable to use the Sto c k and Y ogo (2005) critical v alues for asymptotic test size distortion, after substituting a robust F-statistic for the original test’s homosk edastic one. Of course, this alternativ e ma y b e easily in terpretable, but not in terms of bias. In this pap er, w e in tro duce means to directly test the bias concept discussed ab o v e: the w orst-case asymptotic v alues of P [ ^ <], where the w orst-case is again dened in terms of the correlation structure of the rst and second stage errors. The principle con tribution is to mak e a v ailable a test for bias that empirical researc hers can, w e hop e, more easily in terpret and use to decide whether an instrumen t is strong enough. The test statistic is the robust rst-stage F-statistic (again, equiv alen t to the eectiv e F-statistic in this setting), and, m uc h lik e the Mon tiel Olea and Pueger (2013) test, the test in tro duced here can b e made robust to departures from error homosk edasticit y and uncorrelatedness as long as the asymptotic v ariance of the OLS estimator of the instrumen t’s rst-stage co ecien t can b e consisten tly estimated. The test can b e easily in v erted to pro duce condence in terv als, as w ell. Other related w ork includes the prop osal b y Zhan (2017) to test the n ull h yp othesis of strong instrumen ts b y comparing the b o otstrap distribution of ^ to the asymptotic normal distribution that ^ tak es under the 15 In our setting, the parameter capturing instrumen t w eakness is 1 , where 2 is as dened in our Section 8. The T a ylor expansion is in 1 ab out the p oin t where 1 = 0. 16 The b enc hmark is based on a dieren t appro ximation to the mean bias of ^ IV , but coincides with the bias of OLS when errors are homosk edastic and uncorrelated. 26 n ull. Sanderson and Windmeijer (2016) prop ose a test similar to that in Sto c k and Y ogo (2005) for mo dels withK > 1 endogenous v ariables, where instrumen t w eakness is mo deled b y ha ving the matrix of rst-stage parameters b e lo cal to a matrix with rank K 1, rather than lo cal to zero. The remainder of this pap er is organized as follo ws. Section 8 la ys out the mo del and assumptions. Section 9 dev elops b ounds on P [ ^ <] and a metho d to p erform inference on them. Section 10 calculates the resulting condence in terv als for a v ariet y of empirical pap ers, using data collected b y Andrews et al. (2019) on pap ers published in the American Economic Review that estimate linear IV mo dels. Section 11 concludes and prop oses a v en ues for future researc h. 8 The Model Our mo del cen ters around the follo wing structural equation relating an outcome y to exogenous regressors X and a single endogenous regressor, Y , whose co ecien t w e w an t to estimate: y = Y +X +e: (1) W e also assume the existence of a single instrumen tal v ariable, z , with the reduced form equation for Y b eing Y = z +X +u: (2) Where n is the sample size and J is the n um b er of exogenous regressors, eac h of y , Y , and z is an n 1 v ector, and X is an nJ matrix. The co ecien ts and are scalars, while and are J 1 v ectors. Dening the orthogonal pro jection o of X as MIX(X 0 X) 1 X 0 , where I is the nn iden tit y matrix, let ^ z 0 My z 0 MY and ^ z 0 MY z 0 Mz (3) resp ectiv ely denote the 2SLS estimate of and the OLS estimate of . F or our asymptotic analysis, w e use the asymptotics in tro duced b y Staiger and Sto c k (1997), wherein instrumen t w eakness is expressed b y mo deling as lo cal to zero. Assumption L: = c p n , for a xed scalar c. The follo wing assumptions are used to establish asymptotically v alid b ounds on P [ ^ <] as a function 27 of an unkno wn parameter go v erning instrumen t strength. These b ounds are therefore not feasible, but are used obtain asymptotically v alid condence sets for w orst-case v alues of P [ ^ <]. As in Mon tiel Olea and Pueger (2013), these assumptions are delib erately k ept at a high lev el to encompass a v ariet y of commonly assumed primitiv e conditions. In particular, this is in tended to accommo date heterosk edastic, clustered, and serially correlated errors. Assumption B: The follo wing limits hold as n!1: 1. P [z 0 Me> 0] !:5 and P [z 0 Me< 0] !:5 2. p n z 0 Mu z 0 Mz d !N(0;w), for a xed scalar w> 0 Assumption B.1 is essen tially a statemen t of instrumen t exogeneit y . It is satised b y , but w eak er than, the more common assumption that z 0 Me p n d ! N(0;w e ), and suggests that our asymptotic analysis will b etter appro ximate nite sample realit y when z 0 Me is symmetrically distributed ab out 0 in nite samples, and that slo w con v ergence of z 0 Me p n to normalit y in the tails of the distribution p oses no problem. Assumption B.2 simply pro vides for asymptotic normalit y of ^ . The b ounds on P [ ^ <] are a function of the parameter 2 c 2 w : 2 can b e though t of as the p opulation v alue of the (robust) rst-stage F-statistic, or, alternativ ely , as an analogue of the concen tration parameter discussed b y , for example, Sto c k et al. (2002). In tuitiv ely , it captures instrumen t strength, b eing essen tially the magnitude of the instrumen t’s rst-stage co ecien t relativ e to the asymptotic v ariance of its OLS estimator. P erforming inference on the b ounds amoun ts to p erforming inference on 2 , whic h in turn requires a consisten t estimator for w . This existence of suc h an estimator constitutes our nal high-lev el assumption. Assumption C: There exists an estimator ^ w of w suc h that ^ w p !w , as n!1. Assumption C is satised whenev er one can consisten tly estimate standard errors for ^ , and so accom- mo dates common settings where u is heterosk edastic, auto correlated, or clustered. 9 Bounds and Inference Asymptotic b ounds on P [ ^ <] are established in the follo wing theorem. Theorem 1. Wher e (:) is the standar d normal distribution function, the fol lowing statements hold for the mo del describ e d by e quations 1 and 2 under assumptions L and B: 28 1. P [ ^ <]!a as n!1, with a2 [:5 (); :5 + ()]. 2. a2f:5 (); :5 + ()g when e =um, for a xe d sc alar m6= 0. See the app endix for a pro of of theorem 1. The b ounds are symmetric ab out :5, so the quan tit y () represen ts the maximal p ossible deviation from asymptotic median un biasedness and :5 () are w orst-case v alues of P [ ^ <]. With a p erfectly uninformativ e instrumen t, where 2 = 0, () =:5, and the b ounds are [0; 1]. Con v ersely , in the limiting case of an arbitrarily strong instrumen t, where 2 !1, ()! 0, and the b ounds con v erge to .5 from b oth sides, reecting the estimator’s asymptotic median un biasedness. The b ounds are obtained in the most endogenous case, where the rst and second stage errors are p erfectly correlated. Ha ving obtained b ounds as a function of 2 , w e need no w to p erform inference on 2 to determine ho w wide the b ounds migh t b e in a particular setting. Giv en condence in terv al C 2 ( 2 ; 2 ) for 2 with asymptotic co v erage probabilit y , a condence in terv al for P [ ^ <] with asymptotic co v erage probabilit y at least can b e easily obtained as C P[ ^ <] = (:5 ();:5 + ()): The in tuition b ehind this is simple; C P[ ^ <] con tains ev erything in the b ounds for the w eak est set of instru- men ts inC 2 . T o see this mathematically , note that the lo w er and upp er ends of this in terv al are resp ectiv ely increasing and decreasing in 2 , and so the en tire infeasible in terv al of theorem 1 is con tained in C P[ ^ <] if 2 < 2 . That is, P [ ^ <]2 [:5 ();:5 + ()] (:5 ();:5 + ()) if 2 < 2 . It follo ws that P [P [ ^ <]2C P[ ^ <] ]P [ 2 < 2 ]P [ 2 2C 2]!: When e = um, so that the w orst-case bias holds, equalit y holds in the rst of the ab o v e inequalities, and if C 2 has the form ( 2 ;1), then equalit y holds in the second. Under these circumstances, then, C P[ ^ <] ’s asymptotic co v erage probabilit y is exactly . A ccordingly , w e use condence in terv als for 2 with no nite upp er b ounds and in terpret the en tire pro cedure as pro viding inference on w orst-case bias. Our condence in terv al for 2 comes from in v erting an asymptotically size- (1) test based on the robust rst stage F-statistic F = ^ 2 n ^ w . Note that, b ecause w e ha v e only one instrumen t, this F-statistic equals the square of the con v en tional robust t-statistic for testing the n ull that = 0. W e ha v e b y assumptions L, B.2, and C that this test statistic asymptotically has a non-cen tral c hi-square distribution with one degree 29 T able 8: Critical V alues V alues under n ull h yp othesis Critical v alues W orst-case P [ ^ <] 2 0 F |t-stat| 0.9 :5:4 0.064 2.88 1.697 0.9 :5:3 0.275 3.448 1.857 0.9 :5:2 0.708 4.544 2.132 0.9 :5:1 1.642 6.571 2.563 0.9 :5:05 2.706 8.564 2.926 0.9 :5:01 5.412 13.017 3.608 0.95 :5:4 0.064 4.086 2.021 0.95 :5:3 0.275 4.845 2.201 0.95 :5:2 0.708 6.203 2.491 0.95 :5:1 1.642 8.565 2.927 0.95 :5:05 2.706 10.822 3.29 0.95 :5:01 5.412 15.77 3.971 0.99 :5:4 0.064 7.045 2.654 0.99 :5:3 0.275 8.203 2.864 0.99 :5:2 0.708 10.043 3.169 0.99 :5:1 1.642 13.017 3.608 0.99 :5:05 2.706 15.77 3.971 0.99 :5:01 5.412 21.648 4.653 Critical v alues in terms of robust rst-stage F-statistics and absolute v alues of robust rst-stage t-statistics needed to reject the listed n ull h yp otheses at the lev el. of freedom and noncen tralit y parameter 2 : F = ( + z 0 Mu z 0 Mz ) 2 n ^ w = ( p n p ^ w + p n p ^ w z 0 Mu z 0 Mz ) 2 = ( c p ^ w + p n p ^ w z 0 Mu z 0 Mz ) 2 d ! 2 1 ( 2 ): So that our condence in terv als ha v e no nite upp er b ound, w e in v ert the test of the n ull h yp othesis H 0 : 2 = 2 0 against the one-sided alternativ e H 1 : 2 > 2 0 . T o obtain a test with asymptotic size equal to 1, w e reject the n ull when F >G 1 (1; 2 0 ), where G 1 (p; 2 0 ) is the quan tile function of the 2 1 ( 2 0 ) distribution. G 1 (1; 2 0 ) is strictly increasing in 2 0 , so the condence in terv al formed b y in v erting the test is ( 2 ;1), where 2 is the smallest v alue of 2 0 rejected b y this test. 17 T able 8 presen ts critical v alues for this test at signicance lev els 2f:9;:95;:99g for n ull h yp otheses c hosen suc h that :5 ( 0 ), the w orst-case v alues of P [ ^ <], tak e the listed v alues. Because researc hers ma y b e more familiar with and ha v e b etter in tuition ab out rst-stage t-statistics, critical v alues for rejection based on the absolute v alue of the rst-stage t-statistic are pro vided, as w ell. T o add p ersp ectiv e to the commonly used rule of th um b that instrumen ts are reasonably strong when F 10, table 9 pro vides the condence in terv als for w orst-case P [ ^ < ] obtained when F = 10. This heuristic originated with Staiger and Sto c k (1997), who use evidence from Mon te Carlo sim ulations to prop ose that a homosk edastic F-statistic of 10 or greater is sucien t to alla y most concerns ab out instrumen t 17 If no v alues of 2 0 can b e rejected, the condence in terv al is instead [0;1). 30 T able 9: Condence In terv als at F = 10 (|t-stat|=3:162) W orst-case P [ ^ <] 2 0.9 (:47;:53) (3:537;1) 0.95 (:435;:565) (2:303;1) 0.99 (:298;:702) (0:697;1) Condence in terv als for w orst-case P [ ^ < ] and 2 with asymptotic co v erage probabilit y when the rst-stage robust F-statistic equals 10. w eakness when errors are homosk edastic and uncorrelated. This w as widely rep eated as an easily remem b ered rule of th um b (see, for example, Angrist and Pisc hk e (2009) (pg. 213)), and the recen t surv ey pap er Andrews et al. (2019) up dates this folk wisdom with sim ulations suggesting that an eectiv e F-statistic 18 of 10 or more also app ears to serv e w ell in empirically plausible designs with heterosk edastic or dep enden t errors. Indeed, table 9 sho ws that F = 10 corresp onds with fairly tigh t condence in terv als for w orst-case P [ ^ <], if one is willing to accept an asymptotic co v erage probabilit y of :95 or lo w er. 10 Bounds in Empirical W ork In this section, w e lo ok at the condence in terv als computed for a set of pap ers published in the American Economic Review (AER). In order to assess curren t empirical practice with IV mo dels, Andrews et al. (2019) catalogue a sample of 17 pap ers from the AER, and their supplemen tary materials con tain statistics transcrib ed from ev ery IV mo del estimate rep orted in these pap ers. This include the rst-stage co ecien ts’ p oin t estimates and standard errors, whic h w e use here to compute rst-stage t-statistics and thereb y the condence in terv als for w orst-case P [ ^ < ]. This analysis uses the 90 sp ecications from 11 articles that in v olv e a single instrumen t and endogenous v ariable and that rep ort a p oin t estimate and standard error for the instrumen t’s rst-stage co ecien t. The condence in terv als for 2f:99;:95;:9g for eac h sp ecication are plotted in gure 5, with the sp ecications ordered along the horizon tal axis b y the width of their condence in terv als. Eac h sp ecication has the three condence in terv als (one at eac h v alue of ) o v erlaid up on eac h other and distinguished b y color, with the shorter (lo w er-) in terv als at the fore. The gure con tains t w o panels, one including all the sp ecications, and another including only the 20 main sp ecications, as designated b y Andrews et al. (2019). The main sp ecication designation is in tended to remo v e robustness tests and the lik e, allo wing us to fo cus on a set of sp ecications more represen tativ e of core researc h questions in economics. F or full details on their selection criteria, see their supplemen tary materials. Imp ortan tly , this exercise assumes that the rst-stage standard errors are estimated appropriately b y 18 As noted in the in tro duction, this is the test statistic used b y Mon tiel Olea and Pueger (2013) and is equiv alen t to the robust rst-stage F-statistic in our setting. 31 All Specifications Main Specifications 0 25 50 75 0 5 10 15 20 0.00 0.25 0.50 0.75 1.00 Specification Number P[β ^ <β] α = .9 α = .95 α = .99 α−Confidence Intervals For Worst−Case P[β ^ <β] Figure 5: Condence in terv als for w orst-case P [ ^ <] with asymptotic co v erage probabilit y in sp ecica- tions collected b y Andrews et al. (2019). the studies’ resp ectiv e authors (i.e. that they correctly accoun t for heterosk edasticit y , clustering, serial correlation, and so on). The results are mostly reassuring. A ma jorit y of sp ecications, as w ell as main sp ecications, ha v e condence in terv als for all three v alues of whose lo w er and upp er b ounds are graphically indistinguishable from :5. Still, some studies do app ear to b e at risk of w eak-instrumen t bias. 6:66% of all sp ecications and 10% of main sp ecications ha v e condence in terv als wider than (:4;:6) at = :95. One sp ecication has a condence in terv al of [0; 1] at = :95, and three more ha v e this condence in terv al at = :99, one of whic h is a main sp ecication. While most sp ecications do not app ear to ha v e a w eak instrumen t problem, grounds to w orry ab out this issue app ear to b e common enough to recommend that researc hers use this or a similar test to assess the risk. 11 Conclusion This pap er dev elops an alternativ e to the Mon tiel Olea and Pueger (2013) test for w eak-instrumen t bias in the 2SLS estimator with one instrumen t and one endogenous regressor. The k ey con tribution is in the concept of bias used; the Mon tiel Olea and Pueger (2013) test pro vides inference on the w orst-case relativ e Nagar bias of the 2SLS estimator, whereas the presen t w ork instead concerns inference on the asymptotic w orst-case v alue of P [ ^ < ]. This is the quan tile that o ccupies in the estimator’s asymptotic sampling distribution and ma y b e simpler to in terpret than the relativ e Nagar bias. This is esp ecially in our setting, where the mean bias, E[ ^ ], whic h the Nagar bias attempts to appro ximate, do es not exist. This test, m uc h 32 lik e that of Mon tiel Olea and Pueger (2013), can b e made robust to heterosk edasticit y and dep endence in the errors as long as the rst-stage OLS estimator con v erges to normalit y and has a consisten tly estimable asymptotic v ariance. P erhaps the most ob vious a v en ue for future w ork is the extension of the test to apply to mo dels with m ultiple instrumen ts. The strategy used in this pap er to deriv e b ounds on P [ ^ < ] relies hea vily on a function, s( 0 ), with particular prop erties (see the pro of of theorem 1 in the app endix). W e ha v e not found it straigh tforw ard to nd an analogous function with similar prop erties for use in the mo del with m ultiple instrumen ts or to nd a more direct approac h to deriving b ounds, and so lea v e the area for future researc h. The presen t case of a single instrumen t do es, ho w ev er, app ear to b e the most pressing, as it app ears to b e the most commonly used IV mo del in empirical w ork and is the case where the Nagar bias’s prop er in terpretation is least clear. Extending the metho d to accommo date m ultiple endogenous v ariables ma y also b e of use, but w ould require a more fundamen tal shift to p erforming inference on a m ultiv ariate analogue of P [ ^ <] or another bias concept en tirely . It is not immediately clear what form suc h an analogue should tak e to b oth b e in terpretable and mak e the deriv ation of an inference pro cedure tractable. Extensions to p erform inference onP [ ^ IV <] for estimators other than 2SLS ma y also b e useful, but the predominance of 2SLS - esp ecially in the just-iden tied case, where it coincides with the limited information maxim um lik eliho o d estimator - mak es it seem lik e suc h extensions w ould rarely b e used in empirical practice. A clear impro v emen t o v er the presen t w ork, as w ell as its progenitors in Mon tiel Olea and Pueger (2013) and Sto c k and Y ogo (2005), w ould b e a pro cedure that tests directly for bias, as opp osed to w orst-case bias. Dealing with w orst-case bias mak es the problem signican tly more tractable, but the added complication ma y w ell b e w orth the eort, as the w orst-case bias is lik ely a signican t o v erstatemen t of the true bias in man y or most empirical settings. 33 Chapter 3: A verage Partial Eects in Short-T Panels with Correlated Random Coecients 12 Introduction This pap er prop oses iden tifying exclusion restrictions on the linear panel data mo del with correlated random co ecien ts (CR Cs) and an arbitrarily short, xed time dimension. A large literature has studied the CR C mo del in panels where the n um b er of time p erio ds, T , asymptotically div erges or is xed at a v alue larger than the n um b er of regressors, K . See P esaran (2015) for a textb o ok treatmen t. Comparativ ely little progress has b een made, ho w ev er, for panels with T < K . This is not for lac k of p oten tial applications: regressors commonly outn um b er time p erio ds in analyses of micro economic panel datasets suc h as household surv eys, and there is little reason to exp ect that co ecien ts are an y more homogeneous in these settings than in the longer panels that the literature has fo cused on. W e con tribute to the T < K case b y iden tifying a v erage partial eects (APEs) through restrictions that exclude co ecien t-sp ecic subsets of the regressors from eac h co ecien t’s conditional exp ectation function (conditional on the regressors). The CR C mo del cen ters on the follo wing outcome equation for the i th individual, from whom w e observ e data in T time p erio ds: y i = K X k=1 X fkg i fkg i +e i : (4) Here, y i , X f1g i ;:::;X fKg i , and e i are resp ectiv ely T -v ectors of individual i’s outcomes, regressors, and id- iosyncratic errors, stac k ed o v er time. The co ecien ts fkg i are individual-sp ecic, and, without further assumptions, ma y dep end arbitrarily on the regressors. W e observ e data from N individuals sampled inde- p enden tly from a common p opulation. E[ fkg i ] is the a v erage partial eect on y i of a h yp othetical in terv en tion that alters the v alue of X fkg i with- out altering the co ecien ts, idiosyncratic errors, or other regressors. F orming the TK individual-sp ecic regressor matrix X i X f1g i ::: X fKg i ,E[ fkg i jX i =] is the conditional a v erage partial eect (CAPE) for individuals whose (pre-in terv en tion) regressor matrix equals . These (C)APEs are our parameters of in terest. Estimating the APEs when T K is conceptually straigh tforw ard: under regularit y conditions, a con- sisten t estimator can b e obtained b y estimating the co ecien ts separately for eac h individual using ordinary least squares and then a v eraging the resulting estimates o v er individuals. P esaran and Smith (1995) term this the mean group estimator. W e sho w, ho w ev er, that the (C)APEs are generally not iden tied when 34 T <K . W e obtain iden tication when T < K b y in tro ducing exclusion restrictions that nonparametrically ex- clude regressors from co ecien ts’ conditional exp ectation functions. Dep ending on these dep endence re- strictions, the regressors’ supp ort, and the CAPEs’ con tin uit y in the regressors, a giv en regressor’s CAPEs ma y b e iden tiable ev erywhere, no where, or only in certain subp opulations. APEs for (sub)p opulations with iden tied CAPEs can b e iden tied simply as the (sub)p opulations’ exp ected CAPEs, so w e fo cus our analysis on iden tifying CAPEs. W e mak e one dep endence restriction for eac h co ecien t, sp ecifying the set of regressors that the co ecien t is mean dep enden t on. F or example, one migh t assume that co ecien t 1 mean-dep ends 19 only on regressors 1 and 2; co ecien t 2 is mean indep enden t of the regressors; and co ecien t 3 mean-dep ends only on regressors 1, 4, and 9. These dep endence restrictions are E[ f1g i jX i ] =E[ f1g i jX f1g i ;X f2g i ] E[ f2g i jX i ] =E[ f2g i ] E[ f3g i jX i ] =E[ f3g i jX f1g i ;X f4g i ;X f9g i ]: Similar restrictions routinely , but implicitly , come up in applied w ork through the use of in teraction terms to mo del eect heterogeneit y . Through that lens, our results can b e read as statemen ts ab out the iden ti- cation of nonparametric in teraction terms. Researc hers attempting to parametrically mo del heterogeneous CAPEs through a set of in teractions should b e a w are that iden tication in suc h mo dels ma y deriv e only from ad ho c functional form assumptions if the (C)APEs of in terest are not nonparametrically iden tied b y the dep endence restrictions implied b y their sp ecications. Whether a giv en set of dep endence restrictions iden ties a giv en subp opulation’s CAPEs dep ends hea vily on the regressors’ supp ort, and w e cannot p ossibly pro vide results for ev ery empirically relev an t case. Instead, w e presen t an iterativ e pro of strategy w e refer to as the iden tication algorithm, whic h researc hers can use to determine the exten t of iden tication under a particular set of assumptions. W e presen t a n um b er of examples applying the algorithm under v arious assumptions and sho w iden tication of all CAPEs under substan tially w eak er dep endence restrictions than those treated in Cham b erlain (1992) and Graham and P o w ell (2012), the t w o pap ers most closely related to ours. Cham b erlain (1992) dev elops an estimator of the APEs under dep endence restrictions that, under some ordering of the co ecien ts and for some rT 1, lea v e E[ fkg jX i ] unrestricted for kr and imp ose the 19 While p erhaps nonstandard terminology , it is con v enien t to refer to fkg mean-dep ending on the regressors that w e do not assume to b e excluded from fkg ’s conditional exp ectation function. 35 restriction E[ fkg i jX i ] = E[ fkg i ] for k > r . Graham and P o w ell (2012) extend this framew ork to instead allo w r =T co ecien ts to mean-dep end on the regressors, but nd that this leads to irregular iden tication of the APEs, meaning that p N -consisten t estimation is imp ossible. Our setting nests theirs, and so inherits this dicult y . Compared to these pap ers, whic h allo w eac h co ecien t to mean-dep end either on all or on none of the regressors, our fundamen tal con tribution is in allo wing eac h co ecien t to mean-dep end on an arbitrary subset of the regressors. This larger class of dep endence restrictions allo ws a giv en set of CAPEs to b e iden tied under a m uc h larger degree of total dep endence, measured as the sum o v er co ecien ts of the n um b er of regressors that eac h co ecien t mean-dep ends on. While the largest degree of total dep endence allo w ed b y this prior w ork is linear in K , w e nd settings that allo w iden tication of all CAPEs while total dep endence gro ws quadratically with K , instead. A natural setting in whic h this extra exibilit y is useful arises in the presence of essen tial heterogeneit y . The idea b ehind essen tial heterogeneit y , so termed in Hec kman et al. (2006) (page 391), is that if the panel’s individuals are economic agen ts that ha v e an in terest in the outcome, can inuence their regressor v alues, and ha v e at least partial kno wledge of the co ecien ts, then individuals with more to gain from a larger v alue of regressor k (i.e. individuals with larger v alues of co ecien t k ) will inuence that regressor to tak e larger v alues. Eac h co ecien t then has a clear eect on its o wn regressor, inducing mean dep endence. This mec hanism do es not, ho w ev er, immediately imply dep endence b et w een a co ecien t and regressors other than its o wn. Concrete examples of essen tial heterogeneit y ab ound. P eople who exp ect a higher return to education will obtain more of it. Go v ernmen ts that exp ect larger returns to a p olicy will b e more lik ely to enact it and to do so earlier in time. Ra v allion (2009) (page 38) argues that essen tial heterogeneit y deserv es closer atten tion in dev elopmen t researc h, and Go o dman-Bacon (2019) (page 15), without using the term explicitly , p oin ts out sev eral cases where it w ould b e of concern in dierences-in-dierences estimation. W e presen t a formal mo del of essen tial heterogeneit y for our panel setting and sho w that it leads to the dep endence restrictions E[ fkg i jX i ] = E[ fkg i jX fkg i ] for all k . Relaxing these restrictions to allo w the in tercept to mean-dep end on all regressors (i.e. to include xed eects), while imp osing these exclusion restrictions on all slop e co ecien ts, w e sho w that eac h regressor’s CAPEs are iden tied in the subp opulation where that regressor v aries o v er time, and w e do so requiring only that T 3, regardless of the n um b er of regressors. Allo wing for suc h a dep endence pattern using the metho ds of Cham b erlain (1992) or Graham and P o w ell (2012) w ould require one to allo w ev ery co ecien t to mean-dep end on ev ery regressor and therefore require T >K or TK , resp ectiv ely . 36 W e dev elop metho ds for estimation and inference based on the regression of y i on a set of tec hnical regressors obtained b y in teracting eac h regressor in equation 4 with individual-sp ecic dumm y v ariables or a xed, nite basis expansion of the regressors that its co ecien t mean-dep ends on. This regression estimates what w e call the pro jection mo del, whic h, when correctly sp ecied, pro vides estimates of regularly iden tied (C)APEs. The pro jection mo del ma y not alw a ys b e correctly sp ecied, ho w ev er; researc hers ma y imp ose functional form restrictions for the sak e of parsimon y , and there are cases where the pro jection mo del’s use of nite basis expansions prev en ts it from nesting the full set of p ossible mappings from the regressors to the CAPEs. In this ev en t, w e treat the pro jection mo del as a pro jection of the truth on to a exible, appro ximating submo del, and dev elop estimation and inference results for its pseudo-parameters. F ully nonparametric estimation of our CR C mo del in its fullest generalit y is non trivial and b ey ond the scop e of this pap er. 12.1 Relation to the Literature This pap er adds to a large literature prop osing a v ariet y of other approac hes to estimating CR C mo dels with panel data. As discussed ab o v e, when T is sucien tly large, the APEs can b e iden tied and estimated with the co ecien ts’ dep endence on the regressors left en tirely unrestricted. The mean group estimator, of P esaran and Smith (1995), is protot ypical of this approac h. Breitung and Salish (2021) prop ose an estimator that augmen ts the mo del with a set of regressors that con trol for the bias induced in least squares estimation b y correlated co ecien t heterogeneit y . This generalizes Mundlak (1978b)’s alternativ e form ulation of the common xed eects estimator (Mundlak (1961)), wherein the mo del is augmen ted b y the regressors’ individual-sp ecic means. Other approac hes to the problem imp ose additional structure on the mo del to enable the use of shorter T , impro v e eciency , b etter estimate individuals’ CAPEs, or include time-v arying parameters 20 . Those most closely related to this pap er are in Cham b erlain (1992) and Graham and P o w ell (2012), as previously discussed. An early example is Mundlak (1978a), whic h extends the just-discussed augmen ted regression of Mundlak (1978b) b y in teracting the regressors’ individual-sp ecic means with the non-in tercept regressors, as w ell. This mo del, ho w ev er, is iden tied b y its functional form and do es not estimate APEs with the generalit y of Breitung and Salish (2021). A burgeoning, recen t literature imp oses additional structure b y assuming that eac h individual b elongs to one of a n um b er of groups and that co ecien ts dier only b et w een individuals in dieren t groups. Sun (2005) and Hahn and Mo on (2010) p osit that suc h grouping structures could arise in economic settings with m ultiple equilibria. Sun (2005) prop oses to capture the group structure via a nite mixture mo del 20 F or a w a y to mo del time-v arying eects in our framew ork, see Section 14.5.1’s discussion of Graham and P o w ell (2012). 37 that parametrically mo dels individuals’ probabilities of b elonging to eac h group and pro vides a maxim um lik eliho o d estimator in settings with xed T >K . Lin and Ng (2012) prop ose an estimator that attempts to minimize the sum of squared residuals o v er the groups’ co ecien ts and individuals’ group mem b erships using a co ordinate descen t algorithm inspired b y a commonly used K-means clustering algorithm. Saradis and W eb er (2015) sho w that the n um b er of groups can b e consisten tly estimated b y a related metho d with xed T <K . Bonhomme and Manresa (2015) dev elop asymptotic results for this t yp e of estimator, demonstrating consistency and v alid inference for group co ecien ts and mem b erships under asymptotics where N;T!1, but also inconsistency with xed T . Cheng et al. (2019) relax the grouping assumption b y allo wing eac h individual to b elong to m ultiple groups, eac h of whic h determines the v alue of a subset of their co ecien ts. Su et al. (2016) and W ang et al. (2018) also study the mo del with group ed co ecien ts, using p enalized extrem um estimators to estimate the co ecien ts and group mem b erships. Chernozh uk o v et al. (2019), gain parsimon y instead b y imp osing a factor structure on the co ecien ts, whic h are allo w ed to v ary o v er b oth time and individual. This pap er prop oses to estimate the factors via a least squares estimator p enalized so as to promote sparsit y in the n um b er of factors. All of these pap ers use asymptotics where b oth N;T!1. A v ariet y of other extensions of the CR C panel data mo del ha v e also b een considered. Murtazash vili and W o oldridge (2008) and Laage (2019) dev elop results for the mo del with endogeneit y in the idiosyncratic error term. Arellano and Bonhomme (2012) w ork in a setting m uc h lik e that of Cham b erlain (1992), where up to T 1 of the co ecien ts can dep end arbitrarily on the regressors, while the other co ecien ts are homogeneous. They dev elop restrictions on the idiosyncratic error term that allo w iden tication not only of the heterogeneous co ecien ts’ means, but their higher momen ts and c haracteristic functions, as w ell. The remainder of this pap er is laid out as follo ws: Section 13 la ys out the mo del in greater detail and in tro duces notation to simplify subsequen t discussion. Section 14 dev elops the iden tication algorithm and uses it to sho w iden tication in a n um b er of examples. This section also dev elops a motiv ating mo del of essen tial heterogeneit y and discusses irregular iden tication. Section 15 dev elops metho ds for estimation and inference. Section 16 presen ts an empirical application based on F razis and Lo ew enstein (2005), estimating the eects of job training on w ages and testing for the presence of essen tial heterogeneit y . Section 17 concludes and discusses a v en ues for further researc h. 13 The Model W e consider the balanced panel data mo del with correlated random co ecien ts where w e observ e data on N cross-sectional units in eac h of T time p erio ds generated according to the follo wing data generating pro cess 38 (DGP): y it = K X k=1 X fkg it fkg i +e it for i2f1;:::;Ng;t2f1;:::;Tg: (5) Here, there are K regressors and their co ecien ts are allo w ed to v ary o v er individuals, but not o v er time. y it , e it , and X fkg it and fkg i for all k2f1;:::;Kg are scalars. A more usable form of this mo del comes from stac king the outcome, regressors, and errors o v er time for eac h individual so that y i [y i1 ;:::;y iT ] 0 , e i [e i1 ;:::;e iT ] 0 , and X fkg i [X fkg i1 ;:::;X fkg iT ] 0 for all k 2f1;:::;Kg are all T -v ectors. Also form the matrix of individual i’s regressors X i [X f1g i ;:::;X fKg i ]2R TK and the v ector of individual i’s co ecien ts i = [ f1g i ;:::; fKg i ] 0 2R K1 . With these quan tities, w e ha v e y i = K X k=1 X fkg i fkg i +e i (6) =X i i +e i : (7) y i and X i are observ able, while i and e i are not. W e also assume strict exogeneit y of the regressors and existence of the conditional and unconditional exp ectations of i . E[e i jX i ] =0: (8) E[j fkg i jjX i =]<1 for all k2f1;:::;Kg and 2 supp(X i ): (9) E[j fkg i j]<1 for all k2f1;:::;Kg: (10) As a function of the regressors, the righ t-hand side of equations 5, 6, and 7 are assumed to yield the p oten tial outcome of individual i under an y h yp othetical v alue of X i 2 supp(X i ). In this sense, these equations are structural, and fkg i is a causal parameter represen ting the partial eect on y i of a h yp othetical, exogenous in terv en tion that alters the v alue of X fkg i : for an y t2f1;:::;Tg, @y i @Xit = fkg i : This h yp othetical in terv en tion is exogenous in the sense that it do es not aect the v alues of e i or i . A ccordingly , E[ fkg i ] is the APE of the same t yp e of in terv en tion and E[ fkg i jX i = ] is the CAPE for individuals whose (pre-in terv en tion) regressor matrix equals . The CR C mo del is in terpreted similarly b y W o oldridge (2005), Graham and P o w ell (2012), and Laage (2019). W e seek to iden tify , estimate, and p erform inference on (functions of ) the CAPEs. In particular, w e are in terested in the APEs, obtained b y taking the CAPEs’ exp ected v alues: E[ fkg i ] = E[E[ fkg i jX i ]]. In the case that some subp opulations’ CAPEs cannot b e iden tied, w e are still in terested in the APEs for those subp opulations with iden tied CAPEs. 39 In studying iden tication, w e treat the distribution of the observ ables as kno wn. Of course, this is unhelpful in studying estimation and inference, so when w e co v er these topics in Section 15, w e will further assume that individuals are indep enden tly sampled from a common p opulation: ( i ;e i ;X i ) i.i.d: (11) 13.1 Notation The more standard notation w e use includes the use of 1 L to denote the L-v ector of ones, 0 L for the L- v ector of zeros, 1 L1L2 and 0 L1L2 for the L 1 L 2 matrices of ones and zeros, and I L for the LL iden tit y matrix. With an arbitrary LJ matrix M , M + denotes the Mo ore-P enrose pseudoin v erse of M , and Q(M)I L MM + is the orthogonal pro jection matrix o of the columns of M . F or an arbitrary matrix A and v ector a w e denote the F rob enius matrix norm b y kAk F and the Euclidean v ector norm b ykak. The theory presen ted here rev olv es around statemen ts concerning sets of regressors and co ecien ts and the follo wing notation for submatrices and sub v ectors, while nonstandard, mak es the ensuing discussion m uc h more straigh tforw ard. All index sets in this pap er are ordered and assumed to b e in ascending order. Where Sfs 1 ;:::;s P gf1;:::;Kg is an arbitrary index set of a subset of the regressors, X S i X fs1g i ::: X fs P g i refers to the submatrix of individual i’s regressor matrix con taining only the columns (regressors) whose indices are in S . Lik ewise, S i fs1g i ::: fs P g i 0 is the corresp onding sub v ector of individual i’s co ecien ts. Note that the individual regressor v ectors and co ecien ts X fkg i and fkg i dened in Section 13 follo w consisten tly from this notation when the singleton set fkg is used. Matrices and p for some subscript p are generally used in this pap er to denote realizations of the random v ariable X i , and the notation S or S p similarly refers to the submatrices of or p retaining only the columns with indices in S . When S is the empt y set, w e dene X fg i = fg = fg p 0 T and fg i 0 1 . W e dene the complemen t of an index set Sf1;:::;Kg to b eS c f1;:::;KgS . Unions, in tersections, complemen ts, and set dierences of ordered sets are tak en to b e ordered, as w ell. W e denote the cardinalit y of S b yjSj. W e also in tro duce compressed notation for the conditional and unconditional exp ectations of the co e- 40 cien ts. F or an arbitrary random v ariable A and its realization a, w e use (A)E[ i jA]; (A =a)E[ i jA =a]; andE[ i ]: W e also extend the ab o v e notation for sub v ectors of the co ecien ts to these ob jects: S (A)E[ S i jA]; S (A =a)E[ S i jA =a]; and S E[ S i ]: 14 Identication Iden tication is based on the conditional momen t restriction E[y i jX i =] =(X i =) for all 2 supp(X i ); (12) whic h follo ws from equations 7, 8, and 9. Throughout this section, E[y i jX i =] is assumed to b e kno wn for all 2 supp(X i ). If all the co ecien ts mean-dep end only on regressors with nitely man y p oin ts of supp ort, E[y i jX i ] can b e nonparametrically expressed as a linear function of a nite n um b er of tec hnical regressors. These tec hnical regressors can b e formed b y in teracting eac h regressor, X fkg i , with dumm y v ariables that indicate eac h p oin t in the supp ort of the regressors whic h fkg i mean-dep ends on. Iden tication w ould then follo w or fail from the same rules that go v ern the homogeneous linear mo del. This approac h b ecomes imp ossible under more general supp ort and dep endence restrictions, ho w ev er. Instead, w e presen t three lemmas and some corollaries that are useful for iden tifying CAPEs (and therefore APEs) via our iden tication algorithm. The algorithm is an iterativ e pro cedure that rep eatedly applies the lemmas to the mo del, iden tifying more CAPEs in eac h iteration or terminating if no more CAPEs can b e iden tied. After dev eloping the algorithm, w e apply it to a n um b er of example cases, b oth to pro vide immediately useful iden tication results and to demonstrate the metho d’s use. W e in tro duce the algorithm precisely b ecause w e cannot p ossibly analyze iden tication in ev ery empirically relev an t setting, and this pap er is primarily in tended to presen t the algorithm so that researc hers can study iden tication in their o wn, particular settings. Imp ortan tly , the lemmas and algorithm pro vide sucien t, but not necessary , conditions for iden tication. As a result, w e cannot guaran tee that the algorithm will nd ev ery CAPE that is actually iden tied b y a giv en set of assumptions. Nonetheless, our examples mak e clear that the algorithm can pro duce iden tication results substan tially more general than those a v ailable from past w ork, and in all of our examples, all 41 regressors’ CAPEs are ev erywhere either iden tied b y the algorithm or easily sho wn to b e noniden tied. 14.1 Noniden tication in the General Case Where T <K T o frame the problem, w e start b y sho wing that neither (X i =) nor are iden tied under the assump- tions stated th us far. In tuitiv ely , noniden tication comes from regressor collinearit y , just as it do es in the mo del with homogeneous co ecien ts. The problem is more sev ere in the CR C mo del, ho w ev er, b ecause w e cannot p o ol data b et w een subp opulations with dieren t regressor v alues; they ma y ha v e dieren t CAPEs. Here, then, regressor collinearit y within the subp opulation where X i = means that at least some of that subp opulation’s CAPEs, (X i =), are noniden tied. When T <K , X i has more columns than ro ws, and this problem aects all subp opulations. F ormally , w e sho w the observ ational equiv alence of the true co ecien ts with alternativ e co ecien ts that ha v e dieren t conditional and unconditional means. Consider the alternativ e co ecien ts ~ i i +b i , where b i Q(X 0 i )a i for an arbitrary random v ector a i with supp ort in R T . Q(X 0 i ) represen ts the pro jection on to the k ernel of X i . 21 The supp ort of b i is in the k ernel of X i , so X i b i =0 T , X i i =X i ~ i , and y i =X i i +e i =X i ~ i +e i ; implying observ ational equiv alence. The CAPEs and APEs implied b y the alternativ e co ecien ts are ~ (X i =)E[ ~ i jX i =] =(X i =) +Q( 0 )E[a i jX i =], and ~ E[ ~ i ] = +E[b i ]: E[a i jX i = ] can tak e an y v alue in R T , so, in general, ~ (X i = )6=(X i = ) when the k ernel of has p ositiv e dimension. Lik ewise, when the k ernel of X i has p ositiv e dimension with nonzero probabilit y , at least one elemen t of E[b i ] can tak e an y v alue in R, leading to noniden tication of that APE. When T <K , the k ernel of X i has p ositiv e dimension ev erywhere, presen ting a fundamen tal dicult y in iden tifying CR C mo dels with short panels. 14.2 Dependence Restrictions T o iden tify (C)APEs when T <K , w e no w in tro duce our dep endence restrictions. These tak e the form fkg (X i =) = fkg (X C k i = C k ) for all 2 supp(X i ), (13) 21 Recall that the k ernel of X i is the space of K -v ectors w suc h that X i w =0 T . 42 where C k f1;:::;Kg is the index set of all the regressors that co ecien t k mean-dep ends on. These sets are not unique: If fkg (X i = ) = fkg (X C k i = C k ), then fkg (X i = ) = fkg (X C k [S i = C k [S ) for an y other index set Sf1;:::;Kg. As will b ecome clear, it is generally b est to pic k C k to b e as small a set as p ossible, though exceptions arise when regressors ha v e a single p oin t in their supp ort or are functionally dep enden t. Whether or not C k includes the indices of regressors with a single p oin t of supp ort, suc h as in tercept terms or deterministic trends, has no eect on iden tication. W e adopt the con v en tion of letting all co ecien ts mean-dep end on suc h regressors. Section 14.3.4 deals with the case of functionally dep enden t regressors. Generally , w e dene C k so that it includes all the regressors that cannot b e altered without p ossibly altering fkg (X i =). With this in terpretation in mind, the inclusion of regressors with a single p oin t of supp ort follo ws from the fact that they cannot b e altered in the rst place. Imp ortan tly , mean dep endence of a co ecien t on a regressor means that the co ecien t mean-dep ends on the en tire time path of that regressor. With C k =fc 1 k ;:::;c p k gf1;:::;Kg, this means E[ i jX C k i ] is a function of ev ery elemen t of the matrix X C k i : (X C k i ) =(X fc 1 k g i1 ;:::;X fc 1 k g iT ;:::;X fc p k g i1 ;:::;X fc p k g iT ): As a sort of dual to C k , w e ha v e D k , the index set of all the co ecien ts that mean-dep end on the k th regressor. F ormally , D k fjjk2C j and j2f1;:::;Kgg for all k2f1;:::;Kg: (14) The dep endence restrictions are usually easier to represen t visually using what w e term a dep endence matrix. This is a KK binary matrix whose ro ws and columns corresp ond to co ecien ts and regressors, resp ectiv ely . The (i;j) th elemen t of the dep endence matrix equals 1 when the i th co ecien t mean-dep ends on the j th regressor, and otherwise equals 0. T o demonstrate all of this notation with an example, supp ose that K = 3 and that co ecien t 1 mean- dep ends on regressors 1 and 2, co ecien t 2 is mean-indep enden t of the regressors, and co ecien t 3 mean- dep ends on ev ery regressor. In this case, C 1 =f1; 2g, C 2 =fg, and C 3 =f1; 2; 3g. The dep endence matrix is as follo ws, with the ro ws and columns suggestiv ely lab eled: X f1g i X f2g i X f3g i f1g i 1 1 0 f2g i 0 0 0 f3g i 1 1 1 : 43 Note that the sets C k and D k can b e easily read o the dep endence matrix: C k con tains the indices of columns with 1s in the k th ro w, while D k con tains the indices of ro ws with 1s in the k th column. In this example, D 1 =f1; 3g, D 2 =f1; 3g, D 3 =f3g. 14.2.1 Motiv ation: Essen tial Heterogeneit y In this section, w e motiv ate the dep endence restrictions b y sho wing that they arise in a simple formal- ization of the essen tial heterogeneit y concept discussed in the in tro duction. W e mo del the regressors X it [X f1g it ;:::;X fKg it ] as b eing determined in eac h p erio d b y individual i to solv e a dynamic decision problem wherein the individual’s instan taneous utilit y in p erio d t is linear in y it and (p oten tially random) regressor-sp ecic cost functions. Because the utilit y function is additiv ely separable in the dieren t re- gressors, w e nd that the equilibrium path of eac h regressor dep ends only on its o wn cost functions and co ecien t, fkg i . When ev ery regressor’s co ecien t and cost functions are indep enden t of the other re- gressors’ co ecien ts and cost functions, w e obtain the exclusion restrictions fkg (X i ) = fkg (X fkg i ) for all k2f1;:::;Kg. Consider the nite-horizon dynamic decision problem, where, in p erio d t2f1;:::;Mg, individual i uses bac kw ard induction to maximize exp ected, discoun ted future utilit y o v er the v ector of c hoice v ariables X it . The individual’s instan taneous utilit y function in p erio d t is u t (y it ; (X im ) t m=1 ) =y it K X k=1 c fkg it ((X fkg im ) t m=1 ); their discoun t rate is , and the outcome v ariable y it is determined, as b efore, b y equation 5. F or all k2 f1;:::;Kg, the c hoice v ariable X fkg it is constrained to tak e a v alue in the set k . Dene c im (:) (c fkg im (:)) K k=1 to b e the tuple of random, time- and individual-sp ecic cost functions in p erio d t. The random, individual- sp ecic co ecien ts i are kno wn to the individual, as is their join t distribution with (c it (:)) M t=1 . The outcome sho c ks are indep enden t of the co ecien ts and cost functions: e it ? i ;c im (:) for all t;m2f1;:::;Mg: The co ecien ts, cost functions, and outcome sho c ks are dra wn b y nature b efore the individual acts, but c it (:) and e it are unkno wn to the individual un til the b eginning of p erio d t. c fkg it (:) can b e a function of past, as w ell as presen t, realizations of the k th c hoice v ariable, allo wing the mo del to include sources of state-dep endence suc h as switc hing costs or habit formation. F or example, if X fkg it is a binary treatmen t v ariable, then c fkg it (:) ma y b e a function giving the cost of en tering or lea ving treatmen t, dep ending on the 44 individual’s treatmen t history up to p erio d t. Denote the individual’s p erio d-t information set b y it ( i ; (c im (:);e im ) t m=1 ; (X im ;y im ) t1 m=1 ): Starting bac kw ards induction, the individual obtains the nal p erio d’s p olicy functions for eac h c hoice v ariable, ^ X iM ( iM ) ( ^ X fkg iM ( iM )) K k=1 , solving ^ X iM ( iM ) = arg max X iM u t (y iM ; (X im ) M m=1 ) sub ject to X fkg iM 2 k for all k2{1,...,K} = arg max X iM K X k=1 [X fkg iM fkg i c fkg iM ((X fkg im ) M m=1 )] +e iM : sub ject to X fkg iM 2 k for all k2f1;:::;Kg Assuming a unique solution exists, the additiv e separabilit y b et w een terms in the ob jectiv e function that in v olv e dieren t c hoice v ariables and the indep endence of the c hoice v ariables’ constrain ts allo ws the problem to b e decomp osed in to the K subproblems ^ X fkg iM ( iM ) = arg max X fkg iM X fkg iM fkg i c fkg iM ((X fkg im ) M m=1 ) sub ject to X fkg iM 2 k ; for all k2f1;:::;Kg, making it clear that ^ X fkg iM ( iM ) is a function only of fkg i , c fkg iM (:), and (X fkg im ) M1 m=1 . Next, obtaining the p olicy functions for p erio d M 1, the individual solv es ^ X iM1 ( iM1 ) = arg max X iM1 K X k=1 [X fkg iM1 fkg i c fkg iM1 ((X fkg im ) M1 m=1 )] +e iM + E[ K X k=1 [ ^ X fkg iM ( iM ) fkg i c fkg iM ( ^ X fkg iM ( iM ); (X fkg im ) M1 t=1 )] +e iM j iM1 ] sub ject to X fkg iM1 2 k for all k2f1;:::;Kg: 45 This can again b e decomp osed in to the K subproblems, ^ X fkg iM1 ( iM1 ) = arg max X fkg iM1 X fkg iM1 fkg i c fkg iM1 ((X fkg im ) M1 m=1 )+ E[ [ ^ X fkg iM ( iM ) fkg i c fkg iM ( ^ X fkg iM ( iM ); (X fkg im ) M1 t=1 )]j iM1 ] sub ject to X fkg iM1 2 k ; for all k2f1;:::;Kg. Muc h as b efore, ^ X fkg iM1 ( iM1 ) is a function only of fkg i ; c fkg iM1 (:); (X fkg im ) M2 m=1 ; and no w also the distribution of c fkg iM (:), conditional on iM1 . Con tin uing this recursion bac k through p erio d 1, w e nd that ^ X fkg it ( it ) is a function only of fkg i ;c fkg it (:); (X fkg im ) t1 m=1 ; and the distribution of (c fkg im (:)) M m=t+1 , conditional on it . Recursing forw ards through the p erio ds from 1 to M to substitute eac h p erio d’s p olicy function forw ard in to the next’s, w e nd that the individual’s equilibrium c hoices ( ^ X fkg it ) M t=1 dep end directly on the realizations of fkg i and (c fkg it (:)) M t=1 , and dep end on the other c hoice v ariables’ realized co ecien ts and cost functions only through their presence in the information sets ( it ) M t=1 . No w, where ^ y it = P K k=1 ^ X fkg it fkg i +e it , supp ose a researc her observ es the realized (^ y it ; ^ X it ) T t=1 for T M , 22 and w an ts to estimate or( ^ X i ). W e obtain non trivial dep endence restrictions if the dieren t c hoice v ariables’ co ecien ts and cost functions are indep enden t of eac h other. Sp ecically , if (c fkg it (:); fkg i ) M t=1 ? (c fjg it (:); fjg i ) M t=1 for k6=j; then w e also ha v e ( ^ X fkg it ;c fkg it (:); fkg i ) M t=1 ? ( ^ X fjg it ;c fjg it (:); fjg i ) M t=1 for k6=j; leading to the dep endence restrictions fkg ( ^ X i ) = fkg ( ^ X fkg i ) for all k2f1;:::;Kg: An immediate generalization divides the tuples ( fkg i ; (c fkg it (:)) M t=1 ) in to m utually exclusiv e groups and as- sumes that tuples in dieren t groups are indep enden t of eac h other. This leads to dep endence restrictions where eac h co ecien t mean-dep ends only on regressors with co ecien ts in the same group. This gener- alization is also v alid if the instan taneous utilit y function is mo died so that c hoice v ariables in the same group en ter nonlinear terms together or if the constrain ts are nonseparable in c hoice v ariables from the same groups. 22 The follo wing also holds if an y selection of T p erio ds, rather than just the rst T , are observ ed. 46 Clearly , the additiv e separabilit y of the utilit y function and the indep endence of the dieren t c hoice v ari- ables’ co ecien ts and cost functions are k ey to these results. Ev en when this mo del holds only appro ximately , ho w ev er, the sp ecial relationship b et w een a co ecien t and its o wn regressor that w e see here suggests that this source of dep endence ma y b e of particular practical imp ortance. These dep endence restrictions and v arious relaxations of them allo w ed b y our iden tication results ma y often therefore mak e useful w orking mo dels, esp ecially when using short-T panels that do not iden tify the APEs without further restrictions. 14.3 The Lemmas W e no w presen t the three lemmas used b y the iden tication algorithm. 14.3.1 Lemma 2 - Ceteris P aribus V ariation Lemma 2 is the cen terpiece of the algorithm. T o motiv ate the lemma, w e start b y considering iden tication in the mo del with homogeneous co ecien ts y i = K X k=1 X fkg i fkg +e i ; with the momen t condition E[y i jX i =] = for all 2 supp(X i ): (15) An in tuitiv e w a y to iden tify the co ecien ts in this mo del follo ws from the common c eteris p aribus in ter- pretation often giv en to regression co ecien ts: fkg represen ts the degree to whic h the function E[y i jX i =] c hanges in resp onse to a c hange in fkg , when the other regressors are held constan t. W e see this math- ematically b y dierencing equation 15 at t w o p oin ts that dier only in the k th regressor. These p oin ts are 0 ; 1 2 supp(X i ), c hosen suc h that 23 fkg c 0 = fkg c 1 and fkg 1 fkg 0 6= 0 T . The dierenced momen t condition is E[y i jX i = 1 ]E[y i jX i = 0 ] = K X j=1 ( fjg 1 fjg 0 ) fjg = ( fkg 1 fkg 0 ) fkg ; iden tifying fkg = ( fkg 1 fkg 0 ) 0 (E[y i jX i = 1 ]E[y i jX i = 0 ]) ( fkg 1 fkg 0 ) 0 ( fkg 1 fkg 0 ) : In the heterogeneous case, ho w ev er, this approac h ma y fail. Before examining the reason, it is imp ortan t to clarify that this section, in order to build on the in tuition from the homogeneous case, uses causal-sounding 23 Recall thatfkg c =f1;:::;Kgfkg, the index set of regressors other than the k th . 47 language ab out c hanging regressor v alues to obtain the p oin ts 0 and 1 . Ho w ev er, when w e talk ab out c hanging a regressor and dierencing a momen t condition, w e are really dierencing the momen t conditions for t w o distinct subp opulations whose regressors equal 0 and 1 , and whose CAPEs equal (X i = 0 ) and (X i = 1 ). The problem in the heterogeneous case is that when w e c hange fkg to obtain 1 from 0 , w e also c hange an y CAPEs whose co ecien ts mean-dep end on it. If to o man y CAPEs are aected, it can b ecome imp ossible to disen tangle the c hange’s direct eect on E[y i jX i = ] - the eect it w ould ha v e if none of the CAPEs c hanged with it - from the indirect eects the c hange has through its impact on the CAPEs. T o see this, it is useful to expand (X i =), the righ t-hand side of equation 12, in to three groups of regressor-CAPE pro ducts. The rst con tains only regressor k , whose CAPE w e w an t to iden tify; the second con tains the other regressors whose co ecien ts mean-dep end on regressor k (their indices are in D k fkg); and the third con tains the rest of the regressors, none of whose co ecien ts mean-dep end on regressor k (indices in D c k fkg). The imp ortan t thing to notice here is that a c hange in fkg can c hange the CAPE in the second group, but will lea v e ev erything in the third group unaected. E[y i jX i =] = fkg Ob ject of in terest z }| { fkg (X i =) + D k fkg F unction of fkg z }| { D k fkg i (X i =) (16) + D c k fkg D c k fkg i (X i =) | {z } Not a function of fkg : Dierencing equation 16 at 0 and 1 , terms in the third group are equal at b oth p oin ts and cancel out. While regressors in the second group are equal at b oth p oin ts b y construction, their CAPEs are not. W e obtain E[y i jX i = 1 ]E[y i jX i = 0 ] = fkg 1 fkg (X i = 1 ) fkg 0 fkg (X i = 0 ) (17) + D k fkg 1 ; where D k fkg (X i = 1 ) D k fkg (X i = 0 ) captures c hanges in the CAPEs of regressors in the second group. This can b e rearranged in to a form reecting the in tuitiv e notions of the c hange’s direct and indirect eects. 48 E[y i jX i = 1 ]E[y i jX i = 0 ] = Direct eect z }| { ( fkg 1 fkg 0 ) fkg (X i = 0 ) (18) + fkg 1 k + D k fkg 1 | {z } Indirect eect ; where k fkg (X i = 1 ) fkg (X i = 0 ) represen ts the c hange in the k th regressor’s CAPE. This lea v es us with the problem of isolating the direct eect. W e will attempt to do this b y prem ultiplying equation 18 b y the pro jection matrix Q( D k 1 ) to wip e out the indirect eect term. Clearly , the pro duct Q( D k 1 ) D k fkg 1 =0 T . T o see that Q( D k 1 ) fkg 1 k =0 T , as w ell, note that k = 0 when co ecien t k do es not mean-dep end on its o wn regressor, and that otherwise fkg 1 is a column of D k 1 and Q( D k 1 ) fkg 1 = 0 T . This leads us to the expression Q( D k 1 )(E[y i jX i = 1 ]E[y i jX i = 0 ]) =Q( D k 1 )( fkg 1 fkg 0 ) fkg (X i = 0 ) and therefore to fkg (X i = 0 ) = ( fkg 1 fkg 0 ) 0 Q( D k 1 )(E[y i jX i = 1 ]E[y i jX i = 0 ]) ( fkg 1 fkg 0 ) 0 Q( D k 1 )( fkg 1 fkg 0 ) ; (19) if ( fkg 1 fkg 0 ) 0 Q( D k 1 )( fkg 1 fkg 0 )> 0: (20) Condition 20 is equiv alen t to the condition fkg 1 fkg 0 = 2 range( D k 1 ), whic h can, in turn, b e form ulated as follo ws, 24 to mak e clearer what c hoices of fkg 1 suce to iden tify fkg (X i = 0 ) for a giv en v alue of 0 : fkg 1 fkg 0 = 2 range( D k 0 ) if k = 2C k (21) fkg 1 = 2 range( D k 0 ) and fkg 0 = 2 range( D k fkg 0 ) if k2C k Satisfying this condition is where the panel asp ect of the data comes in to pla y . Clearly , if D k 0 con tains T linearly indep enden t columns, then range ( D k 0 ) = R T and no c hoice of fkg 1 will w ork. This cannot b e the case, ho w ev er, if T >jD k j; D k 0 only hasjD k j columns. More generally , it seems in tuitiv e that it will b e in some sense easier to nd an iden tifying v alue of fkg 1 when few er co ecien ts dep end on regressor k - thereb y shrinking range( D k 0 ) - and when T is larger - thereb y enlarging the space whic h range ( D k 0 ) is a subspace of. 24 This equiv alence follo ws from the next section’s corollary 3. 49 Lemma 2 formalizes this in tuition in its conditions L2.2 and L2.3. The lemma treats a more general momen t condition than equation 12, allo wing some of the CAPEs to already b e kno wn and for E[y i jX i =] to b e replaced b y an arbitrary , kno wn function a(). This is critical to the algorithm, as its iterativ e pro cedure often iden ties some regressors’ CAPEs b efore getting to others’, and it can simplify iden tication to treat an y CAPEs that are already iden tied as kno wn and absorb them in to a(). Lemma 2 also treats one other source of iden tication in its condition L2.1. This simply recognizes that if one regressor is not in the span of the others, then the others can all b e pro jected out to isolate the eect of in terest without going through the dierencing pro cedure discussed ab o v e. That is, if fkg 0 = 2 range( fkg c 0 ), then fkg0 0 Q( fkg c 0 ) fkg 0 > 0, so from the prem ultiplied momen t condition Q( fkg c 0 )E[y i jX i = 0 ] =Q( fkg c 0 ) fkg 0 fkg (X i = 0 ); w e ha v e fkg (X i = 0 ) = fkg0 0 Q( fkg c 0 )E[y i jX i = 0 ] fkg0 0 Q( fkg c 0 ) fkg 0 : Lemma 2. Supp ose that the moment c ondition a() = S S (X i =) holds for al l 2 supp(X i ), with the index set Sf1;:::;Kg and a() : !R T a known function. With 0 2 and k2S , fkg (X i = 0 ) is identie d if either L2.1. fkg 0 = 2 r ange( Sfkg 0 ); or ther e exists 1 2 such that b oth of the fol lowing hold: L2.2. fkg c 1 = fkg c 0 , and L2.3. fkg 1 fkg 0 = 2 r ange( S\D k 1 ). The pro of is in the app endix. Note that this is not the most general v ersion of this lemma. Section 14.3.4 presen ts a generalization in tended to accommo date functionally dep enden t regressors. This is, ho w ev er, a relativ ely simple v ersion that suces for man y cases, and its conditions L2.2 and L2.3 are directly link ed to the in tuition presen ted ab o v e. 14.3.2 Corollaries to Lemma 2 This section presen ts some helpful corollaries to lemma 1 that facilitate its use in the algorithm. Corollary 3 simply pro vides an alternativ e form ulation of lemma 2’s conditions L2.2 and L2.3 whic h ma y b e simpler to in terpret when searc hing for suitable v alues of 1 , giv en some v alue of 0 . Corollary 3. Dene a(), , S , k , 0 , 1 as in lemma 2 . Supp ose that 1 satises c ondition L2.2. 1 satises L2.3 if and only if one of the fol lowing two c onditions hold: 50 LC3.1. k = 2C k and fkg 1 fkg 0 = 2 r ange( S\D k 0 ). LC3.2. k2C k , fkg 1 = 2 r ange( S\D k 0 ), and fkg 0 = 2 r ange( (S\D k )fkg 0 ). The pro of is in the app endix. The next corollary pro vides w eak supp ort conditions guaran teeing the existence of a v alue of 1 that satises conditions L2.2 and L2.3. The set , dened in the corollary , is the supp ort of fkg 1 , conditional on 1 satisfying condition L2.2 and b eing in the set , where lemma 2’s momen t condition holds. In terms of the in tuition from Section 14.3.1, it is the set of v alues that w e can c hange regressor k to while holding the other regressors constan t. W e use this corollary rep eatedly in the examples. Corollary 4. Dene a(), ,S ,k , 0 , 1 as in lemma 2. A lso dene supp(X fkg i jX fkg c i = fkg c 0 and X i 2 ). fkg (X i = 0 ) is identie d under either of the fol lowing two c onditions: LC4.1. k = 2C k , and f fkg 0 g c ontains mor e than dim(r ange( S\D k 0 )) line arly indep endent ve ctors. LC4.2. k 2 C k , fkg 0 = 2 r ange( (S\D k )fkg 0 ), and c ontains mor e than dim(r ange( S\D k 0 )) line arly indep endent ve ctors. The pro of is in the app endix. It can simplify this corollary further to note that dim (range( S\D k 0 )) jS\D k j. In applying the iden tication algorithm, S will usually b e the index set of regressors whose CAPEs ha v e not already b een iden tied b y previous iterations. In this case, the b ound on dim (range( S\D k 0 )) means that whenev er there are few er than T not-y et-iden tied CAPEs that dep end on regressor k , conditions LC4.1 and LC4.2 can b e satised under appropriate supp ort conditions. 14.3.3 Lemmas 2 and 3 - Con tin uit y and Extrap olation from Dep endence Restrictions Lemmas 5 and 6 are simple. Lemma 5 states that if CAPEs are iden tied on a set where they are assumed to b e con tin uous in the regressors, then they are iden tied on that set’s b oundary , as w ell. Lemma 5. Supp ose that fkg (X i =) is identie d for al l 2 supp(X i ) and is c ontinuous in for al l 2 supp(X i ). fkg (X i =) is identie d at al l p oints in the closur e of \ . The pro of is simple. By the con tin uit y assumption, lim p!1 fkg (X i = p ) = fkg (X i = ) for an y sequence ( p ) 1 p=1 of elemen ts of that con v erges to . With fkg (X i = p ) kno wn at all p oin ts in , all p oin ts in the closure of \ can b e iden tied as limits of sequences of iden tied v alues. Note that it is straigh tforw ard to adapt lemma 5 to cases where fkg (X i = ) is con tin uous only in certain elemen ts of . 51 Lemma 6 notes that the dep endence restrictions imply that if regressor k ’s CAPE is iden tied at , it is also iden tied ev erywhere else with the same v alues of the regressors that its co ecien t mean-dep ends on. F or example, if co ecien t 1 mean-dep ends only on regressor 2 and it is iden tied at an y p oin t where f2g =x2 supp(X f2g i ), then it is iden tied at all p oin ts where f2g =x. Lik ewise, if a regressor’s co ecien t is mean-indep enden t of the regressors, iden tifying its CAPE an ywhere iden ties its CAPE ev erywhere and therefore its APE, as w ell. Lemma 6. Supp ose that fkg (X i = ) is identie d for al l 2 supp(X i ). F or al l 0 2 supp(X i ), fkg (X i = 0 ) is identie d if ther e exists 2 such that C k = C k 0 . T o see this, recall that w e assume fkg (X i = ) = fkg (X C k i = C k ), so fkg (X C k i = C k 0 ) = fkg (X C k i = C k ) if C k = C k 0 . 14.3.4 Iden tication With F unctionally Dep enden t Regressors When regressors are functionally dep enden t on an observ able, underlying v ariable whose partial eects are of in terest, it is imp ortan t to note that the co ecien t exp ectations (X i = ) and do not necessarily represen t the underlying v ariables’ (C)APEs. Supp ose one or sev eral regressors X S0 i for some index set S 0 f1;:::;Kg are dieren tiable, in v ertible functions of an underlying, observ able random v ector Z i Z i1 ::: Z iKz 0 2 R Kz . 25 That is, X S0 i = f(Z i ). X S0 i could b e basis expansions, lags or leads, or other functions of Z i . The (v ector-v alued) a v erage partial eect on y i of an elemen t Z ij conditional on X i = , where f 1 ( S0 ) with elemen ts 1 ::: Kz 0 , is E[ @E[y i jX i =; i ] @Z ij jX i =] =E[ @E[ S0 i f(Z i ) + S c 0 i X S c 0 i jX i =; i ] @Z ij jX i =] = @f() @ j E[ S0 i jX i =] = @f() @ j S0 i (X i =): While not conditional co ecien t exp ectations themselv es, CAPEs of underlying v ariables are then ob- serv able linear functions of them, and iden tication of (X i = ) still suces for iden tication of these CAPEs. These issues are also discussed b y Graham and P o w ell (2012) (Section 3.1). As alluded to in Section 14.2, a v ariet y of dep endence restrictions can b e equiv alen t when co ecien ts mean-dep end on an underlying v ector Z i . T ak e the case where the rst 3 regressors are elemen t wise p olyno- mials ofZ i 2R T : X f1g i =Z i ,X f2g i =Z i Z i , andX f3g i =Z i Z i Z i , where denotes the Hadamard pro duct. 25 Kz do es not necessarily equal T . 52 X f1g i andX f3g i are eac h bijectiv e functions of Z i and conditioning on either is equiv alen t to conditioning on Z i . If co ecien t k w ere to mean-dep end on Z i , w e could equiv alen tly sa y C k =f1g, C k =f3g, C k =f1; 2g, C k =f2; 3g, or C k =f1; 2; 3g. W e adopt the con v en tion of including in C k the set of all regressors that are functionally dep enden t on Z i . This is in k eeping with our in terpretation of C k as the index set of regressors that w e cannot c hange without p ossibly c hanging fkg (X i =). Iden tication of (X i = ) through lemma 2’s conditions L2.2 and L2.3 is usually imp ossible for co- ecien ts whose regressors are functionally dep enden t on other regressors. This is b ecause condition L2.2 cannot b e satised; in most suc h cases 26 these regressors cannot b e c hanged without c hanging others, as w ell. Instead, w e require a more general v ersion of lemma 2 whic h allo ws us to sim ultaneously c hange en tire groups of regressors. In terms of the example ab o v e, this will allo w us to v ary all the regressors in X S0 i at once b y v arying Z i . Lemma 7. Supp ose that the moment c ondition a() = S S () holds for al l 2 supp(X i ), with the index set Sf1;:::;Kg and a() : !R T a known function. With 0 2 and S 0 S , S0 (X i = 0 ) is identie d if either L7.1. Q( SS0 0 ) S0 0 has ful l c olumn r ank, or, with D 0 [ j2S0 D j , ther e exists 1 2 such that b oth of the fol lowing hold: L7.2. S c 0 1 = S c 0 0 , and L7.3. Q( S\D0 0 )( S0 1 S0 0 ) has ful l c olumn r ank. The pro of is largely the same as the pro of of lemma 2 and can b e found in the app endix. Lemma 7’s three conditions are directly analogous to their coun terparts in lemma 2, with the dierence that instead of v arying the single regressor X fkg i to iden tify its CAPE fkg (X i = 0 ), it v aries the set of regressors X S0 i to iden tify all of their conditional co ecien t exp ectations S0 (X i = 0 ). Lemmas 2 and 7 are equiv alen t when S 0 con tains only one elemen t. With more regressors c hanging at once, isolating the direct eect of the c hanges in X S0 i requires us to pro ject out from the dierenced momen t equation all the other regressors whose co ecien ts mean-dep end on an y of the c hanging regressors. A ccordingly , condition L7.3 requires the c hange in X S0 i to ha v e full column rank ev en after partialing these out. Notably , conditions L7.1 and L7.3 cannot b e satised when more regressors c hange than there are time p erio ds. That is, whenjS 0 j>T . This means that lemma 7 fails to pro vide iden tication in man y common 26 Coun terexamples exist, suc h as in the case with T = 1, K = 2 where X f2g i = (X f1g i ) 2 and X f1g i has supp ort o v er the real line. Conditional on X f2g i = f2g 1 6= 0 , X f1g i has supp ort o v er b oth the p ositiv e and negativ e square ro ots of X f2g i . Suc h examples are the exception rather than the rule, ho w ev er. Ev en in this example, conditional on X f1g i = f1g 1 ,X f2g i has supp ort o v er only the single p oin t, ( f1g 1 ) 2 . 53 cases, suc h as the cross-sectional setting where some regressors are p olynomials in an underlying v ariable. This is an artifact of our using only t w o p oin ts, 0 and 1 for iden tication, ev en when the supp ort of X i is m uc h ric her. W e consider only t w o p oin ts at a time to con v ey the in tuition in Section 14.3.1, and in tend future w ork to pro vide fuller iden tication results for the functionally dep enden t case. 14.4 Iden tication Algorithm The iden tication algorithm rev olv es around the rep eated application of lemma 2 to iden tify progressiv ely more CAPEs. The algorithm pro ceeds iterativ ely in order to lev erage the fact that lemma 2’s iden tifying p o w er increases with the n um b er of already-iden tied CAPEs. Supp ose that after one or more iterations of the algorithm, for some index set S , w e w an t to iden tify S (X i =) wherev er p ossible, and that in earlier iterations w e already iden tied the other CAPEs, S c (X i = ), for all in a set supp(X i ). W e can treat the already-iden tied CAPEs as kno wn, and form more informativ e momen t conditions than equation 12 b y dening lemma 2’s kno wn function a()E[y i jX i = ] S c S c (X i =). This allo ws us to apply lemma 2 with the momen t condition a() = S S (X i =). F or the purp oses of satisfying the lemma’s conditions L2.1 and L2.3, this is the same as using equation 12’s ra w momen t condition in a mo del where a(X i ) is the outcome v ariable and the only regressors are X S i , with supp ort o v er the setf S j2 g. Essen tially , w e can separate out the part of y i that w e ha v e already explained and con tin ue w orking on the rest. In terms of Section 14.3.1’s motiv ation for lemma 2, this eectiv e remo v al of regressors from the mo del simplies the task of distinguishing the direct eect of a c hange in fkg from the indirect eects mediated through the c hange’s impact on CAPEs; there are simply few er CAPEs left to impact. In terms of the lemma’s conditions L2.1 and L2.3, this manifests as smaller sets of regressors in the matrices Sflg and S\D l . While lemma 2 forms the bac kb one of the algorithm, it can lea v e gaps that w e can ll with lemmas 5 and 6. In its basic form, eac h step of the algorithm has 3 substeps, corresp onding to the 3 lemmas. W e describ e these somewhat informally here and illustrate their use in the next section’s examples, but the in terested reader can nd a formalization in the app endix. 1. Using all the CAPEs that ha v e b een iden tied in previous steps, form momen t conditions and use lemma 2 (or its corollaries or lemma 7) to iden tify the y et-uniden tied CAPEs o v er as large a region as p ossible. 2. In the closure of an y sets where a regressor’s CAPEs are b oth already iden tied and assumed to b e con tin uous in the regressors, iden tify an y y et-uniden tied CAPEs using lemma 5. 54 3. Use lemma 6 to extrap olate iden tication results from dep endence restrictions wherev er p ossible. The algorithm rep eats this sequence of substeps un til it fails to iden tify an y y et-uniden tied CAPEs, or un til it has iden tied the CAPEs ev erywhere except o v er a region in whic h the researc her kno ws them to b e noniden tied. W e call this the template v ersion of the algorithm b ecause it is often simpler to omit certain substeps and to apply them only to a subset of the y et-uniden tied CAPEs. As long as the algorithm reac hes a p oin t where none of the substeps can iden tify an y further CAPEs, the outcome is the same. F requen tly , w e can devise a simplied v ersion that places the regressors in some order and steps through them one b y one, iden tifying just one regressor’s CAPEs ev erywhere in eac h step, ev en when other CAPEs ma y also b e iden tiable. This allo ws us form ulate all or a part of the algorithm as a pro of b y induction. 14.5 Examples W e no w demonstrate the algorithm’s use b y pro viding iden tication results in a n um b er of settings with dieren t supp ort, con tin uit y , and dep endence restrictions. Eac h example con tains a theorem giving an iden tication result, discussion, and a pro of obtained through the algorithm. Example 8. Con tin uous regressors and an in tercept: This example treats the case where all regressors except the in tercept are con tin uous or mixed discrete- con tin uous and co ecien ts ha v e a semi-triangular dep endence structure. Theorem 9. (X i =) is identie d for al l 2 supp(X i ) under the fol lowing assumptions: A9.1. F or al l k2f1;:::;K 1g, supp(X fkg i ) is the union of an op en subset of R T and any p ortion of its b oundary. supp(X f1;:::;K1g i ) = supp(X f1g i )::: supp(X fK1g i ). X fKg i is nonsto chastic and e quals 1 T . A9.2. F or al l k2f1;:::;Kg,jfk;:::;Kg\D k j<T . A9.3. (X i =) is everywher e c ontinuous in . The supp ort conditions in assumption A9.1 state that (except for the K th regressor, whic h is an in tercept term) the regressors all ha v e supp ort o v er an op en subset of R T , plus an y part of that subset’s b oundary . As examples, this allo ws regressors to ha v e supp ort o v er all of R T , h yp errectangles in R T , and the set of v ectors in R T with nonnegativ e elemen ts. The Cartesian pro duct structure of supp (X i ) in assumption A9.1 implies that ev ery non-in tercept re- gressor’s supp ort, conditional on the other regressors, is equal to its unconditional supp ort: for all k < K and 2 supp(X i ), supp(X fkg i jX fkg c i = fkg c ) = supp(X fkg i ). In tuitiv ely , this means that no matter what the v alues of the other regressors are, a giv en regressor can tak e an y v alue in its supp ort. This rules out the 55 presence of functionally dep enden t regressors. Assumption A9.1 also implies that supp (X f1;:::;K1g i ) is itself the union of an op en subset of R TK1 with some subset of its b oundary . A9.2 describ es a class of dep endence restrictions where few er than T of the co ecien ts k through K mean-dep end on regressor k . In terms of the dep endence matrix, this means that elemen ts ab o v e the main diagonal are unrestricted while elemen ts b elo w the sup erdiagonal are restricted to con tain less than T 1s in eac h column. T w o example dep endence matrices with the greatest degree of total dep endence (the n um b er of 1s) allo w ed under assumption A9.2 with K = 5 and T = 3 are: X f1g i X f2g i X f3g i X f4g i X f5g i f1g i 1 1 1 1 1 f2g i 1 1 1 1 1 f3g i 0 1 1 1 1 f4g i 0 0 1 1 1 f5g i 0 0 0 1 1 X f1g i X f2g i X f3g i X f4g i X f5g i f1g i 1 1 1 1 1 f2g i 0 0 1 1 1 f3g i 0 1 1 1 1 f4g i 0 1 0 1 1 f5g i 1 0 1 1 1 Elemen ts b elo w the sup erdiagonal are prin ted in grey to emphasize the fact that these elemen ts can b e either 0 or 1, as long as there are few er than T grey 1s in eac h column. The abilit y of the algorithm to iden tify all CAPEs under no v el and exible dep endence restrictions is illustrated w ell b y a comparison of theorem 9 with an analogous result from Cham b erlain (1992). Cham b er- lain (1992)’s dep endence restrictions allo w a set of up to T 1 co ecien ts to mean-dep end on all regressors, while the rest are assumed to mean-dep end on no regressors. The asso ciated dep endence matrix has up to T 1 ro ws of ones and is otherwise lled with zeros. An example suc h matrix, comparable with those displa y ed ab o v e, for the K = 5, T = 3 case is: X f1g i X f2g i X f3g i X f4g i X f5g i f1g i 1 1 1 1 1 f2g i 1 1 1 1 1 f3g i 0 0 0 0 0 f4g i 0 0 0 0 0 f5g i 0 0 0 0 0 Note that for T K , the total n um b er of ones allo w ed in the dep endence matrix for the Cham b erlain (1992) estimator isK(T1), whic h increases linearly with the n um b er of regressors. The dep endence matrices allo w ed b y theorem 9’s assumption A9.2 can ha v e up to K 2 +K(2T3)(T1) 2 +T1 2 ones, whic h instead gro ws 56 quadratically . The dep endence restrictions allo w ed b y theorem 9 also admit far more exibilit y in where a giv en n um b er of ones can b e placed in the matrix, as seen, for example, in the t w o dep endence matrices presen ted earlier and the fact that an y allo w able dep endence pattern can b e used with the non-in tercept regressors p erm uted arbitrarily . Clearly , the order of the regressors matters tremendously when applying theorem 9. As a rule of th um b, a researc her using theorem 9 should order their regressors so that those whic h few er co ecien ts mean-dep end on come rst. It ma y also help to mo v e regressors whose co ecien ts ha v e man y dep endencies earlier in the order; this places more elemen ts in their co ecien ts’ ro ws ab o v e the main diagonal, where dep endence is unrestricted. An ecien t application of the algorithm here largely b oils do wn to a pro of b y induction. F or k 2 f1;:::;K 1g, the k th step of the algorithm will iden tify fkg (X i = ) ev erywhere in supp(X i ). Eac h of these steps applies the lemmas in essen tially the same w a y , and w e form ulate this pro cedure b elo w as an induction rule. The k ey to iden tication in this setting is that assumption A9.2 ensures that in the algorithm’s step k < K , there are less than T regressors whose CAPEs ha v e not y et b een iden tied and whose co ecien ts mean-dep end on regressor k . In applying lemma 2, these will b e the co ecien ts with index set S\D k . In terms of Section 14.3.1’s in tuition, when w e c hange regressor k , this lets us isolate the direct from the indirect eects of this c hange on the momen t condition. W e no w pro v e theorem 9 using the algorithm. W e start b y la ying out the induction rule. Supp ose that in step k> 1, w e ha v e already ev erywhere iden tied the CAPEs of the rst k1 regressors. F or substep 1 of step kK 1, with Sfk;:::;Kg, supp(X i ), and the kno wn function a()E[y i jX i =]X S c i S c (X i =); w e use the momen t condition a() = S S (X i =) for all 2 : By assumption A9.2, dim(range( S\D k ))<jS\D k j<T: (22) T ak e an arbitrary p oin t 0 2 supp(X i ). W e will no w determine whether substep 1 iden ties fkg (X i = 0 ). Dening supp(X fkg i jX fkg c i = fkg c 0 and X i 2 ); 57 as in corollary 4, w e ha v e from assumption A9.1 that = supp(X fkg i ): (23) W e no w separately consider the t w o cases where k = 2C k andk2C k . First, supp ose that k = 2C k . Because supp(X fkg i ) con tains an op en subset of R T , equations 22 and 23 immediately satisfy corollary 4’s condition LC4.1, iden tifying fkg (X i = 0 ). Supp ose instead that k2 C k . As in the previous case, w e can iden tify fkg (X i = 0 ) b y satisfying corollary 4’s condition LC4.2, but only for 0 suc h that fkg 0 = 2 range( S\D k fkg 0 ). T o complete substep 1, apply the logic ab o v e to ev ery 0 2 supp(X i ) to iden tify fkg (X i = 0 ) wherev er p ossible. Whenk = 2C k , this iden ties fkg (X i =) ev erywhere and w e ha v e nished step k of the algorithm. When k2C k , substep 1 instead iden ties fkg (X i =) for in the set fj fkg = 2 range( S\D k fkg ) and 2 supp(X i )g: A subset of this is the set of where all the columns of S\D k = fkg S\D k fkg are linearly indep enden t. This is the set where S\D k has full rank, since S\D k , aTjS\D k j matrix, has few er columns than ro ws. This set is dense in supp(X i ), so in substep 2 w e apply lemma 5 and the con tin uit y assumption A9.3 to iden tify fkg (X i =) ev erywhere in supp(X i ). This completes step k of the algorithm. This completes the induction rule. The base case follo ws from applying the induction rule in step 1 with S =f1;:::;Kg. After applying this rule for the rst K1 steps, w e iden tify f1;:::;K1g (X i =) ev erywhere in supp(X i ). Finally , in step K , w e ha v e only the in tercept left to iden tify . W e use the momen t condition ab o v e with S =fKg and can trivially apply lemma 2’s condition L2.1 to iden tify fKg (X i = ) ev erywhere in supp(X i ). While not co v ered b y theorem 9, similar settings that ha v e m ultiple regressors with a single p oin t of supp ort, suc h as in tercept terms or deterministic trends, can b e analyzed with a similar form ulation of the algorithm. In suc h a case, these regressors should alw a ys come last. The lac k of v ariation in their supp ort means that lemma 2’s conditions L2.2 and L2.3 (and therefore corollary 4) can nev er b e used to iden tify them. The algorithm should treat them using condition L2.1, instead, after the induction p ortion of the pro of is complete. Note also that the algorithm can b e stopp ed after step r2f1;:::;Kg, and the rst r regressors’ CAPEs w ould still b e iden tied ev erywhere, as long as assumption A9.2 holds for k 2 f1;:::;rg. One strategy 58 a researc her ma y use, then, is to place regressors whose eects are of primary in terest - in the program ev aluation literature this migh t b e a single (con tin uous) treatmen t v ariable - relativ ely early in the regressor order, and then let all co ecien ts mean-dep end on all the con trol v ariables coming after the last regressor of in terest. With the last regressor of in terest b eing in place r , this w ould ll columns fr + 1;:::;Kg of the dep endence matrix with 1s. This w ould place minimal restrictions on the mo del to iden tify the eects of in terest and minimize restrictions on the eect heterogeneit y of the regressors of in terest b y placing them so that, in the dep endence matrix, more elemen ts of their co ecien ts’ ro ws are ab o v e the main diagonal. CAPEs of the con trol v ariables w ould not necessarily b e iden tied, but this ma y b e irrelev an t in man y empirical settings. This strategy ma y also b e used to include time-in v arian t regressors that co ecien ts mean-dep end on, in a mo del where the in tercept’s heterogeneit y is unrestricted (i.e. there are xed eects). Placing the time- in v arian t regressors last lea v es their CAPEs and the in tercepts uniden tied, but allo ws one to condition the other regressors’ CAPEs on these v ariables. Example 10. Essen tial heterogeneit y with xed eects: This example treats a setting with xed eects and a minimal mo del of essen tial heterogeneit y , as dis- cussed in the in tro duction. In this mo del, the in tercept is allo w ed to mean-dep end on all regressors, but ev ery other co ecien t is allo w ed to mean-dep end only on its o wn regressor. Borro wing a term from Graham and P o w ell (2012), w e iden tify a giv en regressor’s CAPEs ev erywhere in the regressor-sp ecic subp opulation of mo v ers: individuals for whom that regressor v aries o v er time. When the eect of a binary treatmen t is of in terest and no individual is treated in ev ery p erio d, the mo v ers’ APE corresp onds to the a v erage treatmen t eect on the treated. W e also sho w that the CAPEs not iden tied b y the algorithm are noniden tied without further assumptions. As long asT 3, this result holds with an y n um b er of regressors and under mild supp ort restrictions that allo w for discrete regressors. By w a y of comparison, iden tication based on the argumen ts of Cham b erlain (1992) or Graham and P o w ell (2012) w ould require T >K or TK , resp ectiv ely , b ecause the dep endence restrictions they consider only nest those of our essen tial heterogeneit y mo del if ev ery co ecien t is allo w ed to mean-dep end on ev ery regressor. Theorem 11. Consider the fol lowing assumptions: A11.1. F or al l k2f1;:::;K1g and2 supp(X i ), supp(X fkg i jX fkg c i = fkg c ) c ontains at le ast 3 line arly indep endent ve ctors. X fKg i is nonsto chastic and e quals 1 T . A11.2. F or al l k2f1;:::;K 1g, C k =fk;Kg. C K =f1;:::;Kg. Under these assumptions, the fol lowing hold: 59 C11.1. F or al l k2f1;:::;K 1g, fkg (X i =) is identie d for al l 2supp(X i jX fkg i = 2 r ange(1 T )). C11.2. fKg (X i =) is identie d for al l 2\ k2f1;:::;K1g (supp(X i jX fkg i = 2 r ange(1 T ))). C11.3. F or al l k2f1;:::;K 1g, fkg (X i = ) is nonidentie d for al l 2supp(X i jX fkg i 2 r ange(1 T )) without further assumptions. C11.4. fKg (X i =) is nonidentie d for al l 2[ k2f1;:::;K1g (supp(X i jX fkg i 2 r ange(1 T ))). Assumption A11.1 imp oses the condition that eac h non-in tercept regressor ha v e supp ort o v er a set of at least 3 linearly indep enden t v ectors, conditional on an y v alues of the other regressors. This requires that T 3, but for panels satisfying this requiremen t, the supp ort condition is v ery w eak. A sucien t condition for this assumption to hold imp oses the supp ort condition on the regressors’ marginal supp orts, instead, and assumes the Cartesian pro duct structure from theorem 9’s assumption A9.1: supp (X f1;:::;K1g i ) = supp(X f1g i )::: supp(X fK1g i ), and for all k2f1;:::;K 1g, supp(X fkg i ) con tains at least 3 linearly indep enden t v ectors. A11.2 mo dels essen tial heterogeneit y of slop e co ecien ts through mean dep endence of eac h non-in tercept co ecien t on its o wn regressor (and, b y our con v en tion, on the in tercept) and includes xed eects b y allo wing the in tercept to mean-dep end on ev ery regressor. The dep endence matrix is 0 ev erywhere except along the main diagonal and the last column and ro w, whic h are lled with 1s. This matrix is sho wn b elo w when K = 4. X f1g i X f2g i X f3g i X f4g i f1g i 1 0 0 1 f2g i 0 1 0 1 f3g i 0 0 1 1 f4g i 1 1 1 1 W e no w pro v e conclusions C11.1 and C11.2 using the algorithm. It requires only t w o steps, eac h in v olving only substep 1. In the rst step, w e pro v e conclusion C11.1 b y applying corollary 4 with the ra w momen t condition in equation 12 to ev ery non-in tercept co ecien t. F or the purp oses of lemma 2 and its corollaries, a() i (X i =), Sf1;:::;Kg, and supp(X i ). F or all k 2f1;:::;K 1g, w e ha v e D k =fk;Kg, so dim(range( S\D k )) 2 for all 2 supp(X i ). Bet w een this and assumption A11.1, corollary 4’s condition LC4.2 is immediately satised for an y 0 suc h that fkg 0 = 2 range( fKg 0 ) = range(1 T ). Conclusion C11.1 follo ws. In the second step, w e use the momen t condition with a()E[y i jX i =] f1;:::;K1g f1;:::;K1g (X i = ),SfKg, and \ k2f1;:::;K1g supp(X i jX fkg i = 2 range(1 T )), the set where step 1 iden tied the CAPEs of ev ery non-in tercept regressor. Lemma 2’s condition L2.1 trivially iden ties fKg (X i =) for all 2 , 60 pro ving conclusion C11.2. Conclusions C11.3 and C11.4 can b e obtained from the fact that X fkg i is collinear with X fKg i o v er the set where X fkg i 2 range(1 T ). This lets us construct an alternativ e set of CAPEs that are observ ation- ally equiv alen t but not equal to the truth, using a strategy m uc h lik e that in Section 14.1’s discussion of noniden tication. The pro of is in the app endix. 14.5.1 Irregular Iden tication So far, w e ha v e studied iden tication without considering the feasibilit y or ease of estimation and inference. This section serv es to caution researc hers using the algorithm that (sub)p opulations’ APEs can b e irregularly iden tied, in the sense of Khan and T amer (2010), Graham and P o w ell (2012), and Lewb el (2019). An irregularly iden tied parameter is iden tied, but has an innite semiparametric eciency b ound, making p N -consisten t estimation imp ossible without further assumptions (New ey (1990)). In a sp ecial case of our mo del, Graham and P o w ell (2012) use results from Cham b erlain (1992) to sho w that the APEs in their mo del are irregularly iden tied. In this section, w e review their case and dev elop further examples where the sources of irregularit y in Graham and P o w ell (2012) app ear to b e presen t as w ell. W e reserv e a thorough analysis of irregularit y for future w ork on fully nonparametric estimation of the (C)APEs, but researc hers using the algorithm should at least b e a w are that irregularit y is p ossible, and b e able to recognize cases where it is lik ely to obtain. Section 15 dev elops p N -consisten t, asymptotically normal estimators for APEs under a v ariet y of dep endence and supp ort assumptions where researc hers can b e assured of regularit y . In their b enc hmark mo del, Graham and P o w ell (2012) ha v e a set of T regressors X f1;:::;Tg i whose co ef- cien ts mean-dep end on ev ery regressor. The remaining regressors X fT+1;:::;Kg i are in teractions of the rst T regressors with time dummies. Their co ecien ts are assumed to b e mean-indep enden t of all the regres- sors, and represen t shifts o v er time in the APEs of the underlying regressors X f1;:::;Tg i . 27 In our language, C 1 through C T equalf1;:::;Kg, C T+1 through C K equalfg 28 . W e decomp ose the mo del’s ra w momen t condition to separate these t w o regressor groups: E[y i jX i =] = f1;:::;Tg f1;:::;Tg (X i =) + fT+1;:::;Kg fT+1;:::;Kg ; 27 It is w orth noting that this and similar strategies can generally b e used to mo del time-v arying eects with our setting’s time-in v arian t co ecien ts. 28 This set could alternativ ely include an y regressors with a single p oin t of supp ort. 61 for all 2 supp(X i ). Note that w e ha v e made the substitution fT+1;:::;Kg (X i =) = fT+1;:::;Kg for all 2 supp(X i ); (24) implied b y the dep endence restrictions. The basic iden tication strategy they b orro w from Cham b erlain (1992) attempts to tak e t w o steps. In the rst, X f1;:::;Tg i is pro jected out and the p opulation regression of Q(X f1;:::;Tg i )y i onQ(X f1;:::;Tg i )X fT+1;:::;Kg i is used to iden tify fT+1;:::;Kg . If this step succeeds, then fT+1;:::;Kg is already iden tied in the second step and a p opulation analogue of the mean group estimator with the outcome v ariable y i X fT+1;:::;Kg i fT+1;:::;Kg and regressors X f1;:::;Tg i iden ties f1;:::;Tg . The rst step fails, ho w ev er, when X f1;:::;Tg i has full rank ev erywhere in supp (X i ). Here, range(X f1;:::;Tg i ) = R T , since X f1;:::;Tg i has T columns, and no v ariation in X fT+1;:::;Kg i is left after pro jecting out X f1;:::;Tg i . In terms of our iden tication algorithm, this corresp onds to a failure of lemma 2’s conditions L2.1 and L2.3 ev erywhere (or lemma 7’s conditions L7.1 and L7.3, for the case with functional dep endence). Graham and P o w ell (2012) ac hiev e iden tication b y assuming that supp (X i ) con tains some p oin ts where X f1;:::;Tg i is rank decien t. They call individuals with suc h regressor matrices sta y ers, as this rank deciency can result from collinearit y b et w een an in tercept term and a regressor with no v ariation o v er time. A t these p oin ts, range(X f1;:::;Tg i )6=R T and v ariation in X fT+1;:::;Kg i remains after pro jecting out X f1;:::;Tg i . In our language, w e w ould use lemmas 2 or 7 with 0 c hosen to lie in the set where X f1;:::;Tg i is rank decien t. This only allo ws us to iden tify fT+1;:::;Kg (X i =) for sta y ers, but equation 24 means this is sucien t to iden tify fT+1;:::;Kg . In our language, this is a use of lemma 6. Iden tication based on this strategy can b e irregular, ho w ev er. If the set where X f1;:::;Tg i is rank decien t has zero probabilit y - as it lik ely will when the sto c hastic regressors in X f1;:::;Tg i are con tin uously distributed - w e are claiming iden tication from kno wledge of a momen t condition at p oin ts that w e ha v e zero probabilit y of seeing in a dataset. The problem is m uc h lik e that of nonparametrically estimating E[y i jX i = ] at a single p oin t with P [X i =] = 0. T ypically , w e w ould estimate this with a k ernel-w eigh ted a v erage of y i , among individuals for whom X i is in a shrinking neigh b orho o d of . This estimator is consisten t only at a rate less than p N , o wing to its use of a shrinking prop ortion of the data. Khan and T amer (2010) and Lewb el (2019) refer to iden tication based on sets with zero probabilit y as thin set iden tication. As w ell as thin set iden tication, our and Graham and P o w ell (2012)’s setting is susceptible to another source of irregularit y that is lik ely to arise when a set of T co ecien ts all mean-dep end on one another’s regressors. T o tak e a simple example, consider the case where T = K = 1, where the single regressor X i 62 and idiosyncratic error e i are indep enden t and standard-normally distributed. 29 The single co ecien t i is assumed to mean-dep end on X i . Where 6= 0, (X i = ) is iden tied as (X i = ) = E[y i jXi=] . An ob vious w a y to estimate the APE ignores the zero-probabilit y ev en t that X i = 0 and uses the mean group estimator ^ 1 N P N i=1 y i Xi = + 1 N P N i=1 ei Xi . Eac h term in the righ tmost summand, ho w ev er, is a ratio of t w o indep enden t, standard normal v ariables, and therefore Cauc h y-distributed. The sum of indep enden t Cauc h y random v ariables is itself Cauc h y , and so ^ is inconsisten t. Graham and P o w ell (2012) sho w that a similar problem often arises when T co ecien ts all mean-dep end on eac h others’ regressors. Graham and P o w ell (2012)’s estimator mimics the t w o steps of the iden tication strategy . F or in tuition, w e examine the simple case where T = 1, K = 2, and P [X f1g i = 0] = 0 . Here, the estimator’s rst step estimates f2g with ^ f2g , computed b y the regression of y i onX f2g i in the subp opulation wherejX f1g i j<h n , with lim N!1 h n ! 0. Under additional assumptions that ensure f1g (X i =)’s b oundedness and con tin uit y in where f1g = 0, this estimator is consisten t, b ecause y i =X f2g i f2g +e i when X f1g i = 0. The second step then estimates ^ f1g via the mean group-lik e estimator that computes y i X f2g i ^ f2g X f1g i , the estimate of f1g i from the individual-sp ecic regression of y i X f2g i ^ f2g on X f1g i , for eac h individual in the subp opulation wherejX f1g i jh n , and a v erages the resulting estimates. In the rst step, using all individuals for whom jX f1g i j<h n to cop e with the thin-set problem is akin to using a k ernel estimator for E[y i jX i =]. In the second step, using only individuals for whom jX f1g i jh n b ounds the individual-sp ecic regressions’ denominators a w a y from zero, a v erting the problem with innite v ariances. As h n ! 0, ho w ev er, the rst step uses a shrinking prop ortion of the data, and the second step allo ws progressiv ely smaller denominators, leading the estimator to con v erge at a rate slo w er than p N . Example 12. Con tin uous regressors and an in tercept: This example lo osens the dep endence restrictions in example 8, but requires stricter supp ort conditions, and iden tication of some co ecien ts in this setting relies critically on using p oin ts where regressors of other co ecien ts are collinear. As discussed ab o v e, iden tication ma y b e irregular when these sets ha v e zero probabilit y . W e also deviate from example 8 b y allo wing the last T , rather than T 1, co ecien ts to dep end on all of eac h others’ regressors. P er our earlier discussion, this ma y also create conditions under whic h iden tication is irregular. Theorem 13. (X i =) is identie d for al l 2 supp(X i ) under the fol lowing assumptions: A13.1. supp(X f1;:::;K1g i ) =R TK1 . X fKg i is nonsto chastic and e quals 1 T . A13.2. F or al l k2f1;:::;KTgjfk;:::;Kg\D k \C k j<T . A13.3. (X i =) is everywher e c ontinuous in . 29 This example ma y seem more relev an t if seen as the cross-section resulting from rst-dierencing a panel with T =K = 2, an in tercept and a standard normal regressor, and unrestricted co ecien t heterogeneit y . 63 Assumption A13.2 allo ws regressor k for k2f1;:::;KTg to b e mean-dep ended on b y up to T 1 of the co ecien ts indexed in fk;:::;Kg, m uc h lik e in example 8. It adds to this b y allo wing regressor k to b e mean-dep ended on b y an y additional co ecien ts whose regressors co ecien t k do es not mean-dep end on, and b y allo wing all co ecien ts to mean-dep end on the last T regressors. The allo w ed dep endence patterns are far easier to understand visually , in matrix form. One can build a dep endence matrix satisfying assumption A13.2 in a three-step pro cedure. First, one starts with a dep endence matrix satisfying the semi-triangular dep endence pattern allo w ed b y theorem 9’s assumption A9.2. Recall that this is an y dep endence matrix with unrestricted en tries ab o v e the main diagonal and up to T1 ones in eac h column b elo w the sup erdiagonal. Second, one ma y set the dep endence patterns in the last T columns to an ything. Finally , an y t w o elemen ts across the main diagonal from eac h other at this p oin t can b e sw app ed. That is, for an y i;j2f1;:::;Kg, one can switc h the v alues of the (i;j) th and the (j;i) th elemen ts. T o illustrate the range of matrices allo w ed in the K = 5, T = 3 case, the follo wing matrix is a v alid dep endence matrix under assumption A13.2, as is an y matrix that sets the blac k-t yp ed elemen ts to an y v alue, sets up to 2 of the grey-t yp ed elemen ts to 1 in eac h of the rst t w o columns, and then sw aps an y v alues ab out the main diagonal. X f1g i X f2g i X f3g i X f4g i X f5g i f1g i 1 1 1 1 1 f2g i 1 1 1 1 1 f3g i 0 1 1 1 1 f4g i 0 0 1 1 1 f5g i 0 0 1 1 1 Another example is X f1g i X f2g i X f3g i X f4g i X f5g i f1g i 0 1 0 1 1 f2g i 1 0 1 1 1 f3g i 1 1 1 0 1 f4g i 0 1 1 1 1 f5g i 1 1 1 1 1 : W e no w apply the algorithm to pro v e theorem 13. As in example 8, the bulk of the algorithm b ecomes a pro of b y induction. F or k2f1;:::;KTg, the k th step of the algorithm will iden tify fkg (X i = ) ev erywhere in supp(X i ) according to the same induction rule, whic h w e la y out b elo w. Supp ose that in step k> 1, w e ha v e already ev erywhere iden tied the CAPEs of the rst k1 regressors. 64 F or substep 1 of step k KT , with Sfk;:::;Kg and the kno wn function a() E[y i jX i = ] X S c S c (X i =), w e use the momen t condition a() = S S (X i =) dened o v er the set =fj2 supp(X i ) and (S\D k )C k 2 range( S\D k \C k )g: This momen t condition is dened o v er the en tiret y of supp (X i ), but w e limit ourselv es to the set . T o understand wh y , note that (S\D k )C k is the index set of regressors that fulll three criteria: their CAPEs ha v e not y et b een iden tied, their co ecien ts mean-dep end on regressor k , and co ecien t k do es not mean-dep end on them. In terms of the in tuition from Section 14.3.1, these regressors are imp ortan t b ecause their CAPEs can c hange when w e c hange regressor k , and so con tribute to the indirect eect that w e need to pro ject out. Ho w ev er, b ecause co ecien t k do es not mean-dep end on them, if w e can iden tify fkg (X i =) in , then ev en though their regressors v alues’ are restricted in this set, w e can still use lemma 6 to iden tify fkg (X i =) ev erywhere in supp(X i ) as long as supp(X C k i jX i 2 ) = supp(X C k i ). The restriction placed on (S\D k )C k in has the follo wing critical consequence: range( S\D k ) = range( S\D k \C k ) for all 2 : (25) In tuitiv ely , regressors in (S\D k )C k are linearly dep enden t on those in S\D k \C k and therefore redundan t in determining the range of S\D k . By assumption A13.2 and equation 25, w e also ha v e dim(range( S\D k ))<T: (26) T ak e an arbitrary p oin t 0 2 . W e will no w determine whether substep 1 iden ties fkg (X i = 0 ). W e separately consider the t w o cases where k = 2C k and k2C k and w e dene supp(X fkg i jX fkg c i = fkg c 0 and X i 2 ); (27) as in corollary 4. By assumption A13.2, = supp(X fkg i ): 65 First, supp ose that k = 2C k . Equations 26 and 27 immediately satisfy corollary 4’s condition LC4.1, and f1g (X i = 0 ) is therefore iden tied. No w supp ose that k2C k . By equations 26 and 27, fkg (X i = 0 ) is iden tied b y corollary 4’s condition LC4.2 wherev er fkg 0 = 2 range( (S\D k )fkg 0 ). Mo ving on to substep 2, w e skip this substep if k = 2 C k ; substep 1 already iden tied fkg (X i = ) ev erywhere in . If k2 C k , w e rst note that substep 1 iden tied fkg (X i = ) ev erywhere in the set 1 fj2 and fkg = 2 range( (S\D k )fkg )g. As w e sho w b elo w, 1 is a dense subset of , and so substep 2 iden ties fkg (X i =) ev erywhere in , using lemma 5 and assumption A13.3. T o see that 1 is dense in , consider an arbitrary p oin t 0 2 1 and the sequence of p oin ts ( p ) 1 p=1 f1g 0 + p f2g 0 ::: fKg 0 , with 2 supp(X fkg i ) c hosen suc h that = 2 range( (S\D k )fkg 0 ). Assumption A13.1 and equation 26 ensure that suc h a exists. p 2 1 for p2f1;:::;1g, and lim p!1 ( p ) = 0 . Finally , in substep 3, w e use lemma 6 to iden tify fkg (X i = ) ev erywhere in supp(X i ), b ecause it is already iden tied ev erywhere in , and supp(X C k i jX i 2 ) = supp(X C k i ) b y assumption A13.1. This completes the induction rule. T o pro vide the base case, simply note that this rule iden ties f1g (X i = ) ev erywhere in supp(X i ) when applied with S =f1;:::;Kg. The induction pro cess con tin ues un til w e reac h step KT +1, in whic h w e instead iden tify the remaining CAPEs fKT+1;:::;Kg (X i =) using lemma 2’s condition L2.1. In this step, with SfKT + 1;:::;Kg and the kno wn function a() dened as ab o v e, w e use the momen t condition a() = S S (X i =) for all 2 supp(X i ): Here, = supp(X i ). S is a TT matrix, and supp(X S i ) = R TT . Wherev er S has full rank, condition L2.1 is satised for all k2 S , iden tifying S (X i = ) on a dense subset of supp(X i ). Substep 2 then uses lemma 5 and assumption A13.3 to iden tify S (X i =) ev erywhere. Example 14. T riangular dep endence This example again resem bles example 8, but uses a more strictly triangular dep endence structure, allo ws one more co ecien t to mean-dep end on eac h regressor, and requires stricter supp ort conditions. This nests the dep endence restrictions of Graham and P o w ell (2012). Eac h step of the algorithm applies lemma 2’s condition L2.1 on a p oten tially thin set and extrap olates using lemma 6 to ev erywhere iden tify some regressors’ CAPEs. The rst step do es this for a set of T co ecien ts that all mean-dep end on eac h other’s regressors, again com bining b oth of the earlier-discussed p oten tial sources of irregularit y . 66 Theorem 15. (X i =) is identie d for al l 2 supp(X i ) under the fol lowing assumptions: A15.1. supp(X f1;:::;K1g i ) =R TK1 . X fKg i is nonsto chastic and e quals 1 T . A15.2. F or al l k2f1;:::;Kg, C k fmax(1;kT );:::;Kg A15.3. (X i =) is everywher e c ontinuous in . The elemen ts of the dep endence matrix created b y assumption A15.2 are unrestricted ab o v e and on the main diagonal, as w ell as in the T 1 elemen ts immediately b elo w the main diagonal in eac h column. Other elemen ts are all 0. The dep endence matrix with the greatest total degree of dep endence is as follo ws for the case with K = 5, T = 3: X f1g i X f2g i X f3g i X f4g i X f5g i f1g i 1 1 1 1 1 f2g i 1 1 1 1 1 f3g i 1 1 1 1 1 f4g i 0 1 1 1 1 f5g i 0 0 1 1 1 : Compared to the rst dep endence matrix illustrated in example 8, this matrix has one more 1 in eac h column. Compared to the dep endence restrictions of Graham and P o w ell (2012), where up to T co ecien ts ha v e unrestricted heterogeneit y and the others are mean-indep enden t of the regressors, this dep endence matrix has 1s in more than T ro ws. The dep endence matrix for Graham and P o w ell (2012)’s dep endence restrictions is as follo ws, again for K = 5, T = 3: X f1g i X f2g i X f3g i X f4g i X f5g i f1g i 1 1 1 1 1 f2g i 1 1 1 1 1 f3g i 1 1 1 1 1 f4g i 0 0 0 0 0 f5g i 0 0 0 0 0 : The algorithm in this case w orks b y applying lemma 2’s condition L2.1 rep eatedly to subsets of the co ecien ts where the regressors of co ecien ts outside that set are held equal to 0 T . In step 1, substep 1 uses the momen t condition witha()E[y i jX i =],SfKT +1;:::;Kg, and = supp(X i jX S c i =0 TKT ). In this region, E[y i jX i =] = S S (X i =) for all 2 ; 67 as all regressors not indexed in S are zero. X S i is a TT matrix, and its supp ort, conditional on X S c i =0 TKT , is the same as its unconditional supp ort. F or an y 0 2 with nonsingular S 0 , condition L2.1 is satised for ev ery k2S , iden tifying ev ery elemen t of S (X i =) on a dense subset of . The algorithm’s substep 2 then uses lemma 5 and assumption A15.3 to iden tify S (X i = ) ev erywhere in . Finally , substep 3 iden ties S (X i = ) ev erywhere in supp(X i ) b y using lemma 6 and assumption A15.2’s implication that co ecien ts in S i do not mean-dep end on X S c . The next step rep eats this pro cess, but instead using the momen t condition with a() E[y i jX i = ] fKT+1;:::;Kg fKT+1;:::;Kg (X i =),Sf1;:::;KTg, and = supp(X i jX f1;:::;KT1g i =0 TKT ). In this region, E[y i jX i =; ] fKT+1;:::;Kg fKT+1;:::;Kg (X i =) = fKTg fKTg (X i =) for all 2 : W e use lemma 2 to iden tify fKTg (X i = ) ev erywhere in where fKTg 6= 0 T . Lemma 5 and assumption A15.3 then iden ties fKTg (X i = ) ev erywhere in where fKTg = 0 T . Lemma 6 then uses A15.2’s implication that fKTg i do es not mean-dep end on X f1;:::;KT1g i to iden tify fKTg (X i =) ev erywhere. The algorithm pro ceeds similarly through the remaining co ecien ts, w orking through them in rev erse order. In eac h step, it rst uses lemma 2 to iden tify one regressor’s CAPEs wherev er p ossible on the set where the regressors of the other y et-uniden tied co ecien t exp ectations are equal to zero. It then lls in an y gaps in this region using lemma 5, and nally extrap olates these lo cal iden tication results to the en tiret y of supp(X i ) through lemma 6. 15 Estimation W e prop ose to use standard linear regression tec hniques to estimate and p erform inference on a pro jection of the DGP presen ted in Section 13 on to a exible parametric submo del that w e refer to as the pro jection mo del. The pro jection mo del is formed b y in teracting eac h regressor with either a set of individual-sp ecic dumm y v ariables or a xed, nite basis expansion of the regressors that its co ecien ts dep end on. This mo del resp ects the dep endence restrictions made ab out the DGP and can b e structured to nest the DGP as long as eac h co ecien t is either mean-dep enden t on ev ery regressor or mean-dep enden t only on regressors with nitely man y p oin ts of supp ort. When, instead, there is a co ecien t that mean-dep ends on a prop er 68 subset of the regressors and one of those regressors has innitely man y p oin ts of supp ort, the use of xed basis expansions means that the pro jection mo del ma y b e missp ecied unless the researc her mak es further assumptions ab out the functional form of (X i = ). T o distinguish the regressors and co ecien ts in the pro jection mo del from those in the DGP , w e refer to these ob jects in the pro jection mo del as tec hnical regressors and pro jection co ecien ts. W e reserv e the unmo died terms regressor and co ecien t to refer to ob jects in the DGP . Ev en when missp ecied, the pro jection mo del is structured to resp ect an y dep endence restrictions made ab out the DGP to iden tify the CAPEs and APEs of in terest. In estimating the pro jection of a mo del where parameters of in terest are iden tied on to a exible submo del, w e hop e that the iden tied parameters are usefully appro ximated b y corresp onding pseudo-parameters in the pro jection mo del. This is m uc h in the same spirit as the common practice of using a presumably missp ecied linear probabilit y mo del in a binary c hoice setting, with the hop e that the missp ecication will not substan tially bias the estimated APEs. This approac h can alternativ ely b e though t of as truncating a nonparametric series estimator suc h that the n um b er of tec hnical regressors do es not increase with the sample size. While a t ypical series estimator asymptotically decreases bias to zero b y adding progressiv ely more tec hnical regressors to a pro jection mo del as the sample size increases, our exible parametric approac h k eeps the set of tec hnical regressors xed at a v alue that the researc her b eliev es will reduce bias enough to mak e the pseudo-parameters go o d enough appro ximations for practical purp oses. While it app ears p ossible to dev elop fully nonparametric series estimators of (X i = ) under more general conditions, doing so is complicated b y the p oten tial for irregular estimation and an imp osing curse of dimensionalit y . T o see the latter p oin t, note that fkg (X C k i = C k ) is a nonparametric function of ev ery elemen t of a TjC k j matrix. Ev en when T = 3 and a single co ecien t mean-dep ends on just four con tin uously distributed regressors, w e ha v e to cop e with a series expansion in t w elv e v ariables. This problem ma y b e surmoun table through regularization or v ariable selection tec hniques lik e those in Chernozh uk o v et al. (2015) and Zh u and Bradic (2018), and Chen and Liao (2014) treat siev e estimation of p oten tially irregular parameters, but dev eloping a com bined metho d for this setting is non trivial. W e reserv e this topic for future researc h and pro ceed with our parametric pro jection mo del. In the pro jection mo del, an y regressor whose co ecien t mean-dep ends on ev ery regressor ma y b e in ter- acted with individual-sp ecic dumm y v ariables to estimate individual-sp ecic pro jection co ecien ts. When the in tercept is suc h a regressor, this corresp onds to including xed eects in the pro jection mo del. Other co ecien ts’ conditional mean functions are appro ximated in the pro jection mo del b y linear com binations of time-in v arian t basis functions of the regressors they mean-dep end on. These basis functions are in teracted with their resp ectiv e co ecien ts’ regressors to form tec hnical regressors in the pro jection mo del. 69 When a co ecien t mean-dep ends only on regressors with nite supp ort, these basis functions can b e comprised of indicator functions for eac h p oin t in those regressors’ (join t) supp ort. If all the co ecien ts that are not giv en individual-sp ecic pro jection co ecien ts satisfy this criterion and use this basis, the pro jection mo del nests the DGP . A researc her ma y alternativ ely c ho ose to risk missp ecication and use a dieren t, smaller set of basis functions if this set is to o large for practical purp oses. When one of these co ecien ts instead mean-dep ends on an innitely-supp orted regressor, the researc her m ust c ho ose a nite set of basis functions. Lik ely candidates include (functions of ) means, (co)v ariances, higher momen ts, and rates of gro wth of the regressors it mean-dep ends on. When some of these regressors are innitely supp orted and others are nitely supp orted, these basis functions ma y include the indicators for eac h p oin t of the nitely supp orted regressors’ supp ort, as discussed ab o v e, as w ell as their in teractions with other basis functions in the innitely-supp orted regressors. Researc hers ma y also elect to use these nite basis expansions instead of individual-sp ecic pro jection co ecien ts for co ecien ts that mean-dep end on all regressors. This could b e to facilitate a particular in terpretation of the pro jection mo del or to force the pro jection mo del to resp ect con tin uit y assumptions that aect iden tication. W e estimate the common (non-individual-sp ecic) pro jection co ecien ts b y pro jecting out the tec hnical regressors with individual-sp ecic pro jection co ecien ts and regressing the residuals of the outcome on the residuals of the remaining tec hnical regressors. While the individual-sp ecic pro jection co ecien ts are not iden tied, w e can consisten tly estimate their means or their subp opulation-sp ecic means using a separate estimation pro cedure based on the mean group estimator of P esaran and Smith (1995). V ery similar estimators are discussed in Arellano and Bonhomme (2012) (page 1004) and W o oldridge (2010) (page 381). W e also pro vide means to cop e with cases where pro jection co ecien ts are not uniquely dened, either in the p opulation or in a nite sample. This non-uniqueness arises when some of the pro jection co ecien ts are simply not iden tied or when tec hnical regressors are collinear in nite samples. This ma y often b e the case when co ecien ts dep end on discrete regressors, particularly when not all CAPEs are iden tied in the DGP . In example 10, w e sa w in the mo del with essen tial heterogeneit y and xed eects that CAPEs are not iden tied for regressor-sp ecic subp opulations of sta y ers. Similar and more complex patterns of noniden tication can obtain with discrete regressors under dieren t dep endence restrictions, as con tin uit y assumptions cannot ll in gaps left b y the iden tication algorithm’s substep 1 and the subp opulations with noniden tied parameters can ha v e p ositiv e probabilit y . When an y pro jection co ecien ts are not dened, w e remo v e a subset of the tec hnical regressors with undened pro jection co ecien ts from the pro jection mo del. The remaining tec hnical regressors from this group eectiv ely con trol for themselv es as w ell as the remo v ed subset, but do not ha v e in terpretable pro jection co ecien ts. 70 W e b egin b y building matrices of tec hnical regressors. The rst con tains the tec hnical regressors with individual-sp ecic pro jection co ecien ts. Dene the index set of co ecien ts that mean-dep end on ev ery regressor to b e f1;:::;Kg. W e dene the T L M individual-sp ecic tec hnical regressor matrices M i X i for all i2f1;:::;Ng and create the blo c k diagonal matrix M where the i th diagonal blo c k equals M i : M 2 6 6 6 6 4 M 1 0 TL M 0 TL M M 2 . . . . . . . . . 3 7 7 7 7 5 : F or the matrix of tec hnical regressors with common pro jection co ecien ts, dene c =f1;:::;Kg . F or eac hk2 c , dene the 1P k v ector of basis functions f k p for co ecien t k asr k i f k 1 (X C k i ) ::: f k P k (X C k i ) . These basis functions should b e c hosen to k eep the appro ximation error min b2R P k E[( i (X i )r k i b) 2 ] as small as p ossible. F orm the matrix of tec hnical regressors corresp onding to the k th regressor and co ecien t as R k i X fkg i r k i . Stac k these horizon tally to obtain the TL W matrix of tec hnical regressors for the i th individual: W i R k1 i ::: R k j c j i , where the indices k 1 through k j c j are the elemen ts of c . Stac king these matrices v ertically o v er individuals giv es us W W 0 1 ::: W 0 N 0 . W e similarly dene y y 0 1 ::: y 0 N 0 . With these tec hnical regressor matrices formed, the baseline pro jection mo del is y i =M i M i +W i W +v i , ory =M M +W W +v; (28) where M M0 1 ::: M0 N 0 is the v ector of individual-sp ecic pro jection co ecien ts, W is the v ector of common pro jection co ecien ts, v v 0 1 ::: v 0 N 0 is the v ector of pro jection errors, and the mo del satises M 0 i v i =0 L M for all i2f1;:::;Ng andE[W 0 i v i ] =0 L W : (29) As discussed ab o v e, ho w ev er, these equations ma y not uniquely dene M and W . In a nite sample, the least squares estimate of a pro jection co ecien t will b e uniquely dened if and only if its tec hnical regressor is nonzero and linearly indep enden t of the other tec hnical regressors. This non-uniqueness will obtain with probabilit y 1 when the pro jection co ecien t is not iden tied. As suc h, in estimation, w e will ag pro jection co ecien ts with undened estimates as b eing p oten tially noniden tied and drop a subset of them from the regression to mak e OLS p ossible. W e no w detail this pro cess. 71 Dening the NT (NL M +L W ) matrix Z M W , the set of solutions to the least squares problem argmin ^ kyZ ^ k is f^ j^ = Z + y + (I NL M +L W Z + Z)a; a2R NL M +L W g. All ^ 2 ha v e the same k th elemen t if and only if thek th ro w of (I NL M +L W Z + Z) is a zero v ector, and so the k th pro jection co ecien t has a uniquely dened estimate under and only under this circumstance. Dene ^ _ W ^ _ W 0 1 ::: ^ _ W 0 N 0 and ^ ~ W ^ ~ W 0 1 ::: ^ ~ W 0 N 0 to b e the submatrices of W con taining the columns whose pro jection co ecien ts’ estimates are and are not dened, resp ectiv ely . 30 W e lik ewise dene _ W _ W 0 1 ::: _ W 0 N 0 and ~ W ~ W 0 1 ::: ~ W 0 N 0 to b e the submatrices of W con taining the columns whose pro jection co ecien ts are and are not iden tied. W e will treat ^ _ W and ^ ~ W as estimates of their p opulation coun terparts, _ W and ~ W . ^ ~ W , ^ _ W , _ W , and ~ W resp ectiv ely ha v e L ^ ~ W , L ^ _ W , L _ W and L ~ W columns. In the same w a y , w e dene ^ _ M i and ^ ~ M i to con tain the columns of M i with and without dened pro jection co ecien t estimates and dene their p opulation coun terparts _ M i and ~ M i to con tain the columns of M i with and without iden tied pro jection co ecien ts. As b efore, w e dene ^ _ M , ^ ~ M , _ M and ~ M to b e the blo c k diagonal matrices con taining their resp ectiv e i-subscripted matrices in their i th diagonal blo c k. _ M i , ~ M i , _ M , ~ M , ^ _ M i , ^ ~ M i , ^ _ M , and ^ ~ M resp ectiv ely ha v e L _ Mi , L ~ Mi , L _ M , L ~ M , L ^ _ Mi , L ^ ~ Mi , L ^ _ M , and L ^ ~ M columns. Columns in all of these matrices deriv ed from W or M ha v e the same order as they had in W or M . Dene the index sets of iden tied pro jection co ecien ts in W and M as W and M and the index sets of the pro jection co ecien ts with dened estimates as ^ W and ^ M . W e assume through the remainder of this section that all iden tied pro jection co ecien ts asymptotically ha v e probabilit y 1 of ha ving dened estimates. That is, as N!1, P [ ^ W = W ; ^ M = M ]! 1: (30) This implies that P [ ^ _ W = _ W; ^ ~ W = ~ W; ^ _ M = _ M; ^ ~ M = ~ M]! 1. W e no w pic k the subset of the tec hnical regressors with undened pro jection co ecien t estimates to retain in our nal regression mo del. This will let us w ork with a tec hnical regressor matrix that has full column rank. F or this, w e retain a subset of the tec hnical regressors with undened pro jection co ecien t estimates suc h that the retained tec hnical regressors are linearly indep enden t and con tain the dropp ed ones in their span. In the nal regression mo del, the retained tec hnical regressors eectiv ely act as con trols for b oth themselv es and the dropp ed tec hnical regressors. T o easily apply la ws of large n um b ers and similar 30 When none or all of the pro jection co ecien ts are dened, one of these matrices is not dened, or is in some sense empt y . The same will apply to similar matrices dened b elo w. W e b eliev e it will b e clear in what follo ws ho w to ignore suc h a matrix, and omit detailed discussion of this p oin t to a v oid a longer exp osition and more cum b ersome notation. 72 results to the resulting estimator, w e require a deterministic pro cess to build the submatrices of retained tec hnical regressors. F or concreteness, w e use the follo wing pro cedure, whic h steps through the columns of a giv en matrix, retaining eac h if and only if it is linearly indep enden t of all the columns already retained b y earlier steps and the columns of an y other regressor matrices. W e demonstrate with ^ ~ W . Using notation lik e that w e in tro duced in Section 13.1 for submatrices of X i , dene ^ ~ W to b e the submatrix ^ ~ W S , where S is the index set formed b y the follo wing iterativ e pro cedure: 1. Initialize j = 0 and S 0 fg. 2. Increase j b y 1 and dene S j 8 > > < > > : S j1 [fjg if ^ ~ W fjg = 2 range( ^ ~ W Sj1 ^ _ W M ) S j1 if ^ ~ W fjg 2 range( ^ ~ W Sj1 ^ _ W M ) 3. If j =L ^ ~ W , terminate this pro cedure and dene SS j . Otherwise, rep eat steps 2 and 3. W e similarly dene ~ W , ^ ~ M i , and ~ M i b y applying the same pro cedure to ~ W , ^ ~ M i , and ~ M i instead of ^ ~ W . As b efore, w e dene ~ M and ^ ~ M as the blo c k diagonal matrices with ^ ~ M i , and ~ M i as their i th diagonal blo c ks. Finally , w e dene the full matrix of tec hnical regressors deriv ed from W that w e use in estimation as ^ W ^ _ W ^ ~ W and its p opulation analogue as W _ W ~ W . W e also dene their individual-lev el blo c ks suc h that ^ W = ^ W 0 1 ::: ^ W 0 N 0 and W = W 0 1 ::: W 0 N 0 . When W ’s pro jection co ecien t estimates are all dened w e simply ha v e ^ W = W , and when all of its pro jection co ecien ts are iden tied w e ha v e W =W . F or regressors with individual-sp ecic pro jection co ecien ts, w e lik ewise dene ^ M i ^ _ M i ^ ~ M i and M i _ M i ~ M i . ^ M i =M i andM i =M i , resp ectiv ely , when none of these regressors’ pro jection co ecien ts ha v e undened estimates or are noniden tied. As b efore, w e dene ^ M and M to b e the blo c k-diagonal matrices with i th diagonal blo c k equal to ^ M i and M i . ^ W , W , ^ M i , M i , ^ M and M resp ectiv ely ha v e L ^ W , L W , L ^ M i , L M i , L ^ M and L M columns. T o help the reader k eep all this notation straigh t, w e briey summarize our use of accen ts. W e dot a matrix, as in _ W and ^ _ W , to denote the submatrix con taining the tec hnical regressors whic h ha v e iden tied pro jection co ecien ts or dened pro jection co ecien t estimates. Tildes (i.e. ~ W , ^ ~ W ), denote the rest of the tec hnical regressors in the matrix - those with noniden tied pro jection co ecien ts or undened estimates. Underbars (i.e. W , ^ W ) denote matrices of tec hnical regressors where all but a spanning subset of the columns with noniden tied pro jection co ecien ts or undened estimates ha v e b een remo v ed. Hats (i.e. ^ _ W , ^ W ) indicate that an y dots, tildes, and underbars refer to tec hnical regressors with (un)dened pro jection co ecien t estimates, rather than (non)iden tied pro jection co ecien ts. 73 W e estimate the nal, reduced pro jection mo del y i = ^ M i ^ M i + ^ W i ^ W +v i ; ory = ^ M ^ M + ^ W ^ W +v; (31) with ^ M 0 i v i =0 L ^ M 0 i for all i2f1;:::;Ng andE[ ^ W 0 i v i ] =0 L ^ W : (32) When ^ M =M and ^ W =W , this mo del is y i =M i M i +W i W +v i ; ory =M M +W W +v; with M 0 i v i =0 L M i for all i2f1;:::;Ng andE[W 0 i v i ] =0 L W : Here, w e can separate out the pro jection co ecien ts in this reduced mo del in to the blo c ks W _ W0 ~ W 0 0 and M i _ M0 i ~ M 0 i 0 , and lik ewise ^ W ^ _ W0 ^ ~ W 0 0 and ^ M i ^ _ M0 i ^ ~ M 0 i 0 . The sup erscripts on these v ectors indicate whic h matrix of tec hnical regressors they m ultiply . V ery imp ortan tly , ~ W and ~ M i (and their nite-sample coun terparts), m ultiply tec hnical regressors whose pro jection co ecien ts in the full pro jection mo del (equations 28 and 29) are undened. As a result, these cannot b e in terpreted in terms of the CAPEs or APEs of X i . Rather, these are essen tially unin terpretable co ecien ts on a set of con trol v ariables. The elemen ts of _ W and _ M i , ho w ev er, are equal to their tec hnical regressors’ pro jection co ecien ts in the full pro jection mo del. The parameters in this nal mo del that w e w an t to estimate and p erform inference on are therefore (functions of ) the elemen ts of _ W and _ M i that corresp ond to identie d parameters of in terest in the DGP . That w e only consider DGP parameters kno wn to b e iden tied is also v ery imp ortan t; a researc her ma y not w an t to base iden tication of DGP parameters on the (lik ely somewhat arbitrary) functional form assumptions implied b y the c hoice of basis functions for the pro jection mo del. In this case, they should not use pro jection co ecien ts - ev en iden tied ones - to estimate corresp onding DGP parameters if iden tication of those parameters has not b een demonstrated through the iden tication algorithm or some other means. T o estimate ^ W , w e use the OLS estimator ^ ^ W after partialing out the tec hnical regressors with individual-sp ecic pro jection co ecien ts. Because w e assume that equation 30 holds and therefore that P [ ^ W = W ]! 1, w e in terpret ^ _ W as an estimator of all of W ’s iden tied pro jection co ecien ts. 74 Theorem 16. Dene the estimator ^ ^ W [ ^ W 0 Q(M) ^ W ] 1 ^ W 0 Q(M)y: Consider the fol lowing assumptions: A16.1. E[jjQ(M i )X i jj 8 F ]<1, E[jj i jj 8 ]<1, and E[jjQ(M i )e i jj 4 ]<1. A16.2. Equation 11 holds. A16.3. Equations 31 and 32 hold. A16.4. E[kQ(M i )W i k 4 F ]<1, E[kW i k 4 F ]<1 and E[kM i k 4 F ]<1. A16.5. E[W 0 i Q(M i )W i ] is p ositive denite. Under the ab ove assumptions, as N!1, we have 31 : C16.1. Dening V E[W 0 i Q(M i )W i ] 1 E[W 0 i v i v 0 i W i ]E[W 0 i Q(M i )W i ] 1 ; we have p N(^ W W ) d !N(0 L W ;V ): C16.2. Dene the r esiduals ^ v i Q(M i )y i Q(M i ) ^ W i ^ ^ W . Dening ^ V [ 1 N N X i=1 ^ W 0 i Q(M i ) ^ W i ] 1 [ 1 N N X i=1 ^ W 0 i ^ v i ^ v 0 i ^ W i ][ 1 N N X i=1 ^ W 0 i Q(M i ) ^ W i ] 1 ; we have ^ V p !V: The pro of can b e found in the app endix. The next theorem presen ts an estimator for the mean of one of the individual-sp ecic pro jection co e- cien ts o v er a sp ecied (sub)p opulation. This can b e used to estimate (the pro jection mo del’s appro ximation to) the APEs of X i o v er subp opulations in whic h these pro jection co ecien ts and their DGP coun terparts are iden tied. These (appro ximate) APEs ha v e the form P N i=1 l 0 i M i P N i=1 l 0 i li , where l i is a selection v ector that se- lects the elemen t of M i corresp onding to the pro jection co ecien t of in terest when individual i is in the subp opulation of in terest. Otherwise, l i =0 L M i . In b oth cases, w e refer to l i as a selection v ector. 31 When ^ W 6= W , the dierences ^ ^ W W and ^ VV (in the denition of the probabilit y limit in conclusion C16.2) ma y b e b et w een incompatible matrices. The follo wing conclusions hold with ^ ^ W W replaced b y an arbitrary v ector in R L W and ^ VV replaced b y an arbitrary matrix in R L W L W when ^ W 6= W . 75 In estimation, w e instead use the selection v ector ^ l i to select the elemen t of ^ ^ M i corresp onding to the pro jection co ecien t of in terest when that pro jection co ecien t’s estimate is dened and individual i is in the subp opulation of in terest. When ^ M = M , the iden tied pro jection co ecien ts all ha v e dened estimates, and ^ l i =l i . The estimator uses theorem 16 to estimate the common pro jection co ecien ts, subtracts out the p ortion of y thereb y explained, p erforms individual-sp ecic regressions of the remainder on ^ M i , and then tak es simple a v erages of the resulting estimates of M i . Theorem 17. Dene ^ l i l( ^ M ;X i ), wher e l(:;:) is a function which takes values in the set of binary L ^ M i -ve ctors with at most one element e qual to one. Dene l i l( M ;X i ). Dene y y ^ W ^ ^ W and y i y i ^ W i ^ ^ W , with ^ ^ W as dene d in the or em 16. With ^ ^ M i ^ l 0 i [ ^ M 0 i ^ M i ] 1 ^ M 0 i y i , ^ 1 N P N i=1 ^ l 0 i ^ ^ M i , and ^ p 1 N P N i=1 ^ l 0 i ^ l i , dene the estimator ^ ^ ^ p : Consider the fol lowing assumptions: A17.1. E[jjQ(M i )X i jj 8 F ]<1, E[jj i jj 8 ]<1, and E[jjQ(M i )e i jj 4 ]<1. A17.2. Equation 11 holds. A17.3. Equations 31 and 32 hold. A17.4. E[kQ(M i )W i k 4 F ]<1, E[kW i k 4 F ]<1 and E[kM i k 4 F ]<1. A17.5. E[W 0 i Q(M i )W i ] is p ositive denite. A17.6. E[k[M 0 i M i ] 1 M i k 4 F ]1 andk M i k 2 <1. A17.7. P [l 0 i l i = 1]> 0. Dene E[l 0 i M i jl 0 i l i = 1], E[l 0 i M i ]; i l 0 i M i , and ^ i (l 0 i ^ ^ M i ^ ): Under the ab ove assumptions, as N!1, we have: C17.1. With U 1 P [l 0 i l i = 1] 2 1 A 1 A 2 U 1 1 A 1 A 2 0 ; U 1 2 6 4 E[ 2 i ] E[ i v 0 i W i ] E[ i v 0 i W i ] 0 E[W 0 i v i v 0 i W i ] 3 7 5; A 1 E[l 0 i [M 0 i M i ] 1 M 0 i W i ], and A 2 E[W 0 i Q(M i )W i ] 1 ; 76 we have p N(^ ) d !N(0;U): C17.2. With ^ U 1 ^ p 2 1 ^ A 1 ^ A 2 ^ U 1 1 ^ A 1 ^ A 2 0 ; ^ U 1 2 6 4 E[^ 2 i ] E[^ i ^ v 0 i ^ W i ] E[^ i ^ v 0 i ^ W i ] 0 E[ ^ W 0 i ^ v i ^ v 0 i ^ W i ] 3 7 5; ^ A 1 1 N N X i=1 l 0 i [ ^ M 0 i ^ M i ] 1 ^ M 0 i ^ W i , ^ A 2 [ 1 N N X i=1 ^ W 0 i Q(M i ) ^ W i ] 1 ; and ^ v i dene d as in the or em 16 , we have ^ U p !U: The pro of can b e found in the app endix. 16 Empirical Application In this section, w e revisit F razis and Lo ew enstein (2005)’s analysis of the eect of job training on w ages. Using panel data from the 1979 cohort of the National Longitudinal Surv ey of Y outh (NLSY), they regress log w ages on v arious transformations of cum ulativ e hours sp en t in job training, a set of con trol v ariables, and xed eects for jobs. Here and b elo w, jobs refer to emplo y er-emplo y ee matc hes. They nd b oth a larger training eect and a higher lik eliho o d of b eing trained for managerial and professional jobs than for blue-collar jobs, whic h is consisten t with the presence of essen tial heterogeneit y . W e apply our iden tication and estimation results to data from the NLSY’s 1997 cohort (NLSY97) in a similar analysis where w e allo w and test for essen tial heterogeneit y b y allo wing mean dep endence of the training co ecien t on the training regressor. The NLSY97 is a longitudinal surv ey of 8,984 p eople living in the United States who w ere aged b et w een 12 and 16 on Decem b er 31, 1996. 6,748 resp onden ts come from a nationally represen tativ e sample, while the remaining 2,236 come from an o v ersample of Blac k and Hispanic p eople. In terviews w ere conducted in ann ual surv ey rounds from 1997 to 2011 and biennially from 2013 on w ards. As of this writing, data is a v ailable from 18 rounds, the last dating from 2017. The surv ey includes questions ab out ev ery job held during or b et w een surv ey rounds, as w ell as ab out formal education and other o ccupational training. This 77 other training includes GED programs, appren ticeships, v o cational sc ho oling (e.g. for business, cosmetology , or n ursing), corresp ondence courses, comm unit y college, and compan y training, among other sources. The training questionnaire asks resp onden ts a detailed set of questions ab out eac h training program they ha v e attended since their last in terview, including questions ab out duration; program completion; and whic h, if an y , emplo y er pro vided or help ed pa y for the training. In k eeping with F razis and Lo ew enstein (2005), w e fo cus on training that w as pro vided or at least partially paid for b y the emplo y er, and use as our main regressor the cub e ro ot of cum ulativ e hours sp en t in this t yp e of training at the presen t job. W e refer to this v ariable from here on out simply as training. Our outcome v ariable is the log of total hourly comp ensation in 1982-1984 US dollars, whic h w e refer to from here on out as comp ensation. Comp ensation includes tips, b on uses, commissions, and similar forms of monetary comp ensation other than base salaries or w age rates. An observ ation in our data is dened b y a job and a surv ey round in whic h related data w as rep orted. F razis and Lo ew enstein (2005)’s con trol v ariables cen ter around a cubic p olynomial in ten ure and these ten ure v ariables’ in teractions with a set of other co v ariates. Because training, lik e ten ure, is (w eakly) mono- tonically increasing o v er time, the t w o are correlated and it is crucial to adequately mo del ten ure eects. T o con trol for unobserv ables impacting b oth training and the ten ure co ecien ts, F razis and Lo ew enstein (2005)’s preferred sp ecications include the cub e ro ot of within-job nal p erio d training among the v ariables in teracted with ten ure. Their preferred sp ecications also include among the regressors a lead and a lag of training, in order to accoun t for dela y ed and an ticipatory training eects when comp ensation is not con- tin uously adjusted. Analyzed b y the iden tication algorithm, the inclusion of these regressors substan tially increases the v alue of T required to iden tify the training CAPEs in sp ecications that additionally feature essen tial heterogeneit y in ten ure and training. A ccordingly , and b ecause F razis and Lo ew enstein (2005) nd that these regressors ha v e small and statistically insignican t co ecien ts in their preferred sp ecications, w e omit these regressors from our mo dels. W e no w analyze a similar mo del with the iden tication algorithm to determine the exten t of heterogeneit y that w e can allo w with a giv en T . W e consider a mo del where the only regressors are an in tercept term, X ncpt i ; ap-term p olynomial expansion in ten ure, denoted b y X tnr i ;:::;X tnr p i ; a training v ariable, X trn i ; a time- v arying con trol v ariable, X tv c i ; and a time-in v arian t con trol v ariable, X tic i . W e denote their co ecien ts and (C)APEs with corresp onding sup erscripts. W e treat X tnr i X trn i as ha ving supp ort o v er the set of T 2 matrices with nonnegativ e elemen ts that are w eakly monotonically increasing from top to b ottom within eac h column. W e imp ose dep endence restrictions that lea v e the in tercept, ten ure, and time-in v arian t con trol’s co ecien ts unrestricted. The restrictions do, ho w ev er, restrict the training co ecien t to mean-dep end only on training, the time-in v arian t con trol, and the in tercept, and restrict the time-v arying con trol’s co ecien t 78 to mean-dep end only on the time-in v arian t con trol and the in tercept. These restrictions accommo date xed eects, essen tial heterogeneit y in b oth training and ten ure, dep endence b et w een the con trols and the training and ten ure co ecien ts, and the concern that ten ure co ecien ts ma y co v ary with training lev els. W e also assume that trn (X i =) is con tin uous in trn tnr ::: tnr p at all p oin ts where 32 trn 6=0 T and that tnr (X i = );::::; tnr p (X i = ) are ev erywhere con tin uous in all con tin uously distributed regressors. Our dep endence matrix is X tv c i X trn i X tnr i ::: X tnr p i X ncpt i X tic i tv c i 0 0 0 0 0 1 1 trn i 0 1 0 0 0 1 1 tnr i 1 1 1 1 1 1 1 . . . . . . . . . . . . . . . . . . . . . . . . tnr p i 1 1 1 1 1 1 1 ncpt i 1 1 1 1 1 1 1 tic i 1 1 1 1 1 1 1 : Before applying the algorithm, it is useful to start b y noting that tic (X i ) is not iden tied, as X ncpt i and X tic i are ev erywhere collinear. This simply reects the w ell-kno wn fact that an y time-in v arian t regressors in the common xed eects mo del are absorb ed in to the xed eects. F urther, this means that the range of an y matrix that con tains X ncpt i and X tic i as columns will b e the same as the range of the same matrix with one of those columns remo v ed. Because ncpt i mean-dep ends on all regressors, this eectiv ely this lets us ignore the mean-dep endence of tic i on other regressors when applying lemma 2 or its corollaries to iden tify those regressors’ CAPEs. In a mo del with additional time-in v arian t regressors, this discussion applies to them, as w ell. W e start the algorithm b y iden tifying tv c (X i = 0 ) for all 0 2 supp(X i ) using corollary 4’s condition LC4.1, pro vided that for all 0 2 supp(X i ),X tv c i tv c 0 has supp ort o v er at least p + 2 linearly indep enden t v ectors, conditional on the v alues of the other regressors. This requires that T p + 2, but is otherwise a w eak supp ort restriction. In a mo del with additional time-v arying con trol v ariables that ob ey the same dep endence restrictions as tv c i , w e use the same argumen t to iden tify them in this step, as w ell. In the algorithm’s second step, w e iden tify all of the training and ten ure co ecien ts using lemma 2’s condi- tion L2.1. F or training, trn (X i = 0 ) is iden tied ev erywhere that trn 0 = 2 range( tnr 0 ::: tnr p 0 ncpt 0 ). Pro vided that Tp + 2, this iden ties trn (X i = 0 ) for all 0 2 supp(X i ), except on a set with Leb esgue 32 Assuming con tin uit y ev erywhere w ould allo w us to iden tify the eect of training for jobs that oer no training, but this seems lik e to o great a demand to place on this assumption. 79 measure 0. Except at p oin ts where trn 0 =0 T , the remaining v alues are iden tied b y our con tin uit y assump- tion, and w e can therefore iden tify the APE of training for the subp opulation of mo v ers - jobs where training increased o v er time. Similar argumen ts iden tify tnr i ;:::; tnr p i ev erywhere. Ultimately , w e require Tp + 2 for iden tication of the mo v ers’ training APE. W e can therefore admit a more exible ten ure p olynomial only at the price of a higher required T . F razis and Lo ew enstein (2005) use a cubic p olynomial, whic h w ould require us to set T = 5. The ma jorit y of jobs in our data do not p ersist o v er 5 surv ey rounds, so w e use a quadratic p olynomial in ten ure and set T = 4. Compared to F razis and Lo ew enstein (2005), who do not balance their panel or trim ten ure outliers, and whose maxim um ten ure is 22.77 y ears, the range of ten ure o v er whic h w e t this p olynomial is substan tially lo w er, with a maxim um of 8.85 y ears. W e exp ect that a quadratic function is sucien tly exible o v er this smaller range. It is w orth noting that this iden tication result still holds if w e allo w the training co ecien ts to dep end on ten ure. W e refrain from doing so, ho w ev er, b ecause this w ould create a set of p + 2 co ecien ts that all mean-dep end on their o wn and eac h others’ regressors. When T =p + 2, this risks irregular iden tication, as discussed in Section 14.5.1. An y pro jection mo del w e use that satises the assumptions of theorem 16 has regularly iden tied pro jection co ecien ts, and the impro v emen t w ould app ear to come from the pro jection mo del’s functional form. As w e w an t to minimize the inuence of these ad-ho c functional form assumptions on our results, w e a v oid estimating an y mo dels that are lik ely to b e irregularly iden tied. Noting that the remo v al of an y dep endencies (i.e. the replacemen t of an y ones in the dep endence matrix with zeros) lea v es an y already-iden tied CAPEs still iden tied, w e estimate sev en v ariations of this mo del with dieren t dep endence restrictions. As time-in v arian t con trols, w e ha v e dummies for o ccupation t yp e (managerial and professional; service; sales and oce; and blue-collar), ASV AB score 33 (zero-lled where missing), a dumm y for whether ASV AB data is missing, a male dumm y , age at start of job, dummies for race (Blac k; Hispanic; mixed race; non-Blac k and non-Hispanic), and t w o sets of time dummies - one for surv ey round and another for the y ear for whic h comp ensation is (p oten tially retrosp ectiv ely) rep orted. All sev en mo dels’ pro jection mo dels include xed eects and the follo wing time-v arying con trols: y ears of education; a union dumm y (zero-lled where missing); a dumm y for whether union data is missing; a part-time dumm y; an ev er-married dumm y; an enrolled-in-sc ho ol dumm y; and non-job training, whic h is the cub e ro ot of cum ulativ e training hours not pro vided b y the presen t job (cum ulativ e o v er all time, including b efore the presen t job started). In mo del 1, all non-in tercept co ecien ts are assumed to b e mean-indep enden t of all regressors. In mo del 2, w e allo w the ten ure co ecien ts to mean-dep end on all the time-in v arian t and time-v arying con trol v ariables, 33 The ASV AB is the Armed Services V o cational Aptitude Battery , a test administered b y the US military to ev aluate p oten tial recruits. The NLSY97 administered a v ersion of the ASV AB to its resp onden ts in 1997 and 1998. 80 except for the time dummies. Our pro jection mo del for this case in teracts the ten ure v ariables with eac h of the time-in v arian t regressors; the mean, maxim um, and standard deviation of non-job training; and the mean and maxim um of eac h of the other time-v arying con trols. Here and in the other mo dels, these means, maxima, and standard deviations are tak en within job. Mo del 3 relaxes mo del 2’s assumptions to also allo w mean-dep endence of the ten ure co ecien ts on training and adds to the pro jection mo del in teractions b et w een the ten ure v ariables and the mean, maxim um, and standard deviation of training. This mo del is the closest to the authors’ preferred mo dels in F razis and Lo ew enstein (2005). Mo del 4 relaxes mo del 3 to allo w mean-dep endence of the ten ure co ecien ts on ten ure, as w ell, and adds in teractions b et w een the ten ure v ariables and the mean, maxim um, and standard deviation of ten ure. This is in tended to capture essen tial heterogeneit y in ten ure. Mo del 5 adds to mo del 4 b y also allo wing the training co ecien ts to mean-dep end on the time-in v arian t co v ariates and adds their in teractions with training to the pro jection mo del. T o incorp orate essen tial heterogeneit y in training, mo del 6 instead allo ws the training co ecien ts to mean-dep end on training, but no con trols. In the pro jection mo del, it in teracts training with the mean, maxim um, and standard deviation of training. Finally , mo del 7 allo ws all the dep endencies allo w ed b y mo dels 1 through 6 and includes all of their in teraction terms in the pro jection mo del. The training CAPEs are iden tied for the en tire p opulation in mo dels where the training co ecien t do es not mean-dep end on training, and instead for the subp opulation of mo v ers in mo dels with essen tial heterogeneit y in training. T o arriv e at our nal dataset, w e drop observ ations from military jobs, unpaid jobs at family businesses, self-emplo y ed jobs, in ternships, and non-traditional jobs suc h as those at temp agencies. W e drop obser- v ations where resp onden ts are y ounger than 18, where (non-logged) comp ensation is less than 1 or greater than 100, and from jobs where the maxim um absolute dierence of (log) comp ensation from its within-job mean is greater than 1.5. W e also drop observ ations with missing v alues of regressors other than union status and ASV AB score. The metho ds dev elop ed in this pap er are for balanced panels, and w e simplify their application to the NLSY97 b y dropping all remaining observ ations other than the rst 4 observ ations from eac h job with at least 4 observ ations. W e then additionally drop jobs where the ten ure v ariables are not strictly monotonically increasing o v er time 34 and jobs whose maxim um v alues of training, ten ure, or non-job training are ab o v e these v ariables’ resp ectiv e 99 th p ercen tiles. Our resulting analytic sample has 26,760 observ ations from 6,690 distinct jobs held b y 4,823 distinct surv ey resp onden ts who liv ed in 4,079 distinct households in 1997. A more detailed discussion of our data-cleaning pro cess can b e found in the app endix. T able 10 presen ts results from tting the sev en pro jection mo dels and calculating the estimated cum ula- 34 This can o ccur when an individual lea v es and later re-en ters the same job. 81 tiv e eect of 60 hours of training, a v eraged o v er the subp opulation with iden tied training CAPEs. 60 hours is the median amoun t of nal-p erio d training for jobs with non-zero training in the data used b y F razis and Lo ew enstein (2005), and w e use this n um b er to simplify comparisons b et w een their and our estimates, though the corresp onding median in our dataset is 48 hours. T able 10’s second through sixth columns indicate the ten ure and training co ecien ts’ dep endencies. C tnr and C trn are the index sets of regressors that the ten ure and training co ecien ts mean-dep end on, resp ectiv ely . The con trol v ariables that the ten ure co ecien t mean-dep ends on in mo dels 2 through 7 is denoted b y Ctrls (All) in column 2, and the time-in v arian t con trols that the training co ecien t dep ends on in mo dels 5 and 7 is denoted b y Ctrls (TI) in column 5. Columns 7 through 9 presen t eac h mo del’s estimate and that estimate’s dierences with the estimates from mo dels 1 and 7. These are denoted b y Estimate, trn hmg , and trn het , resp ectiv ely . Standard errors are in paren theses and are clustered b y the household that resp onden ts liv ed in in 1997. Column 10 presen ts p-v alues from W ald tests for heterogeneous CAPEs, with the n ull b eing that all pro jection co ecien ts on the in teractions with training equal 0. Column 11 presen ts p-v alues from W ald tests for the existence of an y nonzero training CAPEs. These test the n ull h yp othesis that all pro jection co ecien ts on training and its in teractions equal 0. Mo del 1, with no correlated random slop e co ecien ts, has our largest p oin t estimate, :0424, whic h is highly statistically signican t. In tro ducing dep endence of the training co ecien ts on the con trols in mo del 2 and then on training in mo del 3, ho w ev er, reduces this estimate substan tially to a statistically signican t :0284 and then to a totally insignican t :0062. Mo del 4 in tro duces essen tial heterogeneit y in ten ure, and nds no meaningful eect on the estimate. As mo dels 5 through 7 add dep endence of the training co ecien ts on the time in v arian t con trols and on training itself, the eect estimate c hanges little, nev er exceeding :0109, and nev er ac hieving statistical signicance at the 5% lev el. It is also w orth noting that the estimates’ standard errors increase in ev ery case with the addition of more tec hnical regressors to the pro jection mo del, rising from :0105 in mo del 1 to :0182 in mo del 7. Despite nding no evidence for a nonzero mean training eect in the mo dels where ten ure co ecien ts are allo w ed to mean-dep end on training, column 11’s W ald test nds statistically signican t nonzero training CAPEs at the 1.5% lev el for mo dels 5 and 7, where the training co ecien t is allo w ed to mean-dep end on the time-in v arian t con trols. Lik ewise, column 10’s test for CAPE heterogeneit y is signican t in b oth cases at the 1% lev el. The impact of allo wing correlated slop e heterogeneit y is clear in column 8, where w e see that the dif- ferences b et w een the estimates from mo del 1 and mo dels 2 through 5 are signican t, b oth statistically at the 5% lev el and economically . The dierences b et w een mo del 1 and mo dels 6 and 7 are not statistically signican t, though the p oin t estimate of the dierence is economically quite substan tial. In column 9 w e see that no mo del’s estimate is statistically signican tly dieren t from mo del 7’s, seemingly due to its larger 82 T able 10: Estimated A v erage Eects of 60 Hours of T raining in Subp opulation with Iden tied T raining CAPEs C tnr con tains C trn con tains Estimate trn hmg trn het p in tr trn p all trn Mo del Ctrls (All) T raining T en ure Ctrls (TI) T raining 1 (Hmg) 7 7 7 7 7 0.0424*** - 0.0323 - 0.0001 (0.0105) - (0.0177) 2 X 7 7 7 7 0.0284** -0.0140*** 0.0184 - 0.0073 (0.0106) (0.0024) (0.0176) 3 X X 7 7 7 0.0062 -0.0362** -0.0039 - 0.6962 (0.0158) (0.0137) (0.0130) 4 X X X 7 7 0.0080 -0.0343* -0.0020 - 0.6099 (0.0158) (0.0137) (0.0127) 5 X X X X 7 0.0088 -0.0335* -0.0012 0.0093 0.0149 (0.0172) (0.0152) (0.0137) 6 X X X 7 X 0.0109 -0.0315 0.0009 0.4357 0.5679 (0.0178) (0.0169) (0.0031) 7 (Het) X X X X X 0.0101 -0.0323 - 0.0060 0.0093 (0.0182) (0.0177) - Standard errors, in paren theses, are clustered b y household at initial in terview. All mo dels are estimated on the analytic sample describ ed in the main text, with 26,760 observ ations and 4,079 households. C tnr and C trn resp ectiv ely refer to the index sets of regressors that the ten ure and training co ecien ts mean-dep end on. X and 7 resp ectiv ely indicate the presence and absence of the column’s regressors in C tnr or C trn . trn hmg refers to the dierence b et w een the estimates in the presen t ro w and mo del 1. trn het refers to the dierence b et w een the estimates in the presen t ro w and mo del 7. All mo dels include job-sp ecic xed eects and the time-v arying con trols listed in the main text. p in tr trn denotes the p-v alue for the W ald test of the join t h yp othesis that all pro jection co ecien ts on the in teractions with training equal 0. p all trn denotes the p-v alue for the W ald test of the join t h yp othesis that all pro jection co ecien ts on training and its in teractions equal 0. In the columns of estimates and their dierences, starred signicance lev els are as follo ws: with p the p-v alue for the W ald test of a dierence from 0, *** indicates p :001, ** indicates p2 (:001;:01], * indicates p2 (:01;:05]. Estimates are computed using the felm function from the R pac k age lfe. 83 standard error. Giv en the evidence of heterogeneit y in the training eects, it is in teresting to see the apparen tly small impact of allo wing essen tial heterogeneit y in either training or ten ure, as resp ectiv ely seen in the dierences b et w een mo dels 4 and 6 and mo dels 3 and 4. W ald tests (not rep orted in table 10 for mo del 7) for the join t signicance of the terms capturing essen tial heterogeneit y in training - the in teractions b et w een training and basis functions in training - in mo dels 6 and 7 resp ectiv ely ha v e p-v alues of :4357 and :3152. Corresp onding W ald tests (also not in table 10) for the join t signicance of the essen tial heterogeneit y terms for ten ure in mo dels 4 through 7, ho w ev er, nd these terms to b e highly signican t, with p-v alues smaller than :0033 in all three mo dels. There is evidence, then, of essen tial heterogeneit y in ten ure, though its omission from the mo del do es not apparen tly bias training eect estimates. In sp ecications similar to our mo dels 2 and 3, F razis and Lo ew enstein (2005) resp ectiv ely nd training eects of :0476 and :0285. 35 These sp ecications include lags and leads of training, but in sp ecications without these terms, their estimates are virtually unc hanged. 36 Our corresp onding estimates are substan tially smaller, but the dierences are not statistically signican t (p-v alues are :1394 for mo del 2 and:2170 for mo del 3). The dierences b et w een the estimates could deriv e from c hanges in the eect of training o v er time or from our analysis b eing conned to jobs with at least 4 observ ations. Ultimately , it is clear that accoun ting prop erly for correlated random co ecien ts, ev en in con trol v ariables suc h as ten ure, can ha v e a large impact on estimated mean eects. Despite the allo w ance for correlated ran- dom training co ecien ts and for essen tial heterogeneit y in ten ure ha ving a negligible eect on our estimated mean eects, w e do nd evidence for b oth t yp es of heterogeneit y and conclude that mo dels 5 or 7 are to b e preferred in this setting. 17 Conclusion W e ha v e dev elop ed iden tication and estimation results for the linear panel with correlated random co e- cien ts and few er time p erio ds than regressors. Iden tication is based on exclusion restrictions that constrain eac h co ecien t’s mean, conditional on the regressors, to b e a function of a co ecien t-sp ecic subset of the regressors. These iden tifying assumptions, whic h w e call dep endence restrictions, con tribute to the liter- ature b y allo wing dieren t co ecien ts to mean-dep end on dieren t regressors. W e demonstrate that this added exibilit y enables iden tication of conditional and unconditional APEs under a substan tially wider v ariet y of dep endence restrictions than previously considered, including those implied b y an extension of the 35 See table 7 of F razis and Lo ew enstein (2005). These estimates are for their outlier-omitted sample, whic h drops observ a- tions with training in the top 1 p ercen t of the training distribution, m uc h as w e do. 36 See their table 5. 84 essen tial heterogeneit y mo del of Hec kman et al. (2006) that w e dev elop for panel data. These results follo w from our iden tication algorithm, an in tuitiv e pro of strategy whic h researc hers can use to determine iden- tication in settings other than those treated here. W e prop ose estimation and inference pro cedures based on the use of exible parametric pro jection mo dels to appro ximate the DGP’s nonparametric conditional APE functions. Applying the iden tication algorithm and the estimation and inference pro cedures to a w age equation to estimate the eect of job training, w e nd strong evidence of correlated co ecien t heterogeneit y . Conspicuous among a v en ues for further researc h is the dev elopmen t of fully nonparametric estimation and inference pro cedures, and w e plan to con tin ue researc h in this direction. F urther w ork applying the algorithm to study iden tication in additional common empirical settings - particularly those in v olving discrete regressors - is also a clear next step. Extending the algorithm to iden tify (conditional) APEs of functionally dep enden t regressors under w eak er conditions than those presen ted here is clearly p ossible and w ould also mak e a useful future con tribution. Lastly , extending our iden tication and estimation results for use with un balanced panels w ould substan tially enhance this researc h’s empirical relev ance. 85 References Alattar, L., M. Messel, and D. Rogofsky (2018, Ma y). An in tro duction to the understanding america study in ternet panel. So cial Se curity Bul letin 78 (2), 1328. Andrews, I., J. H. Sto c k, and L. Sun (2019). W eak instrumen ts in instrumen tal v ariables regression: Theory and practice. A nnual R eview of Ec onomics 11 (1), 727753. Angrisani, A., A. Kapteyn, E. Meijer, and H. W. Sa w (2014). Recruiting an additional sample for an existing panel. Presen ted at the P anel Surv ey Metho ds W orkshop, Ann Arb or, MI. Angrist, J. D. and J.-S. Pisc hk e (2009, June). Mostly Harmless Ec onometrics: A n Empiricist’s Comp anion . Num b er 8769 in Economics Bo oks. Princeton Univ ersit y Press. Arellano, M. and S. Bonhomme (2012). Iden tifying distributional c haracteristics in random co ecien ts panel data mo dels. The R eview of Ec onomic Studies 79 (3), 9871020. Battaglia, M. P ., D. Izrael, D. C. Hoaglin, and M. R. F rank el (2009, June). Practical considerations in raking surv ey data. Survey Pr actic e 2 (5). Blum b erg, S. J., J. V. Luk e, and M. L. Cynamon (2004, July). Has cord-cutting cut in to random-digit-dialed health surv eys? the prev alence and impact of wireless substitution. In S. B. Cohen and J. M. Lepk o wski (Eds.), Pr o c e e dings of the Eighth Confer enc e on He alth Survey R ese ar ch Metho ds , Hy attsville, MD, pp. 137148. National Cen ter for Health Statistics. Bonhomme, S. and E. Manresa (2015). Group ed patterns of heterogeneit y in panel data. Ec onometric a 83 (3), 11471184. Breitung, J. and N. Salish (2021). Estimation of heterogeneous panels with systematic slop e v ariations. Journal of Ec onometrics 220 (2), 399415. Annals Issue: Celebrating 40 Y ears of P anel Data Analysis: P ast, Presen t and F uture. Cham b erlain, G. (1992). Eciency b ounds for semiparametric regression. Ec onometric a 60 (3), 567596. Chang, L. and J. A. Krosnic k (2009). National surv eys via rdd telephone in terviewing v ersus the in ternet: Comparing sample represen tativ eness and resp onse qualit y . The Public Opinion Quarterly 73 (4), 641678. Chen, X. and Z. Liao (2014). Siev e M inference on irregular parameters. Journal of Ec onometrics 182 (1), 7086. Cheng, X., F. Sc horfheide, and P . Shao (2019, No v em b er). Clustering for m ulti-dimensional heterogeneit y . 86 Chernozh uk o v, V., C. Hansen, Y. Liao, and Y. Zh u (2019). Inference for heterogeneous eects using lo w-rank estimation of factor slop es. Chernozh uk o v, V., C. Hansen, and M. Spindler (2015). V alid p ost-selection and p ost-regularization inference: An elemen tary , general approac h. A nnual R eview of Ec onomics 7 (1), 649688. Coup er, M. P . (2011). The future of mo des of data collection. The Public Opinion Quarterly 75 (5), 889908. F razis, H. and M. A. Lo ew enstein (2005). Reexamining the returns to training: F unctional form, magnitude, and in terpretation. The Journal of Human R esour c es 40 (2), 453476. Go o dman-Bacon, A. (2019, July). Dierence-in-dierences with v ariation in treatmen t timing. Graham, B. S. and J. L. P o w ell (2012). Iden tication and estimation of a v erage partial eects in "irregular" correlated random co ecien t panel data mo dels. Ec onometric a 80 (5), 21052152. Gro v es, R. M. and S. G. Heeringa (2006). Resp onsiv e design for household surv eys: T o ols for activ ely con trol- ling surv ey errors and costs. Journal of the R oyal Statistic al So ciety. Series A (Statistics in So ciety) 169 (3), 439457. Hahn, J. and H. R. Mo on (2010). P anel data mo dels with nite n um b er of m ultiple equilibria. Ec onometric The ory 26 (3), 863881. Hansen, B. E. (2019, August). Econometrics. Ha ys, R. D., H. Liu, and A. Kapteyn (2015, Sep). Use of In ternet panels to conduct surv eys. Behavior R ese ar ch Metho ds 47 (3), 68590. Hec kman, J. J., S. Urzua, and E. V ytlacil (2006, August). Understanding Instrumen tal V ariables in Mo dels with Essen tial Heterogeneit y. The R eview of Ec onomics and Statistics 88 (3), 389432. Khan, S. and E. T amer (2010). Irregular iden tication, supp ort conditions, and in v erse w eigh t estimation. Ec onometric a 78 (6), 20212042. Knaup er, B. (1999). The impact of age and education on resp onse order eects in attitude measuremen t. The Public Opinion Quarterly 63 (3), 347370. Laage, L. (2019, Jan uary). A correlated random co ecien t panel mo del with time-v arying endogeneit y . Lewb el, A. (2019, Decem b er). The Iden tication Zo o: Meanings of Iden tication in Econometrics. Journal of Ec onomic Liter atur e 57 (4), 835903. 87 Lin, C.-C. and S. Ng (2012, August). Estimation of P anel Data Mo dels with P arameter Heterogeneit y when Group Mem b ership is Unkno wn. Journal of Ec onometric Metho ds 1 (1), 114. Meijer, E. (2014). Eectiv e sample size metric for sequen tial imp ortance sampling. mimeo, USC-CESR. Mon tiel Olea, J. L. and C. Pueger (2013, July). A Robust T est for W eak Instrumen ts. Journal of Business & Ec onomic Statistics 31 (3), 358369. Mundlak, Y. (1961). Empirical pro duction function free of managemen t bias. A meric an Journal of A gricul- tur al Ec onomics 43 (1), 4456. Mundlak, Y. (1978a). Mo dels with v ariable co ecien ts: In tegration and extension. A nnales de l’inse e (30/31), 483509. Mundlak, Y. (1978b). On the p o oling of time series and cross section data. Ec onometric a 46 (1), 6985. Murtazash vili, I. and J. M. W o oldridge (2008). Fixed eects instrumen tal v ariables estimation in correlated random co ecien t panel data mo dels. Journal of Ec onometrics 142 (1), 539552. Nagar, A. L. (1959). The bias and momen t matrix of the general k-class estimators of the parameters in sim ultaneous equations. Ec onometric a 27 (4), 575595. New ey , W. K. (1990). Semiparametric eciency b ounds. Journal of Applie d Ec onometrics 5 (2), 99135. P esaran, M. H. (2015). Time Series and Panel Data Ec onometrics . Num b er 9780198759980 in OUP Cata- logue. Oxford Univ ersit y Press. P esaran, M. H. and R. Smith (1995, July). Estimating long-run relationships from dynamic heterogeneous panels. Journal of Ec onometrics 68 (1), 79113. Ra v allion, M. (2009). Ev aluation in the practice of dev elopmen t. W orld Bank R ese ar ch Observer 24 (1), 2953. Remillard, M. L., K. M. Mazor, S. L. Cutrona, J. H. Gurwitz, and J. Tjia (2014). Systematic Review of the Use of Online Questionnaires of Older A dults. Journal of the A meric an Geriatrics So ciety 62 (4), 696705. Riv ers, D. (2013, No v em b er). Commen t. Journal of Survey Statistics and Metho dolo gy 1 (2), 111117. Sanderson, E. and F. Windmeijer (2016). A w eak instrumen t F-test in linear IV mo dels with m ultiple endogenous v ariables. Journal of Ec onometrics 190 (2), 212221. 88 Saradis, V. and N. W eb er (2015). A partially heterogeneous framew ork for analyzing panel data. Oxfor d Bul letin of Ec onomics and Statistics 77 (2), 274296. Sc honlau, M., A. v an So est, A. Kapteyn, and M. Coup er (2009, F ebruary). Selection Bias in W eb Surv eys and the Use of Prop ensit y Scores. So ciolo gic al Metho ds & R ese ar ch 37 (3), 291318. Sc h w arz, N. and S. Sudman (Eds.) (1992). Context Ee cts in So cial and Psycholo gic al R ese ar ch . New Y ork: Springer-V erlag. Sk eels, C. L. and F. Windmeijer (2018, No v em b er). On the sto c k-y ogo tables. Ec onometrics 6 (4), 44. Sriniv asan, T. (1970). Appro ximations to nite sample momen ts of estimators whose exact sampling distri- butions are unkno wn. Ec onometric a 38 (3), 53341. Staiger, D. and J. H. Sto c k (1997). Instrumen tal v ariables regression with w eak instrumen ts. Ec onomet- ric a 65 (3), 557586. Sto c k, J. and M. Y ogo (2005). Identic ation and Infer enc e for Ec onometric Mo dels: Essasy in Honor of Thomas R otherb er g , Chapter T esting for W eak Instrumen ts In Linear IV Regression, pp. 80108. Cam- bridge: Cam bridge Univ ersit y Press. Sto c k, J. H., J. H. W righ t, and M. Y ogo (2002, Octob er). A Surv ey of W eak Instrumen ts and W eak Iden tication in Generalized Metho d of Momen ts. Journal of Business & Ec onomic Statistics 20 (4), 518529. Su, L., Z. Shi, and P . C. B. Phillips (2016). Iden tifying laten t structures in panel data. Ec onometric a 84 (6), 22152264. Sun, Y. (2005, August). Estimation and inference in panel structure mo dels. T ourangeau, R., J. M. Bric k, S. Lohr, and J. Li (2016, Marc h). A daptiv e and resp onsiv e surv ey designs: a review and assessmen t. Journal of the R oyal Statistic al So ciety: Series A (Statistics in So ciety) 180 (1), 203223. V allian t, R., J. A. Dev er, and F. Kreuter (2013). Pr actic al T o ols for Designing and W eighting Survey Samples . New Y ork: Springer. W agner, J., B. T. W est, N. Kirgis, J. M. Lepk o wski, W. G. Axinn, and S. K. Ndia y e (2012). Use of P aradata in a Resp onsiv e Design F ramew ork to Manage a Field Data Collection. Journal of Ocial Statistics 28 (4), 477499. 89 W ang, W., P . C. B. Phillips, and L. Su (2018). Homogeneit y pursuit in panel data mo dels: Theory and application. Journal of Applie d Ec onometrics 33 (6), 797815. W o oldridge, J. M. (2005). Fixed-eects and related estimators for correlated random-co ecien t and treatmen t-eect panel data mo dels. The R eview of Ec onomics and Statistics 87 (2), 385390. W o oldridge, J. M. (2010). Ec onometric A nalysis of Cr oss Se ction and Panel Data , V olume 1 of MIT Pr ess Bo oks . The MIT Press. Y oung, A. (2020, F ebruary). Consistency without inference: Instrumen tal v ariables in practical application. Zhan, Z. (2017, Ma y). Detecting w eak iden tication b y b o otstrap. Zh u, Y. and J. Bradic (2018). Signicance testing in non-sparse high-dimensional linear mo dels. Ele ctr onic Journal of Statistics 12 (2), 33123364. 90 Appendix to Chapter 1 T able 11: Self-rep orted Health With CPS Limited to HH Resp onden ts CPS HRS UAS F ull Phone P erson un wgh wgh0 wgh1 wgh2 wgh3 wgh4 wgh5 Excellen t 0.149 0.094 0.097 0.091 0.108 0.102 0.098 0.111 0.099 0.116 0.102 V ery go o d 0.277 0.321 0.328 0.315 0.354 0.327 0.334 0.353 0.328 0.363 0.333 Go o d 0.330 0.328 0.328 0.328 0.327 0.339 0.335 0.337 0.342 0.335 0.345 F air 0.172 0.191 0.183 0.198 0.171 0.187 0.188 0.157 0.186 0.148 0.177 P o or 0.071 0.067 0.066 0.068 0.041 0.045 0.045 0.043 0.045 0.039 0.043 Me an abs. di - 0.025 0.024 0.026 0.030 0.030 0.031 0.033 0.031 0.036 0.030 See T able 2. T able 12: Home Ownership CPS HRS UAS F ull Phone P erson un wgh wgh0 wgh1 wgh2 wgh3 wgh4 wgh5 N 37,795 16,751 7,654 9,097 1,852 1,852 1,852 1,852 1,852 1,852 1,852 # Missing 0 0 0 0 0 0 0 0 0 0 0 Prop ortion 0 0 0 0 0 0 0 0 0 0 0 missing W eigh ted 0 0 0 0 0 0 0 0 0 0 0 prop ortion missing Notes: wgh0, default UAS w eigh ts using gender, race, age, education, income, household size, census region, and urbanicit y; wgh1, as wgh0 with ner age brac k ets; wgh2, as wgh0 without education, wgh3, as wgh0 without income; wgh4, as wgh0 without education and income; wgh5, as wgh0 without census region and urbanicit y . T able 13: Health Insurance Co v erage CPS HRS UAS F ull Phone P erson un wgh wgh0 wgh1 wgh2 wgh3 wgh4 wgh5 N 37,795 16,751 7,654 9,097 1,852 1,852 1,852 1,852 1,852 1,852 1,852 # Missing 0 0 0 0 27 27 27 27 27 27 27 Prop ortion 0 0 0 0 0.015 0.015 0.015 0.015 0.015 0.015 0.015 missing W eigh ted 0 0 0 0 0.015 0.011 0.010 0.009 0.011 0.009 0.010 prop ortion missing See T able 12. 91 T able 14: Self-rep orted Health CPS HRS UAS F ull Phone P erson un wgh wgh0 wgh1 wgh2 wgh3 wgh4 wgh5 N 37,795 16,751 7,654 9,097 1,852 1,852 1,852 1,852 1,852 1,852 1,852 # Missing 0 14 6 8 3 3 3 3 3 3 3 Prop ortion 0 0.001 0.001 0.001 0.002 0.002 0.002 0.002 0.002 0.002 0.002 missing W eigh ted 0 0.001 0.001 0.001 0.002 0.001 0.001 0.001 0.001 0.001 0.001 prop ortion missing See T able 12. T able 15: Ho w Satised with Life CPS HRS UAS F ull Phone P erson un wgh wgh0 wgh1 wgh2 wgh3 wgh4 wgh5 N 37,795 16,751 7,654 9,097 1,852 1,852 1,852 1,852 1,852 1,852 1,852 # Missing 0 931 706 225 3 3 3 3 3 3 3 Prop ortion 0 0.056 0.092 0.025 0.002 0.002 0.002 0.002 0.002 0.002 0.002 missing W eigh ted 0 0.050 0.084 0.020 0.002 0.001 0.001 0.001 0.001 0.001 0.001 prop ortion missing See T able 12. Figure 6: Predicted Mean Health Status b y Age and Surv ey Mo de, and CPS HH Resp onden t Status 3 3.25 3.5 55-59 60-64 65-69 70-74 75-79 Age UAS 3 3.25 3.5 55-59 60-64 65-69 70-74 75-79 Age HRS-Phone 3 3.25 3.5 55-59 60-64 65-69 70-74 75-79 Age HRS-in-Person 3 3.25 3.5 55-59 60-64 65-69 70-74 75-79 Age CPS-HH-Resp 3 3.25 3.5 55-59 60-64 65-69 70-74 75-79 Age CPS-Non-HH-Resp by age, survey mode, and CPS HH respondent status Predicted Mean Health Status 92 Figure 7: Predicted Probabilit y of Cho osing the First Option (Excellen t Health) b y Age, Surv ey Mo de, and CPS HH Resp onden t Status 0 .1 .2 55-59 60-64 65-69 70-74 75-79 Age UAS 0 .1 .2 55-59 60-64 65-69 70-74 75-79 Age HRS-Phone 0 .1 .2 55-59 60-64 65-69 70-74 75-79 Age HRS-in-Person 0 .1 .2 55-59 60-64 65-69 70-74 75-79 Age CPS-HH-Resp 0 .1 .2 55-59 60-64 65-69 70-74 75-79 Age CPS-Non-HH-Resp by age, survey mode, and CPS HH respondent status Predicted Probability of Choosing the First Option (Excellent Health) Figure 8: Predicted Probabilit y of Cho osing the Last Option (P o or Health) b y Age, Surv ey Mo de, and CPS HH Resp onden t Status 0 .1 .2 55-59 60-64 65-69 70-74 75-79 Age UAS 0 .1 .2 55-59 60-64 65-69 70-74 75-79 Age HRS-Phone 0 .1 .2 55-59 60-64 65-69 70-74 75-79 Age HRS-in-Person 0 .1 .2 55-59 60-64 65-69 70-74 75-79 Age CPS-HH-Resp 0 .1 .2 55-59 60-64 65-69 70-74 75-79 Age CPS-Non-HH-Resp by age, survey mode, and CPS HH respondent status Predicted Probability of Choosing the Last Option (Poor Health) 93 Appendix to Chapter 2 T o deriv e the b ounds stated in theorem 1, rst consider the case with a wholly irrelev an t instrumen t and p erfectly correlated errors: =c = 0 ande =um for a xed scalar m6= 0. When this holds (and z 0 Mu6= 0), ^ = z 0 My z 0 MY = z 0 M(Y +e) z 0 MY = + z 0 Me z 0 M(z +u) = +m z 0 Mu z 0 Mu = +m: Clearly , P [ ^ < ] = 0 when m > 0 and P [ ^ < ] = 1 when m < 0, so the b ounds are simply [0; 1] = [:5 (0); :5 + (0)]. T o dev elop the b ounds when 6= 0, it suces to treat the case where > 0. The b ounds for the case where < 0 follo w immediately from the b ounds for a reparameterization of the mo del that replaces z in the rst stage with ~ z~ , where ~ z =z and ~ = > 0. Because the 2SLS estimator is in v arian t to this reparameterization ( z 0 My z 0 MY = z 0 My z 0 MY = ~ z 0 My ~ z 0 MY ) and the true v alue of is the same in b oth parameterizations, the b ounds on P [ ^ < ] under the reparameterization with a p ositiv e rst-stage co ecien t apply to b oth cases. Assuming no w that > 0, w e in tro duce the follo wing function: s( 0 ) z 0 M(yY 0 ) z 0 Mz = z 0 My z 0 Mz ^ 0 : Three things are imp ortan t to note ab out s( 0 ). First, when ev aluated at , w e ha v e s( ) = z 0 Me z 0 Mz , and so P [s()< 0]!:5 as n!1, b y assumption B.1. Second, s( ^ ) = 0 37 , and, third, s( 0 ) is monotonically decreasing in 0 when ^ > 0, monotonically increasing in 0 when ^ < 0, and constan t when ^ = 0. As a result, sgn(s( 0 )) = 8 > > > > > > < > > > > > > : sgn( ^ 0 ) if ^ > 0 sgn(z 0 My) if ^ = 0 sgn( 0 ^ ) if ^ < 0 ; and this giv es rise to the follo wing cases: 37 The n umerator of s( ^ ) is z 0 M(yY z 0 My z 0 MY ) =z 0 Myz 0 MY z 0 My z 0 MY = 0 94 If ^ > 0; then 8 > > > > > > < > > > > > > : 0 < ^ if s( 0 )> 0 0 = 0 if s( 0 ) = 0 0 > ^ if s( 0 )< 0 : (33) If ^ < 0; then 8 > > > > > > < > > > > > > : 0 < ^ if s( 0 )< 0 0 = 0 if s( 0 ) = 0 0 > ^ if s( 0 )> 0 : (34) With these results, w e can obtain the follo wing, where the o(1) terms collect terms in v olving P [^ = 0] and P [ ^ =j^ < 0]: 38 P [s()< 0] = P [s()< 0 \ ^ > 0] +P [s()< 0 \ ^ < 0] (35) +P [s()< 0 \ ^ = 0] = P [s()< 0j^ > 0]P [^ > 0] +P [s()< 0j^ < 0]P [^ < 0] (36) +P [s()< 0j^ = 0]P [^ = 0] = P [ ^ <j^ > 0]P [^ > 0] (37) +P [ < ^ j^ < 0]P [^ < 0] +o(1) = P [ ^ <j^ > 0]P [^ > 0] + (38) (1P [ ^ <j^ < 0]P [ ^ =j^ < 0])P [^ < 0] +o(1) = P [ ^ <j^ > 0]P [^ > 0] (39) +(1P [ ^ <j^ < 0])P [^ < 0] +o(1): Equation 37 follo ws from equations 36, 33, and 34. With the ab o v e, w e can mo v e on to b ounding P [ ^ <] b y rst noting the simple decomp osition P [ ^ <] =P [ ^ <j^ > 0]P [^ > 0] +P [ ^ <j^ < 0]P [^ < 0] +o(1); (40) rearranging equation 39 in to P [ ^ <j^ > 0]P [^ > 0] =P [s()< 0] (1P [ ^ <j^ < 0])P [^ < 0] +o(1); 38 P [^ = 0]! 0 b y assumption B.2, and P [ ^ = \ ^ < 0]<P [ ^ =]! 0 b y assumption B.1. 95 and substituting this and then P [s()< 0] =:5 +o(1) in to equation 40 to obtain: P [ ^ <] =:5P [^ < 0] + 2P [ ^ <j^ < 0]P [^ < 0] +o(1): (41) Because P [ ^ <] is monotonically increasing in P [ ^ <j^ < 0], w e can obtain upp er and lo w er b ounds on P [ ^ < ] b y ev aluating this last expression with P [ ^ < j^ < 0] equal to its o wn b ounds of 0 and 1. Doing so pro vides b ounds of :5P [^ < 0] +o(1), and setting the o(1) term to its limit of zero pro vides the asymptotic b ounds in part 1 of theorem 1. T o sho w these b ounds’ tigh tness, w e sho w that they are attained when e = um for some xed scalar m6= 0. This is done b y sho wing that under these conditions P [ ^ < j^ < 0] do es, in fact, attain its o wn b ounds of 0 and 1, whereup on the discussion ab o v e implies that the b ounds on P [ ^ <] are also reac hed. P [ ^ <j^ < 0] = P [ z 0 My z 0 MY <j^ < 0] (42) = P [ z 0 MY +z 0 Me z 0 MY <j^ < 0] (43) = P [ z 0 Me z 0 MY < 0j^ < 0]: (44) Notice that z 0 MY < 0 when ^ = z 0 MY z 0 Mz < 0, so that the denominator of z 0 Me z 0 MY is negativ e. It follo ws that P [ ^ <j^ < 0] =P [z 0 Me> 0j^ < 0] =P [z 0 Me> 0j + z 0 Mu z 0 Mz < 0] =P [z 0 Me> 0jz 0 Mu<z 0 Mz]: No w consider the case where e =um. Here, w e ha v e P [ ^ <j^ < 0] =P [z 0 Mum> 0jz 0 Mu<z 0 Mz]. Clearly , z 0 Mu < 0, conditional on z 0 Mu <z 0 Mz , asz 0 Mz < 0. P [z 0 Mum > 0jz 0 Mu <z 0 Mz] is then simply 0 if m > 0 and 1 if m < 0. The b ounds are therefore attained when the errors are p erfectly correlated. 96 Appendix to Chapter 3 F ormal algorithm template Here, w e presen t a more formal v ersion of Section 14.4’s algorithm template, whic h w e encourage researc hers to simplify according to their needs. The template algorithm starts b y dening the empt y sets 0 k =fg for k2f1;:::;Kg. A t the end of the J th step, J k will b e dened to con tain the subset of supp (X i ) where fkg (X i =) has b een iden tied. The set ! s k will serv e the same role in substep s. The substeps of step J are as follo ws. W e include a fourth substep that simply up dates the sets o v er whic h eac h regressor’s CAPEs are iden tied: 1. Apply lemma 2 to al l available moment c onditions: (a) F or eac h subset S 0 f1;:::;Kg (including the empt y set): i. F orm the set o v er whic h S0 (X i =) is already iden tied as =\ k2S J1 k . ii. F or eac h l2Sf1;:::;KgS 0 and 0 2 : A. Apply lemma 2 to iden tify flg (X i = 0 ) wherev er p ossible, with and S as dened ab o v e anda() set to E[y i jX i =]X S0 i S0 (X i =). (b) F or eac h l2f1;:::;Kg: i. Dene ! 1 l supp(X i ) as the set o v er whic h substep 1a iden tied flg (X i = 0 ). 2. Apply lemma 5 to al l sets wher e c ontinuity is assume d and CAPEs ar e identie d: (a) F or eac h l2f1;:::;Kg: i. Where flg (X i = ) is assumed to b e con tin uous in o v er the set l supp(X i ), iden tify flg (X i = 0 ) at all p oin ts in the closure of l \ ( J1 l [! 1 l ) using lemma 5. ii. Dene ! 2 l as the closure of l \ ( J1 l [! 1 l ). 3. Apply lemma 6 to extr ap olate identie d CAPEs to p oints wher e dep endenc e r estrictions al low: (a) F or eac h l2f1;:::;Kg and 0 2 supp(X i ): i. Cho ose a p oin t 2 J1 l [! 1 l [! 2 l suc h that C l = C l 0 if one exists. If suc h a exists, iden tify flg (X i = 0 ) = flg (X i =). ii. Dene ! 3 l f 0 j 0 2 supp(X i ) and (92 J1 l [! 1 l [! 2 l j C l 0 = C l )g. 39 39 This is the set o v er whic h substep 3(a)i iden tied flg (X i = 0 ). 97 4. Up date the identie d r e gions and che ck the stopping c ondition: (a) F or eac h k2f1;:::;Kg dene J k J1 k [! 1 k [! 2 k [! 3 k (b) If J k = J1 k for all k2f1;:::;Kg then terminate the algorithm. Otherwise, pro ceed to step J + 1. This form ulation of the algorithm is written to minimize the n um b er of sub-substeps, whic h leads to some ob vious redundancies. A t the b eginning of step 1, for example, no CAPEs ha v e y et b een iden tied and substep 1a therefore sets tofg for all S 0 6=fg and fails to iden tify an ything for those v alues. Substep 1 ma y also use the same momen t conditions in m ultiple steps, iden tifying the same CAPEs eac h time. Again, w e emphasize that this template serv es to demonstrate the algorithm’s functioning, but is not a v ery practical pro of strategy in itself. See Section 14.5 for examples using a v ariet y of practical mo dications of the algorithm. Pro of of lemma 2 The pro of that condition L2.1 suces for iden tication is simple: Prem ultiplying the lemma’s momen t condition on b oth sides b y the pro jection matrix Q( Sfkg 0 ) yields Q( Sfkg 0 )a( 0 ) =Q( Sfkg 0 ) fkg 0 fkg (X i = 0 ); and b ecause condition L2.1 implies that Q( Sfkg 0 ) fkg 0 6= 0 T , w e can prem ultiply again b y fkg0 0 and rearrange to obtain fkg (X i = 0 ) = fkg0 0 Q( Sfkg 0 )a( 0 ) fkg0 0 Q( Sfkg 0 ) fkg 0 : Note that when Sfkg =fg, Q( Sfkg 0 ) =Q(0 T ) =I T . T o sho w that conditions L2.2 and L2.3 suce for iden tication, w e use a pro of that directly follo ws the iden tication strategy laid out in Section 14.3.1. T o iden tify fkg (X i = 0 ) w e use the c hange in a() observ ed when c hanges from 0 to another v alue 1 . Condition L2.2 ensures that the resulting c hange in a() will b e attributable solely to the c hange in fkg , since the other regressors tak e the same v alues at 0 and 1 . Condition L2.3 then limits the co ecien t-regressor dep endence enough for the direct and indirect eects of the c hange to b e distinguished from eac h other. The pro of follo ws. Supp ose that conditions L2.2 and L2.3 hold. T aking the dierence b et w een the v alues of the momen t 98 conditiona() = S S (X i =) at 0 and 1 , w e ha v e a( 1 )a( 0 ) = S 1 S (X i = 1 ) S 0 S (X i = 0 ): (45) No w consider the expansion of S S (X i = ) in to three groups of regressor-CAPE pro ducts, lik e w e did in equation 16. The rst con tains only regressor k , whose CAPE w e in tend to iden tify; the second con tains the subset of the other regressors whose co ecien ts mean-dep end on regressor k (these ha v e indices in (S\D k )fkg); and the third con tains the subset whose co ecien ts do not (indices in (SD k )fkg). a() = fkg fkg (X i =) + (S\D k )fkg F unction of fkg z }| { (S\D k )fkg i (X i =) (46) + (SD k )fkg (SD k )fkg i (X i =) | {z } Not a function of fkg : Expanding the dierenced momen t condition in equation 45 similarly and rearranging, w e obtain an analogue of equation 18: a( 1 )a( 0 ) = Direct eect z }| { ( fkg 1 fkg 0 ) fkg (X i = 0 ) (47) + fkg 1 k + (S\D k )fkg 1 | {z } Indirect eect ; where k fkg (X i = 1 ) fkg (X i = 0 ) and (S\D k )fkg (X i = 1 ) (S\D k )fkg (X i = 0 ). In this form, ( fkg 1 fkg 0 ) fkg (X i = 0 ) is in terpretable as the direct eect of the c hange in fkg , while the other terms capture the indirect eects that w ork through c hanges in the co ecien t exp ectations. W e no w isolate the direct eect b y pro jecting out the indirect eect terms. W e do this b y prem ultiplying b oth sides of equation 47 b y Q( S\D k 1 ). Clearly , Q( S\D k 1 ) (S\D k )fkg 1 = 0 T . T o see that Q( S\D k 1 ) fkg 1 k = 0 T , consider separately the cases when k2 C k and k = 2 C k . When k2 C k , then k2 S\D k as w ell, and Q( S\D k 1 ) fkg 1 = 0 T . When k = 2 C k , then fkg (X i = 1 ) = fkg (X i = 0 ), so k = 0. The prem ultiplied momen t condition is Q( S\D k 1 )[a( 1 )a( 0 )] =Q( S\D k 1 )[ fkg 1 fkg 0 ] fkg (X i = 0 ): Condition L2.3 guaran tees that [ fkg 1 fkg 0 ] 0 Q( S\D k 1 )[ fkg 1 fkg 0 ]> 0, so b y prem ultiplying again b y 99 [ fkg 1 fkg 0 ] 0 and rearranging, w e nd the follo wing form ula for fkg (X i = 0 ): fkg (X i = 0 ) = [ fkg 1 fkg 0 ] 0 Q( S\D k 1 )[a( 1 )a( 0 )] [ fkg 1 fkg 0 ] 0 Q( S\D k 1 )[ fkg 1 fkg 0 ] : (48) Pro of of corollary 3 W e consider separately the cases where k = 2C k andk2C k . First, supp ose that k = 2C k . 1 satises condition L2.2 b y assumption, so S\D k 1 = S\D k 0 . Substituting this in to condition L2.3 yields condition LC3.1. No w supp ose that k2C k . Because 1 satises condition L2.2, condition L2.3 can b e restated as fkg 1 fkg 0 = 2 range( fkg 1 (S\D k )fkg 0 ): Recall that this holds if and only if there do es not exist a solution v ector a to the equation fkg 1 (S\D k )fkg 0 a = fkg 1 fkg 0 : P artitioning a as a a 1 a 2 0 with a 1 2 R, this is equiv alen t to there existing no solution v ector b 1a 1 a 2 0 to fkg 1 (S\D k )fkg 0 b = fkg 0 : (49) W e no w sho w the equiv alence of condition LC3.2 and condition L2.3 separately in the cases where fkg 0 is and is not in range( (S\D k )fkg 0 ). If fkg 0 2 range( (S\D k )fkg 0 ), then condition LC3.2 do es not hold, and there is a solution v ector b 2 to the subproblem (S\D k )fkg 0 b 2 = fkg 0 so L2.3 fails to hold as w ell. No w supp ose that fkg 0 = 2 range( (S\D k )fkg 0 ). In this case, no v alue of b with b 1 = 0 will solv e equation 49 and the equation can b e rearranged to obtain fkg 0 (S\D k )fkg 0 c = fkg 1 ; where c = 1 b1 1 b 2 0 exists if and only if b do es. A solution c to this equation exists and therefore condition L2.3 fails to hold if and only if fkg 1 2 range( (S\D k ) 0 ). The same is ob viously true of condition LC3.2. 100 Pro of of corollary 4 The pro of of corollary 4 can b e completed b y considering separately the t w o cases where conditions LC4.1 or LC4.2 hold. T o treat the rst case, supp ose that condition LC4.1 holds. k = 2 C k , so w e need to nd a v alue of 1 satisfying lemma 2’s condition L2.2 and corollary 3’s condition LC3.1. Dene the set 1 =f fkg fkg 0 j2 f fkg 0 gg and note that if 2 f fkg 0 g is a set of r linearly indep enden t v ectors, then f fkg 0 j2 2 g 1 is also a set of r linearly indep enden t v ectors. 1 is the set of v alues tak en b y fkg 1 fkg 0 for v alues of 1 6= 0 satisfying condition L2.2. F rom this and condition L2.2, w e can pic k a set of dim(range( S\D k 0 )) + 1 linearly indep enden t v ectors from 1 , and not all of these v ectors can b e con tained in range( S\D k 0 ); a set of linearly indep enden t v ectors c hosen from range ( S\D k 0 ) can ha v e no more than dim(range( S\D k 0 )) elemen ts. This means w e can pic k a condition-L2.2-complian t v alue of 1 that satises condition LC3.1. T o treat the second case, supp ose that condition LC4.2 holds. By assumption, k 2 C k and fkg 0 = 2 range( (S\D k )fkg 0 ). is the set of v alues of fkg 1 tak en on b y v alues of 1 satisfying condition L2.2, and, just lik e in the rst case, the assumption that there are at least dim (range( S\D k 0 )) + 1 linearly indep enden t v ectors in implies that w e can alw a ys nd a v alue of 1 suc h that fkg 1 = 2 range( S\D k 0 ). This satises corollary 3’s condition LC3.1. Pro of of lemma 7 The pro of of the suciency of condition L7.1 is the same as the pro of of the suciency of condition L2.1 for lemma 2 in Section 14.3.1. T o pro v e the suciency of conditions L7.2 and L7.3, w e use a pro of to similar to that for the suciency of conditions L2.2 and L2.3 in lemma 2. The main dierence is that w e allo w 0 and 1 to dier in all the regressors indexed in S 0 , p er condition L2.2. Supp ose that conditions L2.2 and L2.3 hold. Dierencing the lemma’s momen t condition at these p oin ts, w e ha v e a( 1 )a( 0 ) = S 1 S (X i = 1 ) S 0 S (X i = 0 ): (50) This set con tains the indices of the co ecien ts that mean-dep end on the regressors indexed b y S 0 . W e expand S S (X i =) in to three groups of regressor-co ecien t pro ducts. The rst con tains the regressors in S 0 ; the second has other regressors whose co ecien ts mean-dep end on regressors indexed b y S 0 (these ha v e indices in (S\D 0 )S 0 ); and the third con tains the remaining regressors (indices in (SD 0 )S 0 ). 101 a() = S0 S0 (X i =) + Not a function of S 0 z }| { (S\D0)S0 F unction of S 0 z }| { (S\D0)S0 i (X i =) (51) (SD0)S0 (SD0)S0 i (X i =) | {z } Not a function of S 0 Expanding the dierenced momen t condition in equation 50 similarly and rearranging w e obtain a( 1 )a( 0 ) = Direct eect z }| { ( S0 1 S0 0 ) S0 (X i = 0 ) (52) + S0 1 l + (S\D0)S0 1 | {z } Indirect eect ; where l S0 (X i = 1 ) S0 (X i = 0 ); and (S\D0)S0 (X i = 1 ) (S\D0)S0 (X i = 0 ): In this form, ( S0 1 S0 0 ) S0 (X i = 0 ) is in terpretable as the (y et-uniden tied part of ) the direct eect of the c hange in S0 , while the other terms capture the indirect eects that w ork through c hanges in the co ecien t exp ectations. W e no w isolate the direct eect b y pro jecting out the indirect eect terms. W e do this b y prem ultiplying b oth sides of equation 47 b y Q( S\D0 1 ). Clearly , Q( S\D0 1 ) (S\D0)S0 1 = 0 T . T o see that Q( S\D0 1 ) S0 1 l = 0 T , consider an arbitrary regressor in (column of ) S0 1 . Note that when this regressor’s co ecien t mean-dep ends on an y of the regressors in S0 1 , its index is in S\D 0 , and its pro duct with Q( S\D0 1 ) is 0 T . If, instead, that regressor’s co ecien t do es not mean-dep end on an y of the regressors in S0 1 , its corresp onding elemen t of equals 0. The prem ultiplied momen t condition is Q( S\D0 1 )[a( 1 )a( 0 )] =Q( S\D0 1 )[ S0 1 S0 0 ] S0 (X i = 0 ): Condition L7.3 guaran tees that [ S0 1 S0 0 ] 0 Q( S\D0 1 )[ S0 1 S0 0 ] is in v ertible, so b y prem ultiplying again b y 102 [ S0 1 S0 0 ] 0 and solving for S0 (X i = 0 ), w e obtain S0 (X i = 0 ) = ([ S0 1 S0 0 ] 0 Q( S\D0 1 )[ S0 1 S0 0 ]) 1 [ S0 1 S0 0 ] 0 Q( S\D0 1 )[a( 1 )a( 0 )]: Pro of of theorem 11’s conclusions C11.3 and C11.4 F or all k2f1;:::;K 1g, w e construct alternativ e CAPEs ~ (X i =) that are observ ationally equiv alen t to the true CAPEs(X i =), but for whic h ~ fkg (X i =)6= fkg (X i =) and ~ fKg (X i =)6= fKg (X i =) for all 2 supp(X i jX fkg i 2 range(1 T )). This suces to pro v e C11.3 and C11.4. Cho ose an arbitrary k2f1;:::;K 1g. W e lea v e elemen ts of ~ (X i = ) other than the k th and K th unaltered: ~ fk;Kg c (X i =) fk;Kg c (X i ): (53) Next, with B( fkg ) an arbitrary function of fkg satisfying B( fkg ) = 0 where X fkg i = 2 range(1 T ), dene ~ fkg (X i =) fkg (X i =) +B( fkg ) and ~ fKg (X i =) fKg (X i =)B( fkg ) fkg0 1 T T : Note that ~ (X i = ) ob eys the same dep endence restrictions as (X i = ) do es under assumption A11.2. X fKg i = 1 T b y assumption A11.1 and X fkg i =1 T for some 2R wherev er X fkg i 2 range(1 T ). F or ev ery 2 supp(X i jX fkg i 2 range(1 T )), there is therefore some 2R suc h that fk;Kg ~ fk;Kg (X i =) = fkg ~ fkg (X i =) + fKg ~ fKg (X i =) =1 T [ fkg (X i =) +B( fkg )] +1 T [ fKg (X i =)B( fkg ) 1 0 T 1 T T ] =1 T fkg (X i =) +1 T B( fkg ) +1 T fKg (X i =)1 T B( fkg ) =1 T fkg (X i =) +1 T fKg (X i =) = fkg fkg (X i =) + fKg fKg (X i =) = fk;Kg fk;Kg (X i =): 103 It follo ws from this and equation 53 that ~ (X i = ) = (X i = ) for all 2 supp(X i ), pro viding observ ational equiv alence, and ~ fk;Kg (X i = )6= fk;Kg (X i = ) for 2 supp(X i jX fkg i 2 range(1 T )), pro ving noniden tication o v er this set (conclusion C11.3). This also pro v es noniden tication of fKg (X i = ) on the set where an y of the other CAPEs are noniden tied, [ k2f1;:::;K1g supp(X i jX fkg i 2 range(1 T )) (conclusion C11.4). Pro of of theorem 16 Dene the OLS estimator regressing y on W instead of ^ W as ^ W [W 0 Q(M)W ] 1 W 0 Q(M)y: P [ ^ W = W ; ^ M = M ]! 1 b y assumption, so P [^ ^ W = ^ W ]! 1 and it suces to sho w that conclusions C16.1 and C16.2 hold when ^ ^ W is replaced b y ^ W . The rest of the pro of treats this case. W e start b y establishing that assumptions A16.1 and A16.4 imply that E[kv i k 4 ]<1. T o see this, note that Q(M i )X i i +Q(M i )e i =Q(M i )y i =Q(M i )W i W +Q(M i )v i and Q(M i )v i =v i b y equation 32. Therefore, v i =Q(M i )X i i +Q(M i )e i Q(M i )W i W : W e then ha v e E[kv i k 4 ] =E[kQ(M i )X i i +Q(M i )e i Q(M i )W i W k 4 ] E[(kQ(M i )X i i k +kQ(M i )e i k +kQ(M i )W i W k) 4 ] E[4 3 (kQ(M i )X i i k 4 +jjQ(M i )e i k 4 +kQ(M i )W i W k 4 )] E[4 3 (kQ(M i )X i k 4 F k i k 4 +kQ(M i )e i k 4 +k W k 4 kQ(M i )W i k 4 F )] 4 3 ( q E[kQ(M i )X i k 8 F ]E[k i k 8 ] +E[kQ(M i )e i k 4 ] +k W k 4 E[kQ(M i )W i k 4 F ]) <1: 104 F rom here, the pro of is standard. Decomp ose p N(^ ^ W W ) = p N[W 0 Q(M)W ] 1 W 0 Q(M)v = [ 1 N N X i=1 W 0 i Q(M i )W i ] 1 1 p N N X i=1 W 0 i v i Because, b y assumption A16.4 E[kW 0 i v i k 2 ]E[kW 0 i k 2 F kv i k 2 ]< q E[kW 0 i k 4 F ]E[kv i k 4 ]<1; andE[W 0 i v i ] =0 L W , our assumption of i.i.d. sampling lets us apply the Lindeb erg-Levy CL T to nd that 1 p N N X i=1 W 0 i v i d !N(0 L W ;E[W 0 i v i v 0 i W i ]): Lik ewise, assumption A16.4 lets us nd b y the WLLN that 1 N N X i=1 W 0 i Q(M i )W i p !E[W 0 i Q(M i )W i ]; and the con tin uous mapping theorem immediately giv es us the follo wing. p N(^ W W ) d !N(0 L W ;V ); with V E[W 0 i Q(M i )W i ] 1 E[W 0 i v i v 0 i W i ]E[W 0 i Q(M i )W i ] 1 ; pro ving conclusion C16.1. W e no w pro v e the consistency of the standard error estimate - conclusion C16.2, still for the case where ^ ^ W = ^ W . Here, the residuals ^ v i =Q(M i )y i Q(M i )W i ^ W : Assumption A16.4’s assumption of nite 4 th momen ts of Q(M i )W i and the i.i.d. assumption imply that max i2f1;:::;Ng kQ(M i )W i k F = o p (N 1=4 ) (see, for example, Hansen (2019), pages 200-201). By its p N - consistency , ^ W W =O p (N 1=2 ). With the dierence b et w een the residuals and true pro jection errors v i ^ v i v i =Q(M i )W i ( W ^ W ); 105 it follo ws that q v max i2f1;:::;Ng k v i k F =o p (N 1=4 )O p (N 1=2 ) =o p (N 1=4 ): Our estimate of E[W 0 i v i v 0 i W i ] expands as 1 N X i [W 0 i ^ v i ^ v 0 i W i ] = 1 N X i [W 0 i v i v 0 i W i ] + 1 N X i [W 0 i (^ v i ^ v 0 i v i v 0 i )W i ] =E[W 0 i v i v 0 i W i ] +o p (1) + 1 N X i [W 0 i ( v i v 0 i + v i v 0 i +v i v 0 i )W i ]; and the summation on the righ t con v erges to 0 b ecause k 1 N X i [W 0 i ( v i v 0 i + v i v 0 i +v i v 0 i )W i ]k F 1 N X i kW 0 i ( v i v 0 i + v i v 0 i +v i v 0 i )W i k F 1 N X i [kW i k 2 F (k v i k 2 + 2k v i kkv i k)] 1 N X i [q 2 v kW i k 2 F + 2q v kW i k 2 F kv i k] =q 2 v 1 N X i kW i k 2 F +2q v 1 N X i kW i k 2 F kv i k =o p (1)(E[kW i k 2 F ] +o p (1)) +o p (1)(E[kW i k 2 F kv i k] +o p (1)) =o p (1): As a result, 1 N X i [W 0 i ^ v i ^ v 0 i W i ] =E[W 0 i v i v 0 i W i ] +o p (1): Conclusion C16.2 follo ws from this and the con tin uous mapping theorem. Pro of of theorem 17 Muc h as in our pro of of theorem 16, it suces to treat the case where ^ W = W and ^ M = M (and therefore ^ l i = l i ), and w e do so in what follo ws. Assumptions A17.1 through A17.5 are the same as theorem 16’s assumptions, and w e can reuse an y results dev elop ed in that theorem’s pro of. Dene pP [l 0 i l i = 1]. Note that ^ p p !p b y the WLLN. 106 F or later use, recall that ^ A 1 = 1 N N X i=1 l 0 i [M 0 i M i ] 1 M 0 i W i , A 1 =E[l 0 i [M 0 i M i ] 1 M 0 i W i ]; ^ A 2 = [ 1 N N X i=1 W 0 i Q(M i )W i ] 1 , and A 2 =E[W 0 i Q(M i )W i ] 1 ; and note that under the WLLN, ^ A 1 p ! A 1 under assumptions A17.4 and A17.6., and ^ A 2 p ! A 2 under assumptions A17.4 and A17.5. Recall that i l 0 i M i . Noting that y =M i M i W i (^ W W ) +v i , w e ha v e ^ 1 N N X i=1 l 0 i [M 0 i M i ] 1 M 0 i y i = 1 N N X i=1 l 0 i [M 0 i M i ] 1 M 0 i (M i M i W i (^ W W ) +v i ) = 1 N N X i=1 [l 0 i M i l 0 i [M 0 i M i ] 1 M 0 i W i (^ W W )] = + 1 N N X i=1 [ i l 0 i [M 0 i M i ] 1 M 0 i W i (^ W W )] Substituting ^ W W = [ P N i=1 W 0 i Q(M i )W i ] 1 P N i=1 W 0 i v i (from theorem 16), in to the ab o v e, w e ha v e ^ = 1 N N X i=1 ( i l 0 i [M 0 i M i ] 1 M 0 i W i [ N X j=1 W 0 j Q(M j )W j ] 1 N X k=1 W 0 k v k ) = 1 N N X i=1 i + ^ A 1 ^ A 2 N X k=1 W 0 k v k = 1 ^ A 1 ^ A 2 1 N N X i=1 2 6 4 i W 0 i v i 3 7 5: By assumption A17.6, w e ha v e E[k i k 2 ]E[kl 0 i M i k 2 ]E[k M i k 2 ]1; and the pro of of theorem 16 sho w ed that E[k W 0 i v i k 2 ]<1. W e can therefore apply the Lindeb erg-Levy 107 CL T to nd that 1 p N N X i=1 2 6 4 i W 0 i v i 3 7 5 d !N(0 W +1 ;U 1 ); with U 1 2 6 4 E[ 2 i ] E i v 0 i W i ] E[ i v 0 i W i ] 0 E[W 0 i v i v 0 i W i ] 3 7 5; from whic h it follo ws that p N(^ ) d !N(0; 1 A 1 A 2 U 1 1 A 1 A 2 0 ); and p N(^ ) d !N(0;U); with U 1 p 2 1 A 1 A 2 U 1 1 A 1 A 2 0 : This pro v es conclusion C17.1. W e no w pro v e the consistency of the asymptotic v ariance estimate in conclusion C17.2. Recall that the empirical analogue to i is ^ i = (l 0 i ^ M i ^ ) and the empirical residuals ^ v i =Q(M i )y i Q(M i )W i ^ W , as in theorem 16. ^ U 1 = 2 6 4 1 N P N i=1 [^ 2 i ] 1 N P N i=1 [^ i ^ v 0 i W i ] 1 N P N i=1 [^ i ^ v 0 i W i ] 0 1 N P N i=1 [W 0 i ^ v i ^ v 0 i W i ] 3 7 5 . W e ha v e already established the con v ergence of ^ A 1 , ^ A 2 , and ^ p to their probabilit y limits, and the pro of of theorem 16 sho ws that 1 N P N i=1 [W 0 i ^ v i ^ v 0 i W i ] p ! E[W 0 i v i v 0 i W i ]. Here, w e only need to demonstrate the con v ergence of the other blo c ks of ^ U 1 in order to establish conclusion C17.2. W e no w establish some results used later. First, note that 1 N P N i=1 2 i p !E[ 2 i ] and 1 N P N i=1 j i j p !E[j i j] b y the WLLN under assumption A17.6. Dene the follo wing quan tities i ^ i i =l 0 i (^ M i M i ) + ^ , and v i ^ v i v i =Q(M i )W i ( W ^ W ): W e will treatk i k andk v i k as measures of ho w far o ^ i and ^ v i are as estimates of i andv i , and our pro of will rely on q = max i2f1;:::;Ng k i k, and q v = max i2f1;:::;Ng k v i k b oth b eing o p (1). In the pro of of theorem 16, w e already established that q v =o p (N 1=4 ): 108 F or q , note that ^ M i M i =l 0 i [M 0 i M i ] 1 M 0 i (M i M i W i (^ W W ) +v i ) M i =l 0 i [M 0 i M i ] 1 M 0 i W i (^ W W ); so q = max i2f1;:::;Ng kl 0 i (^ M i M i ) + ^ k k^ k + max i2f1;:::;Ng kl 0 i [M 0 i M i ] 1 M 0 i W i (^ W W )k k^ k +k(^ W W )k(max i2f1;:::;Ng k[M 0 i M i ] 1 M i k F )(max i2f1;:::;Ng kW i k F ): Under the assumptions A17.4 and A17.6, w e ha v e q M max i2f1;:::;Ng k[M 0 i M i ] 1 M i k F =o p (N 1=4 ) q W max i2f1;:::;Ng kW i k F =o p (N 1=4 ): W e therefore ha v e q Op(N 1=2 ) z }| { k^ k + Op(N 1=2 ) z }| { k(^ W W )k op(N 1=4 ) z}|{ q M op(N 1=4 ) z}|{ q W =o p (1): No w, to see that 1 N P N i=1 ^ 2 i p !E[ 2 i ], w e ha v e 1 N N X i=1 [^ 2 i ] = 1 N N X i=1 [ 2 i ] + 1 N N X i=1 [2 i i + 2 i ] =E[ 2 i ] +o p (1) + 1 N N X i=1 [2 i i + 2 i ]: T o sho w that the last term is o p (1), k 1 N N X i=1 [2 i i + 2 i ]k 1 N N X i=1 [2k i kk i k +k i k 2 ] (54) op(1) z}|{ q 2 + op(1) z}|{ q Op(1) z }| { 1 N N X i=1 2j i j (55) =o p (1): (56) Finally , w e use essen tially the same pro cedure to sho w that 1 N P N i=1 [^ i ^ v 0 i W i ] p ! E[ i v 0 i W i ]. W e estab- 109 lished in the pro of of theorem 16 that E[kv 0 i W i k 2 ] <1, so rst note that E[k i v 0 i W i k]E[j i jkv 0 i W i k] p E[ 2 i ]E[kv 0 i W i k 2 ]<1. W e therefore ha v e b y the WLLN that 1 N P N i=1 [ i v 0 i W i ] p !E[ i v 0 i W i ]. No w decomp ose 1 N P N i=1 [^ i ^ v 0 i W i ] as follo ws: 1 N N X i=1 [^ i ^ v 0 i W i ] = 1 N N X i=1 [ i v 0 i W i ] + 1 N N X i=1 [ i v 0 i + i v 0 i + i v 0 i ]W i =E[ i v 0 i W i ] +o p (1) + 1 N N X i=1 [ i v 0 i + i v 0 i + i v 0 i ]W i : T o sho w that the last term is o p (1), note that E[k i W i k] p E[ 2 i ]E[kW i k 2 F ]<1, so w e ha v e k 1 N N X i=1 [ i v 0 i + i v 0 i + i v 0 i ]W i k 1 N N X i=1 [k i kk v i kkW i k F +k i W i k F k v i k +k i kkv 0 i W i k q |{z} op(1) q v |{z} op(1) 1 N N X i=1 kW i k F | {z } Op(1) + q v |{z} op(1) 1 N N X i=1 k i W i k F | {z } Op(1) + q |{z} op(1) 1 N N X i=1 kv 0 i W i k | {z } Op(1) =o p (1): This establishes conclusion C17.2. NLSY97 data W e used NLSY97 data do wnloaded b et w een 7-27-2020 and 10-17-2020 from the Bureau of Lab or Statistics’ NLS In v estigator w ebsite. Comp ensation in the NLSY97 is rep orted for eac h job the resp onden t has had since their last in terview. Comp ensation is rep orted as of the in terview date if the resp onden t is curren tly emplo y ed at the job and the job has b een ongoing for more than 13 w eeks (26 w eeks for surv ey rounds after 2012). F or jobs that started less than 13 (26) w eeks prior to the in terview, comp ensation as of the job’s start date is rep orted. If a resp onden t is no longer w orking at the job, they rep ort comp ensation as of the job’s end date. W e refer to the date for whic h comp ensation is rep orted as the w age date in what follo ws. The NLSY97’s training questionnaire asks resp onden ts for eac h training program’s start and end dates, though only the y ear and mon th are in the publicly a v ailable data. The surv ey questions used to obtain the n um b er of hours sp en t in eac h training program c hanged in 2004. Before 2004, resp onden ts w ere rst ask ed whether eac h training program had lasted for t w o or more w eeks, and, if so, they w ere ask ed ho w man y 110 w eeks they did not attend the training. Resp onden ts w ere then ask ed ho w man y da ys p er w eek and ho w man y hours p er da y they usually attended. Because w e do not observ e training programs’ full start and end dates, w e assume that training programs lasting less than 2 w eeks w ere one w eek long. F or programs lasting more than t w o w eeks, w e computed the maxim um p ossible n um b er of w eeks they could ha v e attended as the n um b er of mon ths b et w een (and including) the start and end mon ths, m ultiplied b y the n um b er of w eeks p er mon th (assumed to b e (365=12)=7 4:35) and rounded do wn to the nearest w eek. W e then subtracted from this the n um b er of w eeks they rep orted not attending, and used the midp oin t b et w een the resulting estimate of the maxim um n um b er of w eeks attending and t w o (the minim um n um b er of w eeks). Multiplying these estimates of w eeks attended b y the rep orted da ys p er w eek and hours p er da y in attendance yielded our estimate of training hours. W e drop an y training programs with negativ e hours. These arise due to apparen t misrep orting of the n um b er of w eeks sp en t not attending a training and cases where training programs’ end dates precede their start dates. After 2004, the surv ey simply ask ed Ho w man y w eeks did y ou attend this sc ho ol, course or training program for at least one hour? and During the w eeks that y ou w ere attending, ho w man y hours p er w eek did y ou usually sp end in the sc ho ol, course, or training program? W e compute the n um b er of hours sp en t in training as the pro duct of these t w o questions’ answ ers. T raining programs can con tin ue o v er m ultiple surv ey rounds, and the full battery of training questions is ask ed ab out these programs in eac h round. W e use training duration data from the nal round in whic h a giv en training program is rep orted. When a training program rep orts dieren t emplo y ers pro viding or pa ying for the training in dieren t rounds, w e attribute the training program to the emplo y er rep orted in the earliest round with an emplo y er rep orted. W e compute our nal on-the-job training regressor as the cub e ro ot of the sum of hours sp en t in trainings that w ere pro vided b y or paid for b y the observ ation’s job, whic h the resp onden t completed, and whose end mon th preceded the observ ation’s w age date. Regarding other regressors, w e impute missing union status for jobs with non-missing union status in at least one round if the non-missing statuses all tak e the same v alue. In that case, w e set missing union statuses equal to the job’s non-missing union status. F or jobs where dieren t o ccupation t yp es are rep orted in dieren t rounds, w e use the rst round’s rep ort. Y ears of education is determined b y the highest degree attained. W e attribute 10 y ears to observ ations with no degree, 12 to those with a high sc ho ol diploma or GED, 14 to those with an asso ciate’s degree, 16 to those with a bac helor’s degree, 18 to those with a master’s degree, 20 to those with a professional degree, and 21 to those with a PhD. The v ariables w e use for y ears of education and the ev er-married and enrolled-in-sc ho ol dummies are calculated as of the in terview date, whic h ma y not b e equal to the w age date, as discussed ab o v e. The co de w e used to clean the data and p erform the analysis is a v ailable b y request. Please email 111 briannley12@gmail.com with an y requests or questions. 112
Abstract (if available)
Abstract
This dissertation compiles three essays on applied and theoretical econometrics. ❧ Chapter 1, coauthored with Marco Angrisani and Arie Kapteyn, examines sample characteristics and elicited survey measures of two studies, the Health and Retirement Study (HRS), where interviews are done either in person or by phone, and the Understanding America Study (UAS), where surveys are completed online and a replica of the HRS core questionnaire is administered. By considering variables in various domains, our investigation provides a comprehensive assessment of how Internet data collection compares to more traditional interview modes. We document clear demographic differences between the UAS and HRS samples in terms of age and education. Yet, sample weights correct for these discrepancies and allow one to satisfactorily match population benchmarks as far as key sociodemographic variables are concerned. Comparison of a variety of survey outcomes with population targets shows a strikingly good fit for both the HRS and the UAS. Outcome distributions in the HRS are only marginally closer to population targets than outcome distributions in the UAS. These patterns arise regardless of which variables are used to construct post-stratification weights in the UAS, confirming the robustness of these results. We find little evidence of mode effects when comparing the subjective measures of self-reported health and life satisfaction across interview modes. Specifically, we do not observe very clear primacy or recency effects for either health or life satisfaction. We do observe a significant social desirability effect, driven by the presence of an interviewer, as far as life satisfaction is concerned. By and large, our results suggest that Internet surveys can match high-quality traditional surveys. ❧ Chapter 2 proposes a test and confidence procedure to gauge the possible impact of weak instruments in the linear model with one excluded instrument and one endogenous regressor, the model typically used with instrumental variables in applied work. Where b is the two-stage least squares estimator of the endogenous regressor's coefficient, B, we perform inference on worst-case asymptotic values of P[b<B]. The deviation of P[b<B] from .5 can be intuitively read as a deviation from median unbiasedness, providing an interpretable bias test for the just-identified model, where the mean bias E[b-B] is undefined. These inference procedures can easily be made robust to error heteroskedasticity and dependence such as clustering and serial correlation. ❧ Chapter 3 studies the linear panel data model with correlated random coefficients and fewer time periods than regressors. Under unrestricted coefficient heterogeneity, average partial effects (APEs) are not identified in this model. We identify APEs by introducing exclusion restrictions that restrict each coefficient's mean, conditional on the regressors, to be a function of a coefficient-specific subset of the regressors. These subsets can be defined completely independently for each coefficient, extending past results that assumed each coefficient's mean to depend either on all or on no regressors. We develop an intuitive, generally applicable proof strategy for finding (sub)populations whose APEs are identified under a given set of exclusion restrictions and apply it to a number of examples to show that identification is possible under much weaker restrictions than previously studied. We develop inference and estimation results for APEs in a set of tractable models and, in more complex cases, for analogous pseudo-parameters from “flexible parametric” models informed by our nonparametric identification results. To illustrate the methods, we provide an empirical application estimating the returns to job training.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Essays on estimation and inference for heterogeneous panel data models with large n and short T
PDF
Essays on econometrics analysis of panel data models
PDF
Three essays on linear and non-linear econometric dependencies
PDF
Essays on econometrics
PDF
Essays on nonparametric and finite-sample econometrics
PDF
Essays on the econometric analysis of cross-sectional dependence
PDF
Essays on economics of education
PDF
Two essays on financial econometrics
PDF
Two essays in econometrics: large N T properties of IV, GMM, MLE and least square model selection/averaging
PDF
Essays on development and health economics: social media and education policy
PDF
Three essays on the identification and estimation of structural economic models
PDF
Essays on price determinants in the Los Angeles housing market
PDF
Essays on development economics
PDF
Hierarchical approaches for joint analysis of marginal summary statistics
PDF
Three essays on agent’s strategic behavior on online trading market
PDF
Three essays on strategic commuters with late arrival penalties and toll lanes
PDF
Essays on factor in high-dimensional regression settings
PDF
Essays on bioinformatics and social network analysis: statistical and computational methods for complex systems
PDF
Empirical essays on industrial organization
PDF
Four essays on how policy, the labor market, and age relate to subjective well-being
Asset Metadata
Creator
Finley, Brian
(author)
Core Title
Three essays on econometrics
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Economics
Publication Date
04/13/2021
Defense Date
03/09/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
econometrics,heterogeneity,OAI-PMH Harvest,panel data,pre-testing,random coefficients,Statistics,survey methodology,weak instruments
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Pesaran, Mohammad Hashem (
committee chair
), Kapteyn, Arie (
committee member
), Mukherjee, Gourab (
committee member
), Ridder, Geert (
committee member
)
Creator Email
bfinley@usc.edu,brianfinley12@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-442195
Unique identifier
UC11667835
Identifier
etd-FinleyBria-9443.pdf (filename),usctheses-c89-442195 (legacy record id)
Legacy Identifier
etd-FinleyBria-9443.pdf
Dmrecord
442195
Document Type
Dissertation
Rights
Finley, Brian
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
econometrics
heterogeneity
panel data
pre-testing
random coefficients
survey methodology
weak instruments