Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Essays on econometrics
(USC Thesis Other)
Essays on econometrics
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Essays on Econometrics by Bora Kim A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ECONOMICS) August 2021 Copyright 2021 Bora Kim Acknowledgments I am deeply grateful to Roger Hyungsik Moon, Michael Leung, and Geert Ridder for their guidance. This dissertation would have not been possible without their support and advice. They have been outstanding teachers, mentors and friends. I would like to thank Emily Nix for her encouragement and cheerful comments. I greatly benefited from conversations with Cheng Hisao and Hashem Pesaran as well. Their excellent comments greatly motivated me. I would also like to thank Sangsoo Park who has encouraged me to enter this amazing world of academia. Finally, I would like to thank my parents, YoungKook Kim and SangOk Sim, for their endless patience and unconditional support. Thank you for my sister, Hana Kim, as well. ii Table of Contents Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii 1 Introduction 1 2 Analysis of Randomized Experiments with Network Interference and Non- compliance 4 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Model of Treatment Choices and Outcomes . . . . . . . . . . . . . . . . . . . 10 2.2.1 Treatment Choice Model with Spillovers . . . . . . . . . . . . . . . . 10 2.2.2 Potential Outcomes Model with Spillovers . . . . . . . . . . . . . . . 15 2.2.3 Parameters of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2.4 Source of Endogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3 Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3.1 The Problem of Conventional IV Methods . . . . . . . . . . . . . . . 24 2.3.2 Control Function Approach . . . . . . . . . . . . . . . . . . . . . . . 26 2.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.4.1 First-Stage Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.4.2 Second-Stage Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.4.3 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.4.3.1 Inference for First-Stage Game . . . . . . . . . . . . . . . . 31 2.4.3.2 Inference for Second-Stage Regression . . . . . . . . . . . . 33 iii 2.4.4 Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.5 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.5.1 Background and Data . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.5.2 Estimation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.5.3 Impact of Counterfactual Policies . . . . . . . . . . . . . . . . . . . . 43 2.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3 Identification of Causal Effects in Cluster Randomized Experiments with Spillover and Noncompliance: Difference-in-Differences Approach 47 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.2 Setup and Parameters of Interest . . . . . . . . . . . . . . . . . . . . . . . . 49 3.2.1 Setup and Potential Outcomes Framework . . . . . . . . . . . . . . . 49 3.2.2 Parameters of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.3 Identification Using Baseline Outcome . . . . . . . . . . . . . . . . . . . . . 54 3.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.5 Empirical Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.5.1 The Empirical Specification of ITT for Compliers and Never-takers . 60 3.5.2 Estimates of ITT for Compliers and Never-takers . . . . . . . . . . . 61 3.5.3 Tests of the Equal-trend Assumption . . . . . . . . . . . . . . . . . . 62 3.5.4 The Comparison with the LATE in (Crépon et al., 2015) . . . . . . . 63 3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4 On the Use of an Instrumental Variable in Causal Mediation Analysis 65 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.2 Framework and Identifiability . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.2.1 Potential Outcomes and Causal Effects . . . . . . . . . . . . . . . . . 69 4.2.2 Identification Issues and Sequential Ignorability Assumption . . . . . 71 4.3 Instrumental Variable Approach to Mediation Analysis . . . . . . . . . . . . 72 4.3.1 What Does IV Identify? . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.3.2 Monotonicity Conditions . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 iv Bibliography 82 A Appendix to Chapter 1 87 A.1 Proof of Theorem 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 A.2 Proofs of Asymptotic Results . . . . . . . . . . . . . . . . . . . . . . . . . . 88 A.2.1 Proof of Consistency of First-Stage Estimators . . . . . . . . . . . . . 88 A.2.2 Proof of Asymptotic Normality of First-Stage Estimators . . . . . . . 91 A.2.3 Proof of Consistency of Second-Stage Estimators . . . . . . . . . . . 93 A.2.4 Proof of Asymptotic Normality of Second-Stage Estimators . . . . . . 95 A.3 Auxiliary Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 B Appendix to Chapter 2 101 B.1 Wald Estimator Without Exclusion Restriction . . . . . . . . . . . . . . . . 101 B.2 Comparison to Usual DID Estimator . . . . . . . . . . . . . . . . . . . . . . 103 B.3 Additional Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 C Appendix to Chapter 3 111 C.1 Proof of Lemma 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 v List of Tables 2.1 Simulation Study with 3000 Simulations . . . . . . . . . . . . . . . . . . . . 37 2.2 Summary Statistics (n = 583) . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.3 Estimation Results for FS Model (n = 583) . . . . . . . . . . . . . . . . . . . 40 2.4 Estimation results for SS model (n = 583) . . . . . . . . . . . . . . . . . . . 40 vi List of Figures 2.1 Plot of Estimated ( i ; i ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.2 Counterfactual Impact of Means Tested Subsidy on LR-Adoption . . . . . . 45 4.1 Mediation Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 vii Abstract This dissertation studies econometric methods of causal interference. Chapter II provides a method to analyze randomized experiments when there are network mediated interferences and noncompliances. Using a game-theoretic approach, I propose a framework to incorpo- rate interferences and noncompliances while allowing for general effect heterogeneities in outcomes. In Chapter III, my coauthor and I analyze identification and estimation methods for clus- tered randomized experiments when there are spillover effects and one-sided noncompliance. We propose a modified difference-in-differences estimation to incorporate spillover effects. Finally, Chapter IV discusses the use of instrumental variables in a mediation analysis which is becoming increasingly popular in social sciences. I show that when there are un- observed effect heterogeneities, additional monotonicity assumptions are needed in order for instrumental variables estimators to have causal interpretations. viii Chapter 1 Introduction This dissertation is composed of three papers which study econometric methods of causal inference. In the first paper (Chapter II), I study randomized experiments when there are network mediated interferences and noncompliance. I apply the method to evaluate the price subsidies of mosquito nets. In the second paper (Chapter III), my coauthor and I develop the method to analyze clustered randomized experiments when there are spillover effects. We propose to modify the usual difference-in-differences identification strategy to incorporate spillover effects and apply our methods to the microfinance program. These two papers address spillover effects in causal inference methods. Such spillover effects are closely related to a mediation analysis. In the last paper (Chapter IV), I study the use of instrumental variables in a mediation setting. I show that in a heterogeneous effect setup, additional monotonicity assumptions are required in order for instrumental variables estimators to have causal interpretations. In Chapter II, I study causal effects in randomized experiments with network interference and noncompliance. Randomized experiments have become a standard tool in economics. In analyzing randomized experiments, the traditional approach has been based on the Stable Unit Treatment Value (SUTVA: Rubin (1990)) assumption which dictates that there is no interference between individuals. However, the SUTVA assumption fails to hold in many applications due to social interaction, general equilibrium, and/or externality effects. While 1 much progress has been made in relaxing the SUTVA assumption, most of this literature has only considered a setting with perfect compliance to treatment assignment. In practice, however, noncompliance occurs frequently where the actual treatment receipt is different fromtheassignmenttothetreatment. Thischapterthusproposesamethodtoidentifycausal effects in randomized experiments with network interference and noncompliance. Spillovers are allowed to occur at both treatment choice stage and outcome realization stage. In particular, I explicitly model treatment choices of agents as a binary game of incomplete information where resulting equilibrium treatment choice probabilities affect outcomes of interest. Outcomes are further characterized by a random coefficient model to allow for general unobserved heterogeneity in the causal effects. After defining our causal parameters of interest, I propose a simple control function estimator and derive its asymptotic properties under large-network asymptotics. I apply our methods to the randomized subsidy program of Dupas (2014) where I find evidence of spillover effects on both short-run and long-run adoption of insecticide-treated bed nets. Finally, I illustrate the usefulness of our methods by analyzing the impact of counterfactual subsidy policies. In Chapter III, my coauthor and I study clustered randomized experiments when there are spillover effects. Previous literature on instrumental variables method in the context of randomized experiments has assumed an exclusion restriction which states that an instru- ment does not directly affect an outcome of interest. In a clustered randomized experiment, thisassumptionfailstoholdwhenthereisaspilloverorinterferenceacrossindividuals. When the exclusion restriction fails, many causal parameters such as local average treatment effect (LATE) are not point-identified. We show that in clustered randomized experiments with one-sided noncompliance, point-identification of local causal effect for compliers, which is often the parameter of interest, is possible under a mild difference-in-differences type as- sumption. Furthermore, we can identify an indirect effect of intervention for never-takers, which can be used to test whether there is spillover effect or not. We illustrate our method in 2 our empirical analysis of a microcredit program in rural Morocco from Crépon et al. (2015), where we find evidence of spillover effects. Finally, Chapter IV discusses the use of instrumental variables in mediation setting. Em- pirical researchers are often interested in not only whether a treatment affects an outcome of interest, but also how the treatment effect arises. Causal mediation analysis provides a for- mal framework to identify causal mechanisms through which a treatment affects an outcome. The most popular identification strategy relies on so-called sequential ignorability (SI) as- sumption which requires that there is no unobserved confounder that lies in the causal paths between the treatment and the outcome. Despite its popularity, such assumption is deemed to be too strong in many settings as it excludes the existence of unobserved confounders. This limitation has inspired recent literature to consider an alternative identification strategy based on an instrumental variable (IV). This paper discusses the identification of causal me- diation effects in a setting with a binary treatment and a binary instrumental variable that is both assumed to be random. In this chapter, I show that while IV methods allow for the pos- sible existence of unobserved confounders, additional monotonicity assumptions are required unless the strong constant effect is assumed. Furthermore, even when such monotonicity assumptions are satisfied, IV estimands are not necessarily equivalent to target parameters. 3 Chapter 2 Analysis of Randomized Experiments with Network Interference and Noncompliance 2.1 Introduction Randomized experiments have become a standard tool for causal inference in economics. In analyzing randomized experiments, the traditional approach is based on the Stable Unit TreatmentValue(SUTVA:Rubin(1990))assumptionwhichdictatesthatthereisnointerfer- ence between individuals. However, there are many settings where the SUTVA assumption fails to hold. For instance, deworming treatment given to some student may affect academic achievements of other students through externality effects (See for instance, Miguel and Kre- mer (2004a)). In labor market, Crépon et al. (2013) show that a large-scale job placement program affects non-participant’s employment probability through general equilibrium ef- fects. Ferracci et al. (2014) also report similar results. In such cases, there is interference or spillover effect where an individual’s behavior either directly or indirectly affects others’ outcomes through social interactions, externalities, or general equilibrium effects. 4 In recent years, there has been substantial progress in relaxing the SUTVA assumption in causal inference framework. Examples include Manski (2013), Hudgens and Halloran (2008), Leung (2020b), Vazquez-Bare (2020), and Baird et al. (2018). Much of the literature, however, has been built on the restrictive assumption of perfect compliance to intervention in which experimental units perfectly comply with their assignment of treatment. In practice, noncompliance occurs commonly — some units assigned to treatment group may opt out of the treatment, while some units assigned to control group may decide to take the treatment. In studies of labor market, for example, Crépon et al. (2013) report that only 35% of those who were offered intensive job counseling actually took up the offer. While instrumental variables(IV)methodsarewidelyusedtoaddressthenoncomplianceproblem, thesemethods are developed based on the assumption that rules out interference between units (Imbens and Angrist (1994a)). The goal of this paper is to develop a formal framework to conduct causal inference in randomized experiments with both spillovers and noncompliance. In the presence of non- compliance, spillovers can occur at two stages: at the treatment decision stage, and at the outcome realization stage. In the first stage in which each agent chooses their treatment status, spillovers may occur if the utility from choosing treatment depends on the treatment choices of others. In the second stage where outcomes (or responses) are realized, agent’s outcome can be affected not only by their own treatment choice, but also by treatment choices of others either directly or indirectly. While most of existing literature has only addressed the spillover effects at the outcome level (i.e., at the second stage), we allow for spillover effects both at the treatment choice (first stage) and at the outcome (second stage). Tomodelspillovers, wetakeagame-theoreticapproach. Weconsiderafirststagemodelin whichagentsplayabinarygameofincompleteinformation. Suchbinarygamesofincomplete information have been used in various economic applications, e.g., in empirical industrial or- ganization literature (Bajari et al. (2010)), to model binary choices under peer effects (Brock and Durlauf (2001), Brock and Durlauf (2007) and Xu (2018)), and recently, to model net- 5 work formation process (Leung (2015), and Ridder and Sheng (2020)). We apply the method to the problem of endogenous treatment choices in the presence of spillovers. Specifically, we assume that agents simultaneously choose their treatment status as to maximize their expected utilities, given beliefs about anticipated treatment choices of their neighbors. In equilibrium, agents’ subjective beliefs coincide with objective choice probabilities. Assuming that the unique equilibrium exists, the reduced-form model of agent’s treatment choice can be written as a single threshold-crossing model where the threshold is a function of agent’s own treatment assignment and the average equilibrium treatment choice probability of their neighbors. In the second stage, outcomes are modeled as being a function of agent’s own treatment choice and the equilibrium average treatment choice probability of their neighbors, as it is determined in the first stage game. As in the first stage choice model, spillovers are captured by the equilibrium treatment choice probabilities. In our model, therefore, equilibrium treatment choice probabilities work as a mediator of spillover effects. This is different from the existing literature which often models the spillover at the outcome level by the proportion of treated neighbors. See for instance Hudgens and Halloran (2008), Leung (2020b), and Vazquez-Bare (2020). As we show later, when the outcome of interest represents a choice or behavior of individuals, their formulation implicitly assumes that the proportion of treated neighbors is fully observable to agents, i.e., agents possess a complete information over behaviors of their peers. However, the assumption of complete information is unrealistic especially in a single large network setting as ours where each individual has a considerable number of peers. 1 In such cases, it is more reasonable to assume that agents face uncertainty over others’ behavior, making an incomplete information framework more adequate approximation of reality. We then characterize outcomes as a random coefficient model to allow for general un- observed heterogeneity. Our parameters of interest are average causal effects which include an average direct effect of own treatment take-up and an average spillover effect from direct 1 In our application, for instance, agents have 17 neighbors on average. 6 neighbors. After rigorously defining our parameters of interest, we show our identification result. We first note that under general unobserved heterogeneity, the conventional instru- mental variables (IV) methods do not identify the causal parameters when we allow for general heterogeneity in the outcome. We therefore propose our alternative identification based on a control function approach. We then propose a simple two-step estimator where the first step estimates the payoff parameters of treatment choice games using nested fixed-point maximum-likelihood estima- tion and the second step estimates the average potential outcome functions using control function regression. Our estimator extends canonical Heckman (1979) sample selection es- timator (“Heckit”) to incorporate possible spillover effects. We show that the estimators are p n-consistent and asymptotically normal under the “large-network” asymptotics in which a number of individuals connected in a single network increases to infinity. We study finite- sample properties of our estimators through Monte Carlo simulation. Our methods are applied to the randomized subsidy program of Dupas (2014). While the use of insecticide-treated nets (ITNs) has been shown to be effective in controlling malaria, the rate of adoption remains low. Given that the mosquito nets need to be re-purchased and replaced regularly, understanding the factors affecting household’s short-run and long-run decision to purchase the bednet is an important task to achieve sufficiently high equilibrium adoption rate. In our application, we study the effect of short-run purchase of the bednet on the long-run purchase decision while incorporating possible spillovers from neighbors defined by geographical proximity. The treatment is a binary is a binary indicator for purchasing a mosquito net in the short-run (in Phase 1) and the outcome is a binary indicator for purchasing a mosquito net in the long-run (in Phase 2). We find evidence of positive spillover effects in the short-run bednet purchase decision. More specifically, in Phase 1, households were more likely to purchase the bednet when the average expected purchase rate of their neighbors is higher. On the contrary, we find the evidence of negative spillover effects in the long run although the statistical power is limited. 7 Specifically, households were less likely to purchase the bednet in Phase 2 when the average expectedpurchaserateinPhase1washigher. Ourresultsalsosuggestthattheaveragedirect effect of the bednet purchase in Phase 1 on the purchase in Phase 2 declines monotonically with respect to the expected neighborhood purchase rate in Phase 1. When the Phase-1 neighborhood purchase rate was 0% (no spillover), households who purchased the bednet in Phase 1 were 36.9 percentage points more likely to purchase the bednet in Phase 2 compared to those who did not purchase the bednet in Phase 1. Such effect becomes almost to zero at another extreme where the neighborhood purchase rate was 100% (full spillover). Ignoring spillover effects leads to the misleading conclusion that the average direct effect of the short- run purchase on the long-run purchase is almost zero when in fact, the effect varies from 0% to 36% depending on the degree of spillovers. Our structural modeling allows researchers to analyze the impact of counterfactual poli- cies on the outcome of interest. We illustrate this by analyzing the impact of counterfactual subsidy program on the long-run adoption in which a policy-maker implements a means- tested subsidy rule where the subsidy is given only when the household’s income level is below some pre-specified threshold. We predict the average long-term adoption rate under different subsidy regimes defined by different values of the eligibility threshold. We find that even under the very generous subsidy regime where almost everyone in the sample re- ceives the subsidy, the average long-run adoption rate does not exceed 20%, due to the large negative spillover in the long-run. Related Literature Recentworksoncausalinferenceunderspilloversmainlyconcentrateonthecasewithrandom treatment, i.e., theydonotaddresstreatmentchoiceendogeneity. ExamplesincludeHudgens and Halloran (2008), Leung (2020b), and Vazquez-Bare (2020). In causal inference literature, game-theoretic models have been used in several papers. Lazzati (2015) proposes a structural model of treatment responses using games of complete 8 information. However, the paper does not address the endogeneity of treatment choices. BalatandHan(2019)allowspilloversatbothchoiceandoutcomestagesusinggametheoretic approach. Their model is different from ours in that they model treatment choice by a binary game of complete (perfect) information. Also, Balat and Han (2019) consider an interaction within groups while we consider an interaction under general network. While the assumption of complete information may be appropriate under interactions in a relatively small group, incomplete information assumption is more reasonable under network interactions, especially when the network size is large. Jackson et al. (2020) model treatment choices as a binary game of incomplete information. However, they do not consider spillovers at the outcome level while we are interested in separately identifying the individual treatment effect and spillover effect. Meanwhile a literature from statistics has started to incorporate spillovers and noncom- pliance in network setting. See Imai et al. (2020) for the most recent progress. Unlike our game-theoretic model, their model is reduced-form in nature and consequently, important aspects of economic mechanism behind treatment choices such as utility maximization are largely ignored. Outline We describe our model in Section 2.2. We first outline our model of treatment choices and then the model of potential outcomes. Parameters of interest are also discussed. Section 2.3 discusses identification of parameters of interest. We first show that the conventional IV methods are not valid in the presence of treatment effect heterogeneity. We then show how to use control function approach to achieve point identification. In Section 2.4, we propose a simple two-stage estimation procedure. Asymptotic properties are derived and simulation results are also presented. Section 2.5 applies our methods to empirical setting. 9 2.2 Model of Treatment Choices and Outcomes In this section, we first describe our treatment choice models as a binary game under incomplete information. We then describe our model of treatment responses under spillovers. Let N n =f1; ;ng denote a set of agents. n-many agents are connected through a single, large network. Let G be a symmetric nn adjacency matrix where ijth entry (G ij ) represents a connection or link between agents. Specifically, G ij = 1 if agent i and j are connected and G ij = 0 otherwise. We assume G ii = 0 for all i2 N n (no self-link). When G ij = 1, we say thati andj are (direct) peers or neighbors. LetN i be a set ofi’s peers, i.e., N i =fj2N n :G ij = 1g. The number of i’s neighbors or degree of i is denoted asjN i j. 2.2.1 Treatment Choice Model with Spillovers We consider a game theoretic model of treatment choice. Specifically, we characterize a realized treatment choice as a solution to a binary game under incomplete information played by agents in a given network. In this framework, agents simultaneously choose their treat- ment status in order to maximize their expected utility, given beliefs about the anticipated behaviors of their peers. Utility Each agent i has a vector of observed characteristics X i 2 X and an unobserved utility shockv i 2R. Throughout the paper, we assume thatX is a bounded subset ofR k . In addition, eachi is randomly assigned to treatment. LetZ i 2f0; 1g representi’s randomized treatment assignment where Z i = 1 if i is assigned to treatment and Z i = 0 if i is assigned to control. Let Z = (Z i ) i2Nn and X = (X i ) i2Nn . There is noncompliance if Z6= D, i.e., for some i, the treatment assignment is different from the actual treatment received. There are two possible cases for this: (Z i ;D i ) = (1; 0) and (Z i ;D i ) = (0; 1). The former indicates that i who was assigned to treatment group has refused to take the treatment. The latter indicates that i has received the treatment even when i was assigned to control group. In this paper, we allow for both cases, i.e., we consider a setting with two-sided noncompliance. 10 Unlike Z i , D i is self-selection. We assume that each i chooses D i 2f0; 1g by utility maximization where the utility that i receives depends on the choices of i’s peers. Let the utility function of agent i be (D i ;D i ;X i ;Z i ;v i ) where D i 2f0; 1g n1 is a vector of treatment choices of agents except for i. We specify the utility function as the following linear model: (D i ;D i ;X i ;Z i ;v i ) = 8 > > < > > : X 0 i 1 + 2 Z i + 3 1 jN i j P j2N i D j v i if D i = 1 0 if D i = 0: (2.1) First note that the utility from choosing D i = 0 is normalized as zero. This is without loss of generality as only difference in utilities is identified. Utility of choosing D i = 1 depends on other agents’ treatment choices through the term P j2N i D j =jN i j, the fraction of peers taking up the treatment. This term represents social interactions or spillover effects in treatment choice. When 3 = 0, there are no spillovers and the model becomes a usual single-agent binary choice model as in McFadden (1984). When 3 > 0, we have positive spillovers where the utility of choosingD i = 1 is higher when members ofi’s reference group (directed neighbors in our specification) behave similarly. 3 > 0 thus implies that agents have preference for conformity. On the other hand, when 3 < 0, we conclude that there are negative spillovers in treatment choice. We assume that v i is a private information, i.e., v i is known only to i, and other agents cannot observe v i . Therefore agents have incomplete information over others’ choices. In other words, i cannot observe other players’ treatment choices at the time their choice is made. Instead, each agent i chooses an action that maximizes their expected utility given their beliefs on P j2N i D j =jN i j. Beliefs are formed under the information set available to i. Let i denote i’s information set. We specify i as follows: 11 Assumption 1 (informational structure). Let G = (G ij ) i;j2Nn , X = (X i ) i2Nn and Z = (Z i ) i2Nn . We assume that (G;X;Z) is a public information, i.e., every agent knows the entire network structure (G), the vector of observed characteristics (X) and the vector of treatment assignment (Z). On the other hand, v i is a private information of i where its value is only known to i. Therefore i = (G;X;Z;v i ) summarizes the information available to i. The assumption 1 is standard in the literature on games of incomplete information. Let S = (G;X;Z) be the set of public information. This is often called a public state variable as well. For private information v i , we make the following assumption: Assumption 2 (unobserved heterogeneity). For all i2N n , a private information v i is (i) i.i.d. with a standard normal cdf and (ii) independent of S. As in the standard single-agent binary choice models, distribution of v i must be known up to a finite-dimensional parameter. We use the normal distribution only for convenience. Other distributional assumptions such as logit can be used as well. The assumption that v i ’s are independent to each other is critical for our identification analysis. This assumption implies that the knowledge of v i does not help predicting v j for any j6= i. To our knowl- edge, identification of incomplete information games with correlated private information in a general network setting is an open question. Assumption 2 (ii) is trivially satisfied if we treat S as fixed. Consequently, we do not address the issue of network endogeneity as it is not a focus of this paper. Strategy LetD i ( i ;) denotei’s pure strategy which maps i’s information set i = (S;v i ) to a treatment choice D i 2f0; 1g given a parameter value = ( 1 ; 2 ; 3 ). Agent i chooses her optimal action by maximizing her expected utility E[(D i ;D i ;X i ;Z i ;v i )j i ] where the 12 expectation is taken with respect to D i given her belief about D i . Let j;i be i’s belief over the eventfD j = 1g given the information i . Then j;i = def Pr(D j = 1j i ) (2.2) = Pr(D j ( j ;) = 1j i ) (2.3) = Pr(D j (S;v j ;) = 1jS;v i ) (2.4) = Pr(D j (S;v j ;) = 1) (2.5) = j (S;) (2.6) where the fourth equality follows from the Assumption 2. From the last equality, we see that j;i = j for all i6= j, i.e., every agent shares a common belief on j’s choice. This common belief should be consistent with actual probability of j choosing D j = 1 under rational expectations as we show below. Equilibrium Giventhebeliefprofileoff j (S;)g j6=i , agenticalculatestheexpectedutility he gets when choosing D i = 1 as follows: E (1;D i ;X i ;Z i ;v i )j i = E X 0 i 1 + 2 Z i + 3 1 jN i j X j2N i D j v i S;v i (2.7) = X 0 i 1 + 2 Z i + 3 1 jN i j X j2N i Pr(D j = 1jS) | {z } = j (S;) v i (2.8) = X 0 i 1 + 2 Z i + 3 1 jN i j X j2N i j (S;)v i : (2.9) Agent i would choose D i = 1 if E (1;D i ;X i ;Z i ;v i )j i 0. Therefore, D i =1 n v i X 0 i 1 + 2 Z i + 3 1 jN i j X j2N i j (S;) o : 13 Bayes-Nash equilibrium (BNE) is defined by a vector of choice probabilities (S;) = i (S;) i2Nn that is consistent with the observed decision rule in the sense that it sat- isfies the following system of equations: i (S;) = Pr v i X 0 i 1 + 2 Z i + 3 1 jN i j X j2N i j (S;) ; 8i2N n (2.10) = X 0 i 1 + 2 Z i + 3 1 jN i j X j2N i j (S;) ; 8i2N n : (2.11) Here we use the superscript [] to emphasize that (S;) is an equilibrium quantity. In other words, Bayes-Nash equilibrium given (S;) is a vector (S;) which is defined as a fixed point to the system of equations above. By the implicit function theorem, it can be shown easily that (S;) is smooth in bothS and. Therefore the existence of a fixed point is guaranteed due to Brouwer’s fixed point theorem for any realized data S and parameter value. However, there can be many fixed points (S;) solving the system. We show that a unique equilibrium exists if we restrict the value of 3 to be sufficiently mild. Formally, Theorem 3 (unique equilibrium). Let the pdf of v i be (v). Define =j 3 j sup u (u). For any S and , there exists a unique equilibriumf j (S;)g j2Nn if < 1. SeeappendixA.1forproof. Whenv i isnormallydistributed, wehave sup u (u) = 1= p 2. Therefore < 1 is equivalent toj 3 j < p 2 2:5. Throughout the paper we assume that < 1 so that the degree of interaction is not too strong to breed multiple equilibria. Assumption 4 (unique equilibrium).j 3 j< p 2. 14 Under the unique equilibrium, agent’s treatment choice can be written as the following reduced-form equation: D i =1 n v i X 0 i 1 + 2 Z i + 3 1 jN i j X j2N i j (S;) o (2.12) () D i =1 n (v i ) X 0 i 1 + 2 Z i + 3 1 jN i j X j2N i j (S;) o (2.13) () D i =1 n (v i ) i (S;) o (2.14) where the last step follows from 2.11. Thestorygoeslikethis: ForgivenS and,theequilibriumchoiceprobabilities i (S;);8i2 N n are realized. Observing this equilibrium, each agent chooses their treatment status ac- cording to either 2.12,2.13 or 2.14. 2.2.2 Potential Outcomes Model with Spillovers In this section, we propose our model of treatment response in settings with spillovers. Previous research on treatment response has been based on the SUTVA assumption which requires that an individual’s outcome depends only on their own treatment status. Under the SUTVA assumption, i’s outcome or response Y i can be written as Y i = Y i (D i ). Let d2f0; 1g be the possible treatment value that agents can get. Potential outcome under the SUTVA assumption is denoted by Y i (d), which delivers the response of i when assigned to D i =d. Unlike the SUTVA case, however, there is no obvious way to model spillovers in the treatment response. As Manski (2013) and Kline and Tamer (2020) show, there are many ways to relax the SUTVA assumption, each of which is based on different restrictions on the nature of interference between agents. In our paper, we assume that i’s outcome is a function of a direct effect from own treatment status and an indirect effect or spillover effect fromi’s neighbors. Spillover effects are assumed to be mediated by P j2N i j (S;)=jN i j. For notational simplicity, let us define 15 i (S;) = P j2N i j (S;)=jN i j. Also, i and i (S;) will be used interchangeably. Thus, we write the realized outcome of i as follows: Y i =Y i (D i ; i ) where i = i (S;) is the average of equilibrium treatment choice probabilities of i’s neigh- bor. From now on, we simply refer to i as i’s “neighborhood (propensity) score”. This is the average value of propensity scores of i’s direct neighbors where each score measures the probability of taking up the treatment given the public information S. Jackson et al. (2020) have termed the same object as “peer-influenced propensity score”. Let 2 [0; 1] be the possible value that i can take. The potential outcome Y i (d;) represents i’s response when we exogenously assign D i =d and i =. Concretely, Y i (1;) represents i’s outcome when i is required to be treated and i’s neighborhood score has been exogenously set to . Similarly Y i (0;) isi’s outcome when i is forbidden to be treated and i’s neighborhood score has been exogenously set to . Underlying assumption is that it is possible to manipulate the value of D i and i . Since i is a function of the public state variable S = (G;X;Z), we can conceivably manipulate the value of i by changing Z for a given (G;X), which is assumed to be predetermined and non-manipulable. Thus Y i (d;) can be realized through changingZ profile in the population in a way that it induces i = as an equilibrium in the first-stage and then requiring i to choose D i =d. 2 Comparison to other approaches The existing literature with interference often models potential outcomes as a function of own treatment status and the proportion of treated neighbors or the number of treated neighbors (e.g. Hudgens and Halloran (2008), Leung (2020b), Vazquez-Bare (2020)). Define D i P j2N i D j =jN i j with a generic value d2 [0; 1]. Such models then write the realized outcome as Y i =Y i (D i ; D i ) and the potential outcomes 2 Note that some combination (d;) may represent off-the-equilibrium quantity. Thus, the resulting Y i (d;) may not be a policy-relevant counterfactual. Nevertheless, to define causal effects rigorously, we need to consider every possible combinations of (d;)2f0; 1g [0; 1]. 16 asY i (d; d). Our model differs from theirs in that we model spillovers via ex ante (anticipated) expectation of D i rather than ex post realization of D i itself. Recall that i (S;) =E[ D i jS]. Since the difference between D i and i (S;) has a mean zero (i.e., E[ D i i (S;)jS] = 0), in practice the values of these two quantities may not be too different, especially whenjN i j is large. Nevertheless, they are based on two different behavioral assumptions. Suppose that the outcome of interest represents decision or behavior of agents. Then the formulation Y i = Y i (D i ; D i ) is derived under the assumption that agents base their decisions on D i rather than expected D i . This is realistic only when D i is fully observed at the time decision on Y i is made. Thus, the model could be interpreted as a model with complete or perfect information. On the other hand, our specification Y i = Y i (D i ; i ) assumes that agents do not fully observe D i when they decide theirY i . Thus agents face an intrinsic uncertainty over others’treatmentchoicesevenatthesecond-stage. Thisisplausiblewhenthereferencegroup is relatively large so that it is not easy for agents to fully observe the value of D i . Also, there are settings where agents are reluctant to reveal their treatment status — For instance when treatment represents learning about their HIV status as in Godlonton and Thornton (2012). In such cases, it may be more realistic to assume that agents have private information even in the second stage. Unlike D i , the equilibrium neighborhood score i is always observable to agents as it is a function of public information S. Thus it is plausible that agents base their decisions on the equilibrium quantity i which signals a priori prevalence of treatment adoption in the neighborhood. Random Coefficients Model of Potential Responses We put more structure on Y i (d;) by using random coefficients model where we allow for a correlation between in- dividual treatment status and random coefficients. Therefore our model can be seen as a correlated random coefficient model as in Masten and Torgovitsky (2016) and Wooldridge (2003). 17 Assumption 5 (random coefficient model). (i) For any i2N n , d2f0; 1g and 2 [0; 1], we have Y i (1;) = 1i + 1i ; Y i (0;) = 0i + 0i where ( 1i ; 1i ) and ( 0i ; 0i ) are unit-specific coefficients. (ii) For S = (G;X;Z), unit-specific coefficients satisfy the following restrictions: E[ 1i jS] =E[ 1i jX i ] =X 0 i 1 ; & E[ 1i jS] =E[ 1i jX i ] =X 0 i 1 and similarly, E[ 0i jS] =E[ 0i jX i ] =X 0 i 0 ; & E[ 0i jS] =E[ 0i jX i ] =X 0 i 0 : Recall that Y i (1;) represent i’s response when i is given the treatment and i’s neigh- borhood score had been exogenously set to . Under the Assumption 5 (i), such response is assumed to be linear in with the intercept 1i and the slope 1i that are allowed to be different across agents. Similarly, Y i (0;) is assumed to be linear in with the intercept 0i and the slope 0i . Note that unit-specific coefficients under the treatment, ( 1i ; 1i ), are allowed to be different from those without the treatment, ( 0i ; 0i ) for generality. The assumption that affects the potential outcomesY i (1;) andY i (0;) in a linear way is only for convenience. It is straightforward to extend our model to include higher-order terms such as 2 , e.g., Y i (d;) = d;i + d;i + d;i 2 for d2f0; 1g. Unit-specific coefficients are unobservable random variables that are potentially depen- dent on unit’s observed covariates. By Assumption 5 (ii), we assume that the observed parts of the coefficients depend on the public state variable S = (G;X;Z) only through X i . Importantly, this assumption implies that Z is irrelevant for the random coefficients. This rules out the case that the treatment assignment vector Z = (Z i ;Z i ) directly affects Y i . 18 This is the standard exclusion restriction of instruments. Therefore under this assumption, Z is given a status of an instrumental variable. The assumption that G is redundant is only for convenience as we can always include network statistics such as the number of direct peers in X i . Finally, that the conditional expectation is linear in X i is also for convenience as we can always allow X i to include nonlinear functions of underlying covariates. Under Assumption 5 (ii), we can decompose the unit-specific coefficients into its mean part given X i , and its deviation from mean as follows: 1i =X 0 i 1 +u 1i ; E[u 1i jS] = 0; 1i =X 0 i 1 +e 1i ; E[e 1i jS] = 0: Analogously for D i = 0 as well: 0i =X 0 i 0 +u 0i ; E[u 0i jS] = 0; 0i =X 0 i 0 +e 0i ; E[e 0i jS] = 0: Therefore the potential outcomes can be written as Y i (1;) =X 0 i 1 +u 1i + X 0 i 1 +e 1i ; E[u 1i jS] =E[e 1i jS] = 0; Y i (0;) =X 0 i 0 +u 0i + X 0 i 0 +e 0i ; E[u 0i jS] =E[e 0i jS] = 0; while the observed outcome is given as follows: Y i =Y i (D i ; i ) = 8 > > < > > : X 0 i 1 +u 1i + i X 0 i 1 +e 1i if D i = 1 X 0 i 0 +u 0i + i X 0 i 0 +e 0i if D i = 0 19 Our model contains the four-dimensional error term: i = (u 1i ;e 1i ;u 0i ;e 0i ). By construction, i are uncorrelated with S, i.e., E[ i jS] = 0. By having i , random coefficients are allowed to be heterogeneous even after controlling for relevant observed characteristics X i . The importanceofallowingforsuchunobservedheterogeneityhasbeenemphasizedinthemodern program evaluation literature (See, e.g., Heckman (2001), Heckman et al. (2006) and Imbens (2007)). 2.2.3 Parameters of Interest In this section, we formally define our parameters of interest, the class of average casual effects. For this purpose, let us first study average potential outcomes functions. Average potential outcomes Under our specifications, average potential outcomes for agents with X i =x are computed as follows: for 2 [0; 1], E[Y i (1;)jX i =x] =x 0 1 + (x 0 1 ); E[Y i (0;)jX i =x] =x 0 0 + (x 0 0 ): Integrating them over identically distributed X i gives the unconditional average potential outcomes. Letting X =E[X i ], E[Y i (1;)] = 0 X 1 + ( 0 X 1 ) (2.15) = 1m + 1m ; (2.16) E[Y i (0;)] = 0 X 0 + ( 0 X 0 ) (2.17) = 0m + 0m (2.18) where ( 1m ; 1m ; 0m ; 0m ) = ( 0 X 1 ; 0 X 1 ; 0 X 0 ; 0 X 0 ). Since X is identifiable from the data, identification of ( 1m ; 1m ; 0m ; 0m ) requires one to identify ( 1 ; 1 ; 0 ; 0 ). ( 1m ; 0m ) represent the baseline mean potential outcomes when we set = 0, i.e., ( 1m ; 0m ) = (E[Y i (1; 0)];E[Y i (0; 0)]). Effect of is captured by ( 1m ; 0m ). 20 On the other hand, ( 1 ; 1 ; 0 ; 0 ) measures the heterogeneous effect of X i on the mean potential outcomes. To see this, notice that the following equations hold: E[Y i (1;)jX i =x] = x 0 1 +x 0 1 = E[Y i (1;)] + (x X ) 0 1 +(x X ) 0 1 ; E[Y i (1;)jX i =x] = x 0 0 +x 0 0 = E[Y i (0;)] + (x X ) 0 0 +(x X ) 0 0 : Therefore ford2f0; 1g, ( d ; d ), without constant coefficients parts, explains the difference between E[Y i (d;)jX i =x] and E[Y i (d;)]. Average causal effects Given the average response functions, we now define average causal effects, which are our parameters of interest. Let us define the average direct effect (ADE) of own treatment under as follows: ADE() =E[Y i (1;)Y i (0;)]: ADE() measures the average change in outcomes under the regime in which i is required to chooseD i = 1, compared to the regime in which i is forbidden to choose D i = 1 whilei’s neighborhood score is fixed to . Under our random coefficients specification, ADE() can be written as ADE() = 1m 0m + ( 1m 0m ): Similarly, we define average spillover effect (ASE) from changing the neighborhood score from to ~ for each d2f0; 1g as follows: ASE(; ~ ;d) =E[Y i (d; ~ )Y i (d;)] = (~ ) dm ; 21 which measures the effect of changing the neighborhood score from to ~ while fixing agent’s treatment status atD i =d. Whether 0m = 0 or 1m = 0 is of interest as it indicates whether there are treatment spillovers at the outcome level. 2.2.4 Source of Endogeneity In sum, our model of treatment choices and outcomes can be written as the following semi-triangular system: Y i =Y i (D i ; i ) = 8 > > < > > : X 0 i 1 +u 1i + X 0 i 1 +e 1i i if D i = 1 X 0 i 0 +u 0i + X 0 i 0 +e 0i i if D i = 0 (2.19) D i =1fv i X 0 i 1 + 2 Z i + 3 i g (2.20) s.t. i = X 0 i 1 + 2 Z i + 3 i ; 8i2N n : (2.21) Using the formulaY i =D i Y i (1; i ) + (1D i )Y i (0; i ) =Y i (0; i ) +D i (Y i (1; i )Y i (0; i )), 2.19 can be written as follows: Y i =X 0 i 0 + i X 0 i 0 +D i X 0 i ( 1 0 ) +D i i X 0 i ( 1 0 ) + i (2.22) where i =u 0i + i e 0i +D i u 1i u 0i + i (e 1i e 0i ) : (2.23) Equation2.22givestheconventionallinearregressionmodel. Naturally, onemayconsider estimating ( 1 ; 0 ; 1 ; 0 ) by the least squares regression of Y i on (X i ; i X i ;D i X i ;D i i X i ). Resulting OLS estimator is consistent only when i is uncorrelated with the regressors, i.e., 22 E[ i jD i ;X i ; i ] = 0 which requires that the following two conditions hold: E[u 0i + i e 0i jD i = 0;X i ; i ] = (a) E[u 0i + i e 0i jX i ; i ] = (b) 0; E[u 1i + i e 1i jD i = 1;X i ; i ] = (a) 0 E[u 1i + i e 1i jX i ; i ] = (b) 0 0: Since i = (u 1i ;u 0i ;e 1i ;e 0i ) are uncorrelated withS = (G;X;Z) by construction, (b) and (b) 0 are automatically satisfied. Therefore, we only need to show that (a) and (a) 0 are satisfied. This is true only whenD i is uncorrelated with i conditional on (X i ; i ). This is the familiar selection-on-observables assumption. Such assumption is unlikely to hold if the treatment group and control group are systematically different in their unobserved factors i even after controlling for all relevant observables. Indeed, the very fact that agents with the same observed characteristics (X i ; i ) have made different treatment choices suggests that they differ in their unobserved factors. Thus, the source of endogeneity comes from the correlation between v i and i even after conditional on S. More specifically, note that the selection-on-observables assumption requires that the following two conditions hold: Corr(Y i (0; i );D i jX i ; i ) = 0 (2.24) and Corr(Y i (1; i )Y i (0; i );D i jX i ; i ) = 0: (2.25) Condition 2.24 requires that the idiosyncratic part of Y i (0; i ) is uncorrelated with D i , i.e., in the absence of the treatment, there should be no difference in the mean potential outcomes across treatment group and control group once we account for relevant observables (X i ; i ). However, agents who take up the treatment may have unusual values of Y i (0;) even after controlling for (X i ; i ). If individuals who take up the treatment tend to have 23 higher values of Y i (0;) in terms of unobservables, then the naive least squares regression would suffer from an upward bias sincecov(D i ; i jS)> 0. This is the case of classic selection problem. The requirement 2.25 is also troublesome as the condition implies that the unobserved gain from the treatment given i should not vary across treatment group and control group. This is not satisfied if the treatment choice is correlated with unobserved gains from the treatment. It is plausible that agents have some knowledge of likely idiosyncratic gains from the treatment at the time they choose their treatment status. If agent’s treatment choice is partially based on such knowledge, then 2.25 would not be satisfied. This type of sorting on the unobserved gain, termed “essential heterogeneity" by Heckman et al. (2006), has been emphasized in the modern program literature. Inconclusion, wheneverselectionproblemoressentialheterogeneityexists, thenaiveOLS regression delivers inconsistent estimates of structural parameters ( 1 ; 0 ; 1 ; 0 ). 2.3 Identification In the previous section, we showed that the OLS regression of 2.22 suffers from bias when v i is correlated with i = (u 1i ;u 0i ;e 1i ;e 0i ) even when we control forS. In this section, we first show that the IV methods do not identify the casual parameters of interest in the presence of general heterogeneity. We then propose the alternative method known as control function approach. 2.3.1 The Problem of Conventional IV Methods Endogeneity is often addressed by IV methods such as two-stage least squares (2SLS). In our setup, Z i is a valid IV for D i since (i) D i is correlated with Z i , and (ii) Z i is exogenous and is excluded from the outcome equation. In fact, in the presence of spillovers in the first stage, not only Z i but also n-dimensional vector Z = (Z i ;Z i ) is a valid instrument for D i 24 since in that case, D i is a function of entire assignment vector Z. 3 . Therefore, we may run an IV regression to 2.22 where we instrument D i by Z i or by Z = (Z i ;Z i ), depending on whether spillovers exist in the first stage. We argue that such strategy does not identify ( 0 ; 0 ; 1 ; 1 ) in our setup. Suppose we instrumentD i byZ i . TheresultingIVestimatorisconsistentonlywhentheE[ i jZ i ;X i ; i ] = 0 where i =u 0i + i e 0i +D i u 1i u 0i + i (e 1i e 0i ) as in 2.23. Note that, E[ i jZ i ;X i ; i ] = E[u 0i + i e 0i +D i u 1i u 0i + i (e 1i e 0i ) jZ i ;X i ; i ] = E[u 0i + i e 0i jZ i ;X i ; i ] | {z } A +E[u 1i u 0i + i (e 1i e 0i )jD i = 1;Z i ;X i ; i ] | {z } B Pr(D i = 1jZ i ;X i ; i ) | {z } C : A = 0 since i = (u 1i ;u 0i ;e 1i ;e 0i ) is uncorrelated with S, and thereby with (Z;X i ; i ). C cannotbezeroexceptfortrivialcases. ThereforeE[ i jZ;X i ; i ] = 0onlywhenB = 0. Thisis satisfiedwhenE[u 1i u 0i + i (e 1i e 0i )jD i = 1;Z i ;X i ; i ] =E[u 1i u 0i + i (e 1i e 0i )jZ i ;X i ; i ] as E[ i jS] = 0 implies that the last term is zero. Note that u 1i u 0i + i (e 1i e 0i ) can be interpreted as an idiosyncratic part ofY i (1; i )Y i (0; i ). Therefore we need to assume that D i is uncorrelated with the idiosyncratic gain from taking the treatment once we condition on (Z i ;X i ; i ). Such requirement is unrealistic when agents have some knowledge on their idiosyncratic gains and base their treatment decision on such knowledge, i.e., when there is sorting on unobserved gains. Whether the u 1i u 0i + i (e 1i e 0i ) is correlated with D i is an empirical matter and should not be settled a priori. IV methods rule out the possibility of such correlation and are subject to failure when the correlation exists. This point has also been pointed out in 3 Recall that when there exist spillovers in the first stage choice model, not only i’s direct neighbor’s Z but indirect neighbors’Z also affectD i . ThereforeZ j forj that are eventually connected toi is also relevant for D i . However as the network distance between i and j becomes greater, the dependence between Z j and D i decays exponentially when < 1. (See Xu (2018) and Leung (2020a)). Therefore, using Z j that is too far from i as an IV may incur weak IV problem. 25 the traditional treatment effect literature which rules out spillover effects. (See Hahn and Ridder (2011)). For instance, it is now well established in the literature that IV/2SLS does not recover the average causal parameters such as ATE under the heterogeneous responses model such as random coefficients models (See Imbens and Angrist (1994a)). 2.3.2 Control Function Approach Wenowproposethealternativestrategyknownasthecontrolfunctionapproach. Control function approach addresses the endogeneity problem by explicitly formulating the depen- dence between outcomes and treatments. To apply this method, we first write the observed conditional means E[Y i jD i = 1;S] and E[Y i jD i = 0;S] as follows: E[Y i jD i = 1;S] = E[Y i jD i = 1; i (S;); i (S;);S] = E[Y i (1; i (S;))jv i 1 ( i (S;)); i (S;); i (S;);S] = X 0 i 1 +E[u 1i jv i 1 ( i (S;));S] + i (S;) n X 0 i 1 +E[e 1i jv i 1 ( i (S;));S] o since D i = 1() (v i ) i (See 2.14). Similarly, the observed conditional mean for the control group is, E[Y i jD i = 0;S] = X 0 i 0 +E[u 0i jv i > 1 ( i (S;));S] + i (S;) n X 0 i 0 +E[e 0i jv i > 1 ( i (S;));S] o : The terms E[u 1i jv i 1 ( i (S;));S];E[e 1i jv i 1 ( i (S;));S] 26 and E[u 0i jv i > 1 ( i (S;));S];E[e 0i jv i > 1 ( i (S;));S] are“controlfunctions"whichaccountfortheendogeneityofD i . Assumption6belowrestricts the form of these control functions. Assumption 6. For all i2N n , i = (u 1i ;u 0i ;e 1i ;e 0i ) satisfies the following conditions. (i) i is i.i.d. and is independent of S. (ii) E[ i jv i ] is a linear function of v i . Under these two conditions, we write E[u 1i jv i ;S] =E[u 1i jv i ] = u 1 v i ; E[e 1i jv i ;S] =E[e 1i jv i ] = e 1 v i ; E[u 0i jv i ;S] =E[u 0i jv i ] = u 0 v i ; E[e 0i jv i ;S] =E[e 0i jv i ] = e 0 v i where = ( u 1 ; e 1 ; u 0 ; e 0 ) captures the covariances between each component of i and v i . Assumption 6 (i) is often referred to as “separability" assumption and has been utilized in literature as in Carneiro et al. (2011) and Brinch et al. (2017). Under this assumption, the control functions depend only on the individual propensity score i (S;), e.g., E[u 1i jv i 1 ( i (S;));S] =E[u 1i jv i 1 ( i (S;))] so that the control functions are separated from S. As a result, E[Y i jD i = 1;S] and E[Y i jD i = 0;S] depend on S only though (X i ; i ; i ). This step is necessary since it is not possible to control for S = (G;X;Z) itself as our data consist of one large network. Assumption 6 (ii) further allows us to write E[u 1i jv i 1 ( i )], for instance, as u 1 E[v i jv i 1 ( i )]: 27 Combined with the normality assumption onv i , we effectively assume that ( i ;v i ) are jointly normal. However, it can easily accommodate alternative distributional assumptions on v i other than normality. Under the joint normality assumption, control functions take a form of inverse mills ratio. Define 1 () and 0 () as follows: For 2 (0; 1), 1 () = ( 1 ()) ; 0 () = ( 1 ()) 1 : It follows that E[Y i jD i = 1;S] =X 0 i 1 + u 1 1 ( i ) + i X 0 i 1 + e 1 1 ( i ) ; E[Y i jD i = 0;S] =X 0 i 0 + u 0 0 ( i ) + i X 0 i 0 + e 0 0 ( i ) : Let i = D i 1i + (1D i ) 0i . We see that ( 1 ; 1 ; u 1 ; e 1 ) is identified by regressing Y i on (X 0 i ; i ; i X 0 i ; i i ) 0 usingthesubsampleofD i = 1. Similarly, wecanidentify ( 0 ; 0 ; u 0 ; e 0 ) by regressingY i onX i ; i and their interactions with i using the subsample ofD i = 0. The inclusion of i accounts for the correlation between i and v i so that we can test for the endogeneity of D i by checking whether correlations are collectively zero or not. Our model achieves a point identification by exploiting a functional form assumption between i andv i . We can relax the linearity assumption and have more flexible parametric functional form by adding higher-order terms. For instance, we may specify E[u 1i jv i ] as the quadratic function of v i as follows: E[u 1i jv i ] = u 1 v i + ~ u 1 v 2 i : Then it can be shown that E[u 1i jv i 1 ( i )] = u 1 ( 1 ( i )) i + ~ u 1 h 1 ( i ) ( 1 ( i )) i + n ( 1 ( i )) i o 2 i : 28 This also offers a way to test for linearity assumption in a spirit of Lee (1984). 2.4 Estimation We propose a two-stage estimation procedure. In the first-stage, we estimate the treat- ment choice games using a nested fixed point maximum likelihood (NFXP-ML) method. In the second-stage, using first-stage estimates, we estimate regression models of treatment outcomes with generated regressors. 2.4.1 First-Stage Estimation Recall that the treatment choice models boil down to equation 2.20 subject to the fixed- point requirement 2.28. Our sample log-likelihood function are defined as follows: b L n () = 1 n n X i=1 n D i ln i (S;) + (1D i ) ln(1 i (S;)) o (2.26) Our estimator ^ = ( ^ 1 ; ^ 2 ; ^ 3 ) is defined as the maximizer of b L n () subject to the constraint thatf i (S; ^ )g satisfies the fixed-point requirement. Formally, ^ = arg max 2 b L n () (2.27) subject to i (S; ^ ) = X 0 i ^ 1 + ^ 2 Z i + ^ 3 1 jN i j X j2N i j (S; ^ ) ; 8i2N n (2.28) For computation, we use the nested fixed point (NFXP) algorithm. Specifically, starting with an arbitrary initial guess for ^ , we find the fixed point of 2.28 via contraction iterations (it can be shown that 2.28 is a contraction mapping when < 1). We then compute the 29 log-likelihood function 2.26 using the obtained conditional choice probabilities. Update ^ to ^ 0 according to, say, Newton’s method. Iterate the procedure until a sequence of estimates converges. Our NFXP-ML estimator is taken as its limit. 2.4.2 Second-Stage Estimation Let us define the set of regressors as W i = [X 0 i ; i ; i (S;)X 0 i ; i (S;) i ] 0 where i =D i 1i + (1D i ) 0i with 1i = 1 ( i (S;)) and 0i = 0 ( i (S;)). Our estimators are based on the following moment conditions E[Y i jD i = 1;S] =W 0 i 1 ; E[Y i jD i = 0;S] =W 0 i 0 where 1 = ( 1 ; u 1 ; 1 ; e 1 ) 0 and 0 = ( 0 ; u 0 ; 0 ; e 0 ) 0 . This suggests that 1 and 0 can be estimated by regressing Y i on W i , separately to the subsample with D i = 1 and D i = 0, respectively. However, since i and i are functions of unknown first-stage parameters , we need to replace with ^ . Define ^ 1i = 1 ( i (S; ^ )) and ^ 0i = 0 ( i (S; ^ )). Let ^ i = D i ^ 1i + (1D i ) ^ 0i . Similarly, we replace the unknown quantity i (S;) 1 jN i j P j2N i j (S;) with ^ i = i (S; ^ ) = 1 jN i j P j2N i j (S; ^ ). Thus, our generated regressor ^ W i for W i is ^ W i = [X 0 i ; ^ i ; ^ i X 0 i ; ^ i ^ i ] 0 : 30 Estimator for 1 is then defined as ^ 1 = arg min 1 1 n n X i=1 D i Y i ^ W 0 i 1 2 = n n X i=1 D i ^ W i ^ W 0 i o 1 n X i=1 D i ^ W i Y i : Similarly, estimator for 0 is ^ 0 = arg min 0 1 n n X i=1 (1D i ) Y i ^ W 0 i 0 2 = n n X i=1 (1D i ) ^ W i ^ W 0 i o 1 n X i=1 (1D i ) ^ W i Y i : 2.4.3 Inference For the asymptotic analysis, we consider large-network asymptotics in which a number of individuals connected in a single network goes to infinity. Moreover, for each n, we treat S = (G;X;Z) as fixed. This is justified since S is an ancillary statistics, i.e., S does not contain any information on the parameters of interest. 2.4.3.1 Inference for First-Stage Game We first establish p n-consistency and asymptotic normality of the first-stage estimator ^ . The true parameter is denoted by 0 . Therefore our datafD i g n i=1 is assumed to be generated from D i =1fv i X 0 i 0 1 + 0 2 Z i + 0 3 i (S; 0 )g subject to i (S; 0 ) = X 0 i 0 1 + 0 2 Z i + 0 3 i (S; 0 ) for all i2N n . Theorem 7 (consistency of ^ ). Under the following assumptions, ^ 0 p ! 0. 1. The true parameter 0 = ( 0 1 ; 0 2 ; 0 3 ) lies in a compact set R dim() andj 0 3 j< p 2. The support of X i is a bounded subset ofR k . 31 2. Let R i = (X 0 i ;Z i ; i (S; 0 )) 0 . For large enough n, P n i=1 R i R 0 i is invertible, i.e., lim inf n!1 det( n X i=1 R i R 0 i )> 0: See Appendix A.2.1 for the proof. Assumption 1 ensures that there is unique equilibrium at the true parameter (See The- orem 3) and that each equilibrium probability i (S;)2 (0; 1) for all i. Assumption 2 is the rank condition for identification which requires that for all large enough n. the moment matrix of regressors has full rank. We now establish asymptotic normality of ^ . Let us define the information matrix as follows: I n () =E h 1 n n X i=1 r l i ()r l i () 0 S i wherel i () =D i ln i (S;)+(1D i ) ln(1 i (S;)) is the individual log-likelihood function. Thereforer l i () is given by r l i () =D i r i (S;) i (S;) + (1D i ) r i (S;) 1 i (S;) : (2.29) Theorem 8 (asymptotic normality of ^ ). In addition to the conditions for Theorem 7, assume 1. The true parameter 0 lies in the interior of the compact set R dim() . 2. For any n,I n ( 0 ) is nonsingluar. Then (I 1 n ( 0 )) 1=2 p n( ^ 0 ) d !N(0;I dim() ) (2.30) 32 where I dim() is the dim()dim() identity matrix. See Appendix A.2.2 for proof. Variance Estimation The asymptotic variance of ^ can be estimated by d Var( ^ ) = b I 1 n =n where b I n 1 n n X i=1 r l i ( ^ )r l i ( ^ ) 0 : In order to computer l i ( ^ ) using equation 2.29, we need to evaluater i (S; ^ ). For this we use the numerical approximation method: Take ^ + for a small perturbation (e.g., = 10 5 ), then compute the new equilibriumf i (S; ^ +)g n i=1 by solving the fixed point. r i (S; ^ ) is then computed by ( i (S; ^ +) i (S; ^ ))=. 2.4.3.2 Inference for Second-Stage Regression Next, we establish p n-consistency and asymptotic normality of the second-stage estima- tors (^ 1 ; ^ 0 ). Let us denote the true parameters by ( 0 1 ; 0 0 ). We assume that our model is correctly specified, i.e., Y i satisfies the following conditional moment restrictions: E[Y i jS;D i = 1] =W 0 i 0 1 ; E[Y i jS;D i = 0] =W 0 i 0 0 : We maintain the conditions for p n-consistency and asymptotic normality of the first-stage estimator ^ . Theorem 9 (consistency of (^ 1 ; ^ 0 )). Under the following assumptions, ^ 0 1 0 1 p ! 0 and ^ 0 0 0 0 p ! 0 1. The true parameter 0 1 lies in a compact set 1 R dim( 1 ) . Similarly, the true parameter 0 0 lies in a compact set 0 R dim( 0 ) . 33 2. Let lim inf n!1 det n n X i=1 E[D i W i W 0 i jS] o > 0 and lim inf n!1 det n n X i=1 E[(1D i )W i W 0 i jS] o > 0: See Appendix A.2.3 for proof. Next, we derive the asymptotic results for the second-step estimators. For compactness, we only report results for ^ 1 , as ^ 0 case can be derived in an analogous way. Theorem 10 (asymptotic normality of ^ 1 ). Define n = E[ 1 n n X i=1 D i W i W 0 i jS] n = E[ 1 n n X i=1 D i W i W 0 i 2 1i jS] + E h 1 n n X i=1 D i W i 00 1 r 1 W i ( 0 1 ) S i E h 1 n n X i=1 r l i ( 0 )r l i ( 0 ) 0 S i 1 E h 1 n n X i=1 D i W i 00 1 r 1 W i ( 0 1 ) S i 0 In addition to the conditions for Theorem 9, assume 1. The true parameter 0 1 lies in the interior of the compact set 1 R dim( 1 ) . 2. For any n, n and n are nonsingular. Then we have 1=2 n p n(^ 1 0 1 ) d !N(0;I dim( 1 ) ) where n = 1 n n 1 n . See Appendix A.2.4 for proof. If we ignore first-stage estimation, the asymptotic variance would be 1 n E h 1 n n X i=1 D i W i W 0 i 2 1i jS i 1 n 34 which is smaller, in the positive semi-definite sense, than the correct asymptotic variance 1 n n 1 n . Variance Estimation The asymptotic variance n can be estimated by replacing the population means by sample counterparts. Specifically, ^ n = 1 n n X i=1 D i ^ W i ^ W 0 i ^ n = 1 n n X i=1 D i ^ W i ^ W 0 i ^ 2 1i + 1 n n X i=1 D i W i ^ 0 1 r 1 W i (^ 1 ) 1 n n X i=1 r l i ( ^ )r l i ( ^ ) 1 n n X i=1 D i W i ^ 0 1 r 1 W i (^ 1 ) 0 where ^ 1i =D i (Y i ^ W 0 i ^ 1 ). 2.4.4 Monte Carlo Simulation In this section, we illustrate the finite sample properties of our estimators through sim- ulation exercises. Exogenous Variables For simulation purpose, we imitate the environment of Dupas (2014). The network G is constructed from the GPS data of Dupas (2014). Specifically, two households i and j are considered connected if they live within 500-meter radius. After removing isolated nodes, we have a sample size of 538. The instrumental variable Z is also taken from Dupas (2014) where the binary Z i represents whether i received a high level of subsidy or not. Summary statistics of (G;Z) can be found in the next section. Throughout the simulation replications, G and Z are treated fixed. We do not consider X. 35 Generating Endogenous Variables Treatment choices are determined according to the following equation: D i =1fv i 1 + 2 Z i + 3 i g where v i iid N(0; 1). We set = ( 1 ; 2 ; 3 ) = (2; 1; 1:5) under which the probability of D = 1 is around 0.8. Sincej 3 j < 2:5, there exists a unique equilibrium by the Theorem 3. Given our parameter values, we can compute the unique equilibriumf i (G;Z;)g n i=1 by calculating the fixed point to the following system: i (G;Z;) = 1 + 2 Z i + 3 1 jN i j X j2N i j (G;Z;) ; 8i2N n i is then computed by i (G;Z;) = P j2N i j (G;Z;)=jN i j. Outcomes are realized according to the following rule: Y i = 8 > > < > > : 1i + 1i i if D i = 1 0i + 0i i if D i = 0: We generate the random coefficients according to 1i jv i iid N(2 + 0:3v i ; 1); 1i jv i iid N(1 + 0:4v i ; 1); 0i jv i iid N(4 + 0:2v i ; 1); 0i jv i iid N(3 + 0:2v i ; 1); so that (E[ 1i ];E[ 1i ];E[ 0i ];E[ 0i ]) or ( 1 ; 1 ; 0 ; 0 ) is given as (2; 1; 4; 3). Correlations between ( 1i ; 1i ; 0i ; 0i ) and v i are given by ( 1 ; 1 ; 0 ; 0 ) = (0:3; 0:4; 0:2; 0:2) so that D i is endogenous with respect to all coefficients. Table 2.1 reports the results for the bias, standard errors, and coverage probability for 3000 replications. The target coverage probability is 0.95. As we observe from the first column, our estimators are unbiased. Our estimators perform well in terms of coverage probabilities as well. 36 coeff. bias se cov.prob. FS 1 0.007 0.276 0.948 2 -0.034 0.181 0.937 3 0.026 0.231 0.942 SS 1 0.004 0.277 0.964 1 -0.005 0.530 0.979 0 -0.004 0.333 0.959 0 0.004 0.783 0.972 Table 2.1: Simulation Study with 3000 Simulations 2.5 Application 2.5.1 Background and Data Malaria is a life-threatening infectious disease responsible for approximately 1-3 million deaths per year. Most of these deaths are in children less than five years of age in rural sub-Saharan Africa. The use of insecticide-treated nets (ITNs) has been shown to be a cost-effective way to control malaria. However, the rate of adoption remains low and many households exhibit low willingness to pay (WTP) for ITNs. In addition, positive health externalities generated from using ITNs render the private adoption level that is less than the socially optimal one. For these reasons, public subsidy programs have been proposed to achieve socially optimal coverage rate. While it has been shown that distributing ITNs for free or at highly subsidized prices is effective in increasing the adoption in the short run, there have been concerns that the short-run, one-time subsidies would lower household’s WTPs for the product later, and thus reduce the adoption rate in the long-run. This could happen, for instance, when there exist reference dependence effects in which households anchor their WTPs to previously paid subsidized prices. Consequently, households may be unwilling to pay a higher price for the product later once the subsidies end. On the other hand, some argue that short-run subsidies would be beneficial for the long- run adoption since households could learn the benefits of the product better with prior 37 variable definition mean min max degree number of neighbors 16.41 1.00 38.00 Z 1(high subsidy) 0.27 0.00 1.00 D 1(adoption at phase 1) 0.47 0.00 1.00 Y 1(adoption at phase 2) 0.16 0.00 1.00 female_educ years of educ of female head 5.37 0.00 22.00 wealth wealth level 20367.00 0.00 112273.00 Table 2.2: Summary Statistics (n = 583) experience. Such learning effects would increase consumer’s future WTPs. Moreover, the adoption process can be facilitated with social learning effects in which households learn benefitsoftheproductfromtheirneighbors’priorexperiences. Asaresult,one-timesubsidies would also be beneficial for long-run adoption rate and household’s WTP. Since ITNs need to be regularly replaced and re-purchased, understanding the factors determiningtheshort-runandlong-runadoptiondecisionisanimportanttaskforsustainable public subsidy schemes. Depending on whether reference dependence or learning effects exist, the subsidy schemes would lead to different predictions on the short run and long run demand for ITNs. In this application, therefore, we study the factors affecting the short- run and long-run adoption (purchase) decision of ITNs. In doing so, we allow for possible spillover effects in both short-run and long-run adoption decision. As Dupas (2014) showed, social interactions seem to play an important role in household’s bednet purchase decision. Depending on whether there exist positive or negative peer effects in the short run and in the long run, subsidy effectiveness may vary greatly. Design of Experiment We use data from a two-stage randomized pricing experiment conducted in Kenya by Dupas (2014). In Phase 1, households within six villages were given a voucher for the bednet at the randomly assigned subsidy level varying from 100% to 40% with the corresponding prices varying from 0 to 250 Ksh. In Phase 2, a year later, all study households in four villages were given a second voucher for a bednet. This time, however, all households faced the same subsidy level of 36%. 38 Data Let Z i be a binary indicator representing that household i received a high subsidy (defined as the assigned price less than Ksh 50) in Phase 1. Treatment variable D i equals to 1 ifi purchased a bednet in Phase 1. Y i is also binary taking value 1 ifi purchased a bednet in Phase 2. Following Dupas (2014), we may interpret Y i as a proxy for i’s WTP for the future bednet. Network Using GPS data, we construct the binarized spatial network. Two households i and j are considered connected (i.e., G ij = 1) if they live within 500-meter radius. We also consider 250-m, and 750-m radius. Since the results do not differ much, we only report results for 500-m radius. Other Covariates For household pre-treatment covariates, we consider wealth, and the education level of the female head. Summary statistics of the variables can be found on the Table 2.2. After deleting 25 isolated nodes, we have n = 538 observations from four villages. Figure 2.1: Plot of Estimated ( i ; i ) 39 variable estimates marginal effects p-value spillover () 2.308 0.661 0.000 subsidy 0.694 0.199 0.000 female-educ 0.223 0.064 0.026 wealth 0.005 0.001 0.001 Table 2.3: Estimation Results for FS Model (n = 583) D = 1 estimates p-value D = 0 estimates p-value cons 0.497 0.043 cons 0.128 0.174 female-educ -0.094 0.530 female-educ -0.070 0.519 wealth 0.003 0.388 wealth -0.002 0.325 lambda 0.059 0.767 lambda 0.036 0.841 -0.347 0.324 -0.021 0.940 *female-educ 0.031 0.906 *female-educ 0.176 0.513 *wealth -0.003 0.610 *wealth 0.013 0.098 *lambda 0.317 0.375 *lambda -0.063 0.832 Table 2.4: Estimation results for SS model (n = 583) 2.5.2 Estimation Results Results on the short-run adoption We first estimate the equation for the short-run adoption decision using our game-theoretic model. Table 2.3 displays the estimates of coeffi- cients, marginal effects 4 , as well as associated standard errors and p-values. As anticipated, high-subsidy level is associated with higher adoption of the bednet. Education and wealth are also positively associated with adoption decision in the short run. These variables are all significant at 1 percent level. Figure 2.1 shows the estimated plot of (^ i ; ^ i ) by the value of Z i . The plot shows clearly that individual Z i is relevant for the treatment choice. Our results show strong evidence of the existence of positive spillover effects in the short- run adoption decision. When the average adoption probability of neighbors ( i ) increases by 10 percentage points, i’s short-run adoption probability ( i ) increases by 6.6 percentage points. The resulting conformity effects implies that if we ignore spillover effects in the specification, we would underestimate the full effect of the programs. 4 Marginal effects are computed as the sample average of conditional effects. For instance, the marginal effect of Z i is computed as 1 n P n i=1 (X 0 i ^ 1 + ^ 2 Z i + ^ 3 i (S; ^ )) ^ 2 . 40 Results on the long-run adoption Table 2.4 presents the estimates of own short-run adoption experience (D i ) and average adoption probability of neighbors ( i ) on the long- run adoption decision. Unfortunately, we have very limited statistical power except for few constants due to small sample size. However, in terms of magnitudes, estimated coefficients have implications on the spillover effects in the long-run adoption decision. Using the formula 2.16 and 2.18, we get the following estimated mean response functions: b E[Y i (1;)] = 0:497 0:347; b E[Y i (0;)] = 0:128 0:02 (2.31) First, let us consider b E[Y i (1;)]. Although the coefficient on is not significant, we observe considerable negative spillover effects in terms of magnitude: If increases by 10 percentage points, the probability of the second-period adoption probability decreases by 3.4 percentage points. This is contrary to the positive spillovers observed in the first period adoption decision. 5 One possible explanation for such negative spillovers in the treated response is that they result from positive health spillovers occurring over time. For instance, household with higher value of would anticipate higher coverage rate in their area, which would result in lower malaria prevalence in the long run. This might make households less likely to re-invest the product later. Such results highlight the importance of distinguishing the mechanism of static spillovers from that of dynamic spillovers. Such effects do not seem to apply to the untreated households as b E[Y i (0;)] shows. However, the statistical power is very limited. Average Direct Effect From 2.31, the average direct effect (ADE) of own short-run adoption on the long-run adoption is computed as follows: b E[Y i (1;)Y i (0;)] = 0:369 0:326 (2.32) 5 Dupas (2014) also report similar results from their reduced-form regression models. Their results show that the adoption in Phase 2 is negatively affected by the share of neighbors who received a high subsidy in Phase 1. 41 The result suggests that the values of ADE vary greatly depending on the value of : when = 0, treated households are 36.9 percentage points more likely to invest in the second bednet. However, such effect declines with the neighborhood exposure rate . When = 1, the effect is almost zero. The fact that ADE is positive for all possible values of points to the existence of learning effects from prior experience, rather than reference dependence effects. Bias from ignoring spillovers Suppose that we falsely ignore spillover effects in re- sponses. Using the conventional Heckit model, we obtain the following estimated average treatment effect (ATE): ^ E[Y i (1)Y i (0)] = 0:038: Above result suggests that the effect of D on Y is very limited. However as equation 2.32 shows, there is substantial heterogeneity in the effect of D on Y depending on values of : the effect of D varies from almost 0 percent to 37 percent. Thus, by ignoring the spillover effects, we would draw a misleading conclusion that there is no treatment effect. Observed heterogeneity in effects Let us turn to the effect heterogeneity due to ob- servable covariates, education and wealth. For the treated, the effect of education and wealth on the adoption rate seems to be trivial in magnitude: coefficients are close to zero and their associated p-values are large. We also compute the estimates without covariates. The mag- nitude of the estimates resembles that with covariates. Therefore we do not report the result here. This also suggests that there seems to be little observed heterogeneity in E[Y i (1;)] in terms of education and wealth. On the other hand, forD i = 0 case, the magnitudes of the estimates on the covariates are much higher than those for D i = 1 case. Consider education first. The interaction between and education suggests that higher education is associated with higher spillover effect — 42 one more year of education increases the effect of from0:02 to0:02 + 0:17 = 0:15. Similarly if wealth level increases by 1000 units, the effect on increases by 1:2 percentage point which is significant at 10 percent. Such results suggest that control households with higher education and higher wealth receive higher positive spillover effect. 2.5.3 Impact of Counterfactual Policies One advantage of our structural approach is that it allows researchers to simulate coun- terfactual policies. Suppose that a policy-maker is interested in implementing means-tested subsidy schemes where Z is determined according to the following rule: Z i =1fwealth i g; 8i2N n (2.33) i.e., householdi gets high subsidy only when their wealth levelisbelow some specified thresh- old. The question is: what would be the expected outcome under this new, counterfactual subsidy rule? This problem is related to the literature on the policy-relevant treatment effects (PRTE: Heckman and Vytlacil (2001)). In this framework, each intervention or policy is defined by a manipulation on the exogenous variable S = (G;X;Z). In our setup, we assume that a policy maker has no means of changing the underlying network structureG or pre-treatment covariatesX. Thus, the only way to changeS is through changingZ. Let us denote the new counterfactual policy asS new = (G;X;Z new ) where we set the value ofZ asZ =Z new , which is not in the data. i’s expected outcome under the new policy is given as E[Y i jS = S new ]. Note that for any S, E[Y i jS] =E[Y i jD i = 1;S]Pr(D i = 1jS) +E[Y i jD i = 0;S]Pr(D i = 0jS) (2.34) 43 Under our control function specification, E[Y i jS] can be written as follows:: E[Y i jS] = i (S) h X 0 i 1 + 1 ( i (S)) + n X 0 i 1 + 1 ( i (S)) o i (S) i +(1 i (S)) h X 0 i 0 + 0 ( i (S)) + n X 0 i 0 + 0 ( i (S)) o i (S) i = E[Y i jX i ; i (S); i (S)] Note that E[Y i jS] is a function of S only through (X i ; i (S); i (S)), thus we write: E[Y i jX i ; i (S); i (S)]: i’s expected outcome under new policy is then given by E[Y i jX i ; i (S new ); i (S new )]. To estimate this, we first need to compute the new equilibrium choice probabilities: f i (G;X;Z new )g i2Nn whereZ new is determined according to 2.33. Under the identified first- stage parameters, this is done by solving the new fixed point of the best-response functions under the new data set S new = (G;X;Z new ). We then estimate ^ Y i ^ E[Y i jX i ; i (S new ); i (S new )] for each i2 N n using the formula above. Overall impact of policy S new is computed by P n i=1 ^ Y i =n. Results See 2.2. The red line shows the effect of on the overall long-run adoption level when we ignore interference effects. In such case, as increases, the long-run adoption level increases monotonically. This is because as increases, more households get subsidy, and without interference, treated agents are more likely to adopt in the long-run. In the presence of spillovers, the effect of does not increase monotonically anymore as the blue line shows. Higher also induces higher i which affect long-run adoption negatively. Therefore a priori, we cannot expect that higher would give higher overall 44 Figure 2.2: Counterfactual Impact of Means Tested Subsidy on LR-Adoption long-run adoption rate in the population. In fact, as the blue line shows, the highest long- run adoption rate is achieved under the subsidy scheme targeting the very lowest percentile households. The result also highlights complication involved in the use of subsidies to increase long- run adoption rate. As the result shows, the highest expected coverage is only 17 percent. 45 2.6 Concluding Remarks In this paper, we propose a new methodological framework to analyze randomized experi- ments with spillovers and noncompliance in a general network setup. Using a game-theoretic framework, we allow for spillover effects to occur at two stages: at the choice stage and out- come stage. Potential outcomes are modeled as a random coefficient model to account for general unobserved heterogeneity. We extend the traditional control function estimator of Heckman (1979) to incorporate spillovers. Finally, we illustrate our methods using Dupas (2014) data and show that our model can be used to evaluate the counterfactual policies. In our treatment choice games, we assumed that private information is independently distributed across agents. Relaxing this assumption to allow for network dependence in privateinformationwouldbearewardingtask. Anotherimportantissueismultipleequilibria – formalizing a problem of policy evaluation and counterfactual prediction in the presence of multiple equilibria is important for realistic policy design. Finally, we conclude by noting that our model can be used to derive an ex ante optimal treatment assignment rule under interference, especially in settings where a social planner should take possible noncompliance and spillover into account. 46 Chapter 3 Identification of Causal Effects in Cluster Randomized Experiments with Spillover and Noncompliance: Difference-in-Differences Approach 3.1 Introduction Cluster randomized controlled experiment (RCT hereafter) is widely applied in the eco- nomics and other social science fields ((Miguel and Kremer, 2004b); (Bloom, 2005)). In practice, partial compliance to intervention where some individuals do not take up the in- tended intervention occurs frequently in cluster-RCT. For instance, see (Miguel and Kremer, 2004b) and (Del Rosso and Marek, 1996). In such case, researchers often estimate the intent- to-treat (ITT) to identify the impact of intervention assignment. However, researchers are often interested in knowing the effect of the treatment take-up, rather than that of the treat- ment assignment. This is particularly true if the purpose of the research is to understand the 0 Joint Work with Yiwei Qian 47 effect of the intervention that could be delivered by policy-makers with different experiment designs, i.e., when one is interested in the impacts of counterfactual policy ((Duflo et al., 2007)). To identify the effect of treatment take-up in the settings with partial compliance, in- strumental variable (IV) methods have been widely used. Specifically, (Imbens and Angrist, 1994b)) showed that IV estimator identifies the local average treatment effect (LATE). How- ever, as we show later, the traditional IV estimator does not identify any causal parameter when the nature of interventions in cluster-RCTs introduces interference within clusters. This could happen, for instance, when there are spillover effects among individuals in the same cluster or when there is a general equilibrium effect. In such cases, the exclusion re- striction of the treatment assignment is no longer satisfied making traditional IV estimators invalid. Inthispaper, wesystematicallystudyhowtodefinecausaleffects, andwhateffectscanbe identified in the cluster-RCTs with interference and noncompliance under simple identifying assumptions. Our focus is to identify the treatment effect for the treated as opposed to the general ITT effect. We show that in the general potential outcomes framework under interference, the usual LATE, taken as the treatment effect of the compliers (or treatment effect of the treated under one-sided noncompliance), is not well defined. Instead, we propose a simple DID-style estimator to estimate the ITT of compliers——the treatment effect of the treated in this case—— as long as we have baseline (pre-treatment) information on the outcomes and the case of one-sided noncompliance. In addition, our estimator also identifies the ITT of never-takers, which captures the potential spillover effects or general equilibrium effects introduced by the cluster-level intervention. As an empirical application, we apply our estimator to a cluster-RCT microcredit pro- gram in Morocco ((Crépon et al., 2015)). Given the possibility that the uptake of the mi- crocredit by households could introduce spillover effects to non-takers, we estimate the ITT effect of the compliers and the never-takers. We found the ITT effects for the never-takers 48 are significantly different from zero, confirming the existence of spillovers. Besides, the aver- age treatment effect of the treated measured by the ITT effect for compliers is smaller than the LATE estimates proposed in (Crépon et al., 2015), suggesting the differences between the estimates stem from falsely ignoring spillover effects. Our research makes two contributions to the literature. First, we propose an alternative approach with minimal assumptions to estimate the treatment effect of the treated when the exclusion restriction assumption of the treatment assignment fails. Previous literature has approached this problem with partial identification approach ((Flores and Flores-Lagunes, 2013); (Mealli and Pacini, 2013)) or Bayesian approach ((Imbens and Rubin, 1997); (Rubin and Zell, 2010)). The advantage of our approach is that our key identifying assumption is mild and testable. Second, we contribute to the literature in estimating the spillover effect of intervention programs when there is no information available to identify counter-factual never-takers. With the aid of eligibility rule of intervention take-up from the programs, (Lalive et al., 2015) and (Angelucci and De Giorgi, 2009) were able to construct counter- factual never-takers to identify the spillover effect of the unemployment insurance and condi- tional cash transfer, respectively. Our approach can be applied to the program even without any specific intervention eligibility rule. Theoutlineofthispaperisasfollows. Insection3.2, wepresentoursetupandparameters of interests; in section 3.3, we show our identification assumptions and strategies; in section 3.4, we explain our estimation strategy; finally, we apply our estimator to a microcredit program and test the key identifying assumption empirically in section 3.5. 3.2 Setup and Parameters of Interest 3.2.1 Setup and Potential Outcomes Framework We consider a setting where there are G non-overlapping clusters such as classrooms and villages. Each cluster is indexed by g2f1; ;Gg. For each cluster g, we observe a 49 random sample of individuals indexed by i = 1; ;n g . The sample size is N = P G g=1 n g . Let Z g 2 f0; 1g indicate a binary policy intervention or program at the cluster level g. Specifically, Z g = 1 if the cluster g receives an intervention (i.e., treated cluster) and Z g = 0 otherwise (i.e., control cluster). Everyone in treated clusters is eligible to receive the treatment while everyone in control clusters is ineligible to receive the treatment. In the presence of noncompliance, however, the actual treatment receipt can be different from the intended intervention. For instance, even when the microfinance program was introduced in the village, some households may not borrow from the program. Let D ig 2f0; 1g denote the actual treatment received by individual i in a cluster g where D ig = 1 if i takes up the treatment and D ig = 0 otherwise. When Z g 6=D ig for some i in village g, we say there is a noncompliance problem. Outcome of interest is denoted by Y ig . Our aim is to assess causal effects of treatment take-upD ig on the outcomeY ig while allowing for possible spillover effects across individuals within the same cluster. Throughout the paper, we assume that the clusters are i.i.d. which imply that there is no spillover effect across different clusters. This is justified, for instance, when clusters are located far away from each other to affect one another. To rigorously define causal effects, we use the potential outcomes framework. Let z g 2 f0; 1g be the possible values that Z g can take. Potential treatment is denoted by D ig (z g ). Specifically, D ig (0) denotes the treatment status of i when the i’s village was allocated as the control village, whileD ig (1) denotesi’s treatment status wheni is in the treated village. In a perfect compliance case, we have D ig (1) = 1 and D ig (0) = 0 for all i. There is a noncompliance if D ig (0) = 1 or D ig (1) = 0 for some i. Observed treatment D ig can be written as D ig =Z g D ig (1) + (1Z g )D ig (0). Recall that in the presence of noncompliance, D ig 6= Z g for some i. In other words, the assigned treatment is different from the actual treatment receipt. Letd ig 2f0; 1g be the possible value thatD ig can take. We model the potential outcome for Y ig by Y ig (z g ;d ig ), under which we have four potential outcomes for each i in the cluster 50 g: fY ig (1; 1);Y ig (1; 0);Y ig (0; 1);Y ig (0; 0)g. In our formulation, the intervention assignment variable Z g is allowed to affect Y ig directly. In other words, we do not require the exclusion restrictionofZ g . Theexclusionrestrictionisunlikelytoholdwhenthereisaspillovereffector general equilibrium effect of the policy intervention. For instance, even when the individual is not treated, the intervention itself may affect her outcome through general equilibrium effect or through direct interference from social interaction. In such case we would have Y ig (1; 0)6= Y ig (0; 0), i.e., the potential outcome of i in the counterfactual world where i’s village is assigned to the treated villages will not be the same as the potential outcome in the counterfactual world wherei’s village is assigned to the control villages, even wheni does not take up the treatment in both cases. Similarly, if there is a general equilibrium effect, we would have Y ig (0;d ig )6=Y ig (1;d ig ) for any d ig . This happens for instance when the large scale intervention changes the market prices in the village such as wages or interest rates. 1 3.2.2 Parameters of Interest Throughout the paper, we maintain the following assumptions: Assumption 11 (Independence). Program assignmentZ g is jointly independent to potential outcomes. Formally, Z g ? Y ig (z g ;d ig );D ig (z 0 g ) ; 8d ig ;z g ;z 0 g : This assumption is trivially satisfied when Z g is randomly assigned to clusters as in a case of randomized experiments. Assumption 12 (one-sided noncompliance). Throughout the paper, we assume that there is only one-sided noncompliance where D ig (0) = 0 for all i. The Assumption 12 restricts our attention to the setting where anyone in the control cluster cannot assess the treatment. In many applications, this is satisfied by design. For 1 We note that we do not try to identify exact mechanism of spillover effect. In other words, we do not attempt to distinguish whether the spillover effect comes from general equilibrium effect or from social interactions/peer effects etc. 51 instance, in our application of microfinance, there is no way that an individual in control villages can access the microfinance. We can classify each individual in terms of their potential treatments values. In the terminology of (Angrist et al., 1996), let us define the compliance type in the following way: T ig =co () (D ig (0);D ig (1)) = (0; 1); T ig =nt () (D ig (0);D ig (1)) = (0; 0): Under the Assumption 12, the population consists only of compliers (co) and never-takers (nt). While we cannot identify the type of each individual as we only observe one of two potential treatment status, we can identify the proportions of types in the population. Let us define co = def Pr(T ig =co) and nt = def Pr(T ig =nt). They are identified by nt =Pr(D ig = 0jZ g = 1); co = 1 nt since Pr(D ig = 0jZ g = 1) =Pr(D ig (1) = 0) =Pr(T ig =co). Many empirical papers have largely focused on the effects of Z g on Y ig by focusing on intent-to-treatment (ITT) effect which is defined as follows: ITT = def E[Y ig (1;D ig (1))Y ig (0;D ig (0))]: With exogenous Z g , ITT is easily identified by E[Y ig jZ g = 1]E[Y ig jZ g = 0], i.e., the difference in mean outcomes between the treated clusters and the control clusters. However, in our setup, such ITT effect identifies the weighted averages of two distinct effects as the following derivation shows: 52 ITT = E[Y ig (1;D ig (1))Y ig (0;D ig (0))] = E[Y ig (1;D ig (1))Y ig (0; 0)] (* Assumption 12) = E[Y ig (1; 1)Y ig (0; 0)jD ig (1) = 1] | {z } ITT(co) Pr(D ig (1) = 1) +E[Y ig (1; 0)Y ig (0; 0)jD ig (1) = 0] | {z } ITT(nt) Pr(D ig (1) = 0): Therefore, the ITT is the weighted averages of two different effects, ITT (co) and ITT (nt) where ITT(co) or intent-to-treatment effect for the compliers is defined as follows: ITT (co) = E[Y ig (1; 1)Y ig (0; 0)jD ig (1) = 1] = E[Y ig (1; 1)Y ig (0; 0)jT ig =co]: With one-sided noncompliance, ITT (co) is equivalent to E[Y ig (1; 1)Y ig (0; 0)jD ig = 1]. Therefore ITT (co) measures the average treatment effect on the treated (ATT). On the other hand, ITT (nt) or intent-to-treatment effect for the never-takers is defined as follows: ITT (nt) = E[Y ig (1; 0)Y ig (0; 0)jD ig (1) = 0]: ITT (nt) measures the spillover effect of the intervention for never-takers. Our purpose is to separately identify ITT (co) and ITT (nt) from the overall ITT effect which is an important task as it sheds light on the treatment effect heterogeneity: even when the overall ITT effect is positive, ITT(co) can be negative depending on the spillover effect captured by ITT(nt). In such case using the overall ITT estimates to infer ATT which is often done in the literature, can be misleading. To fully understand the impact of the intervention, therefore, it is essential to identify ITT (co) and ITT (nt) separately. 53 Unfortunately, unlike the overall ITT effect which is identified by the simple mean com- parison across treated and control villages, local ITT effects such as ITT (co) and ITT (nt) are not identified without further assumptions. 2 . To see this point, note that E[Y ig jZ g = 1;D ig = 0] = E[Y ig (1; 0)jnt] (3.1) E[Y ig jZ g = 1;D ig = 1] = E[Y ig (1; 1)jco] (3.2) E[Y ig jZ g = 0;D ig = 0] = co E[Y ig (0; 0)jco] + nt E[Y ig (0; 0)jnt]: (3.3) From 3.1 and 3.2, we see that E[Y ig (1; 0)jnt], and E[Y ig (1; 1)jco] are identified. However, 3.3 shows that we cannot separate E[Y ig (0; 0)jco] from E[Y ig (0; 0)jnt]. Therefore ITT (co) = E[Y ig (1; 1)Y ig (0; 0)jco] is not point identified without imposing an additional assumption. One possible assumption is to assume that E[Y ig (0; 0)jco] = E[Y ig (0; 0)jnt], i.e., in the absence of intervention, never-takers and compliers have the same mean potential outcomes. Undersuchassumption,3.3impliesthatwecanidentifyE[Y ig (0; 0)jco]byE[Y ig jZ g = 0;D ig = 0]. However, this assumption is very strong as it imposes the homogeneity of treatment effect across different compliance types. The fact that compliers and the never-takers select into different treatments suggests that they may be systematically different in their potential outcomes, even when we control for relevant characteristics. Our aim, therefore, is to come up with a milder assumption to identify the ITT (co) and ITT (nt). In the next section, we propose a way to achieve point identification by leveraging the baseline information on the outcome variable. 3.3 Identification Using Baseline Outcome We incorporate baseline information to facilitate our identification. Let us assume that there are two time periods for simplicity. Denote T = t for pre-treatment period and T = 2 While (Angrist et al., 1996) show that the Wald ratio identifies ITT (co) when the exclusion restriction is satisfied. This is no longer true when the exclusion restriction is not satisfied as our case. See Appendix B.1 for the proof 54 t + 1 for post-treatment period. Outcome variables are denoted by (Y t ig ;Y t+1 ig ). Randomized program assignment dummy is denoted by (Z t g ;Z t+1 g ) where Z t g = 0 for all clusters g. On the other hand, in the post-treatment period, Z t+1 g = 1 for treated cluster g while Z t+1 g = 0 for control cluster g. Potential treatment status in two periods are denoted by D t ig (z t g ) and D t+1 ig (z t+1 g ). Since there is no intervention in the pre-treatment period, we have D t ig (z t g ) = 0 for all i. We assume that the baseline outcome variable Y t ig = Y t ig (0; 0) is observed for all observations. Our parameters of interest are ITT t+1 (co) =E[Y t+1 ig (1; 1)Y t+1 ig (0; 0)jco] and ITT t+1 (nt) =E[Y t+1 ig (1; 0)Y t+1 ig (0; 0)jnt] which measure the local effect of the intervention for compliers and never-takers in the post- treatment period. Point identification of ITT t+1 (co) is not possible without a further assumption since Y t+1 ig (0; 0) is a counterfactual outcome for compliers. Similarly, the point identification of ITT t+1 (nt) requires an information on E[Y t+1 ig (0; 0)jnt] which is not reveled by the data To identify this, we impose an equal-trend assumption: Assumption 13. (Equal-Trend) E[Y t+1 ig (0; 0)Y t ig (0; 0)jnt] =E[Y t+1 ig (0; 0)Y t ig (0; 0)jco] =E[Y t+1 ig (0; 0)Y t ig (0; 0)] Under the equal-trend assumption, the common trend E[Y t+1 ig (0; 0)Y t ig (0; 0)] is iden- tified by E[Y t+1 ig Y t ig jZ t+1 g = 0]. That is, for those who were never given a treatment encouragement, changes in their outcomes is taken to be the common trend in the absence of intervention. 55 Lemma 1. (Identification of E[Y t+1 ig (0; 0)jco]) Under the Assumption 13, the counterfactual outcome E[Y t+1 ig (0; 0)jco] is identified by E[Y t+1 ig (0; 0)jco] = E[Y t+1 ig Y t ig jZ t+1 g = 0] +E[Y t ig jD t+1 g = 1;Z t+1 g = 1]: The Lemma shows that can be identified by the changes in outcomes in the control villages, plus Theorem 14. (Identification of ITT t+1 (co)) ITT t+1 (co) is identified by ITT t+1 (co) = def E[Y t+1 ig (1; 1)Y t+1 ig (0; 0)jco] = E[Y t+1 ig jD t+1 ig =Z t+1 g = 1] n E[Y t+1 ig Y t ig jZ t+1 g = 0] +E[Y t ig jD t+1 ig = 1;Z t+1 g = 1] o = E[Y t+1 ig Y t ig jD t+1 ig =Z t+1 g = 1]E[Y t+1 ig Y t ig jZ t+1 g = 0]: The ITT t+1 (co) equals to the change in the outcome for compliers after netting out the common trend. It measures the overall effect consists of a direct effect of treatment and a indirect effect of treatment assignment. Theorem 15. (Identification of ITT t+1 (nt)) We can also identify the local ITT of never-takes in the post-period, ITT t+1 (nt) = def E[Y t+1 ig (1; 0)Y t+1 ig (0; 0)jnt]: This is the effect of the treatment assignment on the outcome for never-takers. Note that when the exclusion restriction is satisfied, we should have ITT t+1 (nt) = 0. Therefore, we can test the plausibility of exclusion restriction with the estimates of the ITT t+1 (nt). 56 Our DID-style estimator is different from the traditional DID estimator where exclusion restriction is assumed to be satisfied 3 . Specifically, in usual DID setting, the common trend assumption that requires the trends between the treated sample and the not treated sample are the same is obtained fromE[Y it+1 Y it jD it+1 = 0]. Whereas we assume the trend between thenever-takersandcompliersarethesame. Thecommontrendisestimatedwiththesample with the control assignment, i.e. Z t+1 = 0. Never the less, our DID-style estimator has a similarity with the traditional DID. If we have more than two periods of observations before the intervention as in many traditional DID application, we can evaluate the validity of our equal-trend assumption with observed compliers and never-takers in the treatment group. 3.4 Estimation In the last section, we have shown that we can identify ITT t+1 (co) = E[Y t+1 ig (1; 1)Y t+1 ig (0; 0)jco] ITT t+1 (nt) = E[Y t+1 ig (1; 0)Y t+1 ig (0; 0)jnt]: Using the sample analog, such effects can be estimated from data conditional on covariates usingnonparametricestimationmethod(e.g., matching. cite). Inthissection, weinsteaduse a regression-based estimation approach given convenience and popularity in applied works. Let us denote the potential outcome by Y zd igt =Y igt (z;d). We specify4Y 00 ig;t+1 Y 00 ig;t+1 Y 00 igt , the potential outcome changes without any intervention by the following linear model: 4Y 00 ig;t+1 =X 0 ig +4u ig;t+1 : 3 We show the proof in Appendix B.2 57 CovariatesX ig mayincludepre-treatmentvillage-specificvariables. Undersuchspecification, our identifying DID assumption becomes E[4u ig;t+1 jX ig =x;nt] =E[4u ig;t+1 jX ig =x;co]: That is, after controlling for X ig and comparing for villages in the same pair, idiosyncratic changes are the same across never-takes and compliers without intervention. Let us denote the various treatment effects as Y 11 igt = Y 00 igt + igt Y 10 igt = Y 00 igt + igt : Our parameters of interest then become ITT t+1 (nt) = E[ ig;t+1 jnt] and ITT t+1 (co) = E[ ig;t+1 jco]. Observed changes in outcome is given as 4Y ig;t+1 = Z g;t+1 D ig;t+1 (Y 11 ig;t+1 Y 00 ig;t ) +Z g;t+1 (1D ig;t+1 )(Y 10 ig;t+1 Y 00 ig;t+1 ) +(1Z g;t+1 )(Y 00 ig;t+1 Y 00 ig;t ) = 4Y 00 ig;t+1 +Z g;t+1 (Y 10 ig;t+1 Y 00 ig;t+1 ) +Z g;t+1 D ig;t+1 (Y 11 ig;t+1 Y 10 ig;t+1 ) = 4Y 00 ig;t+1 + ig;t+1 Z g;t+1 + ig;t+1 Z g;t+1 D ig;t+1 = X 0 ig + ig;t+1 Z g;t+1 + ig;t+1 Z g;t+1 D ig;t+1 +4u ig;t+1 : Since there are only two periods, without loss of generality, we can write it as 4Y ig =X 0 ig + ig Z g + ig Z g D ig +4u ig : 58 Let us denote nt E[ ig jnt] and co E[ ig jco]. Let us write 4Y ig = X 0 ig + nt Z g + co Z g D ig + ig where ig 4u ig + ( ig nt )Z g (1D ig ) + ( ig co )Z g D ig so that E[ ig jX ig ;D ig ] = 0: SinceE[4u ig jX ig ;Z g ;D ig ] = 0byDIDassumptionandE[( ig nt )Z g (1D ig )jX ig ;Z g ;D ig ] = 0 when Z g = 0 or D ig = 1. When (Z g ;D ig ) = (1; 0), E[ ig nt jZ g = 1;D ig = 0] = E[ ig nt jnt] = 0. Therefore the least square estimators to the regression model 4Y ig = X 0 ig + nt Z g + co Z g D ig + ig Inference is standard so we omit this part. Note that we need to use clustered standard errors. 3.5 Empirical Application In this section, we estimate the local Intent-to-Treat (ITT) effect for compliers of mi- crocredit uptake in rural Morocco. (Crépon et al., 2015) evaluated a micrcredit program, operated by Al Amana, introduced in rural areas of Morocco in 2006. The program is a pair-wise cluster randomized controlled trial where the intervention took place at the village level. One village is randomly assigned to the treatment group and one village to the control group in each pair. 81 villages were randomly selected as treated villages while 81 villages 59 were selected as control villages. In the treatment village of the program, every household in treated villages was given access to microfinance program where households can borrow a loan at the low-interest rates. Thirteen percent of the households in treatment villages took a loan, while households in the control villages were not able to access the same intervention. This context is in line with assumptions of our analytic framework where the non-compliance is one-sided in a cluster randomized controlled trial. We use the online data of (Crépon et al., 2015) to provide the following analysis. We replicate the summary statistics of the whole sample in (Crépon et al., 2015) in Table C1. We also provide the summary statistics of non- attrited sample and the non-attrited sample with high probability to borrow in the Table C2 and Table C3, respectively. 3.5.1 The Empirical Specification of ITT for Compliers and Never- takers Previous literature on evaluation of microcredit program has focused only on ITT effect whichidentifiestheeffectof"microcreditaccess"ratherthantheeffectofactual"microcredit take-up". Due to the potential spillover of microcredit takeup, it is likely that exclusion re- striction does not hold. Therefore, the previous literature, rightly, avoids using the standard LATE framework to identify the effect of take-up of microcredit. Though (Crépon et al., 2015) estimated the externalities of borrowing and argued the externalities of borrowing are not the main driving force for the program effect, they only claimed their LATE estimates are suggestive. We use our DID assumption to point identify the ITT effect for compliers. Though the ITT effect for compliers can be interpreted as the impact of microcredit uptake on compliers, it is not the same as LATE as we have shown above. We make some adaptions to the data in the need of the assumptions enlisted above. Since we have yet incorporated the cases of attrition to our framework and (Crépon et al., 2015) argued attrition was not a major concern in this program, we only use the non-attrited sampleforthefollowingspecification. Sinceourapproachreliesonobservationsofindividuals 60 from two periods, we exclude the sample collected only from the endline. We also add pair- specific fixed effect as the program is a pair-wise cluster randomized controlled trial. Our identification in this empirical application relies on the assumption that the parallel trends over time between the never-takers and compliers within each pair of villages. Specifically, E[4u ig;t+1 jX ig =x;P g =p;nt] =E[4u ig;t+1 jX ig =x;P g =p;co] Let P g be an index for the village pair, i.e., P g = 1; 2; ; 81. Since the program is based on the pair-wise randomization, we are assuming that time-effects are the same only for two villages in the pair. Following on our assumption, the main specification for the estimation of the ITT (co) and the ITT (nt) is as follows: Y i;j;g;t+1 =X 0 i;j;g;t+1 + nt Z j;g;t+1 + co Z j;g;t+1 D i;j;g;t+1 +f Pg + i;j;g;t+1 (3.4) where i stands for individual, j for villages, g for village pairs; the treatment assignment Z j;g;t+1 was assigned at village level;X 0 i;j;g;t+1 time-variant covariates that are not likely af- fected by the uptake of microcredit and the treatment assignment are controlled; f Pg pair- specific fixed effect controlled for the pair-specific time-variant changes. 3.5.2 Estimates of ITT for Compliers and Never-takers The ITT estimates for all compliers indicates that the uptake of the microcredit has a substantial effect on the households. Asset value increased by 4846, significant at 1 % level; the sales and home consumption combined increased by 25715, significant at 1% level; expenses increased by 16886, significant at 5% level, and the profit of home business also increases by 8138, significant at 10% level (Table 1, Panel A, Row 1, Column 1, 2, 3, 4). In addition, the microcredit program increased the self-employment by 5 hours per week for compliers, significant at 10% level (Table 1, Panel A, Column 8). 61 Besides, we also find the never-takers were also impacted by the program though they didn’t borrow from the program. The program increased the sales and home consumption combinedandexpensesby4819and3294, respectively, bothsignificantat5%(Table1, Panel A, Row 2, Column 2, 3). Never-takers also decreased their monthly household consumption by 104. The statistical significant impact on these dimension of outcomes for never-takers even if they did not borrow from microcredit program indicates the impact of uptakes of microcredit might have a general equilibrium effect or spillover effects within the treatment village. As we discussed in section 2, the significant ITT effects on never-takers suggest that the exclusion conditions are likely to fail in this context. We also apply the same estimation on the non-attrited sample that has a high probability to borrow as (Crépon et al., 2015) suggested. We find the estimates are slightly higher but still in similar magnitude. It is consistent with the logic that this group of individuals is more likely to borrow and borrow with higher amounts. We estimate the ITT effect for compliers and never-taker with a few variations of specifi- cations, results from all specifications are robust. In Table C4, we show when not controlling individual baseline characteristics, the estimates are almost the same as estimates in Table 1. Similarly, when not controlling for village pair fixed effect (Table C5), we still have similar estimates as Table 1. 3.5.3 Tests of the Equal-trend Assumption Since our estimator only identify the treatment effect on the treated under the equal- trend assumption (A3), it is crucial for us to validate our assumption with every possible effort. Aswementionedinsection3, ifthecluster-RCTshadtwoperiodsofsurveysbeforethe treatment assignment, we would be able to test whether the pre-trend between the compliers andnever-takersarethesamewiththetreatmentsamplewheretypesareobservedduringthe intervention duration. Unfortunately, that is not the case with the (Crépon et al., 2015)—— the study did not collect pre-baseline data. 62 An alternative approach to test the equal-trend assumption uses the control sample. We first use the treatment sample to estimate a statistical model of the borrowing decision. Then, we predict the probability of borrowing for the control individuals with coefficients from the model. Finally, we classify the types of individuals by predicted probability of borrowing and test whether types differ in the trend of outcome variables. No matter how we choose our threshold for types, results consistently confirm our equal-trend assumption is valid in this application (See Table 2). 3.5.4 The Comparison with the LATE in (Crépon et al., 2015) Though the LATE in (Crépon et al., 2015) is potentially problematic, it is still relevant to compare our ITT estimates for the treated to the LATE because both estimators are intended to estimate the impact of the uptake. Since we only use the non-attrited sample in this microcredit program to estimate the ITT for compliers, we restrict the sample to the non-attrited individuals as well when estimating LATE (see Table 3). The LATE estimates are much higher than ITT compliers in asset values, expenses (Table 3, Panel A, Row 1, Column 1,3), but are comparable in consumption level and profit. The LATE estimates also indicate the uptake of the microcredit reduces the probably of being self-employed and working outside, in much larger magnitudes than ITT compliers. The difference between our estimates of ITT of complier and LATE estimates further validate the fact of a potential violation of exclusion restriction. 3.6 Conclusion We proposed a simple DID-based estimator for the average treatment on the treated, ITT for compliers, when the exclusion restriction of treatment assignment fails due to possibility of spillover or general equilibrium effect while allowing for one-sided noncompliance. We also point identify ITT effect for never-takers which can be used to test for the existence of 63 spillover effect. In the application of microcredit program ((Crépon et al., 2015)), we detect the evidence of spillover effects for never-takers which suggest that the traditional LATE estimates are likely to be invalid. 64 Chapter 4 On the Use of an Instrumental Variable in Causal Mediation Analysis 4.1 Introduction Understanding causal mechanisms through which a treatment or an intervention (D) affects an outcome (Y) is a fundamental goal of social science. Researchers are interested not only in identifying whether there is a treatment effect, but also in understanding how such a treatment effect arises. Suppose, for example, that an early childhood program (D) shows a positive effect on an adult outcome (Y). An important question that follows is whether and to what extent such effect can be attributed to a change in an educational achievement that is itself induced by the program, (see Heckman et al. (2013)). Causal mediation analysis offers a formal framework to uncover causal mechanism, a set of casual pathways connecting D and Y, underlying observed treatment effects. Specifically, it aims at decomposing a total effect of D on Y into an indirect effect operating through a third variable called “mediator”,M, (e.g., through years of education) and a direct effect that does not operate through that mediator (e.g., through personality traits). Understanding the 65 mechanism of causal effects allows one to design more effective policies which may involve altering specific causal pathways. Decomposing the total treatment effect into direct and indirect effects is a challenging task. Even with the “gold standard” randomized controlled trial (RCT) where the treatment is randomized, direct and indirect effects are not identified without further assumptions since the mediator, a post-intervention outcome, is in general non-random, thus making it difficult to identify the causal effect of the mediator on the outcome. The so-called black box critique of RCTs illustrates the difficulty of performing causal mediation analysis. The additional assumption that is commonly invoked in the literature in order to identify direct and indirect effects is so-called “sequential ignorability" (SI) assumption, which is essentially a selection-on-observables assumption on both D and M. Under the sequential ignorability assumption, D and M can be considered as-if random after controlling for the relevant set of observable covariates. To illustrate the identification power of SI, let us consider a linear regression model for the random D as follows: for simplicity, let us assume that there is no covariate: Y i =b 0 +b 1 D i +b 2 M i +b 3 D i M i +u i ; M i =a 0 +a 1 D i +v i : (4.1) Here, (u i ;v i ) are unobservables. SI assumes that corr(u i ;D i ) = corr(u i ;M i ) = 0 and cov(D i ;v i ) = 0, which in turn implies corr(u i ;v i ) = 0 after controlling for relevant co- variatesX i . Under these assumptions, coefficients (a;b) can be consistently estimated using least-squares method and direct and indirect effects are then estimated as a function of estimated coefficients. Sequential ignorability assumption is arguably strong as it excludes unobserved con- founders affecting both M and Y, which is unlikely to hold in many realistic settings. For instance in early childhood programs, SI fails if there is an unobserved individual trait such as perseverance which affects both education levels (M) and earnings (Y) regardless of the 66 program participation (D). In the context of linear regression model above, even after con- trolling a rich set of covariates, we may still have corr(u i ;v i ) 6= 0, leading to failure of SI. Recently, several papers have proposed an alternative identification strategy based on instrumental variables (IV) to address the possible existence of unobserved confounders. Theoretical papers include Frölich and Huber (2017), Dippel et al. (2020), Imai et al. (2013) and Mattei and Mealli (2011). Empirical paper include Chen et al. (2019) and Dippel et al. (2021). Assuming that D random, the method supposes the existence of valid IV, denoted by Z, for M in the sense that (i) Z is exogenous to both D and M, and (ii) Z affects Y only through M. The method attempts to exploit the resulting exogenous variation in M generated by Z. In a linear regression model, we now have Y i =b 0 +b 1 D i +b 2 M i +b 3 D i M i +u i ; M i =a 0 +a 1 D i +a 2 Z i +a 3 D i Z i +v i (4.2) where coefficients are estimated using IV methods where (D i ;M i ;D i M i ) is instrumented by (D i ;Z i ;D i Z i ). This produces a consistent estimator of the coefficients (a;b), and thus (in)direct effects as well even when corr(u i ;v i )6= 0. While IV methods can address unobserved confounders in the relationship between M and Y, they generally involve additional assumptions other than exogeneity and exclusion restriction once we attempt to move beyond the linear regression model. In a non-mediation setting where we aim to identify the causal effect of D on Y using Z as an IV for D, Imbens and Angrist (1994a) shows that either (i) constant effect assumption or (ii) so-called monotoncity assumption is required for IV estimand to have a causal interpretation. Imbens and Angrist (1994a) also shows that under the heterogeneous effect setting, even when the monotonicity assumption is satisfied, the average treatment effect is not identified; instead we can only identify the average treatment effect for a certain subpopulation known as the compliers. 67 Similarargumentsareexpectedtoholdinmediationsettingsaswell, albeitmorecomplex. However, despite its increasing popularity, there is no formal result outlining the formal identification result under IV in a mediation setting. This paper fills this gap by formally deriving a set of required assumptions in the context of randomized treatment. Our first result shows that when the constant effect is assumed, IV estimator for (in)direct effects identifies the true (in)direct effects. Such constant effect amounts to assuming that the linear model above is correctly specified: that the coefficients (a;b) are truly constant after controlling for all observable covariates. When such assumption is violated due to random (unobserved) coefficients (that is, the true model has (a i ;b i ) rather than (a;b)), we show that certain monotonicity assumptions are required for IV estimands to have causal interpretations, in the sense that it identifies a positively weighted averages of some subgroup effects. Specifically, we needM to be partially monotonic in both (D;Z) similar to Imbens and Angrist (1994a)’s no-defier assumption. Finally, we show that even when such partial monotonicity assumption is satisfied, there is no guarantee that IV estimands are informative on the target parameter, similar to Imbens and Angrist (1994a)’s result that IV methods identify LATE, not ATE. 4.2 Framework and Identifiability Figure 4.1: Mediation Diagram Mediator (M) Treatment (D) Outcome (Y) indirect effect direct effect The aim of causal mediation analysis is to quantify the extent to which the effect of a treatment on an outcome is mediated by a third variable, called “mediator". Let us decompose a total effect of a treatment on an outcome into an indirect effect (or mediated 68 effect) which operates through the mediator, and a direct effect (or unmediated effect) which does not operate through the mediator, as depicted in Figure 4.1. Throughout the paper we consider a simple case of a binary treatment and a binary mediator. For each individual i, let D i 2f0; 1g be an indicator of treatment (1: treated, 0: not treated), M i 2f0; 1g be a binary mediator and Y i 2R be an outcome of interest. We observe (D i ;M i ;Y i ) for a random sample of individuals. For simplicity, let us suppress the individual index i. In addition, we abstract from any covariates and implicitly condition on them. 4.2.1 Potential Outcomes and Causal Effects Following the literature, we define indirect and direct effects using the potential outcomes (or counterfactual) framework (see, for example, Pearl (2001)). Let M d denote the potential mediator value when the treatment is set toD =d. LetY d;m be the potential outcome when the treatment is set to D =d and the value of mediator is set to M =m. Similarly, Y d;M d 0 is the potential outcome where the treatment is set to D = d while the mediator is set to M =M d 0, i.e., its potential value that would take under the treatment state d 0 . In this way, for each individual, we have two potential mediators, (M 1 ;M 0 ), and four potential outcomes, (Y 1;M 1 ;Y 1;M 0 ;Y 0;M 1 ;Y 0;M 0 ). Only one of each is observed. Realized outcome Y and realized mediator M satisfy the following conditions: M =M D =DM 1 + (1D)M 0 ; Y =Y D;M D =DY 1;M 1 + (1D)Y 0;M 0 : Note that a counterfactual Y d;M d 0 for d6=d 0 is never observed in data, unless an individual has M d 0 = M d . Since only one of (M 1 ;M 0 ) is revealed by data, it is not known a priori whether a certain individual has M 1 =M 0 or not. 69 In this paper we focus on mean effects. Average Total Effect (ATE) of treatment is defined as follows: ATE =E[Y 1;M 1 Y 0;M 0 ]: Note that when the treatment is randomized, ATE is easily identified by E[YjD = 1] E[YjD = 0]. There are two ways to decompose ATE: First, ATE =E[Y 1;M 1 Y 1;M 0 ] | {z } NIE 1 +E[Y 1;M 0 Y 0;M 0 ] | {z } NDE 0 : Second, ATE =E[Y 1;M 1 Y 0;M 1 ] | {z } NDE 1 +E[Y 0;M 1 Y 0;M 0 ] | {z } NIE 0 : Following Pearl (2001), we define natural direct effects (NDE) as follows: NDE d = def E[Y 1;M d Y 0;M d ]; for d = 0; 1 (4.3) which measures the average change in outcomes due to the treatment, while the mediator is kept at its level that would be realized when D = d. Since the mediator is held fixed at M =M d , NDE measures the effect that does not operate through M. To motivate NDE 0 , say, imagine the status quo where everyone is untreated so that Y pre = Y 0;M 0 for all. Now, supposethatthenewpolicyrequiresthateveryonegetstreated,whileapolicy-makerpossibly deactivated the path from D toM so thatM is kept unchanged. Then Y post =Y 1;M 0 would be realized. NDE 0 = E[Y post Y pre ] measures the effect of such policy change. Such definition of direct effect is called natural direct effect due to Pearl (2001), in contrast to the non-natural or controlled direct effect defined by E[Y 1;m Y 0;m ] for m = 0; 1. Unlike 70 natural direct effects, controlled direct effects set the value of mediator to certain level, m. 1 While controlled direct effects can be of interest as well, we focus on the natural direct effects following the majority of the literature. Similarly, natural indirect effects are defined as follows: NIE d = def E[Y d;M 1 Y d;M 0 ] (4.4) which measures the average change in outcomes when the value of mediator changes from the value that would be realized under the control state (i.e., M = M 0 ) to the value that would be realized under the treatment state (i.e., M =M 1 ) while the treatment is fixed at its reference level, D =d. 4.2.2 Identification Issues and Sequential Ignorability Assumption Identification of natural direct and indirect effects is challenging. Even when the treat- ment is randomized, there is no guarantee that the mediator is exogenous. Since our aim is to understand implications of the mediator endogeneity, we maintain the assumption that the treatment is random in order to isolate the essence of the problem: Assumption 16 (treatment exogeneity). For all d;d 0 and m, let Y d;m ;M d 0 ??D: Hereafter, we use?? to denote a mean independence. This assumption is satisfied for instance when we have a randomized experiment where a treatment is randomly allocated across individuals. Recall that we implicitly condition on observable covariates. Thus the 1 That is, natural DE is important when we hypothesize any path-disabling intervention. This is in contrast to the variable-setting intervention where we setM =m for everyone, as being hypothesized by the controlled DE (CDE). It is implied in the definition of CDE that a policy-maker can conceivably set M to specific value for everyone regardless of D. Whether that is realistic or policy-relevant would depend on the context of studies. 71 assumption also covers observational studies where researchers can reasonably assume that the treatment is unconfounded after controlling for sets of covariates. M is endogenous if there exists a common factor simultaneously affecting both M and Y. In order to identify the causal effect of M onY, it is necessary to control for all of these common factors. The “sequential ignorability" (SI) proposed by Imai et al. (2010) assumes that these common factors are all observables, and thus can be controlled. Specifically SI assumes that for all d;d 0 ;m, we have M??Y d;m jD =d 0 after controlling for observable covariates. Under SI assumptions, natural direct and indirect effects are nonparametrically identified as shown in Imai et al. (2010). 4.3 Instrumental Variable Approach to Mediation Anal- ysis SI assumption requires that confounders affecting bothM andY are entirely observable. As admitted by Imai et al. (2010), this assumption is rather strong — The assumption cannot be proven and, in many cases, is difficult to justify. Typically, an instrumental variable (IV) method is used when we want to identify the causal effect of an endogenous variable, where the endogeneity stems from possibly unobserved confounders. Thus recent papers have started to propose an alternative identification strategy based on IVs. We seek to understand the identification power of such IV in a mediation setting. To do so, we consider the case where we have an access to a binary instrument, Z2f0; 1g, satisfying the following assumption: Assumption 17 (binary instrument). There exists a binary instrument, Z2f0; 1g, such that for all values of d;d 0 ;m;z, the following statements hold: 72 (i) randomization of the instrument: Y d;m ;M d 0;D ??Z: (ii) exclusion restriction: Y d;m;z =Y d;m : (iii) relevance: Pr(MjZ =z) is a nontrivial function of z. (i) requires thatZ is exogenous. Recall that we are implicitly conditioning on covariates. (ii) requires thatZ does not affectY directly. On the other hand (iii) requires thatM should be affected by Z. Taken together, (ii) and (iii) require that Z affects Y only through M. With suchZ at hand, we now augment our potential mediator notation fromM d toM d;z . Observed mediator is M =M D;Z . We also use the notation M 0 =M 0;Z and M 1 =M 1;Z so that E[M 0 ] =E[M 0;Z ] =E[M 01 jZ = 1]Pr(Z = 1) +E[M 00 jZ = 0]Pr(Z = 0) =E[M 01 ]Pr(Z = 1) +E[M 00 ]Pr(Z = 0) where the last equality follows fromM d;z ??Z. IV estimators for mediation are then defined by postulating the following linear regression model for Y and M: Y = 8 > > < > > : 0 + 1 M +u; for D = 0 0 + 1 M +u; for D = 1 and M = 8 > > < > > : 0 + 1 Z +v; for D = 0 0 + 1 Z +v; for D = 1 73 Equivalently, we have the following system of linear equations for endogenous variables: Y =D( 0 + 1 M) + (1D)( 0 + 1 M) +u; (4.5) M =D( 0 + 1 Z) + (1D)( 0 + 1 Z) +v (4.6) where it is assumed that E[ujD;Z] =E[vjD;Z] = 0: Linear coefficients, = (;;;), are estimated by running an IV regression ofY onM using Z as an instrument, separately for each value of D. Let b IV denote the resulting IV estimator. Given b IV , direct and indirect effects are estimated by [ NIE 0 = b 1 b 0 b 0 + (b 1 b 1 ) b E[Z] (4.7) [ NIE 1 =b 1 b 0 b 0 + (b 1 b 1 ) b E[Z] (4.8) \ NDE 0 =b 0 b 0 + (b 1 b 1 ) b 0 +b 1 b E[Z] (4.9) \ NDE 1 =b 0 b 0 + (b 1 b 1 ) b 0 +b 1 b E[Z] (4.10) where b E[Z] = P n i=1 Z i =n (see VanderWeele (2016)). Let IV be the probability limit of b IV . Probability limits of estimated mediation effects are given as follows: NIE IV 0 = IV 1 IV 0 IV 0 + ( IV 1 IV 1 )E[Z] (4.11) NIE IV 1 = IV 1 IV 0 IV 0 + ( IV 1 IV 1 )E[Z] (4.12) NDE IV 0 = IV 0 IV 0 + ( IV 1 IV 1 ) IV 0 + IV 1 E[Z] (4.13) NDE IV 1 = IV 0 IV 0 + ( IV 1 IV 1 ) IV 0 + IV 1 E[Z] : (4.14) 74 On the other hand, nonparametrically, NDE d and NIE d , when augmented with Z, are defined as follows: NIE 0 =E[Y 0;M 1;Z Y 0;M 0;Z ] = X z2f0;1g E[Y 0;M 1;z Y 0;M 0;z ]Pr(Z =z) (4.15) NIE 1 =E[Y 1;M 1;Z Y 1;M 0;Z ] = X z2f0;1g E[Y 1;M 1;z Y 1;M 0;z ]Pr(Z =z) (4.16) NDE 0 =E[Y 1;M 0;Z Y 0;M 0;Z ] = X z2f0;1g E[Y 1;M 0;z Y 0;M 0;z ]Pr(Z =z) (4.17) NDE 1 =E[Y 1;M 1;Z Y 0;M 1;Z ] = X z2f0;1g E[Y 1;M 1;z Y 0;M 1;z ]Pr(Z =z) (4.18) Compare, for instance,NIE IV 0 andNIE 0 . We show that when the linear model (4.5 and 4.6) iscorrectlyspecified(i.e., therelationshipistrulylinearwithanadditiveheterogeneityterm), we have NIE IV d = NIE d and NDE IV d = NDE d for all d = 0; 1. However, while linearity assumption may be justified on the ground of discreteness of M and D, the assumption of constant slopes is strong as it assumes that homogeneity of effects conditional on covariate which is likely to be violated when individuals select into M based on their unobservable gains, acaseofwhatHeckman(2001)calls“essentialheterogeneity". Insuchcase, coefficients (;) are random even when we control for all observables, and worse, ( i ; i ) may be correlated with M i . Our aim is to understand how to interpret NIE IV d and NDE IV d , and whether they are informative about target parameters, NDE d and NIE d , when the model is misspecified. 4.3.1 What Does IV Identify? As a first step, the following lemma shows what IV identifies: 75 Lemma 2 (causal interpretation of IV ). IV 1 = E[Y 0;M 01 Y 0;M 00 ] E[M 01 M 00 ] ; IV 0 =E[Y 0;M 0 ] IV 1 E[M 0 ]; IV 1 = E[Y 1;M 11 Y 1;M 10 ] E[M 11 M 10 ] ; IV 0 =E[Y 1;M 1 ] IV 1 E[M 1 ] and IV 1 =E[M 01 M 00 ]; IV 0 =E[M 0 ] IV 1 E[Z] IV 1 =E[M 11 M 10 ]; IV 0 =E[M 1 ] IV 1 E[Z] where IV 1 = E[Y 01 Y 00 jM 01 >M 00 ]Pr(M 01 >M 00 )E[Y 01 Y 00 jM 01 <M 00 ]Pr(M 01 <M 00 ) Pr(M 01 >M 00 )Pr(M 01 <M 00 ) and IV 1 = E[Y 11 Y 10 jM 11 >M 10 ]Pr(M 11 >M 10 )E[Y 11 Y 10 jM 11 <M 10 ]Pr(M 11 <M 10 ) Pr(M 11 >M 10 )Pr(M 11 <M 10 ) See C.1 for proof. 76 Now, let us focus on the question of what NIE IV 0 identifies (NIE IV 1 case can be done in a symmetric way). From equation 4.11, we have NIE IV 0 = IV 1 IV 0 IV 0 + ( IV 1 IV 1 )E[Z] | {z } =E[M 1 M 0 ] (4.19) = IV 1 X z2f0;1g E[M 1z M 0z ] z (4.20) while NIE 0 =E[Y 0;M 1;Z Y 0;M 0;Z ] can be written as X z2f0;1g h E[Y 01 Y 00 jM 1z >M 0z ]Pr(M 1z >M 0z ) (4.21) E[Y 01 Y 00 jM 1z <M 0z ]Pr(M 1z <M 0z ) i z : (4.22) Suppose that the effect Y 01 Y 00 is constant across individuals. In such case, we have IV 1 = E[Y 01 Y 00 ] and NIE 0 = NIE IV 0 , so that IV estimator of NIE 0 identifies the true NIE 0 . The result can be generalized to other (in)direct effects as well: Proposition 1 (IV estimand under constant effects). Conditional on observables, if Y d 0 ;m 0 Y d;m is constant across all individuals, then NIE IV d = NIE d and NDE IV d = NDE d for all d = 0; 1. 4.3.2 Monotonicity Conditions In general, however, such constant effect assumption is hard to justify. It is violated when individuals with different values of potential mediator values,fM d;z g (d;z)2f0;1g 2, experience systematically different (in)direct effects. Note that sinceM d;z is not observed for all possible values of (d;z) (i.e., compliance type is unknown), we cannot control for this. Once the (unobserved) effect heterogeneity is allowed, it is not clear whether and how NIE IV 0 (eq. 4.20) is comparable to NIE 0 (eq. 4.22). 77 We argue that actuallyNIE IV 0 does not have a causal interpretation under the heteroge- nous effect setting without making an additional assumption on M, namely, monotonicity assumption. Our first claim is that in fact, the target parameter NIE 0 itself has no casual interpre- tation under heterogeneous effects setting unless monotonicity of M with respect to D for given value of Z is assumed. To see this, let us implicitly condition on Z =z. Note that NIE 0 = def E[Y 0;M 1 Y 0;M 0 ] = E[Y 01 Y 00 jM 1 >M 0 ]Pr(M 1 >M 0 )E[Y 01 Y 00 jM 1 <M 0 ]Pr(M 1 <M 0 ) Thus, NIE 0 identifies a weighted difference of two effects: (i) E[Y 01 Y 00 jM 1 > M 0 ], an average Y 01 Y 00 for those with (M 0 ;M 1 ) = (0; 1) and E[Y 01 Y 00 jM 1 < M 0 ], an average Y 01 Y 00 for those with (M 0 ;M 1 ) = (1; 0). Since NIE 0 is a weighted difference of two different effects, we may have NIE 0 = 0 even when there is Y 0;M 1 Y 0;M 0 6= 0 for everyone because two effects cancel each other. Thus we may wrongly conclude there is no mediation. This problem occurs since some individuals change their mediator status from 0 to 1 when given the treatment, while some individuals just do the opposite and change their mediator value from 1 to 0. As a result, an overall impact of giving the treatment does not contain any information about causal effect of a treatment on any person. The problem here is analogous to the case that the Wald estimand does not identify any causal parameters when both compliers and defiers are coexisting. Similarly, the problem can be avoided when we assume that treatment affects M in the same direction for everyone: Assumption 18 (weak monotonicity of M in D). for all z2f0; 1g, Pr(M 1;z M 0;z ) = 1 78 This assumption requires that for given z, being treated only weakly increases the value of M. Under this assumption, equations 4.20 and 4.22 become NIE 0 = X z2f0;1g E[Y 01 Y 00 jM 1z >M 0z ]P 1z z (4.23) and NIE IV 0 = E[Y 0;M 01 Y 0;M 00 ] E[M 01 M 00 ] X z2f0;1g P 1z z (4.24) where E[Y 0;M 01 Y 0;M 00 ]=E[M 01 M 00 ] is E[Y 01 Y 00 jM 01 >M 00 ]Pr(M 01 >M 00 )E[Y 01 Y 00 jM 01 <M 00 ]Pr(M 01 <M 00 ) Pr(M 01 >M 00 )Pr(M 01 <M 00 ) For simplicity, let Q 1 = Pr(M 01 > M 00 ) and Q 2 = Pr(M 01 < M 00 ), so that equation 4.24 can be written as follows: NIE IV 0 = E[Y 01 Y 00 jM 01 >M 00 ]Q 1 E[Y 01 Y 00 jM 01 <M 00 ]Q 2 Q 1 Q 2 X z2f0;1g P 1z z (4.25) which, still, gives a non-convex combination of group-specific average effects. By comparing equations 4.23 and 4.25, we conclude that unless either Q 1 = 0 orQ 2 = 0, NIE IV 0 is not informative about theNIE 0 . We show this by using numerical example taken from Angrist and Imbens (1995): suppose Q 1 = 2=3 and Q 2 = 1=3 while E[Y 01 Y 00 jM 01 > M 00 ] = and E[Y 01 Y 00 jM 01 < M 00 ] = 2 with > 0. Even when two subgroup-effects are positive, we would have NIE IV 0 = 0. On the other hand, NIE 0 can take any sign. Again, we have the same problem as in Angrist and Imbens (1995): The effects for those mediator value is shifted from 0 to 1 when Z is switched on can be cancelled out by the effects of those whose mediator value is shifted from one to zero. To avoid such problem, we again impose a moonotonicity assumption, this time for a given value of D =d: 79 Assumption 19 (Weak monotonicity of M in Z). for all d2f0; 1g, Pr(M d;1 M d;0 ) = 1 Our final result shows that even when both assumptions 18 and 19 are satisfied, NIE IV 0 identifies different quantity from NIE 0 : NIE 0 = X z2f0;1g E[Y 01 Y 00 jM 1z >M 0z ]Pr(M 1z >M 0z ) z (4.26) while NIE IV 0 =E[Y 01 Y 00 jM 01 >M 00 ] X z2f0;1g Pr(M 1z >M 0z ) z (4.27) Here NIE 0 measures the overall effect for those who change their M due to change in treatment value weighted over different values of Z. In contrast, NIE IV 0 measures the one for those who change M in response to Z for a fixed D = 0 world, multiplied by a constant P z2f0;1g Pr(M 1z >M 0z ) z . While we do not have non-convex weights problem anymore, so that both two have causal interpretations, it is not clear how NIE IV 0 would be informative about the target parameter, NIE 0 . Thus, it would be desirable to examine the degree of effect heterogeneity over different compliance group. (Note that in the extreme case where there is no effect heterogeneity, these two are equivalent as expected.) Although we have focused on NIE 0 case, it follows easily that the same conclusion holds for NIE 1 as well as NDE d . Our result thus implies that careful examination is needed in using IV methods when the target parameter is the form of natural (in)direct effects. While IV has a benefit of allowing unobserved confounders, its benefit comes with costs: either strong effect homogeneity or the monotonicity assumptions combined with concern of external validity is needed. 80 4.4 Concluding Remarks This paper investigates an identification of direct and indirect effects in a mediation set- ting using an instrumental variable. We have considered a simple case of a binary treatment, a binary mediator and a binary instrumental variable where there exists an unobserved con- founder affecting both mediator and outcome. We have shown that the instrumental variable estimators based on linear models can identify natural direct and indirect effects when there is no unobserved heterogeneity in effects. Under the effect heterogeneity, we show that the instrumental variable estimators of natural direct and indirect effects do not deliver causally meaningful quantities without making certain sets of monotonicity assumptions restricting how the mediator responses to the treatment and the instrumental variable. We also show that even when these monotonicity assumptions are satisfied, the instrumental variable es- timators do not necessarily correspond to natural (in)direct effects. In conclusion, while IV methods have benefits of addressing unobserved confounders, cau- tion would be needed. The comparative advantage of IV methods over traditional methods based on selection-on-observables would be lower as the degree of unobserved effect hetero- geneity gets higher. Careful examination of the plausibility of homogeneity assumption along with sensitivity analysis with respect to unobserved effect heterogeneity would be fruitful. 81 Bibliography Abbring, J. H. and Heckman, J. J. (2007). Econometric evaluation of social programs, part iii: Distributional treatment effects, dynamic treatment effects, dynamic discrete choice, and general equilibrium policy evaluation. Handbook of econometrics, 6, 5145– 5303. Andrews, D. W. K. (1992). Generic uniform convergence. Econometric Theory, 8 (2), 241–257. Angelucci, M. andDe Giorgi, G. (2009). Indirect effects of an aid program: how do cash transfers affect ineligibles’ consumption? American Economic Review, 99 (1), 486–508. Angrist, J. D. and Imbens, G. W. (1995). Two-stage least squares estimation of aver- age causal effects in models with variable treatment intensity. Journal of the American Statistical Association, 90 (430), 431–442. —, — andRubin, D. B. (1996). Identification of causal effects using instrumental variables. Journal of the American statistical Association, 91 (434), 444–455. Baird, S., Bohren, J. A., McIntosh, C. and Özler, B. (2018). Optimal design of experiments in the presence of interference. The Review of Economics and Statistics, (5), 844–860. Bajari, P., Hong, H., Krainer, J. andNekipelov, D. (2010). Estimating static models of strategic interactions. Journal of Business & Economic Statistics, 28 (4), 469–482. Balat, J. and Han, S. (2019). Multiple treatments with strategic interaction. arXiv. Bloom, H. S.(2005).Randomizinggroupstoevaluateplace-basedprograms. Learning more from social experiments: Evolving analytic approaches, pp. 115–172. Brinch, C. N., Mogstad, M. and Wiswall, M. (2017). Beyond late with a discrete instrument. Journal of Political Economy, 125 (4), 985–1039. Brock, W. and Durlauf, S. (2007). Identification of binary choice models with social interactions. Journal of Econometrics, 140 (1), 52–75. Brock, W. A. and Durlauf, S. N. (2001). Discrete Choice with Social Interactions. The Review of Economic Studies, 68 (2), 235–260. 82 Carneiro, P., Heckman, J. J. and Vytlacil, E. J. (2011). Estimating marginal returns to education. American Economic Review, 101 (6), 2754–81. Chen, S. H., Chen, Y.-C. and Liu, J.-T. (2019). The impact of family composition on educational achievement. Journal of Human Resources, 54 (1). Crépon, B.,Devoto, F.,Duflo, E.andParienté, W.(2015).Estimatingtheimpactof microcredit on those who take it up: Evidence from a randomized experiment in morocco. American Economic Journal: Applied Economics, 7 (1), 123–50. —, Duflo, E., Gurgand, M., Rathelot, R. andZamora, P. (2013). Do Labor Market Policies have Displacement Effects? Evidence from a Clustered Randomized Experiment *. The Quarterly Journal of Economics, 128 (2), 531–580. Del Rosso, J. M. and Marek, T. (1996). Class action: Improving school performance in the developing world through better Health, Nutrition and Population. The World Bank. Dippel, C., Robert, G., Stephan, H. and Pinto, R. (2020). Mediation analysis in iv settings with a single instrument. working. —, —, — and Rodrigo, P. (2021). The effect of trade on workers and voters. Economic Journal, accepted. Duflo, E., Glennerster, R. and Kremer, M. (2007). Using randomization in develop- ment economics research: A toolkit. Handbook of development economics, 4, 3895–3962. Dupas, P. (2014). Short-run subsidies and long-run adoption of new health products: Evi- dence from a field experiment. Econometrica, 82 (1), 197–228. Ferracci, M., Jolivet, G. and van den Berg, G. J. (2014). Evidence of treatment spillovers within markets. The Review of Economics and Statistics, 95 (5), 812–823. Flores, C. A. and Flores-Lagunes, A. (2013). Partial identification of local average treatment effects with an invalid instrument. Journal of Business & Economic Statistics, 31 (4), 534–545. Frölich, M. and Huber, M. (2017). Direct and indirect treatment effects–causal chains and mediation analysis with instrumental variables. Journal of the Royal Statistical Soci- ety: Series B (Statistical Methodology), 79 (5), 1645–1666. Gallant, A. and White, H. (1988). A Unified Theory of Estimation and Inference for Nonlinear Dynamic Models. Oxford: Basil Blackwell. Godlonton, S. and Thornton, R. (2012). Peer effects in learning hiv results. Journal of Development Economics, 97 (1), 118 – 129. Hahn, J. and Ridder, G. (2011). Conditional moment restrictions and triangular simulta- neous equations. The Review of Economics and Statistics, 93 (2), 683–689. 83 Heckman, J., Pinto, R. and Savelyev, P. (2013). Understanding the mechanisms through which an influential early childhood program boosted adult outcomes. Ameri- can Economic Review. Heckman, J. J. (1979). Sample selection bias as a specification error. Econometrica,47 (1), 153–161. — (2001). Micro data, heterogeneity, and the evaluation of public policy: Nobel lecture. Journal of Political Economy, 109 (4), 673–748. —, Lochner, L. and Taber, C. (1998a). Explaining rising wage inequality: Explorations with a dynamic general equilibrium model of labor earnings with heterogeneous agents. Review of economic dynamics, 1 (1), 1–58. —, — and— (1998b). General equilibrium treatment effects: A study of tuition policy. Tech. rep., National Bureau of Economic Research. —, — and — (1998c). Tax policy and human capital formation. Tech. rep., National Bureau of Economic Research. —, Urzua, S. and Vytlacil, E. (2006). Understanding instrumental variables in models with essential heterogeneity. The Review of Economics and Statistics, 88 (3), 389–432. — andVytlacil, E. (2001). Policy-relevant treatment effects. American Economic Review, 91 (2), 107–111. Hudgens, M. G. andHalloran, M. E. (2008). Toward causal inference with interference. Journal of the American Statistical Association, 103 (482), 832–842, pMID: 19081744. Imai, K., Jiang, Z. and Malani, A. (2020). Causal inference with interference and non- compliance in two-stage randomized experiments. Journal of the American Statistical As- sociation, 0 (0), 1–13. —, Keele, L. and Yamamoto, T. (2010). Identification, inference and sensitivity analysis for causal mediation effects. Statistical Science, 25 (1), 51–71. —, Tingley, D. and Yamamoto, T. (2013). Experimental designs for identifying causal mechanisms. Journal of the Royal Statistical Society: Series A (Statistics in Society), 176 (1), 5–51. Imbens, G. W. (2007). Nonadditive Models with Endogenous Regressors, Cambridge Uni- versity Press, Econometric Society Monographs, vol. 3, pp. 17–46. Advances in economics and econometrics: Theory and applications, ninth world congress edn. — and Angrist, J. D. (1994a). Identification and estimation of local average treatment effects. Econometrica, 62, 467–475. — and — (1994b). Identification and estimation of local average treatment effects. Econo- metrica, 62 (2), 467–475. 84 — andRubin, D. B. (1997). Bayesian inference for causal effects in randomized experiments with noncompliance. The annals of statistics, pp. 305–327. Jackson, M. O., Lin, Z. and Yu, N. N. (2020). Adjusting for peer-influence in propensity scoring when estimating treatment effects. Kline, B. and Tamer, E. (2020). Chapter 7 - econometric analysis of models with social interactionssome of this chapter had been previously distributed as “the empirical content of models with social interactions” and “some interpretation of the linear-in-means model of social interactions” by the same authors. In B. Graham and Á. de Paula (eds.), The Econometric Analysis of Network Data, Academic Press, pp. 149 – 181. Lalive, R., Landais, C. and Zweimüller, J. (2015). Market externalities of large unem- ployment insurance extension programs. American Economic Review, 105 (12), 3564–96. Lazzati, N. (2015). Treatment response with social interactions: Partial identification via monotone comparative statics. Quantitative Economics, 6 (1), 49–83. Lee, L.-F. (1984). Tests for the bivariate normal distribution in econometric models with selectivity. Econometrica, 52 (4), 843–863. Leung, M. P. (2015). Two-step estimation of network-formation models with incomplete information. Journal of Econometrics, 188 (1), 182 – 195. — (2020a). Causal inference under approximate neighborhood interference. arXiv. — (2020b). Treatment and spillover effects under network interference. The Review of Eco- nomics and Statistics, 102 (2), 368–380. Manski, C. F. (2013). Identification of treatment response with social interactions. The Econometrics Journal, 16 (1), S1–S23. Masten, M. A. and Torgovitsky, A. (2016). Identification of instrumental variable correlated random coefficients models. The Review of Economics and Statistics, 98 (5), 1001–1005. Mattei, A. and Mealli, F. (2011). Augmented designs to assess principal strata direct effects. Journal of the Royal Statistical Society: Series B (Statistical Methodology). McFadden, D. (1984).Econometricanalysisofqualitativeresponsemodels.InZ.Griliches† and M. D. Intriligator (eds.), Handbook of Econometrics, vol. 2, 24, 1st edn., Elsevier, pp. 1395–1457. Mealli, F. and Pacini, B. (2013). Using secondary outcomes to sharpen inference in ran- domizedexperimentswithnoncompliance. Journal of the American Statistical Association, 108 (503), 1120–1131. Miguel, E.andKremer, M.(2004a).Worms: Identifyingimpactsoneducationandhealth in the presence of treatment externalities. Econometrica, 72 (1), 159–217. 85 — and — (2004b). Worms: identifying impacts on education and health in the presence of treatment externalities. Econometrica, 72 (1), 159–217. Pearl, J. (2001). Direct and indirect effects. In Proceedings of the Seventeenth Confer- ence on Uncertainty in Artificial Intelligence, UAI’01, San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., pp. 411–420. Ridder, G. and Sheng, S. (2020). Estimation of large network formation games. arXiv. Rubin, D. B. (1990). Comments on “on the application of probability theory to agricultural experiments. essay on principles. section 9” by j. splawa-neyman translated from the polish and edited by d. m. dabrowska and t. p. speed. Statistical Science, 5, 472–480. — andZell, E. R. (2010). Dealing with noncompliance and missing outcomes in a random- ized trial using bayesian technology: Prevention of perinatal sepsis clinical trial, soweto, south africa. Statistical Methodology, 7 (3), 338–350. VanderWeele, T. J. (2016). Mediation analysis: A practitioner’s guide. Annual Review of Public Health, 37 (1), 17–32. Vazquez-Bare, G. (2020). Causal spillover effects using instrumental variables. arXiv. Wooldridge, J. M. (2003). Further results on instrumental variables estimation of average treatment effects in the correlated random coefficient model. Economics Letters, 79 (2), 185 – 191. Xu, H. (2018). Social interactions in large networks: A game theoretic approach. Interna- tional Economic Review, 59 (1), 257–284. 86 Appendix A Appendix to Chapter 1 A.1 Proof of Theorem 3 Following Xu (2018), we show this by contradiction. Define i = P j2N i j =jN i j. Let (X i ;Z i ; i ;) = (X 0 i 1 + 2 Z i + 3 i ) be i’s best-response function to inputs (X i ;Z i ; i ), and parameter value . Suppose there are two non-identically equilibria = ( i ) i2Nn and + = ( + i ) i2Nn . By definition, they should satisfy i = (X i ;Z i ; i ;); 8i2N n and + i = (X i ;Z i ; + i ;); 8i2N n : Taking difference and applying mean-value theorem, we have i + i = (X i ;Z i ; i ;) (X i ;Z i ; + i ;) = @(X i ;Z i ; m i ;) @ i ( i + i ) 87 where m i is a mean value between i and + i . Taking an absolute value to the LHS, j i + i j @(X i ;Z i ; m i ;) @ i j i + i j (A.1) @(X i ;Z i ; m i ;) @ i max j2N i j j + j j: (A.2) From the definition of (), observe that @(X i ;Z i ; i ;) @ i = (X 0 i 1 + 2 Z i + 3 i ) @ i = X 0 i 1 + 2 Z i + 3 i 3 : Thus, @(X i ;Z i ; m i ;) @ i j 3 j sup u (u): (A.3) Therefore we can write A.2 as j i + i j max j2N i j j + j j: Taking max i2Nn to both sides gives, max i2Nn j i + i j max i2Nn max j2N i j j + j j max k2Nn j k + k j which leads to contradiction when < 1. A.2 Proofs of Asymptotic Results A.2.1 Proof of Consistency of First-Stage Estimators Let l i ()D i ln i (S;) + (1D i ) ln(1 i (S;)) be an individual log-likelihood function of i. Then b L n () = 1 n P n i=1 l i (). Define L n () =E[ b L n ()jS] 88 where the population objective function, L n (), depends on n through the public state S = (G;X;Z). Recall that the true parameter is denoted by 0 . Following Gallant and White (1988) Theorem 3.3, we establish consistency result by showing identifiable uniqueness and uniform con- vergence result. Identifiable Uniqueness We show that lim inf n!1 (L n ( 0 )L n ())> 0 for any such that j 0 j> 0. lim inf n!1 (L n ()L n ( 0 )) = lim inf n!1 1 n n X i=1 E h D i ln i (S;) i (S; 0 ) + (1D i ) ln 1 i (S;) 1 i (S; 0 ) S i = lim inf n!1 1 n n X i=1 h i (S; 0 ) ln i (S;) i (S; 0 ) + (1 i (S; 0 )) ln 1 i (S;) 1 i (S; 0 ) S i lim inf n!1 1 n n X i=1 ln i (S;) + 1 i (S;) = 0: The second equality follows from E[D i jS] = i (S; 0 ) and the last weak inequality is due to Jensen’s inequality. To show that the inequality holds strictly, we need to rule out the case of lim inf n!1 (L n ( 0 )L n ()) = 0. This happens when for some large enoughn, i (S;) = i (S; 0 ) for all i2 N n =f1; 2; ;ng, i.e., there exists n that delivers observationally equivalent choice probabilities. Suppose this is the case. By the fixed point requirement, the following needs to be satisfied for any arbitrary , including the true parameter 0 : 1 ( i (S;)) =X 0 i 1 + 2 Z i + 3 1 jN i j X j2N i j (S;); 8i2N n and 1 ( i (S; 0 )) =X 0 i 0 1 + 0 2 Z i + 0 3 1 jN i j X j2N i j (S; 0 ); 8i2N n : 89 If i (S;) = i (S; 0 );8i2N n , we have, X 0 i ( 1 0 1 ) +Z i ( 2 0 2 ) + ( 3 0 3 ) 1 jN i j X j2N i j (S; 0 ) = 0; 8i2N n : Equivalently, R 0 i ( 0 ) = 0; 8i 2 N n where R i is defined as in Theorem 7. It follows that ( 0 ) 0 P n i=1 R i R 0 i ( 0 ) = 0. Given the assumption that P n i=1 R i R 0 i is positive definite for all large enough n, above equation holds only under = 0 leading to contradiction. Next, we verify that sup 2 j b L n ()L n ()j p ! 0: We first shows the pointwise convergence holds. Uniform convergence follows then from Lipschitz conditions. Pointwise Convergence We first show that for any 2 ,j b L n ()L n ()j p ! 0: It can be shown that b L n ()L n () = 1 n n X i=1 n (D i i (S; 0 )) ln i (S;) 1 i (S;) | {z } i o : f i g n i=1 is conditionally independent with mean zero given S. It is also uniformly bounded due to Lemma 3. Therefore we can apply a LLN for independent observations (e.g., Markov) and the result follows. Uniform Convergence Given pointwise convergence result, uniform convergence follows if we can establish thatf b L n ()L n ()g n is stochastically equicontinuous on (theorem 1 in Andrews (1992)). Sufficient condition for this is to show that the summand in the sample objective function fl i ()g is Lipschitz (Assumption W-LIP in Andrews (1992)). Note that r l i () =D i r i (S;) i (S;) + (1D i ) r i (S;) 1 i (S;) which is bounded by jr l i ()j r i (S;) i (S;) + r i (S;) 1 i (S;) : 90 By Lemma 3 and Lemma 4, i (S;) andr i (S;) are uniformly bounded. Thereforefl i ()g is Lipschitz-continuous and the result follows. A.2.2 Proof of Asymptotic Normality of First-Stage Estimators ^ should satisfy the first-order condition for maximization: r b L n ( ^ ) = 0. Given that b L n ( ^ ) is smooth, wecanapplythemean-valuetheoremtothefirst-orderconditionaroundthetrueparameter 0 : r b L n ( ^ ) =r b L n ( 0 ) +r b L n ( )( ^ 0 ) = 0 (A.4) () p n( ^ 0 ) =(r b L n ( )) 1 p nr b L n ( ^ 0 ) (A.5) where is a mean value of the line joining ^ and 0 . Define the Hessian matrix as H n () =E h 1 n n X i=1 r l i () S i and the information matrix as I n () =E h 1 n n X i=1 r l i ()r l i () 0 S i : We first show thatr b L n ( )H n ( 0 ) p ! 0 (ULLN of the Hessian matrix) and then CLT on the score: p nI 1 n ( 0 )r b L n ( ^ 0 ) d !N(0;I dim() ): ULLN of the Hessian Matrix We show thatr b L n ( )H n ( 0 ) p ! 0. Note that r b L n ( )H n ( 0 ) = 1 n n X i=1 r l i ( ) 1 n n X i=1 r l i ( 0 ) | {z } A + 1 n n X i=1 r l i ( 0 )E h 1 n n X i=1 r l i ( 0 ) S i | {z } B 91 First, A = o p (1) since ^ 0 p ! 0 andr l i () is continuous as a result of Lemma 5. Next, note that B = 1 n n X i=1 n r l i ( 0 )E r l i ( 0 ) S o | {z } i f i g is independent conditional on S with mean zero. Also by Lemma 5, it is uniformly bounded. Therefore by LLN for independent observations, B =o p (1). CLT on the Score Note that p nr b L n ( 0 ) = p n 1 n P n i=1 r l i ( 0 ) and thatfr l i ( 0 )g is in- dependently distributed conditional on S with the uniformly bounded conditional variance I n ( 0 ). Therefore we can apply Lyapunov’s CLT for independent observations to get p nI 1=2 n ( 0 )r b L n ( 0 ) d !N(0;I): Combining all these results, we see that the equation A.5 can be written as p n( ^ 0 ) =(H n ( 0 ) +o p (1)) 1 I n ( 0 ) 1=2 p nI n ( 0 ) 1=2 r c L n ( 0 ): By the information matrix inequality, when the model is correctly specified, H n ( 0 ) =I n ( 0 ) so that we have p n( ^ 0 ) = (I n ( 0 ) +o p (1)) 1 I n ( 0 ) 1=2 p nI n ( 0 ) 1=2 r c L n ( 0 ): Under the assumption thatI n ( 0 ) is nonsingular, we get the desired result: p n(I 1 n ( 0 )) 1=2 ( ^ 0 ) d !N(0;I dim() ): 92 A.2.3 Proof of Consistency of Second-Stage Estimators Our estimators are based on the following moment conditions E[Y i jD i = 1;S] =W 0 i 0 1 ; E[Y i jD i = 0;S] =W 0 i 0 0 Let us focus on ^ 1 case as ^ 0 case can be analyzed in an analogous way. Given the moment condition E[Y i jD i = 1;S] =W 0 i 0 1 , we write the equation in error form as Y i =W 0 i 0 1 + 1i ; E[ 1i jD i = 1;S] = 0: Estimator for 1 is defined as ^ 1 = arg min 1 1 n n X i=1 D i Y i ^ W 0 i 1 2 (A.6) = arg min 1 1 n n X i=1 D i Y i D i ^ W 0 i 1 2 (A.7) = n n X i=1 D i ^ W i ^ W 0 i o 1 n X i=1 D i ^ W i Y i (A.8) Note that D i Y i =D i Y i (1; i (S; 0 )) =D i (W 0 i 0 1 + 1i ) =D i ^ W 0 i 0 1 + 1i ( ^ W i W i ) 0 0 1 . Plugging this into A.8 gives that ^ 1 = n X i=1 D i ^ W i ^ W 0 i 1 n X i=1 D i ^ W i ^ W 0 i 0 1 + 1i ( ^ W i W i ) 0 0 1 = 0 1 + n X i=1 D i ^ W i ^ W 0 i 1 X i D i ^ W i 1i ( ^ W i W i ) 0 0 1 so that ^ 1 0 1 = 1 n n X i=1 D i ^ W i ^ W 0 i | {z } A 1 1 n X i D i ^ W i 1i ( ^ W i W i ) 0 0 1 | {z } B =A 1 B: (A.9) 93 Part A We show that 1 n P i=1 D i ^ W i ^ W 0 i E[ 1 n P n i=1 D i W i W 0 i jS] =o p (1). Decompose 1 n X i=1 D i ^ W i ^ W 0 i E[ 1 n n X i=1 D i W i W 0 i jS] into two parts as follows: 1 n X i=1 D i ^ W i ^ W 0 i 1 n n X i=1 D i W i W 0 i | {z } (a) + 1 n n X i=1 D i W i W 0 i 1 n n X i=1 E[D i W i W 0 i jS] | {z } (b) : (a) = o p (1) since ^ 0 p ! 0 and W i () is continuous in . For (b), note that the summand fD i W i W 0 i E[D i W i W 0 i jS]g is conditionally independent givenS with mean zero. It is also uniformly bounded. Therefore by LLN, (b) =o p (1). Finally, invertibility ofE[ 1 n P n i=1 D i W i W 0 i jS] follows from the identification condition. Part B Since ^ W i W i =o p (1), we can write itB as 1 n n X i=1 D i (W i +o p (1))( 1i o p (1)) = 1 n n X i=1 D i W i 1i Similar argument as above shows that 1 n n X i=1 D i W i 1i E[D i W i 1i jS] =o p (1): It follows from the moment condition E[ 1i jD i = 1;S] = 0 that E[D i W i 1i jS] = 0. Therefore we conclude that B = 1 n n X i=1 D i W i 1i +o p (1) =o p (1): Combining with the result on part A, we conclude that ^ 1 1 =o p (1). 94 A.2.4 Proof of Asymptotic Normality of Second-Stage Estimators From A.9, p n(^ 1 0 1 ) = 1 n n X i=1 D i ^ W i ^ W 0 i 1 1 p n X i D i ^ W i 1i 00 1 ( ^ W i W i ) (A.10) = E[ 1 n n X i=1 D i W i W 0 i jS] +o p (1) 1 1 p n X i D i ^ W i 1i 00 1 ( ^ W i W i ) | {z } C (A.11) where the last step has been established in the previous section. Consider the term ^ W i W i inC. By mean-value theorem, ^ W i W i =W i (^ 1 )W i ( 0 1 ) =r 1 W i ( 1 )(^ 1 0 1 ) =) p n( ^ W i W i ) =r 1 W i ( 1 ) p n(^ 1 0 1 ) where 1 is a mean value of the line joining ^ 1 and 0 1 . By the asymptotic normality of the first- step estimator ^ as in the equation 2.30, we can show that p n( ^ 0 ) is asymptotically linear. Specifically, define the influence function as i =E[ 1 n P n i=1 r l i ( 0 )r l i ( 0 ) 0 jS]r l i ( 0 ), then p n( ^ 0 ) = 1 p n n X i=1 i +o p (1): Therefore the termC in p n(^ 1 0 1 ) can be written as 1 p n X i D i ^ W i ( 1i 00 1 ( ^ W i W i )) = 1 p n n X i=1 D i ^ W i 1i 1 n n X i=1 D i ^ W i 00 1 p n( ^ W i W i ) = 1 p n n X i=1 D i ^ W i 1i | {z } C(a) n 1 n n X i=1 D i ^ W i 00 1 r 1 W i ( 1 ) o | {z } C(b) 1 p n n X i=1 i +o p (1) We first show that C(a) can be replaced by 1 p n P n i=1 D i W i 1i and that C(b) can be replaced byE[ 1 n P n i=1 D i W i 00 1 r 1 W i ( 0 1 )]. 95 Part C(a) We show that 1 p n n X i=1 D i ^ W i 1i D i W i 1i p ! 0 Note htat 1 p n n X i=1 D i ( ^ W i W i ) 1i = 1 p n n X i=1 D i r 1 W i ( 1 )(^ 1 0 1 1i (A.12) = 1 n n X i=1 D i r 1 W i ( 1 ) p n(^ 1 0 1 ) 1i (A.13) = 1 n n X i=1 D i r 1 W i ( 1 ) 1 p n n X i=1 i 1i (A.14) = 1 n n X i=1 D i r 1 W i ( 1 ) 1i 1 p n n X i=1 i (A.15) It can be shown easily that 1 n P n i=1 D i r 1 W i ( 0 1 ) 1i E[D i r 1 W i ( 1 ) 1i jS] p ! 0 where E[D i r 1 W i ( 0 1 ) 1i jS] = 0 from the moment condition. Therefore equation A.15 becomeso p (1)O p (1) and the result follows. Part C(b) We show that 1 n n X i=1 D i ^ W i 00 1 r 1 W i ( 1 )E[ 1 n n X i=1 D i W i 00 1 r 1 W i ( 0 1 )jS] =o p (1): Decompose the LHS as 1 n n X i=1 D i ^ W i 00 1 r 1 W i ( 1 ) 1 n n X i=1 D i W i 00 1 r 1 W i ( 0 1 ) | {z } A + 1 n n X i=1 D i W i 00 1 r 1 W i ( 0 1 )E[ 1 n n X i=1 D i W i 00 1 r 1 W i ( 0 1 )jS] | {z } B : 96 A =o p (1) since ^ 0 p ! 0. Also, sincefD i W i 00 1 r 1 W i ( 0 1 )g are conditionally independent given S and uniformly bounded, we can apply Markov LLN to show that B =o p (1). Combining all the results, termC can be written as C = 1 p n n X i=1 D i W i 1i E h 1 n n X i=1 D i W i 00 1 r 1 W i ( 0 1 ) S i 1 p n n X i=1 i +o p (1) = 1 p n n X i=1 n D i W i 1i E h 1 n n X i=1 D i W i 00 1 r 1 W i ( 0 1 ) S i i o | {z } i : Since i jS has a mean zero and is independently distributed, we can apply CLT for the independent observation and get 1=2 n 1 p n P n i=1 i d !N(0;I dim( 1 ) ) where n = 1 n P n i=1 E[ i 0 i jS] which can be simplified as 1 n n X i=1 E[D i W i W 0 i 2 1i ] +E h 1 n n X i=1 D i W i 00 1 r 1 W i ( 0 1 ) S i 1 n n X i=1 E[ i 0 i jS]E h 1 n n X i=1 D i W i 00 1 r 1 W i ( 0 1 ) S i 0 as the cross-terms get crossed out due toE[ 1i 0 i jS] = 0, i.e., the first- and second-stage moments are uncorrelated. Finally, from A.11, and by defining n =E[ 1 n P n i=1 W i W 0 i jS], we have 1=2 n p n(^ 1 0 1 ) d !N(0;I dim( 1 ) ) for n = 1 n n 1 n as desired. A.3 Auxiliary Lemmas Lemma3 (uniformboundednessof i (S;)). There exists aconstantC2 (0; 1) such that i (S;) C for any i;S; and n. (Proof) As in A.1, let us define agent’s best-response function as (X i ;Z i ; i ;) = (X 0 i 1 + 2 Z i + 3 i ). Recall that i (S;) = (X 0 i 1 + 2 Z i + 3 i (S;)).The result follows since X i is bounded, Z i is binary, and i (S;) 1, 97 Lemma 4 (uniform boundedness ofr i ). Suppose < 1. There exists a finite constant C 1 such that sup i;n;S;;k @ i (S;) @ k <C 1 <1: (Proof) Recall that i (S;) = (X i ;Z i ; i (S;);): Differentiating above equation with respect to k gives @ i (S;) @ k = @(X i ;Z i ; i ;) @ k + @(X i ;Z i ; i ;) @ i @ i (S;) @ k Equivalently, @ i (S;) @ k = @(X i ;Z i ; i ;) @ k + 1 jN i j X j2N i @(X i ;Z i ; i ;) @ i @ j (S;) @ k (A.16) whichgivestheimplicitfunctionof [@ i (S;)=@ k ] i2Nn . LetuswriteA.16inmatrixformbydefining the following: • Let n be n 1 vector with ith component @ i (S;)=@ k . • Let D n be nn matrix with ijth element 1 jN i j @(X i ;Z i ; i ;) @ i if G ij = 1 and zero if G ij = 0. • Let n be n 1 vector with ith component @(X i ;Z i ; i ;) @ k . Then we can write the system A.16 as n =D n n + n or equivalently, (I n D n ) n = n 98 which is invertible ifjjD n jj 1 < 1 where the induced matrix normjjD n jj 1 is the maximum of the absolute values of row sums, i.e., jjD n jj 1 = max i2Nn @(X i ;Z i ; i ;) @ i : A.3 implies thatjjD n jj 1 , thusjjD n jj 1 < 1. Therefore D n is invertible and (I n D n ) 1 = P 1 t=0 D t n . It follows that n = ( P 1 t=0 D t n ) n . Taking sup norm gives jj n jj 1 1 X t=0 jjD t n jj 1 jj n jj 1 = jj n jj 1 1 < C 1 since RHS does not depend on (i;n;z n ;;k), we have the desired result. Lemma 5 (uniform boundedness ofr 2 i ). Suppose < 1. There exists a finite constant C 2 such that j @ 2 i (S;) @ m @ k j<C 2 <1 for any i;n;S;;k;m a.s. (Proof) Fix m. Differentiating the equation A.16 w.r.t. m gives @ 2 i @ m @ k = @ 2 @ m @ k + @ 2 @ i @ k @ i @ m + @ @ i @ 2 i @ m @ k + @ i @ k n @ 2 @ 2 i @ i @ m + @ 2 @ m @ i o : Let us write it compactly as follows: @ 2 mk i = mk + k @ m i + @ 2 mk i + @ k i @ m i + m @ k i : (A.17) Write A.17 in a matrix form by defining • Let ~ n be n 1 vector with ith component @ 2 mk i . • Let ~ n be n 1 vector with ith component mk + k @ m i + @ k i @ m i + m @ k i : 99 Then A.17 can be written as (I n D n )~ n = ~ n : As we have shown before, D n is invertible. For any i2N n ,j i j B ; + 2B C @ +B ; C 2 @ , so thatjj n jj 1 = max i j i j is uniformly bounded. Therefore, jj~ x n jj 1 C 1 and the result follows. 100 Appendix B Appendix to Chapter 2 B.1 Wald Estimator Without Exclusion Restriction In this section, we show that the Wald estimator does not identify the ITT (co) or ATT when the exclusion restriction is not satisfied. First note that the observed outcome is Y ig = Y ig (Z g ;D ig (Z g )) = Z g D ig Y ig (1; 1) +Z g (1D ig )Y ig (1; 0) + (1Z g )Y ig (0; 0) = Z g (Y ig (1; 0)Y ig (0; 0)) +Z g D ig (Y ig (1; 1)Y ig (1; 0)) +Y ig (0; 0) so that E[Y ig jZ g = 1] = E[Y ig (1; 0)Y ig (0; 0)] +E[D ig (Y ig (1; 1)Y ig (1; 0))] +E[Y ig (0; 0)] = E[Y ig (1; 0)] +E[Y ig (1; 1)Y ig (1; 0)jD ig (1) = 1]Pr(D ig (1) = 1) E[Y ig jZ g = 0] = E[Y ig (0; 0)] 101 and, E[Y ig jZ g = 1]E[Y ig jZ g = 0] =E[Y ig (1; 0)Y ig (0; 0)]+E[Y ig (1; 1)Y ig (1; 0)jD ig (1) = 1]Pr(D ig (1) = 1) while E[D ig jz g = 1]E[D ig jz g = 0] =Pr(D ig (1) = 1) so that E[Y ig jZ g = 1]E[Y ig jZ g = 0] E[D ig jZ g = 1]E[D ig jZ g = 0] = E[Y ig (1; 0)Y ig (0; 0)] Pr(D ig (1) = 1) +E[Y ig (1; 1)Y ig (1; 0)jD ig (1) = 1]: Therefore, the Wald ratio on the LHS does not identify E[Y ig (1; 1)Y ig (1; 0)jD ig (1) = 1], average effect for the treated. 102 B.2 Comparison to Usual DID Estimator Traditional Difference-in-Difference estimand is defined as below: ATET = E[Y t+1 ig Y t ig jD ig = 1]E[Y t+1 ig Y t ig jD ig = 0] = E[Y t+1 ig Y t ig jD ig = 1;Z g = 1] n E[Y t+1 ig Y t ig jD ig = 0;Z g = 0]Pr(Z g = 0jD ig = 0) | {z } p +E[Y t+1 ig Y t ig jD ig = 0;Z g = 1]Pr(Z g = 1jD ig = 0) | {z } 1p o = E[Y t+1 ig (1; 1)Y t ig (0; 0)jD ig = 1;z g = 1] n E[Y t+1 ig (0; 0)Y t ig (0; 0)jD ig = 0;z g = 0]p +E[Y t+1 ig (1; 0)Y t ig (0; 0)jD ig = 0;Z g = 1](1p) = E[Y t+1 ig (1; 1)Y t ig (0; 0)jD ig = 1] n E[Y t+1 ig (0; 0)Y t ig (0; 0)jD ig = 0]p +E[Y t+1 ig (1; 0)Y t ig (0; 0)jD ig = 0](1p) = E[Y t+1 ig (1; 1)Y t ig (0; 0)jD ig = 1] +pE[Y t+1 ig (1; 0)Y t+1 ig (0; 0)jD ig = 0]E[Y t+1 ig (1; 0)Y t ig (0; 0)jD ig = 0] = E[Y t+1 ig (1; 1)Y t+1 ig (0; 0)jD ig = 1] +pE[Y t+1 ig (1; 0)Y t+1 ig (0; 0)jD ig = 0]E[Y t+1 ig (1; 0)Y t+1 ig (0; 0)jD ig = 0] = E[Y t+1 ig (1; 1)Y t+1 ig (0; 0)jD ig = 1] (1p)E[Y t+1 ig (1; 0)Y t+1 ig (0; 0)jD ig = 0] the first equality is derived from fD = 1g = fD = 1;Z = 1g, the second equality uses the independence of the randomization assignment, the fifth equality from the equal-trend assumption: E[Y t+1 (0; 0)Y t (0; 0)jD = 1] =E[Y t+1 (0; 0)Y t (0; 0)jD = 0]. Therefore, DID estimator identifies ITT (co) (1p)ITT (nt), and not their effects separately. Note that the last term disappears only 103 when either p = 1 or exclusion restriction holds. Here, Pr(Z g = 0jD ig = 0) = Pr(D ig = 0jZ g = 0)Pr(Z g = 0) Pr(D ig = 0) = Pr(D ig = 0jZ g = 0)Pr(Z g = 0) Pr(D ig = 0jZ g = 0)Pr(Z g = 0) +Pr(D ig = 0jZ g = 1)Pr(Z g = 1) = Pr(D ig = 0jZ g = 0)Pr(Z g = 0) Pr(D ig = 0jZ g = 0)Pr(Z g = 0) +Pr(D ig = 0jZ g = 1)Pr(Z g = 1) = Pr(Z g = 0) Pr(Z g = 0) + nt Pr(Z g = 1) so that p = 1 happens only when nt Pr(Z g = 1) = 1 when nt Pr(Z g = 1) = 1, p=0 because Pr(z g = 0) = 0, p=1 only when Pr(z g = 0) = 1. In conclusion, if the researcher ignores the spillover effect when there is such effect, the above equation shows that DID estimator does not identify ATET. 1 . 1 Regarding this point, see (Abbring and Heckman, 2007), p.5277: "The analysis of (Heckman et al., 1998a), (Heckman et al., 1998b), (Heckman et al., 1998c) has important implications for the widely-used difference-in-differences estimator. If the tuition subsidy changes the aggregate skill prices, the decisions of nonparticipants will be affected. The "no treatment" benchmark group is affected by the policy and the difference-in-differences estimator does not identify the effect of the policy for anyone compared to a no treatment state." 104 B.3 Additional Tables Table C1: Summary Statistics (Replica of Table 1 in Crépon et al., 2015) Control Group Treatment-Control Obs Obs Mean SD coefficient p-value Panel A: Baseline household sample Number members 4465 2266 5.137 2.695 0.043 0.583 Number adults (> 16) 4465 2266 3.45 1.993 0.031 0.564 Number children (< 16) 4465 2266 1.677 1.645 0.007 0.859 Male head 4465 2266 0.935 0.246 0.001 0.813 Head age 4465 2266 47.75 15.955 1.077 0.012 Head with no education 4465 2266 0.615 0.487 -0.013 0.353 Access to credit: Loan from Al Amana 4465 2266 0.007 0.084 -0.003 0.425 Loan from other institution 4465 2266 0.06 0.238 0.03 0.023 Informal Loan 4465 2266 0.068 0.251 0.023 0.006 Electricity or water connection Loan 4465 2266 0.156 0.363 0.013 0.523 Amount borrowed from (in MAD): Al Amana 4465 2266 34.424 459.78 -12.72 0.534 Other institution 4465 2266 354.656 2339.502 91.782 0.188 Informal Loan 4465 2266 247.645 2248.117 -8.25 0.88 Electricity or water connection Loan 4465 2266 528.032 1369.743 22.056 0.758 Self-employment activities Number of activity 4465 2266 1.569 1.227 0.031 0.435 Farms 4465 2266 0.599 0.49 0.017 0.321 Investment 4465 2266 13.002 72.287 -0.436 0.775 Sales 4465 2266 9334.765 36980.76 -392.053 0.665 Expense 4465 2266 3368.652 8428.288 265.911 0.241 Savings 4465 2266 1270.938 3504.987 -76.682 0.433 Employment 4465 2266 22.006 95.363 -1.346 0.477 Self-employment 4465 2266 60.598 101.913 5.17 0.122 Non-farm Business 4465 2266 0.217 0.412 -0.034 0.011 Number activities managed by women 4465 2266 0.218 0.585 0.004 0.75 Share of HH activities by women 4465 2266 0.16 0.367 0.007 0.466 Distance to Souk 4125 2077 20.064 25.157 0.162 0.87 Has income from: Self-employment activity 4465 2266 0.78 0.414 -0.016 0.163 Day labor 4465 2266 0.58 0.494 -0.016 0.194 Risks: Any lost to argiculture & livestock 4125 2077 0.131 0.338 0.002 0.868 Lost more than 50 percent of the harvest 4125 2077 0.106 0.308 0.004 0.642 Lost more than 50 percent of the livestock 4125 2077 0.03 0.172 0.003 0.606 Lost any livestock 4465 2266 0.189 0.392 0.029 0.012 Illness death and/or house sinister 4465 2266 0.218 0.413 0.013 0.168 Consumption: Consumption 4465 2266 2271.611 1349.195 28.302 0.44 Non-durable consumption 4465 2266 2226.511 1295.439 20.161 0.559 Durable consumption 4465 2266 45.099 235.935 8.141 0.231 HH is poor 4465 2266 0.247 0.431 0.002 0.858 Panel B: Attrition Attrition 4465 2266 0.068 0.252 0.018 0.018 Note: Unit of Obs: household. Panel A and B: Sample includes all households surveyed at baseline. 105 Table C2: Summary Statistics of the Non-attrited Sample Control Group Treatment-Control Obs Obs Mean SD coefficient p-value Number members 4105 2101 5.168 2.68 0.061 0.453 Number adults (> 16) 4105 2101 3.467 1.99 0.041 0.472 Household age 4105 2101 47.892 15.912 1.115 0.011 Animal Husbandry 4105 2101 0.537 0.499 0.041 0.031 Run a non-farm business 4105 2101 0.218 0.413 -0.036 0.011 Has an outstanding loan over the past 12 months 4105 2101 0.261 0.439 0.057 0.006 HH spouse responded to the survey 4105 2101 0.061 0.24 0.02 0.007 other HH member responded to the survey 4105 2101 0.04 0.195 0.006 0.243 Missing value in HH spouse responded to the survey 4105 2101 0.155 0.362 0.001 0.91 Missing value in other HH member responded to the survey 4105 2101 0.155 0.362 0.001 0.91 106 Table C3: Summary Statistics of the Non-attrited Sample with High Probability to Borrow Control Group Treatment-Control Obs Obs Mean SD coefficient p-value Number members 3525 1793 5.202 2.675 0.11 0.207 Number adults (> 16) 3525 1793 3.473 1.998 0.078 0.203 Household age 3525 1793 47.423 15.884 1.75 0 Animal Husbandry 3525 1793 0.52 0.5 0.053 0.007 Run a non-farm business 3525 1793 0.228 0.42 -0.039 0.012 Has an outstanding loan over the past 12 months 3525 1793 0.255 0.436 0.067 0.002 HH spouse responded to the survey 3525 1793 0.063 0.243 0.021 0.009 other HH member responded to the survey 3525 1793 0.041 0.198 0.006 0.205 Missing value in HH spouse responded to the survey 3525 1793 0.149 0.356 0.004 0.679 Missing value in other HH member responded to the survey 3525 1793 0.149 0.356 0.004 0.679 107 Table C4: ITT(co) & ITT(nt) estimates with baseline covariates but no fixed effect Panel A: Whole Non-attrited Sample Asset Sales+home Has a self- Income from day Weekly hours worked (stock) consumption Expense Profit employment activity labor/ salaried Self-employment Outside ITT, complier 6197.366 33232.931 21591.955 10768.910 0.017 -1813.735 7.914 -1.573 (2653.878) (10424.903) (9612.722) (5546.228) (0.044) (1651.587) (6.548) (4.069) ITT, never taker 560.941 3865.499 2574.157 1395.822 0.008 -544.950 1.274 -0.979 (1057.104) (3249.111) (2119.834) (2332.300) (0.032) (1017.368) (4.150) (2.077) Covariates X X X X X X X X Observations 4049 4049 4049 4049 4049 4049 4037 4037 control mean 1798.850 4721.573 4266.519 1152.695 0.030 3692.911 21.208 0.313 Panel B: Non-attrited Sample with High Probability to Borrow ITT, complier 6217.462 35708.779 22073.686 13009.093 0.017 -2208.563 6.425 -2.299 (2790.298) (11481.811) (10265.416) (6524.136) (0.044) (1690.263) (6.626) (4.383) ITT, never taker 939.905 5775.956 4363.076 1835.398 0.008 -736.753 0.039 -1.270 (1099.590) (3564.183) (2405.483) (2419.284) (0.034) (1050.461) (4.103) (2.095) Covariates X X X X X X X X Observations 3479 3479 3479 3479 3479 3479 3467 3467 control mean 1811.076 4421.010 4150.940 772.065 0.033 4057.138 22.350 0.381 p< 0:10,p< 0:05,p< 0:01. Covariates are changes of number of adults in the households and number of children in the households. 108 Table C5: ITT(co) & ITT(nt) estimates with pair fixed effect but no baseline covariates Panel A: Whole Non-attrited Sample Asset Sales+home Has a self- Income from day Weekly hours worked (stock) consumption Expense Profit employment activity labor/ salaried Self-employment Outside ITT, complier 4853.085 28454.159 18445.625 9659.636 -0.009 -1161.752 6.027 -2.026 (2053.512) (8313.713) (7207.594) (4863.209) (0.031) (1506.700) (3.012) (3.211) ITT, never taker 643.233 4805.897 3273.928 1629.004 0.013 -539.652 2.458 -0.724 (691.388) (2005.170) (1571.865) (1493.556) (0.014) (622.442) (1.436) (1.226) Pair fixed effect X X X X X X X X Observations 4105 4105 4105 4105 4105 4105 4093 4093 control mean 1879.392 4958.848 4451.599 1190.183 0.034 3756.474 21.127 0.607 Panel B: Non-attrited Sample with High Probability to Borrow ITT, complier 4881.309 30061.156 18734.238 11133.811 -0.012 -1573.765 4.886 -2.683 (2240.657) (8999.712) (7437.518) (5782.038) (0.033) (1586.692) (3.121) (3.447) ITT, never taker 989.909 6810.969 5268.462 1958.207 0.013 -667.858 1.731 -0.826 (729.018) (2333.616) (1824.041) (1624.414) (0.015) (697.904) (1.434) (1.271) Pair fixed effect X X X X X X X X Observations 3525 3525 3525 3525 3525 3525 3513 3513 control mean 1924.312 4704.979 4247.166 951.485 0.038 3982.046 22.325 0.526 p< 0:10,p< 0:05,p< 0:01. 109 Table 1: ITT(co) & ITT(nt) estimates with Baseline Covariates & Pair Fixed Effect Panel A: Whole Non-attrited Sample Asset Sales+home Has a self- Income from day Weekly hours worked (stock) consumption Expense Profit employment activity labor/ salaried Self-employment Outside ITT, complier 4846.328 25715.447 16886.286 8138.491 -0.007 -1386.387 4.993 -2.101 (2047.501) (8155.426) (6948.572) (4918.364) (0.027) (1505.514) (2.992) (3.168) ITT, never taker 751.717 4819.461 3294.213 1575.185 0.015 -518.162 2.468 -0.673 (703.962) (2028.907) (1571.594) (1504.569) (0.012) (634.364) (1.395) (1.245) Baseline Covariates X X X X X X X X Pair fixed effect X X X X X X X X Observations 4049 4049 4049 4049 4049 4049 4037 4037 control mean 1798.850 4721.573 4266.519 1152.695 0.030 3692.911 21.208 0.313 Panel B: Non-attrited Sample with High Probability to Borrow ITT, complier 4807.066 26789.667 16614.551 9728.067 -0.009 -1853.078 3.567 -2.758 (2227.440) (8846.859) (7125.683) (5841.032) (0.028) (1592.119) (3.040) (3.376) ITT, never taker 1189.898 7038.431 5324.645 2072.388 0.018 -699.157 1.709 -0.772 (753.183) (2387.208) (1847.035) (1620.218) (0.013) (710.586) (1.384) (1.260) Baseline Covariates X X X X X X X X Pair fixed effect X X X X X X X X Observations 3479 3479 3479 3479 3479 3479 3467 3467 control mean 1811.076 4421.010 4150.940 772.065 0.033 4057.138 22.350 0.381 p< 0:10,p< 0:05,p< 0:01. Covariates are changes of number of adults in the households and number of children in the households. 110 Appendix C Appendix to Chapter 3 C.1 Proof of Lemma 2 IV estimator is derived under the following conditions: E[DZu] = 0; & E[Du] = 0: (C.1) Sinceu =YD( 0 + 1 M) (1D)( 0 + 1 M), above equations can equivalently be written as: E[DZY ] =E[DZ( 0 + 1 M)]; & E[DY ] =E[D( 0 + 1 M)]: (C.2) Equivalently, E[YjD = 1;Z = 1] = 0 + 1 E[MjD = 1;Z = 1]; (C.3) E[YjD = 1] = 0 + 1 E[MjD = 1] (C.4) which gives two equations with two unknowns with 1 = E[YjD = 1;Z = 1]E[YjD = 1] E[MjD = 1;Z = 1]E[MjD = 1] ; 0 =E[YjD = 1] 1 E[MjD = 1] 111 Using the fact that E[WjD = 1] =E[WjD = 1;Z = 1]Pr(Z = 1) +E[WjD = 1;Z = 0]Pr(Z = 0) for any random variable W, 1 can be rewritten as follows: 1 = E[YjD = 1;Z = 1]E[YjD = 1;Z = 0] E[MjD = 1;Z = 1]E[MjD = 1;Z = 0] ; 0 =E[YjD = 1] 1 E[MjD = 1] which can be expressed in terms of counterfactuals as follows: 1 = E[Y 1;M 11 Y 1;M 10 ] E[M 11 M 10 ] ; 0 = E[Y 1;M 1 ] 1 E[M 1 ] Similarly, the expression for ( 0 ; 1 ) can be derived using the moment conditions: E[(1D)Zu] = E[(1D)u] = 0: 1 = E[Y 0;M 01 Y 0;M 00 ] E[M 01 M 00 ] ; 0 = E[Y 0;M 0 ] 1 E[M 0 ]: 112
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Essays on causal inference
PDF
Essays on high-dimensional econometric models
PDF
Essays on development economics
PDF
Essays on economics of education
PDF
Three essays on econometrics
PDF
Essays on the microeconomic effects of taxation policies
PDF
Essays on nonparametric and finite-sample econometrics
PDF
The EITC, labor supply, and child development
PDF
Essays on treatment effect and policy learning
PDF
Essays on beliefs, networks and spatial modeling
PDF
Essays on health economics
PDF
Essays in information economics and marketing
PDF
A structural econometric analysis of network and social interaction models
PDF
Essays on development and health economics: social media and education policy
PDF
Hierarchical approaches for joint analysis of marginal summary statistics
PDF
Statistical methods for causal inference and densely dependent random sums
PDF
Three essays in education and urban economics
PDF
Causality and consistency in electrophysiological signals
PDF
Essays on econometrics analysis of panel data models
PDF
Two essays on financial econometrics
Asset Metadata
Creator
Kim, Bora
(author)
Core Title
Essays on econometrics
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Economics
Degree Conferral Date
2021-08
Publication Date
06/29/2021
Defense Date
05/10/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
causal inference,econometrics,experiments,mediation analysis,network,OAI-PMH Harvest
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Moon, Hyungsik Roger (
committee chair
), Leung, Michael (
committee member
), Nix, Emily (
committee member
), Ridder, Geert (
committee member
)
Creator Email
kimbora@usc.edu
Permanent Link (DOI)
http://doi.org/10.25549/usctheses-c89-473494
Unique identifier
UC13012654
Identifier
etd-KimBora-9684.pdf (filename), usctheses-c89-473494 (legacy record id)
Legacy Identifier
etd-KimBora-9684
Dmrecord
473494
Document Type
Dissertation
Rights
Kim, Bora
Internet Media Type
application/pdf
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
causal inference
econometrics
experiments
mediation analysis