Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Essays on causal inference
(USC Thesis Other)
Essays on causal inference
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Essays on Causal Inference by Youngmin Ju A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ECONOMICS) August 2021 Copyright 2021 Youngmin Ju Acknowledgments I want to thank and acknowledge my main advisors Geert Ridder for his guidance, unwa- vering support, and belief in me. I am glad that I got a chance to learn from him and I will forever be in his debt. I am also very indebted to Myoung-jae Lee who took a special inter- est in nurturing me academically by investing many hours of his time to assist me with my paper and develop my other research ideas while constantly keeping an eye on my progress and subtly nudging me to do better. I would also like to thank my dissertation committee members, Cheng Hsiao, Hyungsik Roger Moon, and Emily Nix for their encouragement and supervision. Their comments and suggestions about my work always helped me to improve and become a better researcher. I additionally thank Jeffrey Weaver and Sandra Rozo for their part in my Qualifying Exam. Finally, I give massive appreciation to Kihong Park who selflessly invested endless hours discussing my ideas and helping me develop my dissertation paper of my PhD program. Completing a PhD program is so much more than academics. I am very grateful for the support I received from staff at the Department of Economics, especially Young Miller, MorganPonderandAlexKarnazes, staffattheOfficeofInternationalServices, theGraduate School, medical professionals at the Engemann Student Health Center and faculties at the American Language Institute, especially Barry Griner. I have been lucky to share the past six years with wonderful friends. This program has blessed me with the life-changing friendships of Andreas Aristidou, Hayun Song, Bada Han, Jeehyun Ko, Eunjee Kwon, Bora Kim, Andrew Yimeng Xie, Mike Yinqi Zhang, Sheryl Weiran Deng, Qin Jiang, Ray Yiwei Qian, Lidan Tan, Yinan Liu, Kanika Aggarwal, Jae- hong Kim, Dongwook Kim, Jeongwhan Yun, Chris Jeong Yoo, Jason Choi, Seungwoo Chin, Eunhae Shin, Hay Yeun Park, Minsoo Cho, Ida Johnson, Jisu Cao, Mahrad Sharifvaghefi, ii Brian Finley, Rachel Lee, Grigori Frangouridi, Rashad Ahmed, Jake Schneider, Chris Zhen Chen, Usman Ghaus, and several others. I will forever be bonded with my incoming cohort of the PhD program. The connecting experience that is created by the shared struggles of the first year in the program and the core exams is unparalleled. While this was the first time living away from my country, the Korean community blessed me with friends who look, speak, think and most importantly, eat like me. Additionally, academia sometimes has a way of encapsulating you in a bubble. Thus, friends outside the PhD program played a crucial role in helping me balance life. I am grateful to Haneol Lim, Hyungwoo Choi, Jinwoo Lim, Sohyeon Lim, Yeowon Yoon, Mina Park, Sangkyu Jo, Hankyung Jeon, Heejin Cho, Yongwan Lim, among many others. Their presence made life in Los Angeles feel so much closer to home. I am grateful to several of my best friends from my high school days back in Korea who, on several occasions, lifted me up by reminding me of who I am and where I come from and helped instill some perspective to my life when I needed it the most. Kiyoul Yang, Jaewon Nam, Junhyuk Kwon, Hyunhyun Lee, Jaejoon Choi, Byungok Jo, Youngsae Ham, Jungsu Park and many others, thank you for not giving up on me, even after all these years being so far apart. I would like to express my gratitude to Bokhyun Um, Yoonsung Choi, and Soonseop Hwang, who served together as the officer colleagues in the Army Academy before coming to study abroad. Sometimes as seniors who lived their lives first and sometimes as the best friends who can play like children with me, they gave me generous advice on life and helped me overcome my difficult study-abroad life. I am grateful to my loving family. The unwavering support of my parents, Miwon Park and Taemoon Ju, to pursue my dreams even if they don’t fully understand them nor want me to be so far away from them has been a great source of motivation as well as inspiration. They have been praying for me every day and night since the beginning of the PhD program. I’m grateful for the most amazing older sister, Aram Ju, who always supports me with her heart. She took good care of my parents so that I don’t have to worry about them too much and I have been able to concentrate on the PhD program. iii Table of Contents Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1 FuzzyMRD(Multiple-assignmentvariableRegressionDiscontinuity)Iden- tification 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Canonical (single-score) Regression Discontinuity . . . . . . . . . . . . . . . 3 1.3 Multiple-score RD (MRD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3.1 Two types of MRD . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1. Multiple scores for a single treatment (left panel) . . . . . . 6 2. Multiple scores for mutually exclusive treatment (right panel) 6 1.3.2 Fuzzy MRD Identification . . . . . . . . . . . . . . . . . . . . . . . . 9 Potential outcomes . . . . . . . . . . . . . . . . . . . . . . . . 9 Assumptions (Fuzzy MRD Continuity) . . . . . . . . . . . . . 9 Theorem (Fuzzy MRD) . . . . . . . . . . . . . . . . . . . . . . 11 Lemma 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Lemma 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Lemma 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2 Affirmative Action in Korea - Regression Discontinuity with Multiple As- signment Variables 17 iv 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Implementation Procedure of AA in Korea . . . . . . . . . . . . . . . . . . . 20 2.3 Multiple-score RD (MRD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3.1 Potential outcomes and Identification with Partial effects . . . . . . . 22 2.3.2 Estimation of MRD . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Frontier approach (weighted average of boundary effects) . . . 25 Centering approach (minimum score) . . . . . . . . . . . . . . 26 Univariate approach (one-dimensional localization) . . . . . . . 27 2.3.3 Bandwidth Selection of MRD . . . . . . . . . . . . . . . . . . . . . . 27 Step 1: Estimate the conditional outcome variance v(x), the density f(x), and the standard deviations 1 and 2 29 Step 2: Estimate the four second derivatives m 11 0 (x), m 11 1 (x), m 22 0 (x), and m 22 1 (x) . . . . . . . . . . . . . . . . . . 29 Step 3: Calculate ^ h opt (x k ) for K evenly spaced points . . . . . 30 Step 4: Select the rule-of-thumb bandwidth h ROT . . . . . . . 30 2.4 MRD analysis of Affirmative Action . . . . . . . . . . . . . . . . . . . . . . . 30 2.4.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.4.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.4.3 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.4.3.1 Canonical RD Empirical Results for full sample . . . . . . . 35 Women Employment Rate, S rate . . . . . . . . . . . . . . . . . 35 Company size, S size . . . . . . . . . . . . . . . . . . . . . . . . 38 2.4.3.2 Canonical RD Empirical Results for subgroups . . . . . . . 40 Subgroup 1: S rate . . . . . . . . . . . . . . . . . . . . . . . . . 42 Subgroup 2: S size . . . . . . . . . . . . . . . . . . . . . . . . . 45 Subgroup 3: S size . . . . . . . . . . . . . . . . . . . . . . . . . 47 Subgroup 4: S rate . . . . . . . . . . . . . . . . . . . . . . . . . 49 2.4.3.3 MRD Empirical Results . . . . . . . . . . . . . . . . . . . . 51 Optimal Bandwidths for MRD . . . . . . . . . . . . . . . . . . 51 MRD Estimations . . . . . . . . . . . . . . . . . . . . . . . . . 53 v 2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3 Control Function Approach for Partly Ordered Endogenous Treatments: Military Rank Premium in Wage 57 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.2 Treatment Effect Model and Estimators . . . . . . . . . . . . . . . . . . . . 61 3.2.1 Causal Model, Endogeneity and Instrument . . . . . . . . . . . . . . 61 3.2.2 Primary Nearly-Parametric CF Approach . . . . . . . . . . . . . . . 64 3.2.3 Secondary Double-Index CF Approach . . . . . . . . . . . . . . . . . 66 3.3 Empirical Analysis: Military Rank Premium . . . . . . . . . . . . . . . . . . 68 3.3.1 Literature Related to Military Rank Effect on Wage . . . . . . . . . . 68 3.3.2 Descriptive Statistics and LSE . . . . . . . . . . . . . . . . . . . . . . 70 3.3.3 Control Function Approach Results . . . . . . . . . . . . . . . . . . . 74 3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Bibliography 78 A Appendix to Chapter 1 83 A.1 Additional Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 B Appendix to Chapter 2 85 B.1 Additional Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 B.1.1 OECD data for employment rate and gender wage gap . . . . . . . . 85 B.1.2 Detailed Affirmative Action Implementation Procedure . . . . . . . . 87 B.1.3 Changes of the treatment groups . . . . . . . . . . . . . . . . . . . . 89 B.1.4 MRD graphical analysis (3D) . . . . . . . . . . . . . . . . . . . . . . 90 B.2 Additional Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 C Appendix to Chapter 3 100 C.1 Derivation of First-Stage Likelihood Components with R=4 . . . . . . . . . 100 C.2 Derivation of Control Functions . . . . . . . . . . . . . . . . . . . . . . . . . 101 vi C.3 Closed-Form Control Functions Under 0r = 0 . . . . . . . . . . . . . . . . . 101 C.4 LSE Asymptotic Distribution Taking into Account First-Stage Error . . . . . 102 C.5 Computational Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 C.6 Simulation Study for CF Approach . . . . . . . . . . . . . . . . . . . . . . . 104 C.7 Derivation of the CF’s under Double-Index Assumption . . . . . . . . . . . . 106 vii List of Tables 2.1 Descriptive Statistics of All sample . . . . . . . . . . . . . . . . . . . . . . . 31 2.2 RD analysis for S rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.3 RD analysis for S size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.4 Descriptive Statistics of subgroups . . . . . . . . . . . . . . . . . . . . . . . . 41 2.5 RD analysis in subgroup 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.6 RD analysis in subgroup 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.7 RD analysis in subgroup 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.8 RD analysis in subgroup 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 2.9 Women Employment Rate Estimates for d , 1 and 2 with 95% CI . . . . . 54 3.1 Mean (SD) of Variables (N = 3172) . . . . . . . . . . . . . . . . . . . . . . . 72 3.2 LSE with Military Rank Dummies: ^ lse (t-value) . . . . . . . . . . . . . . . . 73 3.3 MLE for (D 0 ;D r ) and Endogeneity Correction for Y: estimate (t-value) . . . 76 3.4 Correction Terms under Double Index Assumption: estimate (t-value) . . . . 77 B.1 Variables Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 B.2 Industry code (9th) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 B.3 Women Employment Rate (%) Cutoffs in 2015 . . . . . . . . . . . . . . . . . 97 B.4 Estimates for Treatment and Partial Effects with 90% CI . . . . . . . . . . . 98 B.5 Women Employment Rate Estimates for D , S and R with 95% CI . . . . 99 C.1 BIAS (SD) with N = 500 and = 0 . . . . . . . . . . . . . . . . . . . . . . . 105 C.2 BIAS (SD) with N = 500 and = 0:7 . . . . . . . . . . . . . . . . . . . . . . 105 C.3 BIAS (SD) with N = 1000 and = 0:7 . . . . . . . . . . . . . . . . . . . . . 106 viii List of Figures 1.1 An illustration of the difference between MRD for a single treatment (left panel) and MRD for mutually exclusive treatment (right panel) . . . . . . . 7 1.2 An illustration of the difference between “AND” (left panel) and “OR” case (right panel) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1 The AA Implementation Procedure . . . . . . . . . . . . . . . . . . . . . . . 20 2.2 Target vs Non-target Process . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3 Affirmative Action’s Treatment and Control regions . . . . . . . . . . . . . . 32 2.4 Two RD effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.5 Scatter plot and RD plot for S rate . . . . . . . . . . . . . . . . . . . . . . . . 36 2.6 Graphical Illustration of Local Linear RD Effects for Covariates - S rate . . . 36 2.7 Histogram and Estimated Density of the score - S rate . . . . . . . . . . . . . 37 2.8 Scatter plot and RD plot for S size . . . . . . . . . . . . . . . . . . . . . . . . 38 2.9 Graphical Illustration of Local Linear RD Effects for Covariates - S size . . . 39 2.10 Histogram and Estimated Density of the score - S size . . . . . . . . . . . . . 39 2.11 Subgroup 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.12 Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.13 Scatter plot and RD plot for Subgroup 1 . . . . . . . . . . . . . . . . . . . . 43 2.14 Graphical Illustration of Local Linear RD Effects for Covariates - Subgroup 1 43 2.15 Histogram and Estimated Density of the score - Subgroup 1 . . . . . . . . . 44 2.16 Scatter plot and RD plot for Subgroup 2 . . . . . . . . . . . . . . . . . . . . 45 2.17 Graphical Illustration of Local Linear RD Effects for Covariates - Subgroup 2 46 2.18 Histogram and Estimated Density of the score - Subgroup 2 . . . . . . . . . 46 ix 2.19 Scatter plot and RD plot for Subgroup 3 . . . . . . . . . . . . . . . . . . . . 47 2.20 Graphical Illustration of Local Linear RD Effects for Covariates - Subgroup 3 48 2.21 Histogram and Estimated Density of the score - Subgroup 3 . . . . . . . . . 48 2.22 Scatter plot and RD plot for Subgroup 4 . . . . . . . . . . . . . . . . . . . . 49 2.23 Graphical Illustration of Local Linear RD Effects for Covariates - Subgroup 4 50 2.24 Histogram and Estimated Density of the score - Subgroup 4 . . . . . . . . . 50 2.25 Square and Oval Neighbors (1 & 2 Bandwidths) . . . . . . . . . . . . . . . . 52 2.26 Multivariate Kernel Density . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 A.1 Sharp vs. Fuzzy Regression Discontinuity Design . . . . . . . . . . . . . . . 83 A.2 Non-cumulative versus cumulative cutoffs in multiple cutoffs RD designs . . 84 A.3 An illustration of MRD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 B.1 Women Employment Rate, % of working age population aged 15 to 64, 2019 in the OECD countries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 B.2 Men Employment Rate, % of working age population aged 15 to 64, 2019 in the OECD countries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 B.3 Gender Wage Gap, % of male median wage, 2019 in the OECD countries . . 86 B.4 Detailed 1st stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 B.5 Detailed 2nd stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 B.6 Detailed 3rd stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 B.7 Changes of targeted companies(blue area) in the first stage over time . . . . 89 B.8 Changes of the treatment groups(red area) over time . . . . . . . . . . . . . 89 B.9 Local Linear Plane (LLP) near cutoffs . . . . . . . . . . . . . . . . . . . . . 90 B.10 LLP by the scores near cutoffs: Square 1 bandwidth . . . . . . . . . . . . . . 91 B.11 LLP by the scores near cutoffs: Square 2 bandwidths . . . . . . . . . . . . . 92 B.12 LLP by the scores near cutoffs: Oval 1 bandwidth . . . . . . . . . . . . . . . 93 B.13 LLP by the scores near cutoffs: Oval 2 bandwidths . . . . . . . . . . . . . . 94 x Abstract This thesis brings together three research papers that investigate, both theoretically and empirically, real-worldsituations. Papersdevelopedextendversionofbasicmethodsofcausal inference that focuses on uncovering causal relationships. The first paper discusses a Multiple-assignment variable Regression Discontinuity design (MRD). Unlike canonical regression discontinuity design (RD), this method uses more than one assignment variable. The main contribution of the paper is that the paper establishes identification for the Fuzzy MRD case. The second paper applies the MRD to analyze the effect of Affirmative Action in Korea on the women employment rate. Affirmative Action in Korea has two assignment variables forthepolicytargets, numberofemployeesandaveragewomenemploymentratebyindustry. Since MRD has few empirical studies, the new empirical application is meaningful. The third paper develops an approach to find effects of partly ordered treatments, while correcting for possible treatment endogeneity with nearly parametric control functions. My coauthor and I use this control function approach, along with its supplementary version, to estimate effects of military ranks (ordered treatments) on wage relative to non-veteran status (control treatment) with the Wisconsin Longitudinal Study data. In our empirical analysis, the military rank effects differ much: officer has large positive effects, but enlisted ranks have small or no effects. xi Chapter 1 Fuzzy MRD (Multiple-assignment variable Regression Discontinuity) Identification 1.1 Introduction Without randomization of treatment, research designs that provide the careful examina- tion of non-experimental treatments are gaining significance. RD design has been established as one of the most trustworthy non-experimental methods for causal impact analysis in ob- servational contexts. Regression discontinuity (RD) designs, first introduced in the 1960s (Thistlethwaite and Campbell (1960)), identify causal effects by simple and transparent identification rules for treatment effect analysis from observation data (Imbens and Lemieux (2008); Lee and Lemieux (2010); Hahn et al. (2001)). Subsequent works have developed the theory of estimation (Porter (2003); Lee and Card (2008); Frandsen et al. (2012)), the way to choose optimal bandwidth (Ludwig and Miller (2007); Imbens and Kalyanaraman (2012); Calonico et al. (2018)), and validation tests of the underlying assumptions (McCrary (2008)). 1 The RD design consists of three key components: an assignment/running variable (or score) X, a treatment D, and a cutoff c. Each unit in the RD design has a score, and treatment is allocated to those whose score exceeds a predetermined cutoff (or threshold). Without one of these three fundamental components, the RD design cannot be employed. Canonical RD design has the following features: 1. a score is continuously distributed and only one 2. a cutoff is only one 3. compliance with treatment assignment is perfect We call this setup the Canonical Sharp RD design. In practice, however, it is very common that RD designs does not satisfy one of the fundamental components. The most common case is that compliance with treatment is imperfect. The RD design with imperfect compli- ance is referred to as the Fuzzy RD design. Figure A.1 illustrates Fuzzy RD. For example, Van der Klaauw (2002) examines the effect of scholarships on students’ college enrollment in situations where scholarships are awarded discontinuously based on a linear combination of the SAT and high school GPA. Hahn et al. (2001) developed identification of a local average treatment effect (LATE) to analyze Fuzzy RD designs. Recent theoretical contributions in the Fuzzy RD design include Porter (2003), McCrary (2008), Imbens and Kalyanaraman (2012), Calonico et al. (2014), Calonico et al. (2015), Dong and Lewbel (2015), Dong (2018), Bertanha (2020), Gelman and Imbens (2019), Imbens and Wager (2019). The next case is that a cutoff is not only one. The value of the cutoff may vary by unit. Cattaneo et al. (2016) deals with this problem. The RD design with multiple cutoffs is referred to as the Multi-Cutoff Regression Discontinuity Design. Figure A.2 illustrates Multi-Cutoff RD. For example, in political science, the assignment variable is a vote share, the unit is an electoral constituency, and the treatment is winning an election. Cutoffs can be changed by the number of candidates. The victory cutoff is 50 % of the vote when there is only two candidate, however, if there are more than two candidates, the candidate who 2 gets less than 50 % of the vote may win the election by 1 percentage point with 34 % against two opponents with both 33 %. The final case is that a score is not only one. Treatment is determined by more than one assignment variable. The RD design with more than one assignment variable is referred to as the Multiple-assignment variable RD (MRD). Figure A.3 illustrates MRD. For example, students need to get a minimum score for each subject to advance the next grade level. This case is the main idea for the second paper, Affirmative Action in Korea - Regression Discontinuity with Multiple Assignment Variables. In this paper, the final case, which is RD with more than one score, is dealt with in- tensively. For example, students need to get a minimum score for each subject to advance the next grade level. Affirmative Action in Korea 1 is also the case. The RD with multi- ple assignment variables (scores) is referred to as ‘Multiple-score Regression Discontinuity (MRD)’. The assignment variables for observationi are denoted byX 1i andX 2i .X i denotes the vector of assignment variables. D i 1i 2i =I[X 1i c 1 and X 2i c 2 ]; where 1i I[X 1i c 1 ] and 2i I[X 2i c 2 ]; c (c 1 ;c 2 ) 0 are known cutoffs (1.1) The rest of this paper is organized as follows. Section 1.2 reviews canonical (single-score) regression discontinuity design. Section 1.3 reviews multiple-score RD (MRD) design and developed Fuzzy MRD identification. Finally, Section 1.4 concludes. 1.2 Canonical (single-score) Regression Discontinuity This section briefly reviews the canonical RD design (especially sharp RD), since most concepts in MRD are based on canonical RD. Let D i be the treatment indicator for unit i. In sharp RD design, D i is completely determined by a assignment variable X i (X i is also 1 second chapter of thesis 3 called a ‘forcing/assignment’ variable or ‘score’). D i = 1[X i c] (1.2) Let Y i (0) and Y i (1) denote the potential outcomes where Y i = Y i (0) for observations with X i <c, and Y i =Y i (1) for observations with X i >c. m(X i ) is a function of X i and U i is an error. The model is Y i = RD D i +m(X i ) +U i where E(UjX) = 0 (1.3) The key identification assumption for the canonical RD design is that the conditional ex- pectation functions of potential outcomes are continuous at the cutoff for the assignment variable X i : Assumption 1 (Continuity of Conditional Expectation Functions) E[Y (0)jX =x] and E[Y (1)jX =x] (1.4) are continuous in x. The assumption 1 is stronger than required as we only use continuity at cutoff (x = c). In practice, it is hard to assume continuity at only cutoff value. So, we make the stronger assumption. Under the assumption 1, the average treatment effect at c, RD , is RD = lim x!c + E[YjX =x] lim x!c E[YjX =x] =E[Y (1)Y (0)jX =c] = RD +m(X =c + )m(X =c ) = RD [by Assumption 1 (m(X =c + ) =m(X =c ))] (1.5) 4 Put another way, comparisons of average outcomes in a small enough neighborhood of cutoff (c) estimate the treatment effect without requiring the correct specification of a model for E[Y (0)jX]. The nonparametric method to RD requires accurate estimates of the mean ofY in a small area near the cutoff, c. It is not easy to obtain such estimates. The first problem is that working in a small area near the cutoff implies that you lose much data far from the cutoff. Furthermore, the sample average is biased for the conditional expectation function near the boundary. To address these problems, (Hahn et al. (2001)) developed a nonparametric version of regression called local linear regression and Porter (2003) developed the partial linear and local polynomial regression estimators. The continuity assumption is the basis of three main features of RD: local randomization, the robustness of RD to the endogeneity of D through X, and the ignorability of covariates other than X. First, in Sharp RD with D i = 1[X i c], those with X i c form the treatment group and those with X i c the control group. RD overcomes the confounding variable problem by focusing on a local neighborhood of X = c, because those who are just less than the cutoff are likely to be almost the same as those just greater than cutoff in terms of X and the unobserved confounders since m(X = c + ) = m(X = c ) because of the continuity of m(X). This is a ‘local randomization’. Second, in many models with an error U as in (1.3), E(UjX) is assumed as a constant, not varying as X changes. But RD allows E(UjX) to be a nontrivial function of X if E(UjX) is continuous at cutoff c. So, E(UjX) can be merged into m(X). This fact makes RD estimators robust to the endogeneity of D throughX, which is an important advantage for using RD. Third, there may be covariates W other than X and U. Then, the generalized model is Y i = RD D i + ~ m(X i ;W i ) +U i (1.6) 5 But this can be rewritten as Y i = RD D i +E[ ~ m(X;W )jX =X i ] + [U i + ~ m(X i ;W i )E[ ~ m(X;W )jX =X i ]] where E[ ~ m(X;W )jX =X i ] = Z ~ m(x;w)@F WjX (wjx) (1.7) We can consider E[ ~ m(X;W )jX =X i ] as m(X i ) and the error term in [ ] as U i , as long as E[ ~ m(X;W )jX =X i ] is continuous at x =c. So, RD can ignore covariates W. Being able to ignore W is an advantage, because otherwise the regression functional form of W should be specified. Despite this advantage, however, W may be still controlled because taking W off from U can lessen the variance of the error term. 1.3 Multiple-score RD (MRD) 1.3.1 Two types of MRD Canonical RD itself is a well-made study design, but in practice, there are many cases where treatment is determined by more than one assignment variable (score). This section introduces the main ideas and features of MRD, introducing key assumptions and notations that will be used throughout this paper. MRD refers to Cattaneo et al. (2016), Wong et al. (2013), Papay et al. (2011), Imbens and Zajonc (2009), Reardon and Robinson (2012) and Choi and Lee (2018). First of all, MRDs can be classified into two categories: single treatment case and mutually exclusive treatment case. 1. Multiple scores for a single treatment (left panel) : units have two assignment variables (e.g., math and reading test scores) and treatment is determined by a boundary depending on both scores (e.g., whether or not the student advances to the next grade). 2. Multiple scores for mutually exclusive treatment (right panel) : units have two assignment variables (e.g., math and reading test scores) and treatment determined by 6 a boundary depending on each score (e.g., grade retention, remedial classes, or moving onto the next grade). X 2 X 1 D i = 1 D i = 0 D i = 0 D i = 0 X 2 X 1 D 1i = 1 D 2i = 1 D 3i = 1 D 4i = 1 Figure 1.1: An illustration of the difference between MRD for a single treatment (left panel) and MRD for mutually exclusive treatment (right panel) The first case (Figure 1.1, left panel) is that students must pass both math and reading tests in order to advance to the next grade level. There are two assignment variables (math and reading tests) and single treatment (whether or not the student advance to the next level). This is the first MRD case that has multiple scores for a single treatment. D i 1i 2i =I[X 1i c 1 and X 2i c 2 ]; where 1i I[X 1i c 1 ] and 2i I[X 2i c 2 ]; c (c 1 ;c 2 ) 0 are known cutoffs (1.8) Without loss of generality, assume that the assignment rule is “AND” case (i.e. both assign- ment variables should cross cutoffs to get treated). An “OR” case can be converted to the “AND” case simply by ‘switching’ the treatment. The way to switch “OR” assignment rule to “AND” assignment rule: “OR” case is D i =I[X 1i >c 1 or X 2i >c 2 ]: 7 In this case, we can simply redefine the treatment indicator and assignment variables by ~ D i 1D i ; ~ X 1i X 1i ; ~ X 2i X 2i ; ~ c 1i c 1i ; ~ c 2i c 2i : This yields ~ D i =I[X 1i c 1 and X 2i c 2 ] =I[ ~ X 1i ~ c 1i and ~ X 2i ~ c 2i ]; which is the same form of an “AND” assignment rule. Figure 1.2 illustrates the difference between “AND” case (left panel) and “OR” case (right panel) X 2 X 1 D i = 1 D i = 0 D i = 0 D i = 0 X 2 X 1 D i = 1 D i = 0 D i = 1 D i = 1 Figure 1.2: An illustration of the difference between “AND” (left panel) and “OR” case (right panel) The second case (Figure 1.1, right panel) is similar to the first case but slightly different. The case is that students who fail one of the exams are required to attend remedial classes. There are two assignment variables (math and reading tests) and mutually exclusive treat- ments (math and reading remedial classes). This is the second MRD case that has multiple scores for mutually exclusive treatment. 8 1.3.2 Fuzzy MRD Identification Let s 1i and s 2i scores, and d 1i =f 1 (s 1i ; 1i ) and d 2i =f 2 (s 2i ; 2i ) indicator for each score. It is known thatE[d 1 js 1 ] is discontinuous ats 1 = 0, and also ford 2 . In the sharp design, we will have d 1 = 1fs 1 0g and d 2 = 1fs 2 0g. Policy requires that individual gets treated (d = 1) only when the score s i = (s 1i ;s 2i ) is in the regionA =fsjs 1 >c 1 = 0;s 2 >c 2 = 0g where cutoffs are normalized as zeros. However with fuzzy design, we may observe d = 1 even if s62A or d = 0 even if s2A. Potential outcomes Let y(d 1 ;d 2 ). The notation allows for the “partial effects”. Then y = d 1 d 2 y 11 +d 1 (1d 2 )y 10 + (1d 1 )d 2 y 01 + (1d 1 )(1d 2 )y 00 = y 00 +d 1 (y 10 y 00 ) +d 2 (y 01 y 00 ) +d 1 d 2 (y 11 y 10 y 01 +y 00 ) +d 1 1 +d 2 2 +d 1 d 2 3 where 1 = y 10 y 00 identifies the causal effect of “first treatment" when the individual does not take-up the second treatment. 2 similarly identifies the causal effect of the second treatment while there is no first treatment. Finally, 3 =y 11 y 10 y 01 +y 00 identifies the effect of “joint treatments". (We can interpret 3 = (y 11 y 10 ) (y 01 y 00 ) as complement if it has a positive value or substitute if it has a negative value.) The purpose of the study is to show that the mean of ( 1 ; 2 ; 3 ) are separately identified for somesubpopulationofindividualsfromwhichwecanidentifyabovetwoindividualtreatment effects and joint treatment effect. It will be interesting to test whether such “partial effects" exist or not (e.g. H 0 : 1i = 0 etc). Assumptions (Fuzzy MRD Continuity) • (A1) E[js 1 = c 1 ;s 2 = c 2 ] is continuous around (c 1 ;c 2 ) = (0; 0). (or only c 1 = 0 or c 2 = 0: boundary effects - consider heterogeneity) 9 • (A2) E[ j js 1 =c 1 ;s 2 =c 2 ] is continuous around (c 1 ;c 2 ) = (0; 0). • (A3) E[d j j js 1 = c 1 ;s 2 = c 2 ] = E[d j js 1 = c 1 ;s 2 = c 2 ]E[ j js 1 = c 1 ;s 2 = c 2 ] around (c 1 ;c 2 ) = (0; 0): (A1) is a standard assumption in RDD. It requires that individuals around cutoff are compa- rable without treatment. (A3) means that individuals do not select into/out the treatment under the perceived gain of treatment effect (No self-selection). (A3) may be too strong, we can relax (A3) to monotone assumption (No defiers). Then, we can use LATE. (as in Hahn et al. (2001) case 3) Using Assumptions, we can get E[yj 1 ; 2 ] =E[ +d 1 1 +d 2 2 +d 1 d 2 3 j 1 ; 2 ] =E[j 1 ; 2 ] +E[d 1 j 1 ; 2 ]E[ 1 j 1 ; 2 ] +E[d 2 j 1 ; 2 ]E[ 2 j 1 ; 2 ] +E[d 1 d 2 j 1 ; 2 ]E[ 3 j 1 ; 2 ] Taking limits y ++ = E[j0; 0] +d ++ 1 E[ 1 j0; 0] +d ++ 2 E[ 2 j0; 0] + (d 1 d 2 ) ++ E[ 3 j0; 0] y + = E[j0; 0] +d + 1 E[ 1 j0; 0] +d + 2 E[ 2 j0; 0] + (d 1 d 2 ) + E[ 3 j0; 0] y + = E[j0; 0] +d + 1 E[ 1 j0; 0] +d + 2 E[ 2 j0; 0] + (d 1 d 2 ) + E[ 3 j0; 0] y = E[j0; 0] +d 1 E[ 1 j0; 0] +d 2 E[ 2 j0; 0] + (d 1 d 2 ) E[ 3 j0; 0] where we use the notation y ++ lim 1 !0 + ; 2 !0 + E[yjs 1 = 1 ;s 2 = 2 ] We have four unknowns with four equations: therefore, we have an exact identification with complicated formulas given as follows: 10 Theorem (Fuzzy MRD) Given the system of equations y ++ = E[j0; 0] +d ++ 1 E[ 1 j0; 0] +d ++ 2 E[ 2 j0; 0] + (d 1 d 2 ) ++ E[ 3 j0; 0] y + = E[j0; 0] +d + 1 E[ 1 j0; 0] +d + 2 E[ 2 j0; 0] + (d 1 d 2 ) + E[ 3 j0; 0] y + = E[j0; 0] +d + 1 E[ 1 j0; 0] +d + 2 E[ 2 j0; 0] + (d 1 d 2 ) + E[ 3 j0; 0] y = E[j0; 0] +d 1 E[ 1 j0; 0] +d 2 E[ 2 j0; 0] + (d 1 d 2 ) E[ 3 j0; 0] We consider two conditions: (i) One treatment is uncorrelated with another score E[d 1 js 1 ;s 2 ] =E[d 1 js 1 ] and E[d 2 js 1 ;s 2 ] =E[d 2 js 2 ] ex) Math score 6 =) English summer camp counter ex) Geographical RD - (latitude, longitude): When latitude or longitude changes, geographical information changes accordingly and it can affect the treatment decision. (ii) Two treatment indicators are conditional independent. E[d 1 d 2 js 1 ;s 2 ] =E[d 1 js 1 ;s 2 ]E[d 2 js 1 ;s 2 ] : our “fuzziness” comes from the fact that d 1 is not a deterministic function of s 1 and simi- larly for d 2 while we are maintaining that the treatment rule is sharp in (d 1 ;d 2 ): d =d 1 d 2 . Note that treatment rule is not sharp in the scores as d6= 1fs 1 0;s 2 0g. ex) Different size of the company has different cutoffs of women employment rate in Affir- mative Action in Korea. Case 1 :Both (i) and (ii) hold. 11 If so, we need to haved ++ 1 =d + 1 ,d + 1 =d 1 ,d + 1 =d + 1 ,d 1 =d 1 and (d 1 d 2 ) ++ =d ++ 1 d ++ 2 for allf++;+; +;g. Then, we have E [ 1 j 0; 0] = d + 2 (y + y )d 2 (y ++ y + ) (d + 1 d 1 ) (d + 2 d 2 ) E [ 2 j 0; 0] = d + 1 (y + y )d 1 (y ++ y + ) (d + 1 d 1 ) (d + 2 d 2 ) E [ 3 j 0; 0] = y ++ y + y + +y (d + 1 d 1 ) (d + 2 d 2 ) Case 2 : Only (i) holds. If so, we only have d ++ 1 =d + 1 , d + 1 =d 1 , d + 1 =d + 1 , d 1 =d 1 . Let (d 1 d 2 ) ++ =d ++ 3 . for allf++;+; +;g Then, we have E [ 1 j 0; 0] = (d ++ 3 d + 3 )(y + y ) (d 3 d + 3 )(y ++ y + ) (d + 1 d 1 )(d ++ 3 d + 3 d + 3 +d 3 ) E [ 2 j 0; 0] = (d ++ 3 d + 3 )(y + y ) (d + 3 d 3 )(y ++ y + ) (d + 2 d 2 )(d ++ 3 d + 3 d + 3 +d 3 ) E [ 3 j 0; 0] = y ++ y + y + +y (d ++ 3 d + 3 d + 3 +d 3 ) 12 Case 3 : Both (i) and (ii) do not hold. Then, we can have, by Cramer’s rule, E[ 1 j 0; 0] = (d ++ 2 d + 3 d + 2 d ++ 3 )(y + y ) + (d + 2 d ++ 3 d ++ 2 d + 3 )(y + y ) +(d 2 d ++ 3 d ++ 2 d 3 )(y + y + ) + (d + 2 d + 3 d + 2 d + 3 )(y ++ y ) +(d + 2 d 3 d 2 d + 3 )(y ++ y + ) + (d 2 d + 3 d + 2 d 3 )(y ++ y + ) (d ++ 1 d + 2 d + 1 d ++ 2 )(d + 3 d 3 ) + (d + 1 d ++ 2 d ++ 1 d + 2 )(d + 3 d 3 ) +(d 1 d ++ 2 d ++ 1 d 2 )(d + 3 d + 3 ) + (d + 1 d + 2 d + 1 d + 2 )(d ++ 3 d 3 ) +(d ++ 1 d + 2 d 1 d + 2 )(d ++ 3 d + 3 ) + (d 1 d + 2 d + 1 d 2 )(d ++ 3 d + 3 ) E[ 2 j 0; 0] = (d + 1 d ++ 3 d ++ 1 d + 3 )(y + y ) + (d ++ 1 d + 3 d + 1 d ++ 3 )(y + y ) +(d ++ 1 d 3 d 1 d ++ 3 )(y + y + ) + (d + 1 d + 3 d + 1 d + 3 )(y ++ y ) +(d 1 d + 3 d + 1 d 3 )(y ++ y + ) + (d + 1 d 3 d 1 d + 3 )(y ++ y + ) (d ++ 1 d + 2 d + 1 d ++ 2 )(d + 3 d 3 ) + (d + 1 d ++ 2 d ++ 1 d + 2 )(d + 3 d 3 ) +(d 1 d ++ 2 d ++ 1 d 2 )(d + 3 d + 3 ) + (d + 1 d + 2 d + 1 d + 2 )(d ++ 3 d 3 ) +(d ++ 1 d + 2 d 1 d + 2 )(d ++ 3 d + 3 ) + (d 1 d + 2 d + 1 d 2 )(d ++ 3 d + 3 ) E[ 3 j 0; 0] = (d ++ 1 d + 2 d ++ 2 d + 1 )(y + y ) + (d ++ 2 d + 1 d ++ 1 d + 2 )(y + y ) +(d ++ 2 d 1 d ++ 1 d 2 )(y + y + ) + (d + 1 d + 2 d + 1 d + 2 )(y ++ y ) +(d ++ 1 d + 2 d 1 d + 2 )(y ++ y + ) + (d 1 d + 2 d + 1 d 2 )(y ++ y + ) (d ++ 1 d + 2 d + 1 d ++ 2 )(d + 3 d 3 ) + (d + 1 d ++ 2 d ++ 1 d + 2 )(d + 3 d 3 ) +(d 1 d ++ 2 d ++ 1 d 2 )(d + 3 d + 3 ) + (d + 1 d + 2 d + 1 d + 2 )(d ++ 3 d 3 ) +(d ++ 1 d + 2 d 1 d + 2 )(d ++ 3 d + 3 ) + (d 1 d + 2 d + 1 d 2 )(d ++ 3 d + 3 ) 13 Case 4 : Only (ii) holds. If so, we only have (d 1 d 2 ) ++ = d ++ 1 d ++ 2 for allf++;+; +;g. If we use d ++ 3 = d ++ 1 d ++ 2 for allf++;+; +;g then we can use Case 3. Lemma 1 Suppose we have sharp design in both scores (d + j = 1;d j = 0) then we have • case 1, 2: E [ 1 j 0; 0] =y + y E [ 2 j 0; 0] =y + y E [ 3 j 0; 0] =y ++ y + y + +y Lemma 2 If we have fuzzy design in one score while sharp in another, we have (let’s say score 1 is sharp and score 2 is fuzzy, d + 1 = 1;d 1 = 0) • case 1: E [ 1 j 0; 0] = d + 2 (y + y )d 2 (y ++ y + ) d + 2 d 2 E [ 2 j 0; 0] = y + y d + 2 d 2 E [ 3 j 0; 0] = y ++ y + y + +y d + 2 d 2 • case 2: E [ 1 j 0; 0] = (d ++ 3 d + 3 )(y + y ) (d 3 d + 3 )(y ++ y + ) (d ++ 3 d + 3 d + 3 +d 3 ) E [ 2 j 0; 0] = (d ++ 3 d + 3 )(y + y ) (d + 3 d 3 )(y ++ y + ) (d + 2 d 2 )(d ++ 3 d + 3 d + 3 +d 3 ) E [ 3 j 0; 0] = y ++ y + y + +y (d ++ 3 d + 3 d + 3 +d 3 ) 14 Lemma 3 Suppose we have one-sided noncompliance case (d j = 0). Then we have • case 1: E [ 1 j 0; 0] = d + 2 (y + y ) d + 1 d + 2 E [ 2 j 0; 0] = d + 1 (y + y ) d + 1 d + 2 E [ 3 j 0; 0] = y ++ y + y + +y d + 1 d + 2 • case 2: E [ 1 j 0; 0] = (d ++ 3 d + 3 )(y + y ) (d 3 d + 3 )(y ++ y + ) d + 1 (d ++ 3 d + 3 d + 3 +d 3 ) E [ 2 j 0; 0] = (d ++ 3 d + 3 )(y + y ) (d + 3 d 3 )(y ++ y + ) d + 2 (d ++ 3 d + 3 d + 3 +d 3 ) E [ 3 j 0; 0] = y ++ y + y + +y (d ++ 3 d + 3 d + 3 +d 3 ) 1.4 Conclusion Although MRD is an effective tool for a variety of policy evaluations, the current research for MRD is limited. As a result, there is little agreement on how to conduct MRD analysis. This study reviews existing research on MRD and proposes new methods for its extension. Especially, this paper provides Fuzzy MRD identification which is a new extension of MRD. Since Fuzzy MRD is a combination of two methods, Fuzzy RD and MRD, which need specific data for analysis, it is hard to find the data which Fuzzy MRD can be applied. Here, I suggest three related further studies. First, if we can find data that contains every required variable for Fuzzy MRD analysis, Fuzzy MRD empirical analysis can be an interestingtopic. Second, MRDestimationwithcovariatesorclusteringcanbeaninteresting topic. With covariates or clustering, MRD is expected to maintain consistency and improve 15 estimation and inference. Finally, one of the advantages of RD design is the clear graphical analysis. If we can suggest nice methods to present MRD analysis in graphical ways, the study can merit further investigation. 16 Chapter 2 Affirmative Action in Korea - Regression Discontinuity with Multiple Assignment Variables 2.1 Introduction Korean women’s economic status has remained poor in comparison to Korean men’s and also in comparison to the status of women in the majority of other OECD countries. Despite a long-term trend toward equality, gender inequality in work and earnings persists in the Korean labor market. The labor force participation rate of Korean women (women employment rate) is nearly the lowest among OECD nations, as is the female employment- to-population ratio (Figure B.1, OECD, 2019). In contrast to other OECD nations, the average income of Korean women is significantly lower than that of Korean men. (Figure B.3, OECD, 2019). According to Korean research on the gender pay disparity, a significant portion of the reported gap is attributed to non-productivity based discrimination against women. 17 In previous studies on the effects of Affirmative Action in Korea on the women employ- ment rate, Chang et al. (2006) found it difficult to see statistically significant quantitative increases. They negatively assessed the effect of AA because the effect of AA mainly affects the employment of a temporary worker. For that reason, they argued that AA policy is not so effective because the U.S. has a specific legal binding force and compulsory, while Korea does not have enough compulsory and incentives. In particular, even though the AA policy has increased the women employment rate, most of them are filled with temporary workers; therefore, the AA system should be improved to increase the quality of employment together. They pointed out that more in-depth and comprehensive institutional improvements should be made to the overall personnel system such as vocational education and promotion. Jeon and Kim (2008) analyzed the performance of AA policy for three years after it was introduced, and unlike Chang et al. (2006), Jeon got the significant positive effect of AA policy. In other words, since the introduction of the AA policy, the women employment rate has steadily increased, and the gap between men and women in positions has also decreased. However, it is difficult to understand whether the quality of women employment has improved. The reason for this is that the AA policy does not report whether a newly hired female worker is a permanent or a temporary worker. Cho et al. (2010) analyzed the effects of Affirmative Action in Korea using Korean In- formation Service (KIS, 2008) data. Unlike previous studies, this study used a method of estimating factors that would enhance the effectiveness of AA policy, which concluded sim- ilarly to prior studies. The current AA policy lacks incentives to encourage companies to participate and is not designed to take into account the quality of employment. This paper examines the economic outcomes of AA in Korea. In response to a wide body of foreign studies on the socioeconomic consequences of AA (Smith and Welch (1984), Leonard (1984), Leonard (1990), Coate and Loury (1993), Holzer and Neumark (1999), Holzer and Neumark (2000), Paola et al. (2010)), only a few studies have been done because 18 of the short history of AA in Korea. This research mostly addressed institutional design and implementation problems (Chang et al. (2006), Kim et al. (2010)). The goal of this paper is to investigate the effect of Affirmative Action policy by applying MRD, which is different from most other Affirmative Action studies in Korea. This paper also allows for partial effects in MRD which was introduced in Reardon and Robinson (2012), Choi and Lee (2018). E(YjX) = 0 + 1 1 + 2 2 + d D; j 1[c j X j ] for j = 1; 2 & D = 1 2 (2.1) where c 1 ;c 2 are known cutoffs and ’s are parameters. The extension of E(YjX) for conditional quantiles or mode appear to be possible for ‘sharp RD’ where D is fully deter- mined by the assignment variables. Sharp MRD design is the suitable design for AA because companies’ size and women employment rate only determine target companies. However, there can be non-compliance because AA policy only imposes incentives and penalties on companies to implement on their plans, not compulsory enforcement. Due to the absence of severe punishment for non-compliance, along with a weak incentive system, corporate success in AA regulation is heavily reliant on firms’ voluntary involvement in the program. Therefore, in order to estimate the true effect of AA policy, we need to consider a Fuzzy MRD design. Since a Fuzzy MRD design needs more assumptions and instrumental variables to estimate the effect, the paper only applies Sharp MRD design because the data do not have instrumental variables or indicator variables for treatment actually received. The rest of this paper is organized as follows. Section 2.2 overviews the implementation procedure of Affirmative Action in Korea. Section 2.3 reviews multiple score RD (MRD) design. Section 2.4 provides an empirical model for AA and the results of the effect of AA. Finally, Section 2.5 concludes. 19 2.2 Implementation Procedure of AA in Korea Affirmative Action (AA) was first implemented in Korea in 2006 as an active initiative to increase women’s employment and resolve widely rooted discriminatory structures against women. It was first introduced for public companies and private companies with 1,000 or more workers, and after a two-year grace period, it was expanded to smaller private firms (with 500-999 employees). Figure 2.1: The AA Implementation Procedure Figure 2.2: Target vs Non-target Process As illustrated in Figure 2.1, Affirmative Action in Korea is implemented in four stages 1 . To begin, under the AA clause, firms with more than 500 employees become the targets for the first stage. These firms must submit an initial report detailing their male and female employee counts by jobs and rank (Figure B.4: Detailed 1st stage). Figure 2.2 illustrates 1 See appendix for detailed processes 20 the process of target determination. Second, firms with a women employment rate that is less than 60% of the industry average (changed to 70% in 2015) must submit an AA implementation plan outlining how they intend to increase female hiring in the next year (Figure B.5: Detailed 2nd stage). Third, firms that have filed an implementation plan need to submit a progress report for performance evaluation (Figure B.6: Detailed 3rd stage). Finally, based on the evaluation, firms that have achieved significant progress get incentives, while firms that did not meet the required standards get penalties (4th stage). Incentives are 1. Additional points for product bidding qualification examination by the Public Procure- ment Service 2. Additional points for contract performance evaluation of competitive product by the Small and Medium Business Administration 3. Funding Priorities 4. Exemption from regular supervision at local labor offices Penalties are 1. Deduction for product bidding by the Public Procurement Service 2. Exclude family-friendly certification by the Ministry of Gender Equality and Family 3. Distribution of press releases by the Ministry of Employment and Labor 2.3 Multiple-score RD (MRD) Canonical Regression Discontinuity design itself is a well-made study design, but in prac- tice, there are many cases where treatment is determined by more than one assignment variable (score). This special regression discontinuity design is referred to as Multiple-score 21 RD (MRD). Some MRD works have developed recently (Cattaneo et al. (2016), Wong et al. (2013), Papay et al. (2011), Imbens and Zajonc (2009), Reardon and Robinson (2012) and Choi and Lee (2018)). 2.3.1 Potential outcomes and Identification with Partial effects Define potential outcomes (Y 00 ;Y 10 ;Y 01 ;Y 11 ) when ( 1 ; 2 ) = (0,0), (1,0), (0,1), (1,1). Although the interaction D = 1 2 is the treatment of interest, 1 and 2 can have separate effectsonY.Forinstance, intheAffirmativeActionpolicy(D = 1)effectexample(onwomen employmentY determined by fulfilling both company size ( 1 = 1) and women employment requirement ( 2 = 1)), even if a company is not selected to AA, the company who fulfills company size requirement still have a duty to submit employment data to the government. At first glance, the individual treatment effects of interest appear to be Y 11 Y 00 because D = 1 2 , but this is not the true effect if there is a partial effect. Hence “true” individual treatment effect of interest should be Y 11 Y 00 (Y 10 Y 00 ) (Y 01 Y 00 ) =Y 11 Y 10 Y 01 +Y 00 (2.2) where the two partial effects are subtracted from Y 11 Y 00 . Reconstruct E(YjX) as E(YjX) =E(Y 00 jX)(1 1 )(1 2 ) +E(Y 10 jX) 1 (1 2 ) +E(Y 01 jX)(1 1 ) 2 +E(Y 11 jX) 1 2 =E(Y 00 jX) +fE(Y 10 jX)E(Y 00 jX)g 1 +fE(Y 01 jX)E(Y 00 jX)g 2 +fE(Y 11 jX)E(Y 10 jX)E(Y 01 jX) +E(Y 00 jX)gD (2.3) The coefficient ofD = 1 2 is similar form of the aboveY 11 Y 10 Y 01 +Y 00 and it is similar with a Difference in Differences (DD) (E(Y 11 jX)E(Y 10 jX): “treatment group difference” and E(Y 01 jX)E(Y 00 jX): “control group difference”). In this settings, DD is widely used to identify the interaction (treatment) effect by removing the partial effects. If there is no 22 partial effects (E(Y 10 jX) =E(Y 01 jX) =E(Y 00 jX)) then E(YjX) becomes E(YjX) =E(Y 00 jX) +fE(Y 11 jX)E(Y 00 jX)gD (2.4) Identification of RD is based on observations near the boundary. In a canonical RD setting, sharp RD identifies a treatment effect at the cutoff, X = c. For the identification of MRD, we need higher dimensions. To simplify the problem, extend the canonical RD setting to a two-dimensional setting. To identify the sharp MRD effects 2 , we need continuity assumption in higher dimension. Assumption 2 (Continuity of Conditional Expectation Functions) (i) :E[Y 01 jX 1 =c 1 ;X 2 =c + 2 ] =E[Y 01 jX 1 =c + 1 ;X 2 =c + 2 ] (ii) :E[Y 10 jX 1 =c + 1 ;X 2 =c 2 ] =E[Y 01 jX 1 =c + 1 ;X 2 =c + 2 ] (iii) :E[Y 00 jX 1 =c 1 ;X 2 =c 2 ] =E[Y 01 jX 1 =c + 1 ;X 2 =c + 2 ] (2.5) where c means left side limit and c + means right side limit. Continuity of the conditional expectation functions shows that units near cutoffs have comparable potential outcomes. Using (2.5), the coefficient of D in (2.3) becomes d =E(Y 11 jX)E(Y 10 jX)E(Y 01 jX) +E(Y 00 jX) =E[Y 11 jc + 1 ;c + 2 ]E[Y 10 jc + 1 ;c + 2 ]E[Y 01 jc + 1 ;c + 2 ] +E[Y 00 jc + 1 ;c + 2 ] =E[Y 11 Y 10 Y 01 +Y 00 jc + 1 ;c + 2 ] (2.6) If there is no partial effects (E(Y 10 jc + 1 ;c + 2 ) = E(Y 01 jc + 1 ;c + 2 ) = E(Y 00 jc + 1 ;c + 2 )) then d = E[Y 11 Y 00 jc + 1 ;c + 2 ]. Partial effects are not suitable for certain MRD cases. For instance, in a geographic MRD setting with latitude and longitude as scores, crossing only one cutoff may nothaveanyeffect. Butiftherearemountainsrangingfromnorthtosouth(longitudescore), then a partial effect can exist because the weather on the right side of the mountain range 2 See appendix for the identification for fuzzy MRD 23 can differ considerably from that on the left such as raining more, having high temperatures or being dryer. Another example is the effect of Affirmative Action in Korea on women employment rate. Even if the targets of the affirmative action are the companies that have more than 500 employees and less than women’s employment standard by industry, still the companies that have more than 500 employees can be different from the companies that have less than 500 employees because the companies that have more than 500 employees and more than women’s employment standard by industry have to submit the first year survey for calculating women’s employment standard by industry and the companies always have to consider women’s employment rate when they hire new employees. 2.3.2 Estimation of MRD While (2.6) demonstrates that d can be estimated by substituting sample versions of the four identified elements, in practice, it is more convenient to use MRD with (2.3), which requires only the local observations satisfying X j 2 (h j ;h j );j = 1; 2. Use parameters 1 , 2 and d to attain E(YjX) =E(Y 00 jX) + 1 1 + 2 2 + d D (2.7) where E(Y 00 jX) is specified as linear or quadratic function. Then we can attain inference of the standard OLS asymptotic variance estimator with (2.7). With j 1[h j < X j < 0]; + j 1[0 X j < h j ]; j = 1; 2, we can replace E(Y 00 jX) as piecewise-linear function at cutoff (set to 0): E(Y 00 jX) = 0 + 11 1 2 X 1 + 12 1 2 X 2 + 21 1 + 2 X 1 + 22 1 + 2 X 2 + 31 + 1 2 X 1 + 32 + 1 2 X 2 + 41 + 1 + 2 X 1 + 42 + 1 + 2 X 2 (2.8) ThereareseveralotherapproachesforMRD.Wonget al.(2013)presentedfourestimation procedures: the frontier, centering, univariate, and IV approach. IV approach needs strong assumption because it requires continuity in the expected potential outcomes for all units 24 at c 1 to estimate the complier average treatment effect at cutoff c 2 . Because of this strong assumption, Wong et al. (2013) did not recommend it for applied practice. So, we will explain the other three approaches. Frontier approach (weighted average of boundary effects) The treatment frontiers are defined as: F 1 f(x 1 ;x 2 )jx 1 =c 1 and x 2 c 2 g; and F 2 f(x 1 ;x 2 )jx 1 c 1 and x 2 =c 2 g (2.9) We need continuity assumptions for the conditional expectation functions of the potential outcomes. The continuity assumptions for Y(0) are lim x 1 !c + 1 E[Y i (0)jX 1i =c 1 ;X 2i c 2 ] = lim x 1 !c 1 E[Y i (0)jX 1i =c 1 ;X 2i c 2 ] lim x 2 !c + 2 E[Y i (0)jX 1i c 1 ;X 2i =c 2 ] = lim x 2 !c 2 E[Y i (0)jX 1i c 1 ;X 2i =c 2 ] (2.10) and similarly for Y(1). LetG i =Y i (1)Y i (0) be the difference between potential outcomes, g(x 1 ;x 2 ) be the difference in expected potential outcomes, and f(x 1 ;x 2 ) be the joint density function for (X 1 ;X 2 ). Under the assumptions, we can obtain average treatment effects on each frontier. Each treatment effect is 1 E[G i j(X 1i ;X 2i )2F 1 ] = R x 2 c 2 g(c 1 ;x 2 )f(c 1 ;x 2 )dx 2 R x 2 c 2 f(c 1 ;x 2 )dx 2 ; and 2 E[G i j(X 1i ;X 2i )2F 2 ] = R x 1 c 1 g(x 1 ;c 2 )f(x 1 ;c 2 )dx 1 R x 1 c 1 f(x 1 ;c 2 )dx 1 (2.11) 25 respectively. The frontier average treatment effect MRD is MRD E[G i j(X 1i ;X 2i )2F 1 [F 2 ] =w 1 1 +w 2 2 where w 1 R x 2 c 2 f(c 1 ;x 2 )dx 2 R x 1 c 1 f(x 1 ;c 2 )dx 1 + R x 2 c 2 f(c 1 ;x 2 )dx 2 w 2 R x 1 c 1 f(x 1 ;c 2 )dx 1 R x 1 c 1 f(x 1 ;c 2 )dx 1 + R x 2 c 2 f(c 1 ;x 2 )dx 2 (2.12) a weighted average of each frontier treatment effect. The advantage of the frontier approach is that it estimates the average treatment effect on each frontier and overall simultaneously. The disadvantage is that the approach relies on the strong assumption that the response surface and kernel density are correctly specified. Moreover, nonparametric estimation of densities and numerical integration is burdensome and computationally expensive (particu- larly if standard errors are bootstrapped). Centering approach (minimum score) The centering approach collapses multiple as- signment variables into a single assignment variable by using a minimum function on assign- ment variables. Let m define m min( 1 ; 2 ) =) D = 1[0 m ] (2.13) to set up E(YjX) = 0 + m (1D) + + m D + m D (2.14) where m is the treatment effect of interest. The advantage of the centering approach is that it generalizes well to MRDs with more than two assignment variables. The disadvantage is thatcollapsingmultipleassignmentvariablesintoasingleassignmentvariableleadstoamore complex functional form which tends to generate misspecification bias even if nonparametric regression methods are used. In addition, the centering approach cannot estimate average 26 treatment effects on each frontier 1 and 2 ,but only estimate MRD . Thus, it should consider the assignment variables’ unit and scale. Univariate approach (one-dimensional localization) The univariate approach uses a subpopulation to solve the dimensionality problem. In the subpopulation, one assignment variable is already greater than its cutoff. Define potential outcomes (Y 00 ;Y 10 ;Y 01 ;Y 11 ) when ( 1 ; 2 ) = (0,0), (1,0), (0,1), (1,1). Let 1 = 1 and D = 2 E(YjX) =E(Y 10 jX) +fE(Y 11 jX)E(Y 10 jX)g 2 (2.15) E(Y 10 jX) is the baseline. Now we consider x 2 is a canonical RD case with x 1 c 1 . Assume the continuity near the cutoff E(Y 10 jx 1 ;c + 2 ) =E(Y 10 jx 1 ;c 2 ) 8x 1 c 1 (2.16) The advantage of univariate approach is that it is efficient because it uses larger control and treatment group by only using one dimension. The disadvantage is that it does not compute MRD and a bias arises if there is a partial effect. Themethodsdiscussedaboveidentified“boundary-specific” effects,whicharethenweighted averagedtoeliminatepartialeffects(ImbensandZajonc(2009), Wong et al.(2013)andKeele andTitiunik(2015)). Unlikethesepapers, someotherpapersallowforpartialeffectsinMRD which was introduced in Papay et al. (2011), Reardon and Robinson (2012) and Choi and Lee (2018). The approaches introduced above are used as useful references for MRD with partial effects in the empirical analysis section. 2.3.3 Bandwidth Selection of MRD Choosing optimal bandwidths for h in MRD is important. Following Imbens and Kalya- naraman (2012) for standard RD’s optimal bandwidth and Zajonc (2012) for MRD’s optimal 27 bandwidth, we can derive a rule-of-thumb bandwidth selection rule. Two types of kernel neighborhood are used. With COR(X 1 ;X 2 ), j SD(X j ) and j h j = j for j = 1; 2, (i) square-neighborhood kernel: K h (X 1 ;X 2 ) = 1[j X 1 1 1 j 1] 1[j X 2 2 2 j 1] (ii) oval-neighborhood kernel: K h (X 1 ;X 2 ) = 1[( X 1 1 1 ) 2 2 X 1 1 1 X 2 2 2 + ( X 2 2 2 ) 2 1] Optimal plug-in bandwidth for MRD (x 1 ;x 2 ) h opt (x) 4:75 v(x)=(f(x) 1 2 ) (C 1 2 1 [m 11 0 (x)m 11 1 (x)] +C 2 2 2 [m 22 0 (x)m 22 1 (x)]) 2 1=6 n 1=6 (2.17) where the conditional outcome variance v(x), the density f(x), the standard deviations of the scores 1 and 2 , and the four second derivatives m 11 0 (x), m 11 1 (x), m 22 0 (x), and m 22 1 (x). Since the denominator may be close to zero, the optimum bandwidth can be unstable. This instability occurs as a result of the fact that first-order asymptotic biases may cancel, remaining only asymptotic variance, which is reduced by an infinite bandwidth. Unlike standard RD settings in Imbens and Kalyanaraman (2012), bias cancellation does not occur uniquely in the RD setup and also occurs due to the multivariate setting. There is a single optimal bandwidth in standard RD setting, but the optimal bandwidth (2.17) varies based on the boundary position. There are four steps for a rule-of-thumb bandwidth selection rule in Zajonc (2012). 1. Estimate the conditional outcome variance v(x), the density f(x), and the standard deviations 1 and 2 . 2. Estimate the four second derivatives m 11 0 (x), m 11 1 (x), m 22 0 (x), and m 22 1 (x). 3. Calculate ^ h opt (x k ) for K evenly spaced points. 4. Select the minimum ^ h opt (x k ) as the rule-of-thumb bandwidth h ROT . 28 Step 1: Estimate the conditional outcome variance v(x), the density f(x), and the standard deviations 1 and 2 Calculate the standard deviations of the scores ^ 1 and ^ 2 . For temporary bandwidths, use Scott’s rule with d = 2 for j = 1; 2, ^ h j = ^ j n 1=6 ; (2.18) which is roughly optimal under normal kernel and multivariate normal data. Denote the set of units that have been treated and untreated within the uniform kernel as H d fi :jX 1i x 1 jh 1 ;jX 2i x 2 jh 2 ;D i =dg ford2f0; 1g. Further, letN d P i2H d 1 means the number of units inH d . The density f(x) at (x 1 ;x 2 ) is estimated as ^ f (x 1 ;x 2 ) = N 0 +N 1 Nh 1 h 2 (2.19) and the conditional variance v(x) at (x 1 ;x 2 ) as ^ v (x 1 ;x 2 ) = 1 N 0 +N 1 X i2H 0 (Y i 1 N 0 X j2H 0 Y j ) 2 + X i2H 1 (Y i 1 N 1 X j2H 1 Y j ) 2 ! (2.20) Step 2: Estimate the four second derivatives m 11 0 (x), m 11 1 (x), m 22 0 (x), and m 22 1 (x) With simple rule-of-thumb temporary bandwidth h j;d = 4 1=4 j n 1=8 d , bandwidth matrix H 1=2 d =diag[h 1;d h 2;d ] and an uniform kernel, the local quadratic regression is Y i = 0 + 1 (X 1 x 1 )+ 2 (X 2 x 2 )+ 3 (X 1 x 1 ) 2 + 4 (X 2 x 2 ) 2 ++ 5 (X 1 x 1 )(X 2 x 2 )+v i (2.21) for units with D i = 1 and, separately, D i = 0. Using the coefficients in (2.21) with D i =d, we can calculate estimates of the second derivatives as ^ m 11 d (x) = 2 ^ 3 and ^ m 22 d (x) = 2 ^ 4 . 29 Step 3: Calculate ^ h opt (x k ) for K evenly spaced points ChooseK points along the boundary. Putting in the estimates which are calculated in Step 1 and 2 into the optimal bandwidth (2.17) produces ^ h opt (x k ) for each point on the boundary. Step 4: Select the rule-of-thumb bandwidth h ROT Select the minimum ^ h opt (x k ) as the final rule-of-thumb bandwidth h ROT . 2.4 MRD analysis of Affirmative Action This section applies the Multiple Regression Discontinuity Design developed above to measure the Affirmative Action policy effect on women employment rate in Korea. 2.4.1 Data The data for the empirical analysis is used from the Workplace Panel Survey (WPS) for 2009, 2011, 2013 and 2015. 3 Detailed variables description is in Table B.1 in Appendix. Table 2.1 presents descriptive statistics of the final sample. The final sample has 3266 ob- servations. On average, the number of the employee of companies is around 327.5, and the cutoff for them is 500. So, we center the assignment variable (X size ) around the cutoff (S size = X size 500). 19.6% of the companies have more than 500 employees. The women employment rate (WER) is around 0.289, and the cutoffs are varied based on industry and company size (ex. 2015 cutoffs in Table B.3). We center the assignment variable (X rate ) around the cutoffs (S rate = X rate cutoffs). 32.3% of the companies cannot satisfy women employment rate cutoffs. Companies that get treatment (Affirmative Action) are only 9.5% of the sample. The dependent variable Y is the women employment rate changes after Af- firmative Action. The dependent variable Y is positive (0.003). It means that overall the women employment rate two years later is greater than the current women employment rate. 3 2005 and 2007 was an initial stage for the Affirmative Action in Korea. So, some changes are hard to control at the same time with recent data. For example, 2005 and 2007 have different cutoffs for company size. So, they eliminated from the sample 30 17.1% of the companies provide childcare supports. 88.6% of the companies provide parental leave. Table 2.1: Descriptive Statistics of All sample Variable Explanation Mean (SD) Min Max X size Total number of employee (size) 327.5 (352.5) 50 1998 S size Normalized X size -172.5 (352.5) -450 1498 size Indicator-size 0.196 (0.397) 0 1 X rate Women Employment Rate (WER, rate) 0.289 (0.240) 0 1 S rate Normalized X rate 0.114 (0.180) -0.255 0.828 rate Indicator-rate 0.323 (0.468) 0 1 D Treatment 0.095 (0.293) 0 1 Y WER Changes after AA (2 years) 0.0031 (0.070) -0.431 0.805 Child Childcare Support 0.171 (0.377) 0 1 Parent Parental Leave 0.886 (0.318) 0 1 N Observations 3266 31 2.4.2 Model Figure 2.3: Affirmative Action’s Treatment and Control regions Figure 2.3 illustrates Affirmative Action’s treatment and control regions. From section 2.2, we illustrate the procedure of Affirmative Action in Korea. Companies that have more than 500 employees advance the first stage. Therefore, control 1 and treatment regions are the first target after the first stage of AA. After that, based on the industry and company size (500-999 or more than 1000), each company has different criteria (cutoffs) which determine the final target of the Affirmative Action. Companies’ women employment rate in treatment region is less than women employment rate criteria but the control 1 region is greater than the criteria. Because this policy has multiple stages and criteria, we can analyze each step. We will deal with this in the empirical results section (section 2.4.3) based on canonical RD before handling the MRD problem. Let X Size;i;t is a variable that measures the size of the company i in year t; X Rate;i;t be a variable that measures women’s employment rate of the company i in yeart;c Rate;Ind;Size;t is a cutoff that measures average women’s employment rate of the same industry and similar sizeinyeart; Size;i;t isadummyvariablewhichtakesavalueofoneifthesizeofthecompany 32 i is greater than the cutoff size in year t. Size;i;t = 1[X Size;i;t 500] (2.22) Rate;i;t isadummyvariablethattakesavalueofone, ifwomen’semploymentrateofcompany i is less than women employment rate cutoffs. Rate;i;t = 1[X Rate;i;t c Rate;Ind;Size;t ] (2.23) Thisequationhasmultiplecutoffs. Cattaneo et al.(2016)dealtwithmultiplecutoffsproblem and a simple solution is the normalizing-and-pooling approach. 4 D i;t is a treatment dummy variable which takes a value of one, if both dummy variables i;t;Rate and i;t;Size take a value of one. D i;t = i;t;Rate i;t;Size ) Multiple scores (2.24) Implementing AA can result in various effects on companies. First, it has a direct effect which is the main reason why AA policy has been implemented. The direct effect is the effect on women employment rate because AA is designed to increase women employment rate to reduce gender discrimination. Second, it has an indirect effect which is not the main reason AA policy has been conducted but additional AA results. For example, the indirect effect is the effect on corporate performance such as sales and profit of companies. For companies, corporate performance is more important than any other results. So, compliance of targeted companies for AA depends on the indirect effect of AA. Exploring the indirect effect is also an interesting research question but we do not deal with the indirect effect in this paper. We use a lag variable because women employment rate of firm i in year t is related to women employment rate of firm i in year t + 1 and treatment in year t affects dependent variable in year t + 1. There is no reason to consider any partial effect involving the female 4 Although cutoff varying across individuals, it can be easily accommodated as long as it is observed 33 worker proportion because there is simply no basis to believe that the variable alone affects anything. However, there is a good reason to suspect a partial effect involving firm size, because firm size plays many roles such as work hours and wage (e.g., Kim and Lee 2019). Especially in AA, firm size has an effect because targeted companies in the first stage must submit an initial report detailing their male and female employee counts by jobs and rank. It means the companies have to manage their women employment rate even if they pass the criteria this year, they may not pass next year. So, the model is Y i;t = S Size;i;t + R Rate;i;t + D D i;t + Z i;t +" i;t D i;t = Size;i;t Rate;i;t = 1[X Size;i;t 500] 1[X Rate;i;t c Rate;Ind;Size;t ] (2.25) whereY i;t is a dependent variable that measures women employment rate change of firm i in yeart;Z i;t is a vector of firm’s characteristics such as industry, region, childcare and parental leave.; " i;t is an error term. The first line in (2.25) is a standard regression model describing the causal impacts of Y i;t , Size;i;t , Rate;i;t ,D i;t andZ i;t onY i;t . Z i;t represents firm’s characteristics. The main problem is that elements of Z i;t may be unobservable to the researcher, so OLS will suffer from an omitted variable bias, since Z i;t might be correlated with Y i;t , and hence with D i;t . This is why a simple comparison is likely to be biased. But an RD can plausibly be used here. If continuity assumption holds, we can get the average treatment effect. In order to adjust inference, we can add covariates Z i;t to the RD analysis. We will also apply this to the canonical RD Empirical Results section but adding covariates to MRD is also an interesting research question but we do not deal with the indirect effect in this paper. 34 2.4.3 Empirical Results 2.4.3.1 Canonical RD Empirical Results for full sample In Affirmative Action in Korea, we have two scores and each score can get RD treatment effect. Therefore, we can analyze the effects of two AA policy criteria on women employment rate. It illustrates in Figure 2.4, (a) Score: Women Employment Rate (b) Score: Company size Figure 2.4: Two RD effects Women Employment Rate, S rate Table 2.2: RD analysis for S rate Coef. 95% C.I. Bandwidth (h) Eff. obs Number of obs - 0.008 [-0.002 , 0.019] 0.089 [824 , 708] [1054 , 2212] Covariates 0.006 [-0.004 , 0.017] 0.092 [840 , 733] [1054 , 2212] Clusting (reg) 0.008 [0.001 , 0.017] 0.090 [828 , 718] [1054 , 2212] Clusting (ind) 0.007 [-0.005 , 0.020] 0.101 [869 , 795] [1054 , 2212] Cov and Clust (reg) 0.006 [-0.003 , 0.016] 0.102 [872 , 801] [1054 , 2212] Cov and Clust (ind) 0.007 [-0.004 , 0.018] 0.097 [853 , 766] [1054 , 2212] 35 −0.2 0.0 0.2 0.4 0.6 0.8 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 Score_rate Outcome (a) Scatter plot −0.025 0.000 0.025 0.050 −0.2 −0.1 0.0 0.1 0.2 Score Outcome (b) RD plot Figure 2.5: Scatter plot and RD plot for S rate According to the raw scatter plot, we cannot detect any RD effect but the RD plot reveals a negative jump (the left region is the treatment region). The results of RD estimation and inference are in Table 2.2. The RD analysis considers the left region as the control region, so the sign of the coefficients is opposite to the actual treatment effect. We accommodated additionalcovariates(year, childcare, parentalleave, companysize)inthemodelspecification and clustering (region or industry) of observations. There is a significant effect in regional clustering. 200 300 400 500 −0.08 −0.04 0.00 0.04 0.08 Score # of employee (a) Company size 0.0 0.1 0.2 0.3 −0.05 0.00 0.05 Score child (b) Childcare 0.6 0.7 0.8 0.9 1.0 −0.02 0.00 0.02 Score Parental_leave (c) Parental leave Figure 2.6: Graphical Illustration of Local Linear RD Effects for Covariates - S rate 36 Falsification Tests are essential steps of the RD design. The tests assess the validity of the RD assumptions and present empirical support for the feasibility of the RD design. Figure 2.6 and estimated effects for the covariates (Coefficient (p-value): -0.850 (.937) for Company size, 0.017 (.496) for Childcare and 0.023 (.861) for Parental leave) show there is no evidence that treatment and control groups differ systematically near the cutoff. 0 20 40 60 −0.03 0.00 0.03 0.06 Score Number of Observations (a) Histogram 0 2 4 6 −0.03 0.00 0.03 0.06 Score Density Control Treatment (b) Estimated Density Figure 2.7: Histogram and Estimated Density of the score - S rate Figure 2.7 provides a histogram and an estimated density of the score with shaded 95% confidence intervals. As demonstrated in Figure 2.7, the density estimates near cutoff are very close and confidence intervals overlap. The value of the statistics from the density test is -0.2697 and the associated p-value is 0.7874. This implies that we fail to reject the null hypothesis that there is no difference in the density of control and treatment groups at the cutoff. 37 Company size, S size Table 2.3: RD analysis for S size Coef. 95% C.I. Bandwidth (h) Eff. obs Number of obs - 0.016 [-0.006 , 0.044] 138.1 [330 , 189] [2624 , 642] Covariates 0.015 [-0.006 , 0.044] 142.3 [341 , 191] [2624 , 642] Clusting (reg) 0.017 [-0.003 , 0.043] 128.8 [307 , 179] [2624 , 642] Clusting (ind) 0.017 [-0.006 , 0.048] 112.5 [257 , 158] [2624 , 642] Cov and Clust (reg) 0.016 [-0.003 , 0.040] 137.9 [326 , 189] [2624 , 642] Cov and Clust (ind) 0.017 [-0.007 , 0.049] 120.8 [282 , 170] [2624 , 642] −500 0 500 1000 1500 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 Score Outcome (a) Scatter plot −0.025 0.000 0.025 0.050 −500 0 500 1000 1500 Score Outcome (b) RD plot Figure 2.8: Scatter plot and RD plot for S size According to the raw scatter plot, we cannot detect any RD effect but the RD plot reveals a positive jump. The results of RD estimation and inference are in Table 2.3. Most cases are no significant effects but the cases which use regional clustering have a significant positive effect if we consider 90% confidence interval. 38 0.2 0.3 0.4 0.5 0.6 −100 0 100 Score (a) Women employment rate 0.2 0.4 0.6 −100 −50 0 50 100 Score (b) Childcare 0.80 0.85 0.90 0.95 1.00 1.05 −200 −100 0 100 200 Score (c) Parental leave Figure 2.9: Graphical Illustration of Local Linear RD Effects for Covariates - S size Figure 2.9 and estimated effects for the covariates (Coefficient (p-value): 0.093 (.053) for Women employment rate, 0.123 (.137) for Childcare and 0.004 (.794) for Parental leave) show there is difference in Women employment rate between treatment and control groups near the cutoff. We can interpret this result that bigger companies, which have more than 500 employees, are more likely to hire more women to achieve higher women employment rate or since bigger companies have more facilities and policies for women, women tend to work in bigger companies. 0 5 10 15 −40 0 40 Score Number of Observations (a) Histogram −5e−04 0e+00 5e−04 1e−03 −40 0 40 Score Density Control Treatment (b) Estimated Density Figure 2.10: Histogram and Estimated Density of the score - S size 39 As we can see in Figure 2.10, the density estimates near cutoff are very close and confi- dence intervals overlap. The value of the statistics from the density test is -0.3412 and the associated p-value is 0.7329. This implies that we fail to reject the null hypothesis that there is no difference in the density of control and treatment groups at the cutoff. 2.4.3.2 Canonical RD Empirical Results for subgroups From Figure 2.3, we can attain four subgroups for canonical RD analysis. The first subgroup is target companies in the first stage. Figure 2.11 illustrates this subgroup. We can analyze the effect of the actual AA policy on women employment rate among the companies in the first stage by using this subgroup. We consider other subgroups (Figure 2.12) for comparison. Figure 2.11: Subgroup 1 40 (a) Subgroup 2 (b) Subgroup 3 (c) Subgroup 4 Figure 2.12: Subgroups Table 2.4: Descriptive Statistics of subgroups All Subgroup 1 Subgroup 2 Subgroup 3 Subgroup 4 Variable Mean (SD) Mean (SD) Mean (SD) Mean (SD) Mean (SD) X size 327.5 (352.5) 916.4 (373.1) 310.9 (332.0) 335.4 (361.7) 183.6 (123.4) S size -172.5 (352.5) 416.4 (373.1) -189.1 (332.0) -164.6 (361.7) -316.3 (123.4) size 0.196 (0.397) 1 (0) 0.171 (0.376) 0.208 (0.406) 0 (0) X rate 0.289 (0.240) 0.305 (0.252) 0.079 (0.067) 0.389 (0.228) 0.285 (0.237) S rate 0.114 (0.180) 0.120 (0.165) -0.057 (0.053) 0.196 (0.161) 0.113 (0.184) rate 0.323 (0.468) 0.281 (0.450) 1 (0) 0 (0) 0.333 (0.471) D 0.095 (0.293) 0.281 (0.450) 0.171 (0.376) 0 (0) 0 (0) Y 0.0031 (0.070) 0.0016 (0.053) 0.013 (0.053) -0.0014 (0.077) 0.0034 (0.074) Child 0.171 (0.377) 0.349 (0.477) 0.143 (0.351) 0.185 (0.388) 0.128 (0.334) Parent 0.886 (0.318) 0.944 (0.230) 0.811 (0.392) 0.921 (0.269) 0.872 (0.334) N 3266 641 1054 2212 2625 41 Table 2.4 presents descriptive statistics of all subgroups. Assignment and indicator vari- ables have undergone significant changes because subgroups are divided by assignment vari- ables. In subgroup 3, the dependent variable changed from positive to negative. We can interpret this that companies which have already hired enough women are free to hire new employees. Therefore, they hired more men (negative dependent variable). Bigger size com- panies have more childcare facilities (34.9%) compared to smaller size companies (12.8%). In order to install childcare facilities in the company, it requires a lot of resources. Because small companies have difficulty investing resources in amenities, bigger companies have more childcare facilities. However, parental leave does not require many resources compared to childcare facilities, so parental leave has no big difference among subgroups. Subgroup 1: S rate Table 2.5: RD analysis in subgroup 1 Coef. 95% C.I. Bandwidth (h) Eff. obs Number of obs - 0.007 [-0.017 , 0.041] 0.083 [147 , 149] [180 , 461] Covariates 0.005 [-0.018 , 0.028] 0.101 [153 , 167] [180 , 461] Clusting (reg) 0.009 [-0.012 , 0.032] 0.068 [133 , 122] [180 , 461] Clusting (ind) 0.008 [-0.017 , 0.033] 0.083 [147 , 149] [180 , 461] Cov and Clust (reg) 0.008 [-0.012 , 0.029] 0.075 [139 , 132] [180 , 461] Cov and Clust (ind) 0.007 [-0.014 , 0.025] 0.140 [163 , 207] [180 , 461] 42 −0.2 0.0 0.2 0.4 0.6 −0.4 −0.3 −0.2 −0.1 0.0 0.1 0.2 Score_rate Outcome (a) Scatter plot −0.025 0.000 0.025 0.050 −0.2 −0.1 0.0 0.1 0.2 Score Outcome (b) RD plot Figure 2.13: Scatter plot and RD plot for Subgroup 1 According to the raw scatter plot, we cannot detect any RD effect but the RD plot reveals a negative jump (the left region is the treatment region). The results of RD estimation and inference are in Table 2.5. The RD analysis considers the left region as the control region, so the sign of the coefficients is opposite to the actual treatment effect. However, there is no significant effect in different considerations. 800 900 1000 1100 −0.08 −0.04 0.00 0.04 0.08 Score # of employee (a) Company size 0.1 0.2 0.3 0.4 0.5 −0.04 0.00 0.04 Score child (b) Childcare 0.75 0.80 0.85 0.90 0.95 1.00 −0.08 −0.04 0.00 0.04 0.08 Score Parental_leave (c) Parental leave Figure 2.14: Graphical Illustration of Local Linear RD Effects for Covariates - Subgroup 1 Figure 2.14 and estimated effects for the covariates (Coefficient (p-value): 17.4 (.831) for Company size, 0.031 (.491) for Childcare and 0.024 (.915) for Parental leave) show there is no evidence that treatment and control groups differ systematically near the cutoff. 43 0 5 10 15 20 −0.04 0.00 0.04 Score Number of Observations (a) Histogram 0.0 2.5 5.0 7.5 10.0 −0.04 0.00 0.04 Score Density Control Treatment (b) Estimated Density Figure 2.15: Histogram and Estimated Density of the score - Subgroup 1 Figure 2.15 provides a histogram and an estimated density of the score with shaded 95% confidence intervals. As we can see in Figure 2.15, the density estimates near cutoff are very close and confidence intervals overlap. The value of the statistics from the density test is -0.9521 and the associated p-value is 0.3411. This implies that we fail to reject the null hypothesis that there is no difference in the density of control and treatment groups at the cutoff. 44 Subgroup 2: S size Table 2.6: RD analysis in subgroup 2 Coef. 95% C.I. Bandwidth (h) Eff. obs Number of obs - 0.020 [-0.003 , 0.052] 162.4 [136 , 53] [874 , 180] Covariates 0.016 [-0.002 , 0.039] 191.8 [166 , 61] [874 , 180] Clusting (reg) 0.016 [-0.008 , 0.047] 198.8 [172 , 62] [874 , 180] Clusting (ind) 0.020 [-0.011 , 0.061] 164.1 [138 , 53] [874 , 180] Cov and Clust (reg) 0.018 [0.001 , 0.039] 172.4 [147 , 54] [874 , 180] Cov and Clust (ind) 0.019 [-0.006 , 0.052] 168.9 [142 , 53] [874 , 180] −500 0 500 1000 1500 −0.2 0.0 0.2 0.4 0.6 0.8 Score Outcome (a) Scatter plot −0.02 −0.01 0.00 0.01 0.02 −500 0 500 1000 1500 Score Outcome (b) RD plot Figure 2.16: Scatter plot and RD plot for Subgroup 2 According to the raw scatter plot, we cannot detect any RD effect but the RD plot reveals a positive jump. The results of RD estimation and inference are in Table 2.6. Most cases are no significant effects but the case which adds covariates and regional clustering has a significant positive effect. Some cases also have a significant positive effect if we consider 90% confidence interval. 45 0.050 0.075 0.100 0.125 0.150 −100 0 100 Score (a) Women employment rate 0.0 0.1 0.2 0.3 0.4 −100 0 100 Score (b) Childcare 0.6 0.8 1.0 1.2 −200 −100 0 100 200 Score (c) Parental leave Figure 2.17: Graphical Illustration of Local Linear RD Effects for Covariates - Subgroup 2 Figure 2.17 and estimated effects for the covariates (Coefficient (p-value): 0.01 (.873) for Women employment rate, 0.114 (.355) for Childcare and -0.124 (.336) for Parental leave) show there is no evidence that treatment and control groups differ systematically near the cutoff. 0 2 4 6 −100 −50 0 50 100 Score Number of Observations (a) Histogram −0.0005 0.0000 0.0005 0.0010 0.0015 0.0020 −100 −50 0 50 100 Score Density Control Treatment (b) Estimated Density Figure 2.18: Histogram and Estimated Density of the score - Subgroup 2 As we can see in Figure 2.18, the density estimates near cutoff are very close and confi- dence intervals overlap. The value of the statistics from the density test is -0.4146 and the associated p-value is 0.6784. This implies that we fail to reject the null hypothesis that there is no difference in the density of control and treatment groups at the cutoff. 46 Subgroup 3: S size Table 2.7: RD analysis in subgroup 3 Coef. 95% C.I. Bandwidth (h) Eff. obs Number of obs - 0.018 [-0.016 , 0.058] 135.7 [208 , 139] [1750 , 462] Covariates 0.015 [-0.017 , 0.055] 139.2 [213 , 140] [1750 , 462] Clusting (reg) 0.019 [-0.010 , 0.053] 123.0 [190 , 131] [1750 , 462] Clusting (ind) 0.019 [-0.012 , 0.057] 121.8 [181 , 128] [1750 , 462] Cov and Clust (reg) 0.017 [-0.009 , 0.049] 130.2 [202 , 138] [1750 , 462] Cov and Clust (ind) 0.017 [-0.011 , 0.053] 126.8 [194 , 133] [1750 , 462] −500 0 500 1000 1500 −0.4 −0.2 0.0 0.2 0.4 0.6 Score Outcome (a) Scatter plot −0.025 0.000 0.025 0.050 −500 0 500 1000 1500 Score Outcome (b) RD plot Figure 2.19: Scatter plot and RD plot for Subgroup 3 According to the raw scatter plot, we cannot detect any RD effect but the RD plot reveals a positive jump. The results of RD estimation and inference are in Table 2.7. There is no significant RD effect in subgroup 3. 47 0.4 0.5 −200 −100 0 100 200 Score (a) Women employment rate 0.2 0.4 0.6 −100 −50 0 50 100 Score (b) Childcare 0.80 0.85 0.90 0.95 1.00 −100 −50 0 50 100 Score (c) Parental leave Figure 2.20: Graphical Illustration of Local Linear RD Effects for Covariates - Subgroup 3 Figure 2.20 and estimated effects for the covariates (Coefficient (p-value): 0.034 (.539) for Women employment rate, 0.065 (.454) for Childcare and 0.023 (.602) for Parental leave) show there is no evidence that treatment and control groups differ systematically near the cutoff. 0 5 10 15 −40 0 40 Score Number of Observations (a) Histogram −0.0005 0.0000 0.0005 0.0010 0.0015 −40 0 40 Score Density Control Treatment (b) Estimated Density Figure 2.21: Histogram and Estimated Density of the score - Subgroup 3 As we can see in Figure 2.21, the density estimates near cutoff are very close and confi- dence intervals overlap. The value of the statistics from the density test is 0.5564 and the associated p-value is 0.5779. This implies that we fail to reject the null hypothesis that there is no difference in the density of control and treatment groups at the cutoff. 48 Subgroup 4: S rate Table 2.8: RD analysis in subgroup 4 Coef. 95% C.I. Bandwidth (h) Eff. obs Number of obs - 0.009 [-0.002 , 0.022] 0.068 [584 , 456] [874 , 1751] Covariates 0.007 [-0.003 , 0.020] 0.068 [586 , 456] [874 , 1751] Clusting (reg) 0.009 [-0.000 , 0.021] 0.074 [616 , 484] [874 , 1751] Clusting (ind) 0.009 [-0.004 , 0.025] 0.070 [593 , 463] [874 , 1751] Cov and Clust (reg) 0.007 [-0.004 , 0.021] 0.093 [688 , 575] [874 , 1751] Cov and Clust (ind) 0.007 [-0.004 , 0.020] 0.066 [574 , 450] [874 , 1751] −0.2 0.0 0.2 0.4 0.6 0.8 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 Score_rate Outcome (a) Scatter plot −0.025 0.000 0.025 0.050 −0.2 −0.1 0.0 0.1 0.2 Score Outcome (b) RD plot Figure 2.22: Scatter plot and RD plot for Subgroup 4 According to the raw scatter plot, we cannot detect any RD effect but the RD plot reveals a negative jump (the left region is the treatment region). The results of RD estimation and inference are in Table 2.8. The RD analysis considers the left region as the control region, so the sign of the coefficients is opposite to the actual treatment effect. Most cases are no 49 significant effects in 95% confidence interval but some cases have a significant negative effect if we consider 90% confidence interval. 150 175 200 225 −0.04 0.00 0.04 Score # of employee (a) Company size 0.0 0.1 0.2 0.3 −0.05 0.00 0.05 Score child (b) Childcare 0.6 0.7 0.8 0.9 1.0 −0.02 0.00 0.02 Score Parental_leave (c) Parental leave Figure 2.23: Graphical Illustration of Local Linear RD Effects for Covariates - Subgroup 4 Figure 2.23 and estimated effects for the covariates (Coefficient (p-value): -0.832 (.959) for Company size, 0.025 (.562) for Childcare and 0.037 (.966) for Parental leave) show there is no evidence that treatment and control groups differ systematically near the cutoff. 0 20 40 60 −0.06 −0.03 0.00 0.03 0.06 Score Number of Observations (a) Histogram 0 2 4 6 −0.06 −0.03 0.00 0.03 0.06 Score Density Control Treatment (b) Estimated Density Figure 2.24: Histogram and Estimated Density of the score - Subgroup 4 As we can see in Figure 2.24, the density estimates near cutoff are very close and confi- dence intervals overlap. The value of the statistics from the density test is 0.1615 and the 50 associated p-value is 0.8717. This implies that we fail to reject the null hypothesis that there is no difference in the density of control and treatment groups at the cutoff. According to all canonical RD analysis, except some bandwidth selections, most cases have no significant RD effect. The results can be explained that the policy depends on voluntary participation of the companies who submitted their own implementation plan. Therefore, it is hard to control their actual implemented plans. Another reason is that non-targeted companies may appear to be making their own efforts not to be targeted next period. 2.4.3.3 MRD Empirical Results Optimal Bandwidths for MRD In this section, in order to simplify the results, we denote size = 1, rate = 2 (ex, X size = X 1 ;X rate = X 2 . For bandwidths h = (h 1 ;h 2 ) 0 , at first, we use Scott’s rule SD(S j )N 1=6 for j = 1; 2, and the optimal bandwidths described in Figure 2.25. For the common single bandwidth, the bandwidth selection rule gave (h 1 ;h 2 ) = (172.4, 0.074) with the square- neighborhood kernel and (191.8, 0.089) with the oval-neighborhood kernel. 5 When we al- lowed different bandwidths, the square-neighborhood kernel provided (h 1 ;h 2 ) = (232, 0.044), and the oval-neighborhood kernel provided (245, 0.055). Figure 2.25 illustrates the local ob- servations picked by these four distinct bandwidths. SinceCOR(S 1 ;S 2 ) = 0.004 in the data, the observations are distributed along the horizontal line. 5 Single bandwidth has two different bandwidths because we used normalized scores to attain optimal bandwidth. After single optimal bandwidth is calculated, we un-normalize bandwidth for each score. 51 (a) Square Neighbor h single bandwidth (b) Square Neighbor h1 h2 bandwidth (c) Oval Neighbor h single bandwidth (d) Oval Neighbor h1 h2 bandwidth Figure 2.25: Square and Oval Neighbors (1 & 2 Bandwidths) 52 Figure 2.26: Multivariate Kernel Density As we can see in Figure 2.26, the density estimates near cutoff are very close. This means that it is hard to say that there is a difference in the density of control and treated groups at the cutoff. MRD Estimations The estimation results for the effect of Affirmative Action in Korea in Table 2.9. In the first section, ‘Sq’ means square-neighborhood kernel, ‘Oval’ means oval-neighborhood ker- nel, ‘ROT’ means rule-of-thumb bandwidth, ‘CV1’ means CV with a single bandwidth, and ‘CV2’ means CV with two bandwidths. ‘N 1 N 4 ’ provides the local number of observa- tions in each quadrant, and ‘ j N j =N’ provides the fraction of local observations used in comparison to the total observations (N = 3266). The second section displays the treatment effect estimates using OLS (allowing for partial effect) and the other approaches from the literature. ‘BW’ means frontier approach which uses boundary-weight, ‘MIN’ means center- ing approach which uses minimum score, ‘RD1’ means univariate approach with S 1 j 2 = 1, and ‘RD2’ means univariate approach with S 2 j 1 = 1. The last section presents the partial 53 effect estimates by OLS. Since MIN, RD1 and RD2 use a single bandwidth with “square- neighborhood” kernel, their estimates are only shown in the ‘Sq’ columns. For inference, 95% and 90% confidence intervals (CI) were constructed using a bootstrap with 1,000 iterations. The statistical significance is determined by whether the confidence interval encompasses zero or not. 95% CIs are presented in Table 2.9 and 90% CIs are presented in Appendix. Table 2.9: Women Employment Rate Estimates for d , 1 and 2 with 95% CI ( 1 ; 2 ) Sq-ROT Sq-CV1 Sq-CV2 Oval-ROT Oval-CV1 Oval-CV2 N 1 (1; 1) 30 65 61 25 76 69 N 2 (0; 1) 51 160 306 42 191 304 N 3 (0; 0) 33 135 178 22 150 185 N 4 (1; 0) 17 45 36 13 51 39 j N j =N 0.055 0.171 0.246 0.043 0.198 0.253 Treatment Effect ( d ) OLS 0:041 (0:030;0:109) 0:049 (0:003;0:096) 0:036 (0:009;0:082) 0:037 (0:046;0:122) 0:052 (0:02;0:094) 0:027 (0:018;0:073) BW 0:009 (0:025;0:043) 0:009 (0:035;0:055) 0:012 (0:030;0:056) 0:019 (0:032;0:064) MIN 0:026 (0:030;0:075) 0:016 (0:019;0:051) 0:023 (0:036;0:074) RD1 0:044 (0:022;0:113) 0:048 (0:010;0:092) 0:052 (0:020;0:081) RD2 0:016 (0:074;0:040) 0:023 (0:067;0:019) 0:023 (0:086;0:036) Partial Effects ( 1 and 2 ) by OLS 1 0:026 (0:061;0:117) 0:005 (0:054;0:042) 0:017 (0:027;0:062) 0:023 (0:079;0:130) 0:008 (0:051;0:036) 0:020 (0:025;0:063) 2 0:002 (0:068;0:063) 0:030 (0:068;0:003) 0:001 (0:033;0:035) 0:002 (0:083;0:066) 0:026 (0:061;0:005) 0:002 (0:030;0:032) In Table 2.9, BW did not work with rule-of-thumb bandwidth because rule-of-thumb bandwidths (Sq-ROT and Oval-ROT) use too small sample (0.055 and 0.043) to have enough observations in each neighbor of all boundary points. Sq-ROT and Oval-ROT use too small 54 sample hence other bandwidths’ results are more trustable (Sq-CV1, Sq-CV2, Oval-CV1 and Oval-CV2). The treatment effects which fall in 0.049 0.052 are statistically significant in OLS. These numbers are similar to RD1’s results whose treatment effects fall in 0.048 0.052. This similarity is persuasive because there is no significant partial effect. If there is any significant partial effect, the MRD without partial effects such as BW, MIN, RD1 and RD2 are inconsistent. Overall MRD analysis, except some bandwidth selections, most cases have no significant RD effect. The results indicate that the policy is contingent on the voluntary participation of the firms that submit their own implementation plan. As a result, it is difficult to monitor their actual implementation of plans. Another argument is that non-targeted companies may appear to be proactively avoiding being targeted in the future. 2.5 Conclusion In this paper, a multiple-score regression discontinuity design (MRD) was used to analyze theeffectofAffirmativeActioninKoreaonthewomenemploymentrateintheprivatesector. In the empirical application, this paper allowed ‘partial effect’ for MRD unlike most other papers. From the results derived by various methods, the partial effect of Affirmative Action in Korea has no significant effect on the women employment rate but the treatment effect has significant effect depending on bandwidth selection. Most studies that analyze the effect of AffirmativeActioninKoreaconcludethatthereisnosignificanteffectonwomenemployment rate. It is because the policy depends on companies that submit their own implementation plan. Consequently, it is impossible to track the actual implementation of plans. Another possible scenario is that non-targeted companies also seem to put some efforts for women employment rate to avoid being targeted in the future. In terms of MRD analysis, MRD allowing partial effects has similar results with previous approaches. This is because the effect of AA does not have any significant partial effects; 55 therefore, the results of MRD without partial effects estimation have similar results with MRD with partial effects. Given that Affirmative Action in Korea aims at increasing women employment rate in the workforce, the findings support any women employment-raising effect of AA. However, there are opposite opinions. The rationale is as follows. First, both the lack of strong incentives for compliant companies and penalties for non-compliant companies are often cited as factors contributing to low corporate compliance. Furthermore, it may take more time for the effect of AA to be fully realized, since specific rules changed recently to establish more effective policy. ThecurrentAApolicyisalsooftencriticizedforitslimitationsintacklinggenderinequal- ity issues. Among other things, it currently covers only small portion of the total number of female workers, of which the vast majority are worked in small and medium-sized firms. It also focuses only on the overall size of female employment, ignoring both the quality of employment and wage inequality. 56 Chapter 3 Control Function Approach for Partly Ordered Endogenous Treatments: Military Rank Premium in Wage 1 3.1 Introduction In the treatment effect literature (see, e.g., Rosenbaum (2011); Lee (2008), Lee (2016); Pearl (2009); Morgan and Winship (2020); Imbens and Rubin (2019)), typically a binary treatment appears where the control group receives zero treatment. But there are many cases where the treatment is ordered and the control group receives, not zero treatment, but a treatment of a different kind; e.g., doses of a medicine are ordered, and its effect relative to a surgery/therapy (control treatment) may be of interest. An economic example is effects on unemployment duration of a job training length (ordered) versus unemployment insurance benefit (control treatment) as in Lee and Lee (2005). Also, military ranks are ordered and their effect on wage relative to non-veteran status can be of interest (Hirsch and Mehay (2003); Dechter and Elder (2004)). 1 This chapter was published at Oxford Bulletin of Economics and Statistics 2017; 79: 1176-1194. The final publication is available at: https://doi.org/10.1111/obes.12199. 57 ConsideracontroltreatmentD = 0versusRorderedtreatmentsD = 1;:::;R. Depending on the number of equations involved in generatingD = 0; 1;:::;R, three different cases arise: a single ordered response equation, R multinomial choice equations, and “in-between” cases with more-than-one but fewer-than-R equations. For multiple treatments in general, see Imbens (2000), Lechner (2001), Frolich (2004), Cattaneo (2010), Lee (2018), and references therein. To illustrate the “in-between” cases, let 1[A] = 1 ifA holds and 0 otherwise, and suppose that the decision for D = 0 may be of different type from choosing D = 1;:::;R such that two equations appear: one for D 0 = 0; 1 (binary), and the other for D r = 1;:::;R (ordered) with ‘r’ standing for rank: D 0i 1[0W 0 0i 0 +" 0i ]; D ri 1 + R1 X d=1 1[ d W 0 ri r +" ri ]; D i (1D 0i )D ri taking on 0; 1;:::;R (1.1) where W 0 and W r are regressors including unity, 0 and r are parameters, " 0 and " r are error terms, and 1 << R1 are thresholds. The observed treatment D is 0 iff D 0 = 1 (i.e., taking the control treatment), andD = 0 is not ranked with the other ordered numbers. Let W be the collection of the regressors in W 0 and W r . Whereas our treatment model is (1.1), we adopt the following linear model for response Y: Y i =X 0 i x + R X d=1 d di +U i where di 1[D i =d]; (1.2) X is a subvector of W with unity as its first element, W has an instrument Z, ’s are parameters, and U is an error term. This model can be generalized to allow for effect heterogeneity using interaction terms between treatments and X (but effect heterogeneity due to treatments interacting with U cannot be entertained). 58 Without endogeneity of D, least squares estimator (LSE) can be applied to (1.2); this is the conventional dummy-variable approach. If D is endogenous, instrumental variable estimator (IVE) is applicable as long as R instruments are available. But finding even a single instrument is hard in reality, not to mention as many as R. Facing the problem of instrument paucity for possibly endogenous treatments that are partly ordered as in (1.1), we develop control function (CF) approaches in this paper which are new in the literature. Our primary CF approach is nearly parametric, going against the current trend in econometrics. To overcome this shortcoming, we also explore a secondary CF approach based on a semiparametric double-index assumption. Before proceeding further, some words on notation are needed. ‘AqBjC’ means the conditionalindependencebetweenAandB givenC. Forparameters, say and, estimators are denoted as ^ and ^ . The standard normal density and distribution functions are denoted with and . Since we assume iid (independent and identically distributed) observations for individualsi = 1;:::;N, often the subscripti indexing individuals are omitted; when specific models are introduced, however, we keep i to make it clear what changes across i = 1;:::;N and what does not. For simplicity, we call multiple non-binary D just ‘multiple D’. In essence, our primary CF approach goes as follows under the assumption that, for some parameters 0 and r , (" 0 ;" r ) is jointly normal, (" 0 ;" r )qW, E(UjW;" 0 ;" r ) = 0 " 0 + r " r : (1.3) 1. Obtain the maximum likelihood estimator (MLE) ^ for all treatment parameters ( 0 0 ; 0 r ; 2 ;:::; R1 ; 0r ) 0 with 1 = 0 and 0r COR(" 0 ;" r ) (1.4) 59 and construct the usual binary-treatment CF 0 (W 0 0 0 ) (W 0 0 0 )=(W 0 0 0 ) for D 0 , and the CF’s for D r : jd (W ;) R W 0 r r+ d W 0 r r+ d1 R W 0 0 0 1 t j (t 0 ;t r ; 0r )dt 0 dt r R W 0 r r+ d W 0 r r+ d1 R W 0 0 0 1 (t 0 ;t r ; 0r )dt 0 dt r for j = 0;r; d = 1;:::;R; where 0 =1 and R =1, and (;; 0r ) is the bivariate normal density with unit variance and correlation 0r . 2. With 0u COV (" 0 ;U), do the LSE of Y on X; 1 ;:::; R ; D 0 0 (W 0 0 ^ 0 ); R X d=1 d 0d (W ; ^ ); R X d=1 d rd (W ; ^ ) for the parameters ! 1 ( 0 x ; 1 ;:::; R ; 0u ; 0 ; r ) 0 : (1.5) 3. The treatment exogeneity can be tested with H 0 : 0u = 0 = r = 0. To weaken the restrictions in (1.3), low-order polynomials of the two indices W 0 0 ^ 0 andW 0 r ^ r and their interactions with treatments may be used instead of the above CF’s, under the assumption that (" 0 ;" r ;U) follows an unknown distribution that may depend on W only through (W 0 0 0 ;W 0 r r ), which is our secondary CF approach. Therestofthispaperisorganizedasfollows. Section2introducesourtreatment-response modelsandtheCFapproaches. Section3appliesthemethodstotheaforementionedmilitary rank example, after reviewing the relevant literature related to effects of military ranks on wage. Finally, Section4concludesourfindings. Theappendixcontainsproofsandsimulation studies. 60 3.2 Treatment Effect Model and Estimators 3.2.1 Causal Model, Endogeneity and Instrument Consider a control treatment 0 versus ordered treatments D r = 1;:::;R observed only when the control treatment is not taken (i.e., D 0 = 0), where the control treatment is not ordered along with D r = 1;:::;R. This gives 1 +R partly ordered treatments: D (1D 0 )D r taking on 0; 1;:::;R (with D = 0 for the control treatment). Corresponding to the treatments are 1 +R potential responses Y d , d = 0; 1;:::;R. We are interested in the mean treatment effect on the population (or ‘dose-response’ relation) E(Y d Y 0 ), d = 1;:::;R. More generally, E(Y d Y d 0 ) with d 0 6= 0;d may be of interest, which can be found fromE(Y d Y 0 )E(Y d 0 Y 0 ). The observed response can be written as Y i =D 0i Y 0 i + R X d=1 di Y d i : What is observed is (D i ;W i ;Y i ), i = 1;:::;N. In our military rank example, D 0 = 1 is never joining the military, and D r = 1 4 being private, corporal, sergeant and officer, respectively. The observed treatment D takes on 0; 1;:::; 4 with 0 being non-veteran. For veterans (D 0 = 0), we can imagine their potential wage Y 0 as non-veterans. For non-veterans, we can imagine their potential military ranks and potential wages Y d with d6= 0 after getting discharged from the military, had they joined the military contrary to the fact. Our model is (1.1) with potential responses augmented: D 0i 1[0W 0 0i 0 +" 0i ]; D ri 1 + R1 X d=1 1[ d W 0 ri r +" ri ]; 1 << R1 Y 0 i =X 0 i x +U i and Y d i =Y 0 i + R X j=1 j ji for d = 1;:::;R (2.1) 61 where (W 0 ;W r ;X) consists of components of W such that at least one component of W 0 does not appear in X, and (" 0 ;" r ;U) are errors possibly related to one another; 1 = 0 for location normalization for D r . In the Y d equation, j increases the baseline Y 0 by j . This ‘parallel shift’ or ‘constant effect’ assumption can be relaxed by allowing for j ’s to interact with components of W. Substituting the potential response equations into Y i =D 0i Y 0 i + P R d=1 di Y d i gives Y i =X 0 i x + R X d=1 d di +U i : (2.2) If d ’sinteractwithsomeelementsofW, sayW , then 0 dw W d ’sshouldappearin(2.2)where dw is the slope vector; here, dw is to be construed as the original slope of W d minus the slope 0w ofW D 0 becausetheimplicitpresenceof 0 0w W D 0 = 0 0w W (1 P R d=1 d )subtracts 0w from dw . Given the exogeneity of W, if (" 0 ;" r ) is unrelated to U, then the parameters can be estimated by the LSE to (2.2) where the d ’s are exogenous. If d ’s are endogenous and there are enough instruments, sayZ, for the d ’s, then IVE can be applied to (2.2). In our military rank data, however, we have only a couple of instruments at most, which are insufficient for the IVE; instead, we construct CF’s to remove the treatment endogeneity. To illustrate how CF can circumvent instrument paucity, consider a continuously dis- tributed ~ D with a dummy instrument Z, and ~ D i = 0 + z Z i +" i ; ~ Y i = 0 + 1 ~ D i + 2 ~ D 2 i + ~ U i , ~ U i =" i + ~ V i with Z;"; ~ V independent of one another (2.3) where " and ~ V are error terms and is a parameter. Here, ~ D is endogenous because " affects both ~ D and ~ U, and IVE to the ~ Y equation fails because the single Z is not enough for two endogenous regressors ~ D and ~ D 2 . In contrast, CF approach uses the LSE residual ^ " from the ~ D equation as an extra regressor in the ~ Y equation, which makes ~ V the ~ Y 62 equation error term—hence no more endogeneity problem. Differently from (2.3), however, D is not continuously distributed in our partly ordered treatments, which makes getting a residual similar to ^ " impossible. Instead, imposing nearly parametric assumptions on (" 0 ;" r ;U)rendersCF’swhich, whenaddedtotheY equation, maketheY equationregressors exogenous. Those CF’s are often called ‘generalized residuals’; see Lee (2012) for more on CF approaches. In general, an instrument Z should meet three conditions: included in the treatment equation, excluded from theY equation, andE(ZU) = 0. In CF approach, strictly speaking, ifthespecifiedfunctionalformoftheoutcomeregressionandtheformofCFarebothcorrect, then no instrument is needed. Nevertheless, this is hardly ever the case in reality, and Z is used for two reasons in practice. First, when both forms are correct, having Z in the CF’s alleviates multicollinearity between X and the CF’s. Second, when the outcome regression function is overspecified, Z helps separating the selection problem from the overspecified regression function as follows. Suppose first that, for a regressorX k ,X 2 k is wrongly omitted from the regression function and that there is no selection problem. Since the CF’s are non-linear functions of W 0 0 0 and W 0 r r , their slopes may come out significant because the CF’s partly play the role of the omittedX 2 k . To avoid this problem, we may then over-specify the regression function. Now, suppose there is a selection problem and the regression function is over-specified. In this case, the CF’s may come out insignificant because the CF’s have no variation independent of X, for which Z helps because Z gives an independent variation to the CF’s despite the over-specified regression function. We just explained why we need a variable Z excluded from the outcome equation while included in the treatment equations to address a selection problem with CF’s. Do we need E(ZU) = 0 as well that is required forZ to be an instrument? In principle, no, because we can add a function ofZ into the outcome equation to account forE(UjZ) being a function of Z, assumingthatweknowitsfunctionalform. Butthen, therewillbenoexclusionrestriction 63 left. In short, if we know the functional forms of the outcome regression equation, CF’s and E(UjZ), then instruments are not needed for our approach, but this being unrealistic, we do require an instrument satisfying all three conditions mentioned above to address a selection model with CF’s. In our military rank data, there are two candidate instruments included in W 0 and excluded from X. We may also assume that W r has instruments to make things easier, but we refrain from this, as W r does not have plausible instruments in the data. 3.2.2 Primary Nearly-Parametric CF Approach Recall 0r COR(" 0 ;" r ) and suppose " 2 6 4 " 0 " r 3 7 5 N(0; ); 2 6 4 1 0r 0r 1 3 7 5 ; "qW; E(UjW;") = 0 " 0 + r " r (2.4) where SD(" 0 ) =SD(" r ) = 1 is a scale normalization for D 0 and D r . Estimating 0r is done with the MLE for (D 0 ;D r ), and the Y equation is estimated by LSE with CF’s. The linearity assumption in (2.4) consists of two parts: E(UjW;") =E(Uj"), andE(Uj") being a linear function of". (2.4) is only partially parametric, because the joint distribution of (" 0 ;" r ;U)isnotspecified. SinceU canbegeneratedbyU = 0 " 0 + r " r +error whereerror can be any mean-zero (asymmetric) variable, the assumption for U is much weaker than the normality ofU. Clearly, the linearity assumption holds if (" 0 ;" r ;U) is jointly normal. In the following, we provide more details of our CF approach that was outlined in the introduction; some derivations are relegated to the appendix. Recall 1 = 0 and ( 0 0 ; 0 r ; 2 ;:::; R1 ; 0r ) 0 , and define (" 0 ;" r ; 0r ) Z "r 1 Z " 0 1 (t 0 ;t r ; 0r )dt 0 dt r : 64 The MLE ^ is obtained by maximizing N X i=1 f D 0i lnP (D 0i = 1jW 0i ) + R X d=1 di lnP (D 0i = 0;D ri =djW i )g = N X i=1 f D 0i ln (W 0 0i 0 ) + R X d=1 di lnP (D 0i = 0;D ri =djW i )g: For the military rank example with R = 4, the appendix shows that P (D 0 = 0;D r = 1jW ) = (W 0 0 0 ;W 0 r r ; 0r )p 1 (W;); P (D 0 = 0;D r = 2jW ) = (W 0 0 0 ;W 0 r r + 2 ; 0r )p 1 (W;)p 2 (W;); P (D 0 = 0;D r = 3jW ) = (W 0 0 0 ;W 0 r r + 3 ; 0r )p 1 (W;)p 2 (W;)p 3 (W;); P (D 0 = 0;D r = 4jW ) = 1 (W 0 0 0 )p 1 (W;)p 2 (W;)p 3 (W;): The log-likelihood function is concave if 0r = 0, because 0r = 0 reduces the log- likelihood to the sum of the probit log-likelihood forD 0 and the ordered probit log-likelihood for D r j(D 0 = 0). If 0r 6= 0, however, there is no guarantee that the log-likelihood function is concave. It is thus recommended to obtain the probit estimator for D 0 and the ordered probit estimator for D r j(D 0 = 0) to use them as initial values in the above MLE, where doing a grid search over fixed values of 0r may be advantageous in terms of convergence. This kind of problem is not unique to this paper though, as estimating correlations such as 0r is never easy in any multiple choice framework. The appendix also shows that the second stage LSE is to be done for Y i =X 0 i x + R X d=1 d di + 0u D 0i 0 (W 0 0i ^ 0 ) + 0 R X d=1 di 0d (W i ; ^ ) + r R X d=1 di rd (W i ; ^ ) +V i where 0 (W 0 0i ^ 0 ) (W 0 0i ^ 0 ) (W 0 0i ^ 0 ) ; (2.5) 65 and jd ’s were defined between (1.4) and (1.5), and V is defined as U minus the selection correction terms. The parameter for this LSE is ! 1 ( 0 x ; 1 ;:::; R ; 0u ; 0 ; r ) 0 . If desired, interaction terms between covariates and d ’s can be allowed in X. D 0 does not appear as a regressor on its own, but the CF 0 (W 0 0 ^ 0 ) is still necessary, because those with D 0 = 1 appear implicitly with d = 0, d = 1; 2; 3; 4. The requisite assumptions for this primary, nearly-parametric CF-based endogeneity correction approach is (2.1) and (2.4). The appendix provides a simple simulation study to demonstrate that this CF approach works as it is supposed to. IfD 0 weretheonlytreatment,thenthesecond-stagewouldbetheLSEofY onfX;D 0 0 (W 0 0 ^ 0 )g. In this case, the asymptotic variance taking into account the first-stage error ^ 0 0 takes the form 1 + 0u A for a matrix A where 1 is the LSE asymptotic variance when 0 were known. The reason is that the ‘generated regressor’D 0 0 (W 0 0 ^ 0 ) enters the model additively with the slope 0u . Here the null hypothesis of interest ‘H 0 : 0u = 0’ for no endogeneity can be tested using 1 ignoring 0u A under the null. For our partly ordered treatments, the null hypothesis of no endogeneity is H 0 : 0u = 0 = r = 0. Since the generated regressors in (2.5) enter the model additively with the slopes ( 0u ; 0 ; r ), ‘H 0 : 0u = 0 = r = 0’ can be tested using the LSE asymptotic variance as if the first-stage parameter were known. Other hypothesis tests, however, require taking the first-stage error ^ into account, and the LSE asymptotic distribution for this is provided in the appendix. In our application, taking ^ into account makes hardly any difference, because the estimates for ( 0u ; 0 ; r ) turn out to be almost zero. Alternatively, nonparametric bootstrap resampling from the original sample with replacement may be done. 3.2.3 Secondary Double-Index CF Approach To relax the restrictive assumptions in (2.4), suppose (" 0 ;" r ;U) follows an unknown distribution that may depend onW only through (W 0 0 0 ;W 0 r r ) such that, for some functions 66 0 (W 0 0 0 ) and d (W 0 0 0 ;W 0 r r ), E(UjW;D 0 = 1) = 0 (W 0 0 0 ), E(UjW; d = 1) =E(UjW; D 0 = 0;D r =d) = d (W 0 0 0 ;W 0 r r ): Analogously to the appendix derivation for (2.5), this gives Y i =X 0 i x + R X d=1 d di +D 0i 0 (W 0 0i ^ 0 ) + R X d=1 di d (W 0 0i ^ 0 ;W 0 ri ^ r ) +V 0 i (2.6) where V 0 i is defined U i minus the correction terms. To make (2.6) operational, approximate the functions polynomially with 0 (W 0 0 0 ) = 00 + 01 W 0 0 0 + 02 (W 0 0 0 ) 2 d (W 0 0 0 ;W 0 r r ) = d0 + d1 W 0 0 0 + d2 (W 0 0 0 ) 2 + d3 (W 0 0 0 W 0 r r ) (2.7) where ’s are parameters. Using W 0 r r and (W 0 r r ) 2 as well in (2.7) for a full quadratic approximation does not work, because there is no excluded variable inW r for our data. This shortfall in approximation makes the following CF approach ‘secondary’. The appendix shows that substituting (2.7) into (2.6) gives Y =X 0 x + R X d=1 ( d0 00 + d ) d + 01 W 0 0 ^ 0 + 02 (W 0 0 ^ 0 ) 2 (2.8) + R X d=1 ( d1 01 ) d W 0 0 ^ 0 + R X d=1 ( d2 02 ) d (W 0 0 ^ 0 ) 2 + R X d=1 d3 d (W 0 0 ^ 0 W 0 r ^ r ) +V 0 : 67 The regressors and parameters of this LSE are X; 1 ;:::; R ; W 0 0 ^ 0 ; (W 0 0 ^ 0 ) 2 ; d W 0 0 ^ 0 ; d (W 0 0 ^ 0 ) 2 ; d W 0 0 ^ 0 W 0 r ^ r , d = 1;:::;R for ! 2 ( 0 x ; 1 ;:::; R ; d0 00 + d ; d = 1;:::;R; (2.9) 01 ; 02 and d1 01 ; d2 02 ; d3 ; d = 1;:::;R) 0 : In our empirical analysis, we also apply this semiparametric alternative. For this ap- proach, it is sufficient to have three conditions: (" 0 ;" r ;U) following a distribution that may depend on W only through (W 0 0 0 ;W 0 r r ), the approximation (2.7) being adequate, and avail- ability of an instrument Z to give a variation to W 0 0 0 that is independent of X. This kind of CF’s appeared in Newey et al. (1990), Melenberg and Soest (1996), and Lee (2017). 3.3 Empirical Analysis: Military Rank Premium Military ranks matter because effects of serving as a foot soldier are likely to differ much from those of serving as an officer. Officers may acquire leadership skills that foot soldiers do not, and sergeants may do so as well although to a lesser degree than officers. Indeed, the literature has much to show on this kind of heterogeneous human capital accumulations including leadership; see Case and Paxson (2008) and references therein. Differently from studies that ignored the distinctions among the enlisted ranks, we look at the three enlisted ranks (private, corporal and sergeant) along with officer to find their effects on wage relative to non-veterans. 3.3.1 Literature Related to Military Rank Effect on Wage Effects of military service on wage/earnings have been examined extensively with mixed findings. Since the characteristics of military service (combat experience, peace-time service, reservist, etc.) and compensating laws differ much (e.g., the GI Bill in the U.S. to help 68 education), military service effects are heterogeneous accordingly. Focusing on relatively recent studies, Angrist and Chen (2011) and Angrist et al. (2011) presented negative short- term effects of military service during the Vietnam era, which however disappeared in the long run as the veterans made up by increasing their schooling due to the GI Bill. Grenet et al. (2011) showed near zero effect for peacetime service during 1949-1960 using British data. Card and Cardoso (2012) used Portuguese draft data for men born in 1967 to find a positive effect for primary school graduates but no effect for better educated men. In contrast to the above studies for military service effects, studies examining military rank effects on wage or other work-related outcomes are scarce: we are aware of only Hirsch and Mehay (2003), Dechter and Elder (2004), Maclean (2008) who looked at only two ranks (officer and the enlisted ranks combined), and Grönqvist and Lindqvist (2016). Related to these studies, it is notable that Maclean and Edwards (2010) examined military rank effects on health, whereas Sampson and Laub (1996) and Maclean and Elder (2007) reviewed military service effects on various response variables (health, criminal, socioeconomic or marital outcomes). Examining the military rank effect studies in detail, Hirsch and Mehay (2003) used the ‘Reserve Component Surveys’ to compare the reservist non-veterans and reservist veterans with N = 41413. Hirsch and Mehay did matching on age and matching on the propensity score using age and race, along with the usual wage equation approach. Using only the reservists removes the first selection problem of joining themilitary versus remainingcivilian, but the findings there apply only to the reservists to limit the ‘external validity’ of the study. Hirsch and Mehay found 10% wage premium for officers and almost zero wage premium for the enlisted veterans. Dechter and Elder (2004) used ‘Stanford-Terman longitudinal data’ that followed children born around 1900 with IQ above 135 in California. Applying logit to a sample with N = 508, they found that officers are more likely to have a work-life progress compared with the other veterans as well as the non-veterans. Maclean (2008) applied LSE and a “multiple-indicator and multiple-cause” method to the Wisconsin longitudinal study 69 (N = 2960). MacLean found positive earnings and occupation-status effects for officers compared with the enlisted veterans and non-veterans. Grönqvist and Lindqvist (2016) applied regression discontinuity (RD) to Swedish data, as officers have to score above a threshold in a cognitive skill exam. Compared with regular soldiers, platoon officers have 0.05 higher probability of becoming a civilian manager, which amounts to 75% increase because the baseline probability is 0.067. Grönqvist and Lindqvist attributed this large effect to the officer training enhancing leadership-specific human capital. Regardless of whether one looks at only the military service dummy or the rank, there always is a possibility that those treatments are endogenous. For instance, officers may have a higher wage, not because of the military training/experience, but because they have higher ability than others. In the literature, three methods have been dominant in dealing with endogeneity: IVE, RD, and difference in differences (DD). Angrist (1989), Angrist (1990), AngristandChen(2011)andAngrist et al.(2011)usedtheVietnam-eradraftlotterynumber as instruments. Angrist and Krueger (1994) used birth quarter dummies as instruments for World War II veterans, as men were drafted in the chronological order of birth. Imbens and Van Der Klaauw (1995) used government-induced variations in the conscription rate for different birth cohorts in the Netherlands as instruments. Grenet et al. (2011) and Bauer et al. (2012) used RD, because getting called for the military service depends on age crossing a cutoff. DD has been used, e.g., by Card and Cardoso (2012). 3.3.2 Descriptive Statistics and LSE Our data source, the Wisconsin Longitudinal Study, contains about 1/3 random sample of the 1957 Wisconsin high school graduates. Using only the working males who were not in the military in the current year (1974), 57% in the data are veterans because the cohort overlaps with the Vietnam era. The original sample size was 4991, and our final sample size became 3172 after deleting missing observations for 1957 parental income (258), 1957 number of activities (553), 1964 schooling (398), zero wage (336) and so on. 70 The US military has in fact many ranks, which are aggregated into just four in this paper: private (enlisted ranks E1-E3), corporal (enlisted rank E4), sergeant (enlisted rank E5-E9) and officer (warrant officers W1-W5 and officers O1-O11). Do these US military ranks fit the model (2.1)? The promotion process in the US military is quite complex, which we illustrate with the Army in the following. For enlisted ranks, each year the US Congress determines what percentage can serve in each enlisted rank above E4 (no limits for E4 and below). Then the Army assigns the slots to each ‘military occupation specialty’, and promotion can be done within this limit using three systems: (i) “decentralized” for E2 to E4 where the commander of the unit (company) decides mostly using time-in-grade and time-in-service, (ii) “semi-centralized” for E5 and E6 where the unit plays a part but it is the Army that decides considering time- in-grade, time-in-service, duty performance (competence, military bearing, leadership, etc.), awards/decorations, education and training, and (iii) “centralized” for E7 to E9 where the Army decides. For officers, there are four routes: the U.S. Military Academy, the Army Reserve Officers’ Training Corps, the Officer Candidate School, and direct appointment. All in all, there is no way for a single model to incorporate this kind of promotion complexity, and our model is a simple approximation to the reality, just as most models are. 71 Table 3.1: Mean (SD) of Variables (N = 3172) 1356 Non-Veterans 1816 Veterans 1974 wage (exp(Y )) 15,941 (8,083) 15,374 (7,472) 1974 schooling years 14.5 (2.42) 13.6 (1.93) 1957 parent wage 6,458 (6,111) 6,330 (5,513) 1957 # activities 1.40 (1.50) 1.38 (1.47) 1957 IQ 103 (16.0) 100 (14.5) 1957 father alive 0.952 0.951 1957 mother alive 0.975 0.977 1957 any religion 0.789 0.758 1957 friend military 0.097 0.219 1974 single 0.073 0.059 1974 married 0.875 0.895 private ..... 0.376 corporal ..... 0.349 sergeant ..... 0.202 officer ..... 0.073 Table 3.1 shows the mean and SD of the variables, where exp(Y ), schooling years, ‘single’ and ‘married’ are for 1974, and the variables preceded by ‘1957’ are for the graduation year 1957. ‘# activities’ denotes the number of activities that the person participated in actively, which reflects how outgoing or active the person is. The veterans are about one year less educated than the non-veterans and have a much higher proportion of ‘friend military’ (whether or not they have any friends who joined the military); in the other variables, the veteran and non-veteran differences are rather small. The proportion of officer is fairly low (0.073) among the veterans. 72 Table 3.2 presents two LSE’s except the intercept estimates; the response variable is Y = ln(wage), and the slope estimates and their t-values in () are shown. In the column ‘LSE-Military’, the military dummy is used, whereas the detailed rank dummies are used in the column ‘LSE-Ranks’. The two columns are similar except for the slopes of the military dummy and ranks. Table 3.2: LSE with Military Rank Dummies: ^ lse (t-value) LSE-Military LSE-Ranks 1974 schooling years 0.042 (9.34) 0.038 (8.39) 1957 ln(parent wage) 0.084 (6.42) 0.083 (6.36) 1957 # activities 0.015 (2.03) 0.014 (1.96) 1957 IQ/100 0.410 (6.51) 0.395 (6.25) 1957 father alive -0.098 (-2.95) -0.095 (-2.89) 1957 mother alive -0.038 (-0.90) -0.042 (-1.00) 1974 single -0.190 (-2.99) -0.190 (-3.00) 1974 married 0.106 (2.38) 0.104 (2.33) military 0.014 (0.80) private -0.020 (-0.84) corporal 0.009 (0.45) sergeant 0.008 (0.29) officer 0.165 (3.07) R 2 0.127 0.131 In Table 3.2, schooling years, parental wage, # activities, IQ and father alive are signif- icant. The wage of the married tends to be higher than the wage of the baseline (neither single nor married), which in turn is higher than the wage of singles. In the LSE-Military 73 column, the military service effect is statistically insignificant and small. In the LSE-Ranks column, only the officer effect is statistically significant and large (0.165), whereas the other rank effects are insignificant and small. The large officer effect but almost no effect for the enlisted ranks is in line with the findings in the military rank effect literature. 3.3.3 Control Function Approach Results Table 3.3 presents the first-stage MLE ^ 0 and ^ r , and the second-stage CF LSE. ‘ ^ - Normal’ uses the CF’s derived under (2.4), and ‘ ^ -Index’ uses the 14 CF’s in (2.9) following the semiparametric double-index approach in (2.6)-(2.9); we show only two CF’s for D 0 in the ^ -Index column of Table 3.3, and put the other CF’s for d ’s in Table 3.4. As was mentioned already, we would like to use W 0 r ^ r and its square as well in ^ -Index, which is however impossible because there is no variable in W r excluded from X whereas W 0 has ‘friend military’ and religion excluded from X. As in Table 3.2, the intercepts are omitted in Table 3.3; also omitted are the two ordered probit thresholds. InTable3.3, excluding‘friendmilitary’andreligionfromtheD r equationisnotcriticalas we can simply put the two variables there. Excluding them from theY equation is, however, critical so that the two variables can serve as instruments. If the two variables appear in the Y equation, then the significance of the CF’s cannot be necessarily taken as removing the treatment endogeneity, because the CF’s might be picking up misspecifications in the Y regression function. We note that, as in almost any instrument, there are possible scenarios to make ‘friend military’ and religion invalid instruments; e.g., people with lower ability may prefer the military and they hang around with such friends and low ability affects wage negatively, people with sincerity tend to be religious and sincerity affects wage positively, and so on. Since ^ 0r =0:069 with t-value0:293 in the MLE for (D 0 ;D r ), ‘H 0 : 0r = 0’ is not rejected. Because the Wald test for 0u = 0 = r = 0 in ^ -Normal has the p-value 0:463, the H 0 of no treatment endogeneity is not rejected. For ^ -Index, one CF in Table 3.4 is 74 significant with t-value 2.06 and there is another CF that is nearly so with t-value 1.63. Note that, with 14 CF’s, there would be about one CF that is falsely significant because 14 0:05 = 0:7. Whereas no rank dummies are significant in ^ -Normal, officer in ^ -Index is nearly so with estimate 0.114 and t-value 1.59; the officer effect in ^ -Index is more modest. 75 Table 3.3: MLE for (D 0 ;D r ) and Endogeneity Correction for Y: estimate (t-value) ^ 0 for Civilian ^ r for Rank ^ -Normal ^ -Index 1974 schooling years 0.123 (9.62) 0.117 (4.54) 0.039 (4.46) 0.034 (3.29) 1957 ln(parent wage) -0.098 (-2.77) 0.011 (0.28) 0.083 (6.07) 0.086 (5.93) 1957 # activities -0.028 (-1.76) 0.052 (3.01) 0.014 (1.82) 0.015 (2.00) 1957 IQ/100 -0.170 (-0.99) 0.703 (3.55) 0.396 (5.73) 0.417 (6.36) 1957 father alive 0.085 (0.79) 0.009 (0.07) -0.095 (-2.90) -0.096 (-2.87) 1957 mother alive -0.082 (-0.54) 0.102 (0.49) -0.041 (-0.99) -0.045 (-1.05) 1957 religion 0.177 (3.15) 1957 friend military -0.494 (-7.41) 1974 single -0.190 (-3.02) -0.185 (-2.91) 1974 married 0.105 (2.34) 0.105 (2.34) private -0.003 (-0.02) -0.053 (-1.54) corporal 0.008 (0.07) -0.004 (-0.12) sergeant -0.005 (-0.04) -0.025 (-0.57) officer 0.156 (0.88) 0.114 (1.59) 0 (W 0 0 ^ 0 ) -0.035 (-0.39) P d d 0d (W ; ^ ) 0.051 (0.65) P d d rd (W ; ^ ) 0.016 (0.19) W 0 0 ^ 0 0.039 (0.47) (W 0 0 ^ 0 ) 2 -0.039 (-0.42) log-likelihood 4258:009 R 2 = 0:131 R 2 = 0:136 76 Table 3.4: Correction Terms under Double Index Assumption: estimate (t-value) 1 W 0 0 ^ 0 -0.065 (-0.51) 1 (W 0 0 ^ 0 ) 2 0.162 (1.08) 1 W 0 0 ^ 0 W 0 r ^ r 0.434 (2.06) 2 W 0 0 ^ 0 0.125 (0.76) 2 (W 0 0 ^ 0 ) 2 0.259 (1.63) 2 W 0 0 ^ 0 W 0 r ^ r -0.313 (-1.19) 3 W 0 0 ^ 0 -0.187 (-0.77) 3 (W 0 0 ^ 0 ) 2 -0.022 (-0.08) 3 W 0 0 ^ 0 W 0 r ^ r 0.300 (0.86) 4 W 0 0 ^ 0 -0.390 (-0.65) 4 (W 0 0 ^ 0 ) 2 1.567 (1.30) 4 W 0 0 ^ 0 W 0 r ^ r 0.021 (0.02) 3.4 Conclusions In treatment effect analysis, typically the treatment is binary or ordered, with zero treat- ment meaning ‘no treatment’. But zero is a special number, and it sometimes stands for control treatment of a different kind. In this paper, we examined how to find the effects of or- dered, possibly endogenous, treatments (e.g., military ranks) relative to a control treatment (e.g., non-veteran) when the control treatment is not ordered along with the treatments of interest; i.e., thetreatmentsarepartlyordered. Inthiscase, adoubledecisionproblemarises: choosing the ordered treatment of interest over the control treatment, and then selecting the level of the ordered treatment if the control treatment is not taken. Our main methodological contribution is proposing a nearly parametric endogeneity- correcting control function (CF) approach for partly ordered treatments to allow for treat- ment endogeneity. To relax restrictive parametric assumptions in this method, we also suggested a supplementary CF approach under a semiparametric double-index assumption. Our empirical contribution is applying these methods to find the effects on wage of military ranks relative to non-veterans. The empirical finding is that the rank effects differ: large pos- itive officer effects, and near-zero enlisted-rank effects. Using only a single dummy variable for military service would be misleading. 77 Bibliography Angrist, J. D. (1989). Using the draft lottery to measure the effect of military service on civilian labor market outcomes. Research in Labor Economics, 10, 265–310. — (1990). Lifetime earnings and the vietnam era draft lottery: evidence from social security administrative records. American Economic Review, 80, 313–336. — and Chen, S. H. (2011). Schooling and the vietnam-era gi bill: Evidence from the draft lottery. American Economic Journal: Applied Economics, 3 (2), 96–118. —, — and Song, J. (2011). Long-term consequences of vietnam-era conscription: New estimates using social security data. American Economic Review, 101 (3), 334–338. — and Krueger, A. B. (1994). Why do world war ii veterans earn more than nonveterans? Journal of Labor Economics, 12 (1), 74–97. Bauer, T. K., Bender, S., Paloyo, A. R. and Schmidt, C. M. (2012). Evaluating the labor-market effects of compulsory military service. European Economic Review, 56 (4), 814–829. Bertanha, M. (2020). Regression discontinuity design with many thresholds. Journal of Econometrics, 218 (1), 216–241. Calonico, S., Cattaneo, M. D. and Farrell, M. H. (2018). Optimal Bandwidth Choice for Robust Bias Corrected Inference in Regression Discontinuity Designs. Papers 1809.00236, arXiv.org. —, — and Titiunik, R. (2014). Robust nonparametric confidence intervals for regression- discontinuity designs. Econometrica, 82 (6), 2295–2326. —, — and — (2015). Optimal data-driven regression discontinuity plots. Journal of the American Statistical Association, 110 (512), 1753–1769. Card, D. and Cardoso, A. R. (2012). Can compulsory military service raise civilian wages? evidence from the peacetime draft in portugal. American Economic Journal: Ap- plied Economics, 4 (4), 57–93. Case, A. and Paxson, C. (2008). Stature and status: Height, ability, and labor market outcomes. Journal of Political Economy, 116 (3), 499–532. 78 Cattaneo, M. D. (2010). Efficient semiparametric estimation of multi-valued treatment effects under ignorability. Journal of Econometrics, 155 (2), 138–154. —, Keele, L., Titiunik, R. and Vazquez-Bare, G. (2016). Interpreting regression dis- continuity designs with multiple cutoffs. The Journal of Politics, 78 (4), 1229–1248. Chang, J., Cho, J., Lee, J., Jo, Y., Shin, D., Sung, S. and Kim, H. (2006). Discrimi- nation in the labor market and affirmative action. Korea Labor Institute. Cho, J., Kwon, T. and Ahn, J. (2010). Half success, half failure in korean affirmative action: An empirical evaluation on corporate progress. Womens Studies International Forum, 33 (3), 264–273. Choi, J.-Y.andLee, M.-J.(2018).Regressiondiscontinuitywithmultiplerunningvariables allowing partial effects. Political Analysis, 26 (3), 258–274. Coate, S. and Loury, G. C. (1993). Will affirmative-action policies eliminate negative stereotypes? American Economic Review, 83 (5), 1220–1240. Cunningham, S. (2021). Causal Inference: The Mixtape. Yale University Press. Dechter, A.R.andElder, J.G.H.(2004).Worldwariimobilizationinmen’sworklives: Continuity or disruption for the middle class? American Journal of Sociology, 110 (3), 761–793. Dong, Y. (2018). Alternative assumptions to identify late in fuzzy regression discontinuity designs. Oxford Bulletin of Economics and Statistics, 80 (5), 1020–1027. — and Lewbel, A. (2015). Identifying the effect of changing the policy threshold in regres- sion discontinuity models. The Review of Economics and Statistics, 97 (5), 1081–1092. Frandsen, B. R., Frölich, M. and Melly, B. (2012). Quantile treatment effects in the regression discontinuity design. Journal of Econometrics, 168 (2), 382–395. Frolich, M. (2004). Programme evaluation with multiple treatments. Journal of Economic Surveys, 18 (2), 181–224. Gelman, A. and Imbens, G. (2019). Why High-Order Polynomials Should Not Be Used in Regression Discontinuity Designs. Journal of Business and Economic Statistics, 37 (3), 447–456. Grenet, J., Hart, R. A. and Roberts, J. E. (2011). Above and beyond the call: Long- termrealearningseffectsofbritishmalemilitaryconscriptioninthepost-waryears. Labour Economics, 18 (2), 194–204. Grönqvist, E. and Lindqvist, E. (2016). The making of a manager: Evidence from military officer training. Journal of Labor Economics, 34 (4), 869–898. Hahn, J., Todd, P. and Van Der Klaauw, W. (2001). Identification and estimation of treatment effects with a regression-discontinuity design. Econometrica, 69 (1), 201–209. 79 Heckman, J. J. (1979). Sample selection bias as a specification error. Econometrica,47 (1), 153–161. Hirsch, B. T. and Mehay, S. L. (2003). Evaluating the labor market performance of veterans using a matched comparison group design. Journal of Human Resources, 38 (3), 673–700. Holzer, H. and Neumark, D. (1999). Are affirmative action hires less qualified? evidence from employer-employee data on new hires. Journal of Labor Economics, 17 (3), 534–569. Holzer, H. J. and Neumark, D. (2000). What does affirmative action do? Industrial and Labor Relations Review, 53 (2), 240. Imbens, G. W. (2000). The role of the propensity score in estimating dose-response func- tions. Biometrika, 87 (3), 706–710. — and Kalyanaraman, K. (2012). Optimal bandwidth choice for the regression disconti- nuity estimator. The Review of Economic Studies, 79 (3), 933–959. — and Lemieux, T. (2008). Regression discontinuity designs: A guide to practice. Journal of Econometrics, 142 (2), 615–635. — andRubin, D. B. (2019).Causal inference: for statistics, social, and biomedical sciences: an introduction. Cambridge Univ. Press. — and Van Der Klaauw, W. (1995). Evaluating the cost of conscription in the nether- lands. Journal of Business and Economic Statistics, 13 (2), 207–215. — and Wager, S. (2019). Optimized Regression Discontinuity Designs. The Review of Economics and Statistics, 101 (2), 264–278. —andZajonc, T.(2009).Regressiondiscontinuitydesignwithvector-argumentassignment rules. unpublished paper. Jeon, M. and Kim, H. (2008). Affirmative action: 2-year evaluation and challenges. Korea Labor Institute. Keele, L. J. andTitiunik, R. (2015). Geographic boundaries as regression discontinuities. Political Analysis, 23 (1), 127–155. Kim, T., Kang, M. andKwon, T. (2010). Performance evaluation of the affirmative action in korea and strategies to improve its effectiveness. Korea Women Development Institute. Lechner, M. (2001). Identification and estimation of causal effects of multiple treatments undertheconditionalindependenceassumption. Econometric Evaluation of Labour Market Policies ZEW Economic Studies, p. 43–58. Lee, D. S. andCard, D. (2008). Regression discontinuity inference with specification error. Journal of Econometrics, 142 (2), 655–674. 80 — and Lemieux, T. (2010). Regression discontinuity designs in economics. Journal of Eco- nomic Literature, 48 (2), 281–355. Lee, M.-j. (2008). Micro-econometrics for policy, program, and treatment effects. Oxford Univ. Press. — (2012). Semiparametric estimators for limited dependent variable (ldv) models with en- dogenous regressors. Econometric Reviews, 31 (2), 171–214. — (2016). Matching, Regression Discontinuity, Difference in Differences, and Beyond. Ox- ford University Press. — (2017). Extensive and intensive margin effects in sample selection models: racial effects on wages. Journal of the Royal Statistical Society: Series A (Statistics in Society), 180 (3), 817–839. — and Lee, S.-J. (2005). Analysis of job-training effects on korean women. Journal of Applied Econometrics, 20 (4), 549–562. Lee, Y.-Y. (2018). Efficient propensity score regression estimators of multivalued treatment effects for the treated. Journal of Econometrics, 204 (2), 207–222. Leonard, J. (1984). The impact of affirmative action on employment. Journal of Labor Economics, 2 (4), 439–463. Leonard, J. S. (1990). The impact of affirmative action regulation and equal employment law on black employment. Journal of Economic Perspectives, 4 (4), 47–63. Ludwig, J. and Miller, D. L. (2007). Does head start improve childrens life chances? evidence from a regression discontinuity design. The Quarterly Journal of Economics, 122 (1), 159–208. Maclean, A. (2008). The privileges of rank: The peacetime draft and later-life attainment. Armed Forces and Society, 34 (4), 682–713. — and Edwards, R. D. (2010). The pervasive role of rank in the health of u.s. veterans. Armed Forces and Society, 36 (5), 765–785. — and Elder, G. H. (2007). Military service in the life course. Annual Review of Sociology, 33 (1), 175–196. McCrary, J. (2008). Manipulation of the running variable in the regression discontinuity design: A density test. Journal of Econometrics, 142 (2), 698–714, the regression discon- tinuity design: Theory and applications. Melenberg, B. and Soest, A. V. (1996). Parametric and semi-parametric modelling of vacation expenditures. Journal of Applied Econometrics, 11 (1), 59–76. Morgan, S. L. and Winship, C. (2020). Counterfactuals and causal inference: methods and principles for social research. Cambridge University Press. 81 Newey, W., Powell, J. and Walker, J. (1990). Semiparametric estimation of selection models: some empirical results. American Economic Review, 80, 324–328. Paola, M. D., Scoppa, V. and Lombardo, R. (2010). Can gender quotas break down negative stereotypes? evidence from changes in electoral rules. Journal of Public Eco- nomics, 94 (5-6), 344–353. Papay, J. P., Willett, J. B. and Murnane, R. (2011). Extending the regression- discontinuityapproachtomultipleassignmentvariables. Journal of Econometrics,161(2), 203–207. Pearl, J. (2009). Causality. Cambridge University Press. Porter, J. (2003). Estimation in the regression discontinuity model. Unpublished manuscript, Department of Economics, Harvard University. Reardon, S. F. and Robinson, J. P. (2012). Regression discontinuity designs with multi- ple rating-score variables. Journal of Research on Educational Effectiveness,5 (1), 83–104. Rosenbaum, P. R. (2011). Observational studies. Springer. Sampson, R. J. and Laub, J. H. (1996). Socioeconomic achievement in the life course of disadvantaged men: Military service as a turning point, circa 1940-1965. American Sociological Review, 61 (3), 347–367. Smith, J. P. and Welch, F. (1984). Affirmative action and labor markets. Journal of Labor Economics, 2 (2), 269–301. Thistlethwaite, D. L. and Campbell, D. T. (1960). Regression-discontinuity analysis: An alternative to the ex post facto experiment. Journal of Educational Psychology, 51 (6), 309–317. Van der Klaauw, W. (2002). Estimating the effect of financial aid offers on college en- rollment: A regression-discontinuity approach. International Economic Review, 43 (4), 1249–1287. Wong, V.C.,Steiner, P.M.andCook, T.D.(2013).Analyzingregression-discontinuity designs with multiple assignment variables. Journal of Educational and Behavioral Statis- tics, 38 (2), 107–141. Zajonc, T. (2012). Essays on causal inference for public policy. Doctoral dissertation, Har- vard University. 82 Appendix A Appendix to Chapter 1 A.1 Additional Figures Figure A.1: Sharp vs. Fuzzy Regression Discontinuity Design Source: Cunningham (2021) 83 Figure A.2: Non-cumulative versus cumulative cutoffs in multiple cutoffs RD designs Source: Figure in Cattaneo et al. (2016), Journal of Politics X 2 X 1 D i = 1 D i = 0 D i = 0 D i = 0 Figure A.3: An illustration of MRD 84 Appendix B Appendix to Chapter 2 B.1 Additional Figures B.1.1 OECD data for employment rate and gender wage gap Figure B.1: Women Employment Rate, % of working age population aged 15 to 64, 2019 in the OECD countries Source: OECD (2021), Employment rate (indicator) 85 Figure B.2: Men Employment Rate, % of working age population aged 15 to 64, 2019 in the OECD countries Source: OECD (2021), Employment rate (indicator) Figure B.3: Gender Wage Gap, % of male median wage, 2019 in the OECD countries Source: OECD (2021), Gender wage gap (indicator) 86 B.1.2 Detailed Affirmative Action Implementation Procedure Figure B.4: Detailed 1st stage Figure B.5: Detailed 2nd stage 87 Figure B.6: Detailed 3rd stage 88 B.1.3 Changes of the treatment groups Figure B.7: Changes of targeted companies(blue area) in the first stage over time Figure B.8: Changes of the treatment groups(red area) over time 89 B.1.4 MRD graphical analysis (3D) Figure B.9: Local Linear Plane (LLP) near cutoffs 90 Figure B.10: LLP by the scores near cutoffs: Square 1 bandwidth 91 Figure B.11: LLP by the scores near cutoffs: Square 2 bandwidths 92 Figure B.12: LLP by the scores near cutoffs: Oval 1 bandwidth 93 Figure B.13: LLP by the scores near cutoffs: Oval 2 bandwidths 94 B.2 Additional Tables Table B.1: Variables Description Variable Explanation Id ID of Companies Year Year (2009, 2011, 2013, 2015) Assignment variables and related variables Wemp Women employment rate (now) Empnum Total number of employee Nextwemp Women employment rate at next period (now + 2 year) Avg-wemp Average female employment rate by industry and size Indicator variables Indicator-rate Whether female employment rate crosses a cutoff or not Indicator-size Whether total number of employee crosses a cutoff or not Treatment Whether both scores cross cutoffs or not Covariates Child Childcare support (1: company provides childcare support, 0: none) Parent Parental leave (1: company provides parental leave, 0: none) Region Regions (11-Seoul, 21-Busan, 22-Daegu, 23-Incheon, 24-Gwangju, 25- Daejeon, 26-Ulsan, 29-Sejong, 31-Gyeonggi, 32-Gangwon-do, 33-Chungbuk, 34-Chungnam, 35-Jeonbuk, 36-Jeonnam, 37-Gyeongsangbuk-do, 38- Gyeongsangnam-do, 39-Jeju) Industry Industry code (9th) to calculate cutoff rate by industry - *Table B.2 95 Table B.2: Industry code (9th) 1 Light industry 1: Grocery, beverage, tobacco manufacturing 2 Light Industry 2: Textile product manufacturing, clothing products manufacturing 3 Light Industry 3: Wood products, Pulp–Paper products, Furniture manufacturing 4 Chemical Industry 5 Heavy Industries 6 Electronics industry: Manufacturing electronic components 7 Electricity, Gas and Waterworks 8 Sewage, Waste disposal, raw material regeneration and environmental restoration 9 Construction industry 10 Wholesale and Retail and Accommodation business 11 Restaurant business 12 Transport: Land transport and pipeline transport 13 Air transportation industry 14 Publishing, video, broadcasting and information services 15 Financial and insurance industries 16 Real Estate and Rental 17 R&D and professional services-related businesses 18 Technology services related business 19 Business related to business facility management 20 Business Support Services Industry 21 Education service industry 22 Health and social welfare services 23 Arts, sports and leisure-related services 24 Associations and organizations, repair and other personal services 96 Table B.3: Women Employment Rate (%) Cutoffs in 2015 1000 employees 500 employees < 1000 Industry codes average 70% average 70% Total 38.22 26.75 36.87 25.81 1 36.01 25.21 38.83 27.18 2 54.41 38.09 55.25 38.68 3 26.05 18.24 19.8 13.86 4 18.02 12.61 20.4 14.28 5 7.81 5.47 11.08 7.76 6 32.52 22.76 25.07 17.55 7 11.68 8.18 10.73 7.51 8 7.16 5.01 14.33 10.03 9 10.31 7.22 8.18 5.73 10 52.82 36.97 52.55 36.79 11 52.73 36.91 50.43 35.30 12 14.24 9.97 9.00 6.30 13 53.72 37.60 45.22 31.65 14 34.23 23.96 30.91 21.64 15 47.33 33.13 38.35 26.85 16 23.55 16.49 31.46 22.02 17 30.64 21.45 36.31 25.42 18 12.72 8.90 14.79 10.35 19 36.68 25.68 39.63 27.74 20 55.79 39.05 52.23 36.56 21 42.94 30.06 50.75 35.53 22 70.09 49.06 70.88 49.62 23 45.79 32.05 44.59 31.21 24 46.12 32.28 37.92 26.54 97 Table B.4: Estimates for Treatment and Partial Effects with 90% CI Sq-ROT Sq-CV1 Sq-CV2 Oval-ROT Oval-CV1 Oval-CV2 Treatment Effect ( d ) by OLS and Other Estimators OLS 0:005 (0:07;0:06) 0:044 (0:01;0:08) 0:017 (0:02;0:06) 0:023 (0:08;0:04) 0:034 (0:001;0:06) 0:028 (0:004;0:06) BW 0:030 (0:01;0:05) 0:018 (0:01;0:04) 0:027 (0:003;0:05) 0:018 (0:01;0:04) MIN 0:005 (0:04;0:03) 0:009 (0:02;0:04) 0:005 (0:04;0:03) RD1 0:022 (0:04;0:07) 0:040 (0:01;0:07) 0:038 (0:01;0:06) RD2 0:013 (0:05;0:02) 0:007 (0:02;0:03) 0:013 (0:05;0:02) Partial Effects ( 1 and 2 ) by OLS 1 0:001 (0:09;0:08) 0:007 (0:03;0:04) 0:011 (0:03;0:05) 0:015 (0:08;0:10) 0:002 (0:03;0:03) 0:013 (0:04;0:02) 2 0:047 (0:11;0:02) 0:011 (0:04;0:01) 0:011 (0:04;0:01) 0:045 (0:10;0:02) 0:010 (0:03;0:01) 0:032 (0:05;0:005) Sq: square-neighbor kernel, Oval: oval-neighbor kernel RT: rule of thumb bandwidth, CV1: CV with 1 bandwidth, CV2: CV with 2 BW: boundary-weight, MIN: min(S 1 ;S 2 ), RD1: S 1 j 2 = 1, RD2: S 2 j 1 = 1 90% bootstrap CI in () **, * for 5, 10% level significance 98 Table B.5: Women Employment Rate Estimates for D , S and R with 95% CI Sq-ROT Sq-CV1 Sq-CV2 Oval-ROT Oval-CV1 Oval-CV2 Treatment Effect ( D ) OLS 0:041 (0:030;0:109) 0:049 (0:003;0:096) 0:036 (0:009;0:082) 0:037 (0:046;0:122) 0:052 (0:02;0:094) 0:027 (0:018;0:073) RD1 0:044 (0:022;0:113) 0:048 (0:010;0:092) 0:052 (0:020;0:081) Partial Effects ( S and R ) by OLS S 0:026 (0:061;0:117) 0:005 (0:054;0:042) 0:017 (0:027;0:062) 0:023 (0:079;0:130) 0:008 (0:051;0:036) 0:020 (0:025;0:063) R 0:002 (0:068;0:063) 0:030 (0:068;0:003) 0:001 (0:033;0:035) 0:002 (0:083;0:066) 0:026 (0:061;0:005) 0:002 (0:030;0:032) Sq: square-neighbor kernel, Oval: oval-neighbor kernel RT: rule-of-thumb bandwidth, CV1: CV with 1 bandwidth, CV2: CV with 2 RD1: S Size j Rate = 1 95% bootstrap CI in () **, * for 5, 10% level significance 99 Appendix C Appendix to Chapter 3 C.1 DerivationofFirst-StageLikelihoodComponentswith R=4 P (D 0 = 0;D r =djW ), d = 1; 2; 3, are P (" 0 <W 0 0 0 ; " r <W 0 r r ) = (W 0 0 0 ;W 0 r r ; 0r )p 1 (W ;); P (" 0 <W 0 0 0 ;W 0 r r <" r <W 0 r r + 2 ) = P (" 0 <W 0 0 0 ; " r <W 0 r r + 2 )P (" 0 <W 0 0 0 ; " r <W 0 r 1 ) = (W 0 0 0 ;W 0 r r + 2 ; 0r )p 1 (W ;)p 2 (W ;); P (" 0 <W 0 0 0 ;W 0 r r + 2 <" r <W 0 r r + 3 ) = P (" 0 <W 0 0 0 ; " r <W 0 r r + 3 )P (" 0 <W 0 0 0 ; " r <W 0 r r + 2 ) = (W 0 0 0 ;W 0 r r + 3 ; 0r )fp 1 (W ;) +p 2 (W ;)gp 3 (W ;): 100 C.2 Derivation of Control Functions Since the CF for D 0 involves only (" 0 ;U), analogously to the Heckman (1979) correction, we have E(UjW;D 0 = 1) = 0u 0 (W 0 0 0 ). For the other CF’s, observe E(UjW; d = 1) =E(UjW; D 0 = 0;D r =d) = 0 E(" 0 jD 0 = 0;D r =d) + r E(" r jD 0 = 0;D r =d) = 0 E(" 0 jW; " 0 <W 0 0 0 ;W 0 r r + d1 <" r <W 0 r r + d ) + r E(" r jW; " 0 <W 0 0 0 ;W 0 r r + d1 <" r <W 0 r r + d ) = 0 0d (W ;) + r rd (W ;): Hence, the CF for P R d=1 d d is R X d=1 d f 0 0d (W ; ^ ) + r rd (W ; ^ )g = 0 R X d=1 d 0d (W ; ^ ) + r R X d=1 d rd (W ; ^ ): C.3 Closed-Form Control Functions Under 0r = 0 ‘E(UjW;D 0 = 1) = 0u 0 (W 0 0 0 )’ holds regardless of the 0r value, but the other CF’s can be written in closed forms only if 0r = 0. To see the closed forms, observe E(UjW; D 0 = 0;D r =d) = 0 E(" 0 jD 0 = 0;D r =d) + r E(" r jD 0 = 0;D r =d) = 0 E(" 0 jW; " 0 <W 0 0 0 ;W 0 r r + d1 <" r <W 0 r r + d ) + r E(" r jW; " 0 <W 0 0 0 ;W 0 r r + d1 <" r <W 0 r r + d ) = 0 E(" 0 jW; " 0 <W 0 0 0 ) + r E(" r jW;W 0 r r + d1 <" r <W 0 r r + d ) = 0 (W 0 0 0 ) (W 0 0 0 ) + r (W 0 r r + d1 )(W 0 r r + d ) (W 0 r r + d ) (W 0 r r + d1 ) 101 usingE(Gjs<G<t) =f(s)(t)g=f(t) (s)g forGN(0; 1). The second stage is the LSE to a rewritten version of (2.2): Y i =X 0 i x + R X d=1 d di + 0u D 0i 0 (W 0 0i ^ 0 ) + 0 (1D 0i ) 0 (W 0 0i ^ 0 ) + r R X d=1 di rd (W 0 ri ^ r ; ^ ) +V 0 i where V 0 is defined as U minus the selection correction terms here, and 0 (W 0 0 0 ) (W 0 0 0 ) (W 0 0 0 ) ; rd (W 0 r r ; ) (W 0 r r + d1 )(W 0 r r + d ) (W 0 r r + d ) (W 0 r r + d1 ) : C.4 LSE Asymptotic Distribution Taking into Account First-Stage Error Letz i ( ^ )denotethesecond-stageLSEregressorsin(2.5)timesV i ,andlet ^ HN 1 P i z i ( ^ )z i ( ^ ) 0 . The first-stage error ^ matters forz i ( ^ ), but not for the second-order matrix ^ H. Hence, it holds that p N(^ ! 1 ! 1 ) = 1 p N X i ^ H 1 z i ( ^ ) = 1 p N X i H 1 fz i () +E(z 0) i g +o p (1) whereHEfz()z() 0 g,z 0 denotesthederivativeofz()for, i =fE(SS 0 )g 1 S i isthe‘influence function’ for , and S i is the MLE score function; z 0 can be obtained with numerical derivatives. See, e.g., Lee (2010) for more details on this way of accounting for the first-stage estimation error. Denoting convergence in law as ‘;’, p N(^ ! 1 ! 1 );Nf0;H 1 E(## 0 )H 1 g where #z() +E(z 0): E(## 0 ) can be estimated by replacing (;! 1 ) with ( ^ ; ^ ! 1 ), and the expected values in # with the corresponding sample means. If E(z 0) = 0 (this occurs under H 0 : 0u = 0 = r = 0), there is no first-stage estimation error effect on the second-stage. 102 C.5 Computational Details First, the first-stage MLE can be done with Newton-Raphson guarding against 0r going out of the bound (1; 1). If there is a convergence problem due to 0r , choose grid points for 0r over (1; 1) to do MLE only with respect to ( 0 ; r ; 2 ;:::; R1 ) for each grid point. Some probabilities in the likelihood function may be near zero to cause a problem when ln is taken; in this case, replace them with a small number such as 10 5 or drop the observations from the likelihood function. Second, in constructing jd (W i ; ^ ), the denominator can cause a numerical stability problem if it is almost zero. If this problem occurs, set jd (W i ; ^ ) = 0 or drop the observation from the second-stage. Note that the denominators for jd ’s are probabilities that are easily obtainable once the first-stage MLE is done. If a reliable numerical integration program for the numerator of jd is available, it should be used as is done in this paper; otherwise, the following Monte Carlo integral can be used. From the first-stage MLE, the covariance matrix for (" 0 ;" r ) can be estimated by replacing 0r with ^ 0r . Obtain its square matrix ^ C such that ^ C 0 ^ C = ^ ; e.g., ^ C is the upper triangular Cholesky decomposition of ^ . Draw an error term j = ( 0j ; rj ) 0 where 0j and rj are iid N(0; 1) to get j = ( 0j ; rj ) 0 ^ C 0 , j = 1;:::;N s ; N s is the number of simulated errors (e.g., N s = 5000). Then E ( 0 ) = E ( ^ C 0 0 ^ C) = E ( ^ C 0 I 2 ^ C) = ^ C 0 ^ C = ^ . The numerator for jd (W i ; ^ ), j = 0;r, can be found with 1 N s Ns X l=1 jl 1[W 0 0i ^ 0 < 0l ;W 0 ri ^ r + ^ d1 < rl <W 0 ri ^ r + ^ d ]: 103 C.6 Simulation Study for CF Approach We show first how we set ( 0 ; r ) as a function of ( 0r ; 0u ; ru ; u ) where 0u COR(" 0 ;U), ru COR(" r ;U) and u SD(U): E(U" 0 )fE("" 0 )g 1 = 0u u ru u 1 1 2 0r 2 6 4 1 0r 0r 1 3 7 5 = u 1 2 0r 0u ru 0r ru 0u 0r f = u 0u 1 2 0r 1 0r 1 0r = u 0u 1 + 0r 1 1 if 0u = ru g: Our simulation design is (( 0 ; r ) is obtained with the preceding display) N = 500; 1000; 500 Repetitions, 0r = 0u = ru ; M 2 ;M 3 2N(0; 1); W 0 = (1;M 2 ;M 3 ) 0 ; W r = (1;M 2 ) 0 ; X = (1;M 2 ) 0 ; (" 1 ;" r ;U) 0 N(0;A) with A 2 6 6 6 6 4 1 u 1 u u u 2 u 3 7 7 7 7 5 =) 2 6 4 0 r 3 7 5 = 2 6 4 u =(1 +) u =(1 +) 3 7 5 u = 2; 0u = ru = 0r = 0; 0:7; 0 = (1; 1; 1) 0 ; r = (1; 1) 0 ; ( 1 ; 2 ; 3 ) = (0:5; 0:5; 1:5); x = (1; 1) 0 ; ( 1 ; 2 ; 3 ; 4 ) = (0; 1; 2; 3): In each table, what is presented are ‘OPR’ (ordered probit for D r jD 0 = 0), the MLE for D in the first-stage, ‘OLS’ that is the LSE under no selection bias ( = 0), and ‘SelCorec’ that is the LSE for the second-stage with the correction terms; the estimates for 0u , 0 and r are omitted. For each entry, Bias and SD from 500 simulation repetitions are shown. In Table C.1, we set N = 500 and = 0; as = 0, OPR and OLS are consistent. OPR and MLE perform almost the same; also OLS and SelCorec perform similarly. Despite the relatively small sample size (N = 500), hardly any bias can be seen. In Table C.2, we set N = 500 and = 0:7 under which OPR and OLS are inconsistent. Whereas only the intercept (and 3 1 ) is 104 much biased in OPR, OLS is heavily biased all around. In contrast, MLE and SelCorec have almost no bias. Table C.1: BIAS (SD) with N = 500 and = 0 First Stage Second Stage OPR MLE OLS SelCorec r intercept 0.03 (0.20) 0.03 (0.21) x intercept 0.00 (0.12) 0.00 (0.16) r slope 0.03 (0.11) 0.02 (0.11) x slope 0.00 (0.06) 0.00 (0.08) 2 1 = 1 0.01 (0.15) 0.01 (0.15) 1 = 0 0.00 (0.32) 0.00 (0.43) 3 1 = 2 0.04 (0.21) 0.04 (0.22) 2 = 1 0.01 (0.38) 0.01 (0.42) = 0 0.01 (0.21) 3 = 2 0.00 (0.40) -0.01 (0.44) 4 = 3 0.02 (0.41) 0.02 (0.52) Table C.2: BIAS (SD) with N = 500 and = 0:7 First Stage Second Stage OPR MLE OLS SelCorec r intercept -0.27 (0.17) 0.04 (0.17) x intercept 0.47 (0.11) 0.00 (0.15) r slope -0.04 (0.11) 0.02 (0.10) x slope -0.24 (0.06) 0.00 (0.08) 2 1 = 1 0.06 (0.16) 0.02 (0.15) 1 = 0 -1.90 (0.30) 0.01 (0.44) 3 1 = 2 0.12 (0.23) 0.04 (0.22) 2 = 1 -0.99 (0.33) 0.01 (0.35) = 0:7 0.00 (0.13) 3 = 2 -0.56 (0.35) -0.01 (0.38) 4 = 3 0.05 (0.41) -0.01 (0.47) In Table C.3, we setN = 1000 and = 0:7 under which OPR and OLS are inconsistent. Indeed the bias in OPR and OLS stays the same as in Table C.2 despite the twice greater sample size; SD’s clearly decreased. The biases in MLE and SelCorec are all near zero; SD’s decreased obviously. 105 Table C.3: BIAS (SD) with N = 1000 and = 0:7 First Stage Second Stage OPR MLE OLS SelCorec r intercept -0.29 (0.11) 0.02 (0.11) x intercept 0.47 (0.08) 0.00 (0.11) r slope -0.05 (0.07) 0.01 (0.06) x slope -0.24 (0.04) 0.00 (0.06) 2 1 = 1 0.05 (0.10) 0.01 (0.10) 1 = 0 -1.89 (0.20) 0.00 (0.29) 3 1 = 2 0.11 (0.16) 0.03 (0.15) 2 = 1 -0.99 (0.22) 0.00 (0.24) = 0:7 0.00 (0.09) 3 = 2 -0.56 (0.25) -0.01 (0.27) 4 = 3 0.04 (0.27) -0.02 (0.31) C.7 Derivation of the CF’s under Double-Index Assump- tion Substitute (2.7) into the selection correction part of (2.6) to obtain D 0 f 00 + 01 W 0 0 ^ 0 + 02 (W 0 0 ^ 0 ) 2 g + R X d=1 d f d0 + d1 W 0 0 ^ 0 + d2 (W 0 0 ^ 0 ) 2 + d3 (W 0 0 ^ 0 W 0 r ^ r )g = (1 R X d=1 d ) 00 + 01 (1 R X d=1 d )W 0 0 ^ 0 + 02 (1 R X d=1 d )(W 0 0 ^ 0 ) 2 + R X d=1 d0 d + R X d=1 d1 d W 0 0 ^ 0 + R X d=1 d2 d (W 0 0 ^ 0 ) 2 + R X d=1 d3 d (W 0 0 ^ 0 W 0 r ^ r ): In (1 P R d=1 d ) 00 = 00 00 P R d=1 d , 00 is merged into the intercept inX 0 x , and 00 P R d=1 d is merged into P R d=1 d d in the regression function which also absorbs P R d=1 d0 d in this display; i.e., P R d=1 d d in the regression function becomes P R d=1 ( d0 00 + d ) d . Using this and collecting terms with W 0 0 ^ 0 , (W 0 0 ^ 0 ) 2 and W 0 0 ^ 0 W 0 r ^ r gives (2.8). 106
Abstract (if available)
Abstract
This thesis brings together three research papers that investigate, both theoretically and empirically, real-world situations. Papers developed extend version of basic methods of causal inference that focuses on uncovering causal relationships.
The first paper discusses a Multiple-assignment variable Regression Discontinuity design (MRD). Unlike canonical regression discontinuity design (RD), this method uses more than one assignment variable. The main contribution of the paper is that the paper establishes identification for the Fuzzy MRD case.
The second paper applies the MRD to analyze the effect of Affirmative Action in Korea on the women employment rate. Affirmative Action in Korea has two assignment variables for the policy targets, number of employees and average women employment rate by industry. Since MRD has few empirical studies, the new empirical application is meaningful.
The third paper develops an approach to find effects of partly ordered treatments, while correcting for possible treatment endogeneity with nearly parametric control functions. My coauthor and I use this control function approach, along with its supplementary version, to estimate effects of military ranks (ordered treatments) on wage relative to non-veteran status (control treatment) with the Wisconsin Longitudinal Study data. In our empirical analysis, the military rank effects differ much: officer has large positive effects, but enlisted ranks have small or no effects.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Essays on economics of education
PDF
Essays on econometrics
PDF
Essays on the microeconomic effects of taxation policies
PDF
Essays on nonparametric and finite-sample econometrics
PDF
Statistical methods for causal inference and densely dependent random sums
PDF
Leveraging sparsity in theoretical and applied machine learning and causal inference
PDF
Essays on development economics
PDF
Essays on the estimation and inference of heterogeneous treatment effects
PDF
Essays on high-dimensional econometric models
PDF
Robust causal inference with machine learning on observational data
PDF
Statistical insights into deep learning and flexible causal inference
PDF
Essays in the economics of education and conflict
PDF
Essays on estimation and inference for heterogeneous panel data models with large n and short T
PDF
Essays on treatment effect and policy learning
PDF
Causality and consistency in electrophysiological signals
PDF
Large-scale multiple hypothesis testing and simultaneous inference: compound decision theory and data driven procedures
PDF
Three essays on linear and non-linear econometric dependencies
PDF
Nonparametric ensemble learning and inference
PDF
Essays on competition and antitrust issues in the airline industry
PDF
Large scale inference with structural information
Asset Metadata
Creator
Ju, Youngmin
(author)
Core Title
Essays on causal inference
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Economics
Degree Conferral Date
2021-08
Publication Date
06/28/2021
Defense Date
05/31/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Affirmative Action,causal inference,control function approach,fuzzy,identification,military rank premium,MRD,multiple assignment variables,multiple-assignment variable regression discontinuity,OAI-PMH Harvest,partly ordered endogenous treatments,RD,regression discontinuity,women employment rate
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Ridder, Geert (
committee chair
), Hsiao, Cheng (
committee member
), Moon, Hyungsik Roger (
committee member
), Nix, Emily (
committee member
)
Creator Email
ymju86@gmail.com
Permanent Link (DOI)
http://doi.org/10.25549/usctheses-c89-471615
Unique identifier
UC13012643
Identifier
etd-JuYoungmin-9674.pdf (filename), usctheses-c89-471615 (legacy record id)
Legacy Identifier
etd-JuYoungmin-9674
Dmrecord
471615
Document Type
Dissertation
Rights
Ju, Youngmin
Internet Media Type
application/pdf
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
causal inference
control function approach
fuzzy
military rank premium
MRD
multiple assignment variables
multiple-assignment variable regression discontinuity
partly ordered endogenous treatments
RD
regression discontinuity
women employment rate