Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Essays on improving human interactions with humans, algorithms, and technologies for better healthcare outcomes
(USC Thesis Other)
Essays on improving human interactions with humans, algorithms, and technologies for better healthcare outcomes
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ESSAYS ON IMPROVING HUMAN INTERACTIONS WITH HUMANS, ALGORITHMS, AND TECHNOLOGIES FOR BETTER HEALTHCARE OUTCOMES by Wilson Wai-Gin Lin A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (BUSINESS ADMINISTRATION) August 2022 Copyright 2022 Wilson Wai-Gin Lin Dedicated to my family, all of my teachers, and the teachers of my teachers. ii Acknowledgements I am greatly indebted to my advisors, Professor Sampath Rajagopalan, Professor Song-Hee Kim, Professor Tianshu Sun, and Professor Amy Ward, for their strong support throughout my time at the University of Southern California. Professor Sampath Rajagopalan provided sound guidance on my research trajectory and mentoring on my academic job market and career. Professor Song-Hee Kim helped me gain exper- tise in empirical operations management, healthcare management, and behavioral operations management research, and connected me with Keck Medicine of USC, which has led to meaningful discussions and col- laborations. Professor Tianshu Sun helped me gain expertise in empirical research and has introduced me to a number of interdisciplinary research communities. Professor Amy Ward introduced me to operations management research as an undergraduate, and served as my first year PhD advisor. All have helped build confidence in myself and have enabled me to become an interdisciplinary researcher. It has been a privilege to have the opportunity to collaborate with each of them. Moreover, I am also grateful for Professor Yan Liu for serving on my dissertation committee, and Professor Florenta Teodoridis and Professor Peng Shi for serving on my qualifying exam committee. I would like to also thank my co-authors Professor Susan Lu and Professor Jordan Tong for their ex- pertise, mentorship, and gracious support in my development as a researcher. I am also appreciative for my clinical collaborators, Mary Rose Deraco, Anjali Mahoney, Sirisha Mohan, Francis Reyes Orozco, and Jehni Robinson, as well as Keck Medicine of USC for supporting me and making our research possible. My appreciation goes to the faculty and staff in the Department of Data Sciences and Operations for their inspiration and sound guidance on research, teaching, and life. I am also grateful for my DSO PhD student community for their insight and camarderie. Finally, I would like to thank my family for their continual support and belief in me and my pursuits in life. iii Table of Contents Dedication ii Acknowledgements iii List of Tables vii List of Figures ix Abstract x Chapter 1: What Drives Algorithm Use? An Empirical Analysis of Algorithm Use in Type 1 Diabetes Self-Management 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Hypotheses Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.1 Previous Performance Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.2 Task Difficulty and the Need for Precision . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.3 Previous Deviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.4 Multiple Algorithm Input Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Empirical Setting and Data Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.1 Empirical Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.2 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4 Does Dynamic Algorithm Aversion Exist in the Field? . . . . . . . . . . . . . . . . . . . . 11 1.4.1 Data and Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.4.2 Sample Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4.3 Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4.4 Model Free Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.4.5 Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.4.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.5 Do People use Algorithms More for Difficult Tasks? . . . . . . . . . . . . . . . . . . . . . 17 1.5.1 Data and Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.5.2 Sample Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.5.3 Summary Statistics and Model Free Evidence . . . . . . . . . . . . . . . . . . . . . 19 1.5.4 Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.5.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.6 Do Previous Deviations from Algorithm Recommendations Impact Algorithm Use? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.6.1 Data Set and Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 iv 1.6.2 Sample Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.6.3 Summary Statistics and Model Free Evidence . . . . . . . . . . . . . . . . . . . . . 22 1.6.4 Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 1.6.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.6.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.7 Does Reducing Algorithm Input Sources Impact Algorithm Use? . . . . . . . . . . . . . . . 26 1.7.1 Data and Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1.7.2 Sample Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1.7.3 Model Free Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 1.7.4 Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 1.7.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 1.7.6 Mechanism Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 1.7.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 1.8 General Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 1.8.1 Results Summary and Managerial Implications . . . . . . . . . . . . . . . . . . . . 32 1.8.2 Interpreting our Rejection of the Dynamic Algorithm Aversion Hypothesis . . . . . 34 1.8.3 Limitations and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 1.9 Additional Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Chapter 2: Worker Experience and Donor Heterogeneity: The Impact of Charitable Workers on Donors’ Blood Donation Decisions 41 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.2 Related Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.2.1 Charitable Giving and Non-Profit Operations . . . . . . . . . . . . . . . . . . . . . 44 2.2.2 Staffing Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.3 Empirical Setting: Operations in the Blood Bank . . . . . . . . . . . . . . . . . . . . . . . 46 2.4 Hypotheses Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 2.5 Data and Empirical Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.5.1 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.5.2 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.5.2.1 Outcome Variables and Explanatory Variables . . . . . . . . . . . . . . . 54 2.5.2.2 Control Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.5.3 Identification and Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 2.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 2.6.1 The Effect of Experience: V oluntary versus Group . . . . . . . . . . . . . . . . . . 58 2.6.2 The Heterogenous Effect of Experience by Donor Self-Efficacy . . . . . . . . . . . 60 2.7 Robustness Checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 2.7.1 Alternative Measure: Days Worked . . . . . . . . . . . . . . . . . . . . . . . . . . 60 2.7.2 Analysis on Group Donors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 2.7.3 Alternative Explanations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 2.7.4 Incentives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 2.7.5 Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 2.8 Managerial Implications for Non-Profit Operations . . . . . . . . . . . . . . . . . . . . . . 65 2.8.1 Back of the Envelope Calculation on Potential Benefits of Alternative Matching . . . 65 2.8.2 The Learning- Limited Learning Nature of V oluntary Donation Experience . . . . . 66 2.9 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 2.10 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 2.10.1 Supplemental Descriptive Statistics and Background . . . . . . . . . . . . . . . . . 71 v 2.10.1.1 Discussion on Nurse Location Assignment . . . . . . . . . . . . . . . . . 71 2.10.2 Information About Donor Satisfaction Surveys . . . . . . . . . . . . . . . . . . . . 74 2.10.3 Regression Tables for Robustness Checks . . . . . . . . . . . . . . . . . . . . . . . 76 2.10.4 Managerial Implications: Counterfactual Analyses and Intervention Comparisons . . 80 2.10.4.1 Simulation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 2.10.4.2 Simulation with Personalized Nurse-Donor Pairings . . . . . . . . . . . . 81 2.10.4.3 Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Chapter 3: Patient Perceptions of Synchronous Telemedicine Video Visits in a Fee-for-Service Model 85 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 3.2.1 Study Design and Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 3.2.2 USC Telemedicine Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 3.2.3 Telemedicine Video Visit Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.2.4 Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.2.4.1 Survey Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.2.4.2 Data Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.2.4.3 Distance from Clinic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.2.4.4 Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.3.1 Primary Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.3.1.1 Demographics and Baseline Medical Appointment Visits . . . . . . . . . 88 3.3.1.2 Transportation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 3.3.1.3 Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 3.3.1.4 Logistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 3.3.1.5 Comparisons to Office Visits . . . . . . . . . . . . . . . . . . . . . . . . 92 3.3.1.6 Provider Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 3.3.1.7 Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 3.3.1.8 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 3.3.2 Secondary Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 3.3.3 Copayment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 3.3.4 Other Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Chapter 4: Future Directions 99 4.1 Future Directions for Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.2 Future Directions for Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.3 Future Directions for Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.4 Broader Research Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 References 105 vi List of Tables 1.1 Testing Hypothesis 1: Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.2 Testing Hypothesis 1: Model Free Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.3 Testing Hypothesis 1: Regression Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.4 Testing Hypothesis 2: Regression Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.5 Testing Hypothesis 3a and 3b: Regression Results . . . . . . . . . . . . . . . . . . . . . . 25 1.6 Testing Hypothesis 4a and 4b: Regression Results . . . . . . . . . . . . . . . . . . . . . . 29 1.7 Mechanism Evidence for Hypothesis 4b . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 1.8 Results Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 1.9 Testing Hypothesis 1: Regression Results (Daily Level) . . . . . . . . . . . . . . . . . . . 36 1.10 Testing Hypothesis 2: Regression Results (Logit Model) . . . . . . . . . . . . . . . . . . . 37 1.11 Testing Hypotheses 3a and 3b: Regression Results (Logit Model) . . . . . . . . . . . . . . 38 1.12 Testing Hypothesis 4a and 4b: Regression Results (Daily Level) . . . . . . . . . . . . . . . 39 1.13 Mechanism Evidence for Hypothesis 4a (Daily Level) . . . . . . . . . . . . . . . . . . . . 40 2.1 Summary Statistics and Correlations of Variables of Interest . . . . . . . . . . . . . . . . . 58 2.2 The Impact of V oluntary versus Group Donation Experience on Donation V olume Decisions 59 2.3 The Heterogeneous Effects of V oluntary Donation Experience on Donor Groups for Dona- tion V olume Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 2.4 The Learning-Limited Learning Nature of V oluntary Donation Experience . . . . . . . . . 67 2.5 Comparison between V oluntary and Group Donations . . . . . . . . . . . . . . . . . . . . 71 2.6 Donor Groups and Donation Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 2.7 Extended Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 2.8 Relationship between Nurse Experience and Location Type Assignment . . . . . . . . . . . 73 2.9 Locations Worked . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 2.10 Alternative Measures for V oluntary Donation Experience (Days Worked) . . . . . . . . . . 76 vii 2.11 Effect of Different Experiences using the Sample of Group Donations . . . . . . . . . . . . 77 2.12 Mechanism Analysis: Peer Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 2.13 Certain Nurses Driving Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 2.14 Multinomial Logit Model Specifications for Hypotheses 1-2 . . . . . . . . . . . . . . . . . 78 2.15 The Effect of Incentives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 2.16 Analysis on Pass Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 2.17 Workload Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 2.18 Summary of Interventions That could be Done at the Blood Bank and Our Nurse Experience Staffing Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.1 Participant Baseline Medical Appointment Experiences and Characteristics . . . . . . . . . 88 3.2 Participant Demographics and Visit Distances . . . . . . . . . . . . . . . . . . . . . . . . . 91 3.3 Secondary Analyses (Means of Responses Provided by Demographic Group) . . . . . . . . 95 viii List of Figures 1.1 Testing Hypothesis 1: Predicted Percent Algorithm Use by Previous Algorithm Use and Out of Target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.2 Testing Hypothesis 2: Model Free Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.3 Variation in Deviations from Algorithm Recommendations on Algorithm Use . . . . . . . . 23 1.4 Testing Hypotheses 4a and 4b: Model Free Evidence . . . . . . . . . . . . . . . . . . . . . 27 1.5 Mechanism Evidence for Hypothesis 4b: Model Free Evidence . . . . . . . . . . . . . . . . 31 3.1 Video Visits Rated Better than Office Visits in 3 Domains . . . . . . . . . . . . . . . . . . . 92 3.2 Times and days of the week in which patients were willing to use video visits (multiple selections allowed) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 ix Abstract Significant new health technologies and corresponding processes and digital infrastructure have emerged recently to improve operational and health outcomes. However, human behavior, such as non-compliance, beliefs, and biases, can limit the effectiveness of these developments. This dissertation utilizes empirical methods (causal inference and surveys) to study how we can improve human interactions with other humans, algorithms, and technologies to enable better healthcare outcomes. We summarize the three primary chapters below: 1. Examining Human-Algorithm Interactions Chapter 1, which is a joint work with Song-Hee Kim and Jordan Tong, explores the question of what circumstances may drive individuals to use or avoid using algorithms in a field setting. Using the bolus calculator (algorithm) use behavior in over 306,000 bolus insulin decisions from a field experiment on Type 1 Diabetes self-management (Aleppo et al. 2017), we contribute field analysis to identify drivers of algorithm use. We first focus on an influential experimental finding from Dietvorst et al. (2015) that we refer to as dynamic algorithm aversion – an asymmetric usage response to performance feedback that favors humans over the algorithm. Using panel data models, we reject this hypothesis, instead finding an asymmetric usage response in favor of the algorithm over the human. Moreover, we find that previous algorithm use strongly predicts future algorithm use, and that algorithm use declines from morning to evening. We explore three additional factors that affect algorithm use: one’s need to be precise, deviations from algorithm recommendations, and exposure to multiple, potentially conflicting algorithm input sources. Using linear probability models, we find that algorithm use increases as one’s need for precision increases, and that previous deviations from algorithm recommendations lead to lower future algorithm use. Finally, we leverage an experimental design feature from the original field data with a differences-in- differences analysis to show that increasing the number of measurements provided to the user for a single algorithm input decreases algorithm use. Our field results complement experimental findings and generate new insight into levers that affect algorithm use. x 2. Improving Charitable Productivity Via Worker Experience and Donor Heterogeneity Chapter 2, which is a joint work with Susan Lu and Tianshu Sun, explores whether and how a charitable orga- nization’s front-line staff members can be effectively positioned to encourage donors to donate more (in compliance with the eligibility rules) during their in-person interactions. Specifically, we consider how charitable organizations can use micro-level data on worker-donor interactions to increase do- nation amounts by maximizing the influence of a staff member on donors’ decisions, informed by understanding of workers’ experiences and donors’ characteristics. Using a dataset at the nurse-donor interaction level from a blood bank partner, we analyze the role of nurses’ experience in driving charitable productivity as well as the moderating effect of nurse ex- perience with donors’ self-efficacy. We find that the effect of the charitable worker on charitable productivity strongly depends on nurse’s experiences with voluntary blood donors, where nurses can discuss these donation volume choices with the donors, but experience gained with group donation donors (who primarily come from organization-sponsored blood drives and typically choose the low- est donation amount) appears to be less beneficial. Moreover, a nurse’s voluntary donation experience can encourage donors that perceive reduced control over their donation to have greater self-efficacy in choosing higher donation volumes. We identify that when taking the insights on staffing-donor interactions into account, improved matching between nurses and donors can increase blood donation volume by up to 5%, which is an economically significant benefit for the blood bank. Our proposed framework can help charitable organizations improve their productivity simply from the personnel end. 3. Developing Streamlined Telehealth Services Chapter 3 1 , which is a joint work with Sirisha Mohan, Francis Reyes Orozco, Jehni Robinson, and Anjali Mahoney, studies patient perceptions on experi- ence and co-payment for those who receive telemedicine services in the fee-for-service setting of an academic medical center’s family medicine department. To the best of our knowledge, this study is the first to investigate patient sentiments on both experiential and financial aspects of telemedicine primary care services when co-payments have been collected as pertinent to one’s insurance. We re- port the results of a 53-question cross-sectional digital survey delivered to patients’ e-mail addresses after their telemedicine visits. Of 3,414 potential respondents, 903 patients responded, corresponding 1 Published in the Journal of the American Board of Family Medicine: https://www.jabfm.org/content/35/3/497 xi to a 26.7% effective response rate; 797 completed the survey. Of these, 91% described their video visit experience as more convenient than office-based care, 74% reported shorter wait times, 87% felt confident about protection of privacy, and 91% are willing to use telemedicine again. However, 29% perceived copayments to be unreasonable. The findings suggest that telemedicine offers con- venience and consistency with continuity and corroborate previous studies investigating telemedicine viewpoints. Payors should consider copayment in detail when designing telehealth benefits to ensure they do not become a barrier in seeking care. These three chapters demonstrate the value of understanding the micro behavior of both workers and patients, as such behavior are non-trivial determinants of operational and performance outcomes. Conse- quently, as part of an organization’s digital transformation, organizations should leverage both the digital trace data that comes from operational processes as well as routinely collected feedback from individuals (e.g., via surveys) to identify improvement opportunities. Conducting such analyses, driven by both un- derstanding of the operational context, and underlying theory and behavior, can lead to new insights and identify new areas for improvement. We describe several future directions inspired by these three chapters in Chapter 4. xii Chapter 1 What Drives Algorithm Use? An Empirical Analysis of Algorithm Use in Type 1 Diabetes Self-Management JointworkwithSong-HeeKimandJordanTong 1.1 Introduction Advancements in decision-support algorithms show significant promise for enhancing operations in a multi- tude of contexts, such as retail (Kesavan and Kushwaha 2020, Caro and de Tejada Cuenca 2018), warehouse management (Bai et al. 2020, Sun et al. 2020), and healthcare (Adjerid et al. 2019, Ayvaci et al. 2021, Kamalzadeh et al. 2021, Bertsimas et al. 2017), by facilitating monitoring and improving users’ decision- making (Bretthauer and Savin 2018, Dai and Tayur 2020, de V´ ericourt and Perakis 2020). Across a wide range of contexts, algorithms tend to outperform human judgment (Meehl 1954, Grove et al. 2000). One sig- nificant challenge to achieving such benefits in practice, however, is behavioral: people sometimes simply choose to use their own judgment over that of an algorithm (Frick 2015). Thus, even when the technol- ogy functions properly and algorithms recommend superior prescriptions, the operational benefits are not realized because people fail to use the algorithm. Why would people fail to use algorithms even if they are superior? Perhaps the most well-known and popularized reason provided is that people exhibit so-called “algorithm-aversion,” a term made influential by the experimental paper Dietvorst et al. (2015). As observed by Logg et al. (2019) and Harrell (2016), the idea is so prevalent that the popular press, business leaders, and companies have invested into ways to avoid or mitigate it. A precise definition of algorithm aversion is, however, somewhat elusive. Some use it to refer to a general preference to use humans over algorithms, controlling for performance (Meehl 1954, 1 Fildes and Goodwin 2007, Harrell 2016). However, Dietvorst et al. (2015) conjectures (and finds evidence for) the more specific observation of “a general tendency for people to more quickly lose confidence in algorithmic than human forecasters after seeing them make the same mistake” (p.115). This more precise notion of algorithm aversion – an asymmetric usage response to performance feedback that favors humans over algorithms – is what we focus on first in this paper. We will refer to it as dynamic algorithm aversion to highlight the idea that the definition is a reaction to feedback over time and is distinct from a general preference to avoid algorithms. Despite the attention algorithm aversion has received and its potential importance, the existing evidence to date on dynamic algorithm aversion – indeed, for algorithm aversion broadly defined – is largely limited to laboratory experiments. According to Burton et al. (2020), only 3 out of 61 papers on algorithm aversion were field/ethnography studies – and none of these 3 (Sanders and Courtney 1985, Lamberti and Wallace 1990, Christin 2017) address dynamic algorithm aversion. Sanders and Courtney (1985) and Lamberti and Wallace (1990) discuss conditions which are associated with decision support tool and expert system use, and the studies are conducted via a survey and field experiment respectively. Christin (2017) is an ethnographic study describing how journalists and criminal justice professionals use and systematically avoid decision support tools in the field, in contrast to their managements’ positive perceptions of such tools. The environmental conditions of the lab and field may have meaningful differences that affect whether dynamic algorithm aversion applies. For example, although dynamic algorithm aversion regards the re- sponse to performance feedback over time, laboratory experiments typically consider only a limited number of rounds of feedback and decisions in experiments that last on the order of minutes. Dietvorst et al. (2015) has only two stages – an initial stage during which participants receive performance feedback, and a second stage for which participants decide whether to use the algorithm. Prahl and Van Swol (2017) also finds laboratory evidence akin to dynamic algorithm aversion in a task with 14 forecasting decisions, though their setting is somewhat different (the subject decides how to adjust their forecast after advice from an algorithm or a different human advisor). Some laboratory evidence suggests that more feedback matters, too. For example, Filiz et al. (2021) find that people learn to increase their algorithm use over 40 rounds of feedback (for an algorithm that generally performs better than humans). That is, there are practical limitations of lab- oratory experiments that make them unable to capture features of interactions with algorithms that can last 2 weeks or months and involve hundreds of rounds of feedback and usage opportunities, as is often required as part of real-world operations. To answer this call, we study Type 1 Diabetes patients who use algorithm-enabled technologies in their diabetes self-management. Such patients can not generate insulin in their pancreas, and so they need to deliver insulin appropriately (for food, stress, etc.) in stochastic control of their glucose levels. Typically, the objective is to keep the glucose levels within 70-180 mg/dl (Battelino et al. 2019). We focus on the decisions they make for bolus insulin dosing, which typically occur 5-7 times in a day (Snider 2018) and can involve the use of a bolus calculator (henceforth, the “algorithm”). The algorithm takes the patient’s settings, conditions, and inputs, and provides a recommendation on how much insulin to bolus. Using the algorithm has been shown to reduce postprandial blood glucose levels and improve quality of life (Gross et al. 2003, Klupa et al. 2008, Gonzalez et al. 2016). The American Diabetes Association’s Standards of Care suggests that patients may find such algorithm helpful in their diabetes self-management (Association et al. 2021). Utilizing detailed diabetes device data from a randomized controlled trial on Type 1 Diabetes patients (Aleppo et al. 2017), we perform an econometric study on when and how patients interact with this algorithm by analyzing over 306,000 bolus decisions over 6 months. First, we reject the dynamic algorithm aversion hypothesis in this insulin dosing context. Using panel data models, we show that patients’ likelihood to use the algorithm is impacted by previous performance from algorithm-driven decisions or human-driven decisions in a manner that is inconsistent with dynamic algorithm aversion. Conditional on algorithm-driven decisions last period, patients did not significantly avoid algorithms more after seeing it err relative to seeing it succeed (0.1 percentage points less likely to use the algorithm). Conditional on human-driven decisions last period, patientsdid significantly favor using algorithms more after seeing themselves err relative to seeing themselves succeed (1.8 percentage points more likely to use the algorithm). These findings suggest an asymmetric reaction to performance feedback that favors the algorithm – patients punished themselves more severely for seeing the human err than they punished the algorithm for seeing the algorithm err. Our findings not only reject dynamic algorithm aversion, but also show more broadly that past perfor- mance is not as strong of a predictor of algorithm use as other – perhaps less popularized – factors. Notably, there is a strong habitual component to algorithm use; strong enough to overpower the effect of past perfor- mance. Moreover, there is a significant time-of-day effect, with algorithms being used at higher rates in the morning than later in the day. 3 Given the evidence contrary to the dynamic algorithm aversion hypothesis, we next consider three other possible algorithm use determinants: the need for precision, deviations from algorithm recommendations, and the number of input sources to consider for an algorithm. First, we find that algorithm use increases as the need for precision increases. This finding is consistent with an egocentric bias (Kruger 1999) – more specifically a pattern of overconfidence of one’s own judgment for easy tasks (requiring less precision) but under-confidence of one’s own judgment for hard tasks (requiring more precision), as argued in Dietvorst et al. (2015). Second, we find that deviations from algorithm recommendations lead to reduced future algorithm use. This finding is consistent with the notion that deviations represent lower appreciation for the algorithm’s advice (Logg et al. 2019). Third, leveraging a fortunate experimental design feature from the original study, we use a differences-in-differences (DID) analysis to show that exposing patients to multiple (potentially differing) measurements required for an algorithm input reduces their algorithm use. This result suggests that multiple measurements may cause people to experience cognitive dissonance (Festinger 1957) and react by avoiding the algorithm altogether. Our results, as summarized in Table 1.8, contribute to the existing literature by analyzing field evi- dence that test several of the literature’s hypotheses regarding algorithm use: challenging some of these conjectures, while supporting others. In doing so, we also generate new insight into when we should expect algorithm use to be higher or lower, and levers one can pull to influence these usage patterns. More gen- erally, our work contributes to the burgeoning operations management literature on how human-algorithm interactions impact performance, such as when and why people may deviate from an algorithm’s recommen- dations (Caro and de Tejada Cuenca 2018, Sun et al. 2020, Kesavan and Kushwaha 2020) and how fairness concerns may lead to greater appreciation for algorithms (Bai et al. 2020) as well as on precision medicine (?Dai and Tayur 2020, Adjerid et al. 2019, Ayvaci et al. 2021, Kamalzadeh et al. 2021, Bertsimas et al. 2017). Going beyond the operations management literature, there are a rising number of empirical works surrounding human-algorithm interactions that answer inquiries such as how expert algorithm recommen- dations can impact ones’ health insurance choices (Bundorf et al. 2019), and how the representation of AI matters in impacting performance outcomes (Luo et al. 2019, Glikson and Woolley 2020, Tong et al. 2021). The rest of the paper is structured as follows. First, we describe our data and setting inx1.3. x1.4 describes the analyses we perform for testing dynamic algorithm aversion in the field.x1.5,x1.6, andx1.7 follow up on our results of dynamic algorithm aversion and consider the impact of three algorithm use determinants.x1.8 provides a discussion and concludes the paper. 4 1.2 Hypotheses Development In this section, we use the literature to motivate several hypotheses about factors that may influence al- gorithm use. Here, we define algorithm “use” as following the required steps to observe the algorithm’s recommendation. It is distinct from adherence to the algorithm’s advice. We start with the role of previ- ous performance feedback (x1.2.1). Then, we address the desire to be precise (x1.2.2). Next, we consider past deviations from the algorithm’s recommendation (x1.2.3). Finally, we discuss the issue of conflicting algorithm inputs (x1.2.4). 1.2.1 Previous Performance Feedback In settings with repeated decisions under risk and outcome feedback, people have the opportunity to learn and adapt from their experience. In such settings, people (a) naturally gravitate towards the option that ap- pears the best in light of their experienced outcomes with that option, and (b) tend to weight recent outcomes more than earlier ones (Hertwig et al. 2006). These intuitive patterns suggest that a positive outcome with an algorithm should increase one’s likelihood of using the algorithm next time, while a negative outcome should decrease one’s likelihood of doing so again next time. Similarly, a positive outcome using only hu- man judgment should increase one’s likelihood of using only human judgment next time, while a negative outcome should decrease one’s likelihood of doing so again next time. However, as mentioned in the introduction, recent notable experimental evidence suggests that these us- age responses to positive and negative outcomes areasymmetric for algorithms relative to human judgment. Specifically, Dietvorst et al. (2015) hypothesize that there exists “a general tendency for people to more quickly lose confidence in algorithmic than human forecasters after seeing them make the same mistake” (p.115). We therefore formulate the following hypothesis: Hypothesis 1 (Dynamic Algorithm Aversion). Thereisanasymmetricusageresponsetoperformancefeed- backthatfavorsthehumanoverthealgorithm. We emphasize that we refer to this hypothesis as the dynamic algorithm aversion hypothesis, where we add the word “dynamic” to indicate that it refers to a response pattern to previous decisions and outcomes. It does not refer to a general tendency for humans to favor humans over algorithms. In fact, as discussed in Logg et al. (2019), people generally tend to appreciate and take the advice of an algorithm more than they do a human’s advice – and such behavior does not contradict dynamic algorithm aversion. 5 1.2.2 Task Difficulty and the Need for Precision When making comparative ability judgments, the literature in judgment and decision-making suggests that people tend to display an egocentric bias that causes them to (i) overestimate their relative ability for easy tasks for which they have high absolute performance, but (ii) underestimate their relative ability for hard tasks for which they have low absolute performance (e.g., see Kruger 1999). In the context of choosing between one’s own judgment and an algorithm’s recommendation, this egocentric bias implies a relatively stronger reliance on one’s own judgment for “easier” tasks relative to “harder” tasks. Consistent with this idea, Dietvorst et al. (2015) found that participants choose to use the algorithm more when they needed to be very precise (within 5 percentile points) in order to earn a bonus compared to when they did not need to be as precise (within 20 percentile points). Supporting this finding, they found that “although participants’ confidence in themodel’s forecasting ability did not differ between the 20-percentile and the other payment conditions, they were significantly more confident in their own forecasting ability in the 20-percentile condition” (p. 120). In other words, peoples’ confidence in their relative ability (compared to the algorithm) was higher for easier tasks requiring less precision than it was for harder tasks requiring greater precision. We therefore hypothesize: Hypothesis 2 (Need for Precision). Peoplewilltendtorelyonthealgorithmmoreforhardertasksrequiring greaterprecisionrelativetoeasiertasksrequiringlessprecision. 1.2.3 Previous Deviations Commonly, people over-ride or deviate from an algorithm’s recommendation (Kesavan and Kushwaha 2020, Caro and de Tejada Cuenca 2018, Sun et al. 2020, Karlinsky-Shichor and Netzer 2019, Wladawsky-Berger 2020). Depending on the context, researchers find that these deviations sometimes improve performance but other times they degrade performance (Kesavan and Kushwaha 2020, Tan and Staats 2020, Ibanez et al. 2018, Fildes et al. 2009). Deviations are more likely to improve performance in situations where the human has more private information or domain knowledge that the algorithm cannot access (Lawrence et al. 2006, Sun et al. 2020, Ibrahim et al. 2021). The literature is relatively quiet about whether deviations from algorithms are associated with increased or decreased algorithm use. However, based on findings in which using the algorithm is either mandatory 6 or never used, we believe there are reasons to hypothesize that observing deviations could lead to higher or lower algorithm use. On one hand, observing deviations from an algorithm’s recommendation may indicate that a person is using the algorithm’s advice and that they appreciate their flexibility to modify it. As mentioned above, a person may be using the algorithm’s recommendation as an anchor, then adjusting based on their private information. Relatedly, Dietvorst et al. (2018) found that people are more willing to use algorithms when they are allowed to deviate freely or up to a certain extent from its recommendations (versus forced to take the algorithm’s recommendations). Thus, we have: Hypothesis 3a (Deviations From Algorithm Recommendations A). Deviations from algorithm recommen- dationsareassociatedwithanincreaseinalgorithmuse. On the other hand, researchers tend to interpret a human who weights the (always observed) advice of an algorithm more heavily as they make their judgments as “algorithm appreciation” (Logg et al. 2019). Therefore, one might conjecture that observing fewer deviations from the algorithm indicates that a person appreciates the process of using the algorithm more and therefore will also use it more in the future. Thus, we also have: Hypothesis 3b (Deviations From Algorithm Recommendations B). Deviations from algorithm recommen- dationsareassociatedwithadecreaseinalgorithmuse. 1.2.4 Multiple Algorithm Input Sources Using an algorithm often requires the user to provide input data on which the algorithm would make its recommendation. Of course, algorithms rely on accurate data inputs to make their recommendations; as the saying goes “garbage in, garbage out.” So, how do users respond when they receive multiple, potentially conflicting measurements but can only input one value into the algorithm? We again believe there exists the- ories that suggest that users could respond to such conflicting information by either increasing or decreasing their algorithm use. On one hand, having multiple measurements may increase one’s confidence in their ability to provide an accurate single input due to an appreciation for the wisdom of crowds (e.g., see Surowiecki 2005). Having multiple measurements is likely to reduce the problem of measurement noise if one is able to aggregate the data appropriately (e.g., by taking the average, see Clemen 1989). Therefore, we hypothesize: 7 Hypothesis 4a (Conflicting Inputs A). People who receive multiple sources of measurements for an algo- rithminputaremorelikelytousethealgorithmthanpeoplewhoreceiveonlyonemeasurement. On the other hand, having multiple measurements but being required to only provide one number may make the user uncomfortable with the whole process. That is, they may experience cognitive dissonance (Festinger 1957). In these situations, research on cognitive dissonance suggests that people are motivated to reduce this tension – and in this case, one way to do so is to avoid using algorithms altogether. Relat- edly, people often underappreciate the value of the wisdom of crowds (Larrick and Soll 2006) or may only consider the uncertainty when provided multiple conflicting data points (e.g., see Juslin et al. 2007). These phenomena support the following opposite hypothesis: Hypothesis 4b (Conflicting Inputs B). People who receive multiple sources of measurements for an algo- rithminputarelesslikelytousethealgorithmthanpeoplewhoreceiveonlyonemeasurement. 1.3 Empirical Setting and Data Overview 1.3.1 Empirical Setting To test our hypotheses, we leverage a data set from a field experiment that tracked Type 1 Diabetes patients’ algorithm usage patterns and performance (Aleppo et al. 2017). Type 1 Diabetes patients, which compose about 1 in 250 people worldwide (DiMeglio et al. 2018), have multiple opportunities to use algorithms and receive performance feedback on such decisions as part of their daily routines. The condition is such that patients can no longer generate insulin in their pancreas (which helps to decrease glucose levels), and so they need to deliver insulin appropriately in stochastic control of their glucose levels. Typically, the objective is to keep one’s glucose levels within 70-180 mg/dl and to maintain such levels for 70 percent of the time or more (Battelino et al. 2019). The proportion of time that one’s glucose levels are within range is commonly referred to as “time in range.” Dosing insulin can occur in two domains: basal (insulin to cover baseline needs) and bolus (insulin to cover larger blood glucose spikes from food or to correct from a period of higher blood glucose). The deliv- ery of basal insulin is typically pre-set according to a schedule determined by the patient and provider. On the other hand, bolus insulin delivery decisions require dynamic decision-making to determine the appro- priate amount of insulin to cover blood glucose spikes from food or correct from being above range. Bolus 8 insulin dosing is where the algorithm that we study – bolus calculators – has been developed to aid with human decision making. The algorithm, which is part of a patient’s insulin pump, acts as a decision support tool to determine the appropriate bolus amount. At the time of a proposed bolus, patients can input their glucose level (for the al- gorithm to correct for being out of range) and/or carbohydrates (for the algorithm to correct for food intake). The algorithm, being a set of structural equations that considers human inputs (e.g. carbohydrate entries and settings determined by the patient and their care team, such as insulin to carb ratio and insulin sensitivity factor) and calculated fields (e.g. insulin on board), then determines the dosing recommendation. Patients can then modify the algorithm’s recommendation before dosing. Documentation on how recommendations are computed are available in an insulin pump’s manual. In this sense, the algorithm is “simulatable” (Mur- doch et al. 2019). The American Diabetes Association’s Standards of Care encourages interested patients to use the algorithm in their diabetes self-management instead of doing the computation by themselves (Asso- ciation et al. 2021). The use of algorithm has been shown to reduce postprandial blood glucose levels and improve quality of life (Gross et al. 2003, Klupa et al. 2008, Gonzalez et al. 2016), although the body of evidence is somewhat limited and causal evidence on this matter is hard to entangle (Schmidt and Nørgaard 2014). We utilize data from Aleppo et al. (2017) to conduct our study. The authors of Aleppo et al. (2017) ran a 6 month randomized clinical trial to understand whether there would be any performance impacts of requiring (versus not requiring) patients to perform finger stick measurements for insulin dosing decisions. At the time of their study, patients were required to perform finger stick measurements to make insulin dosing decisions. However, some patients utilized a continuous glucose monitor (CGM) – a tool that provides real- time performance feedback – and were relying only on the CGM information to make such decisions. They were doing so even though, at the time, it was not recommended by the FDA. Following a pre-intervention period of 2 to 10 weeks to understand self-management behaviors, Aleppo et al. (2017) randomized eligible participants into two conditions: one-third of them had access to their finger stick measurements (which we will refer to as the “Finger Stick Confirmation” condition), and the rest were blinded from such information for self-management decisions (which we will refer to as the “No Finger Stick Confirmation” condition). Note that patients in both conditions were instructed to perform the same measurement-related actions; the 9 conditions were designed to only be different in terms of the measurements observed. 1 The findings from the study suggested that patients who were blinded from the results of finger stick measurements for self- management decisions were able to obtain similar performance outcomes as those who were not, i.e. an average 65% time in range was achieved in both conditions. This suggested that finger stick measurements were not a necessary task to perform prior to making a self-management decision. Aleppo et al. (2017)’s primary objective was to study the impact of (not) having finger stick measure- ments available on performance. It collected data on, but did not report, insights regarding algorithm use. Our work extends their original study by investigating the human-algorithm interactions that occurred for insulin boluses over the course of the experiment. 1.3.2 Data Description Aleppo et al. (2017)’s detailed data collection using platforms like Tidepool (Snider 2018) to pull dia- betes device data onto one platform enabled our study 2 . The data contain deidentified data for 224 patients enrolled in the study 3 . After removing duplicate records, we have 71,613 bolus decisions during the pre- intervention period and 235,149 bolus decisions during the experimental period. For every patient, we are able to see event level data on every bolus delivered, interactions with the algo- rithm (which is linked to a corresponding bolus, enabling us to identify when a patient used the algorithm or not for a bolus decision 4 ), and glucose readings from their CGM (1 record every 5 minutes of being worn). Moreover, we have patients’ outpatient clinic visit data during the study as well as their demographics data. Patients had many opportunities to decide whether or not to use the algorithm, with an average (standard deviation) of 5.85 (2.15) boluses performed per patient per day during the pre-intervention and experimental periods. Algorithm use across all patients was generally high. Among the 224 patients, 91% or 204 patients 1 The number of finger sticks executed daily during the experimental period was, on average (standard deviation) 5.40 (1.50) in the Finger Stick Confirmation condition and 5.97 (1.54) in the No Finger Stick Confirmation condition. Only when performing calibrations for their CGM or under certain physical conditions were the No Finger Stick Confirmation condition patients allowed to see the finger stick measurement. If a finger stick measurement performed for a CGM calibration coincided with a mealtime bolus, patients were instructed to utilize the CGM reading rather than finger stick measurement as the basis for determining their insulin dose. 2 The source of the data is the REPLACE-BG Study Group, but the analyses, content and conclusions presented herein are solely the responsibility of the authors and have not been reviewed or approved by the REPLACE-BG Study Group. 3 Aleppo et al. (2017) originally randomized 226 patients but 2 patients did not come back to clinic and hence they do not have data for the 2 patients. 4 When using an algorithm, the user is asked to provide two inputs: carbohydrate and/or blood glucose inputs. If the two inputs into the algorithm are 0 or missing, this implies that the user skipped over this input or noted that it was not applicable. The combination of both inputs being 0 or missing lead to a bolus amount recommendation of 0. Therefore, we consider both inputs being 0 or missing as no algorithm use. 10 used the algorithm at least once during the pre-intervention and experimental periods. For the 204 patients who used the algorithm at least once during these periods, 79.4% of their boluses were associated with algorithm use. Utilizing the insulin pump and CGM data, which are available at the event-level, we build our datasets for analysis at two levels of aggregation: bolus-level and 6 hour level. We describe the level of aggregation, variables we utilize, and sample selection for each analysis in their respective sections. 1.4 Does Dynamic Algorithm Aversion Exist in the Field? Recall, Hypothesis 1 regards dynamic algorithm aversion – an asymmetric response to performance feed- back that favors the human over the algorithm. To test for it in our context of Type 1 Diabetes, we need to operationalize the meaning of an asymmetric response to the “same mistake.” To do so, we consider whether how people respond to high versus low performance is different for algorithms versus humans. For this sec- tion, as forx1.5-1.7, we proceed by describing the data set and relevant measures, summary statistics and model-free evidence, followed by our empirical strategy and corresponding results. 1.4.1 Data and Measures For this analysis, we build a patient-6 hour level panel to understand the effects of previous algorithm use and performance feedback on subsequent algorithm use. We perform this level of aggregation as boluses typically occur during meal-times, with additional boluses performed on a as-needed basis. As a bolus insulin decision can impact the body for up to 4 hours, this patient-6 hour panel approach allows us to understand how previous periods’ decisions and the consequent performance impacts of such decisions affect subsequent algorithm use decisions in a way that rules out simultaneity concerns. The majority of patients do not typically bolus from 12am to 6am as they are asleep, and only about 10% of the bolus decisions occurred from 12am to 6am. Hence, we focus on the time periods from 6am to 12am and define three 6-hour periods for each day: morning (6am-12pm), afternoon (12pm-6pm), and evening (6pm-12am). We construct our dependent measure, PctAlgorithmUse it , as follows. We understand whether a bolus j of patient i is associated with algorithm use by merging the data files on algorithm use and bo- lus together – if there is a match, then we can note the bolus j for patient i as one that was delivered with use of the algorithm (AlgorithmUse ij = 1, else = 0). Then, we build the measure of interest, 11 PctAlgorithmUse it , by computing the algorithm use percentage for patienti in 6-hour periodt, defined as number of boluses in periodt that used the algorithm for patienti number of boluses in periodt for patienti . We have two independent variables of interest,HighAlgorithmUse it1 andOutofTarget it1 . When constructing HighAlgorithmUse it1 , we classify patient i’s algorithm use in period t 1 as high or low based on their pre-intervention algorithm use behavior. We do this as algorithm use can vary strongly across individuals, with some using it frequently while others not using it often. The pre-intervention period considers the time between enrollment of the patient into the study and prior to the start of the experimental portion of the study. For each patient i, we calculate the patient’s algorithm use in each period t during the pre-intervention period. We then setHighAlgorithmUse it1 equal to 1 ifPctAlgorithmUse it1 is greater than the median value of the patient’s algorithm use in the pre-intervention period and 0 otherwise; if the median value of the patient’s algorithm use in the pre-intervention period is equal to 100%, then we HighAlgorithmUse it1 = 1 ifPctAlgorithmUse it1 = 100 and 0 otherwise. We utilize the CGM records to construct OutofTarget it1 . Recall that guidelines recommend that patients maintain a 70 percent or higher time in range (70-180mg/dL) for excellent glucose control (Battelino et al. 2019). As such, we define OutofTarget it1 as a binary variable that equals 1 when patient i’s performance at time periodt1 achieves less than 70% time in range, defined by identifying the number of CGM readings that are within 70-180 mg/dL divided by the total number of CGM readings in the 6-hour periodt1, and 0 otherwise. We include the following control variables. We include patient fixed effects because algorithm use varies across individuals. DaysSinceVisit it represents the number of days since the last outpatient clinic visit for patienti and the current periodt of the observation. Time t tracks the number of 6 hour time periods t that have passed since the start of the experiment day for patienti. Note that the dates were anonymized so that one can only detect the days since the patient enrolled into the study. Both variables allow us to control for time-related trends. We also control for variation in algorithm use across the span of a day by including the following variables:Morning t which represents 6am-12pm and represents the base category for comparison,Afternoon t which represents 12pm-6pm, andEvening t which represents 6pm-12am. 12 1.4.2 Sample Selection We perform the following sample selection procedure. As explained above, we first construct a patient-6 hour level panel using 224 patients and their bolus decisions from 6am to 12am during the experimental period, which leads to 125,956 observations. We also drop 23,541 observations belonging to 224 patients during which patients did not complete a bolus. Second, we focus on patients who went through the entire experiment, to ensure that the effects we observe are not being driven by patients who found the conditions set forth by Aleppo et al. (2017) to be problematic. This leads us to drop 7 patients and their 1,583 ob- servations. Third, as Hypothesis 1 considers the effect of algorithm use in the previous period on current algorithm use, we filter the data for records where patients did not complete a bolus in the focal and previous period (where previous period is 6pm-12pm for focal periods 6am-12am), dropping 9,261 observations from 216 patients. Fourth, as the hypothesis considers the effect of previous performance on current algorithm use, we drop another 4,504 records from 160 patients that are missing performance data from the previous period. This can occur when patients are not wearing their CGM. Lastly, recall that one of our independent variables of interest,HighAlgorithmUse it , considers patient pre-intervention algorithm use data. We drop 25 patients who do not have pre-intervention algorithm use data and their 7,973 observations. This leads us to focus on 192 patients and their 79,094 observations. Of the 192 patients, the average age is 44.38 (standard deviation 13.80), with 75 patients (39.06%) older than 50, 98 patients (51.04%) female, and 127 patients (66.15%) part of the No Finger Stick Confirmation condition. 1.4.3 Summary Statistics Table 1.1: Testing Hypothesis 1: Summary Statistics Mean Standard Deviation 1) 2) 3) 4) 5) 6) 7) 8) 1)PctAlgorithmUse it 83.31 32.81 1 2)HighAlgorithmUse it1 0.78 0.41 0.5055* 1 3)OutOfTarget it1 0.53 0.50 0.0118* -0.0512* 1 4)DaysSinceVisit it 19.49 14.16 -0.0296* -0.0243* 0.0044 1 5)Time t 358.95 211.75 -0.0389* -0.0347* -0.0013 0.5325* 1 6)Morning t 0.34 0.47 0.0479* -0.0368* 0.0576* -0.002 0.0019 1 7)Afternoon t 0.33 0.47 -0.0170* 0.0588* -0.0945* 0.0012 -0.0028 -0.4945* 1 8)Evening t 0.34 0.47 -0.0310* -0.0214* 0.0361* 0.0009 0.0009 -0.5088* -0.4967* 1 Note. This table provides summary statistics regarding the patient-6 hour panel data we use to test Hypothesis 1. N = 79,064 patient-6 hours. Summary statistics for the variables of interest are available under Table 1.1. In general, overall algo- rithm use is high in the sample, with PctAlgorithmUse it having a mean of 83.31%. We also observe 13 that the average value of HighAlgorithmUse it1 is high at 0.78. This is because the median value of algorithm use in the pre-intervention period is equal to 100% for 153 of 192 patients, which again, leads to HighAlgorithmUse it1 equaling 1 ifPctAlgorithmUse it1 is 100% in periodt1. The correlation betweenHighAlgorithmUse it1 andPctAlgorithmUse it of 0.5055 suggests a ha- bitual momentum effect: patients’ reliance on the algorithm in the previous period is positively correlated with use of the algorithm in the current period. 1.4.4 Model Free Evidence To begin our analysis on dynamic algorithm aversion, we note some summary statistics of algorithm use by previous frequent algorithm use and out of target. Table 1.2 shows that when patients rely on the algorithm last period (HighAlgorithmUse it1 = 1), if they do not err with the algorithm (OutofTarget it1 = 0), the average value ofPctAlgorithmUse it is 91.84%, while if they do err with the algorithm (OutofTarget it1 = 1), the average value of PctAlgorithmUse it is 92.20%. On the other hand, when users rely on them- selves (HighAlgorithmUse it1 = 0) and do not err (OutofTarget it1 = 0), the average value of PctAlgorithmUse it is 45.77%, while if they do err (OutofTarget it1 = 1) the average value of PctAlgorithmUse it is 56.11%, a 10.34 percentage point increase. In other words, if the algorithm errs, there is almost no change in using the algorithm the next period than had the algorithm not erred, whereas if a human errs, there is an increase in algorithm use the next period than had they not erred. We also observe thatPctAlgorithmUse it is significantly higher whenHighAlgorithmUse it1 = 1 compared to whenHighAlgorithmUse it1 = 0, which suggests that momentum may be playing a large role in driving algorithm use. In what follows, we test these insights formally in reduced form models. Table 1.2: Testing Hypothesis 1: Model Free Evidence OutofTargetit1 = 0 OutofTargetit1 = 1 HighAlgorithmUseit1 = 1 91.84 (22.19) 92.20 (22.28) N = 30,046 N=31,942 HighAlgorithmUseit1 = 0 45.77 (44.04) 56.11 (43.01) N=7,229 N=9,877 Note. This table provides the mean and standard deviation (in parentheses) of PctAlgorithmUse it , a patient’s algorithm use in a 6-hour window, by previous frequent algorithm use, HighAlgorithmUse it1 , and out of target, OutofTarget it1 .N = 79;094 patient-6 hours 14 1.4.5 Identification To formally test dynamic algorithm aversion as noted in Hypothesis 1, we utilize the following panel data model: PctAlgorithmUse it = 1 + 1 HighAlgorithmUse it1 + 2 OutofTarget it1 + 3 HighAlgorithmUse it1 OutofTarget it1 (1.1) + i + t + it We are interested in understanding how previous frequent use of the algorithm (HighAlgorithmUse it1 ) and its interplay with previous performance feedback (OutofTarget it1 ) affect subsequent use of the al- gorithm (PctAlgorithmUse it ). Hence, we include HighAlgorithmUse it1 as a main effect, and we includeOutofTarget it1 and its interaction withHighAlgorithmUse it1 to understand the differential responses to performance feedback. We control for patient fixed effects i and time controls t , which in- cludeDaysSinceVisit it ,Time t ,Afternoon t , andEvening t . We cluster our standard errors by patient. Note that 1 enables us to compare the main effect of using the algorithm frequently last period versus limited or no use of the algorithm last period. If Hypothesis 1 holds, then the sum of 2 and 3 should be negative and statistically significant. That is, relative to the effect of human-driven failure (rather than algorithm-driven failure), when people use algorithms and err, they should be less likely to use the algorithm. Comparing the magnitudes of 2 + 3 versus 2 will also let us understand the differences in reactions to negative performance feedback when one had relied on algorithm driven versus human driven decisions. 1.4.6 Results Table 1.3 shows the results for the model outlined inx1.4.5, and Figure 1.1 illustrates the main findings. Patients didnot significantly avoid using algorithms after erring with an algorithm, instead being only 2 + 3 =0:124 percentage points likely to use the algorithm (not statistically significant atp < 0:05; refer to the green solid line in Figure 1.1). However, conditional on human-driven decisions last period, patients did significantly favor algorithm use after seeing a human-driven error versus success, with patients being 1.798 percentage points more likely to use the algorithm per 2 ; refer to the dashed blue line in Figure 1.1. This asymmetry is in the opposite direction of the dynamic algorithm aversion hypothesis (Hypothesis 1). 15 Table 1.3: Testing Hypothesis 1: Regression Results PctAlgorithmUseit HighAlgorithmUseit1 6.089*** (1.599) OutOfTargetit1 1.798* (0.817) HighAlgorithmUseit1OutOfTargetit1 -1.922* (0.884) DaysSinceVisitit -0.015 (0.013) Timet -0.005** (0.002) Afternoont -3.319*** (0.606) Eveningt -3.501*** (0.658) Constant 45.864*** (0.733) Observations 79,094 R 2 0.603 Patient Fixed Effects Yes Time Controls Yes Mean of Dependent Variable 83.31 Note. Robust standard errors in parentheses, clustered by patient. Significance re- ported as +p< 0:1, *p< 0:05, **p< 0:01, ***p< 0:001. Figure 1.1: Testing Hypothesis 1: Predicted Percent Algorithm Use by Previous Algorithm Use and Out of Target Note. The predicted percentage algorithm use in the focal 6 hour period, by the previous period’s algorithm use (where Human corresponds to HighAlgorithmUse it1 = 0, and Algorithm corresponds toHighAlgorithmUse it1 = 1) and out of target (where In Target corresponds to OutofTarget it1 = 0, and Out of Target corresponds toOutofTarget it1 = 1), is shown on the figure. 16 The regression results also indicate that previous-period algorithm use is an important predictor. If the patient used the algorithm last period, they are more likely to use the algorithm again this period irrespective of whether or not they went out of target. For example, conditional on failing to be in target, patients are still more likely to use the algorithm next period if they went out of target from algorithm-driven decisions relative to human-driven decisions ( 1 + 3 = 4:167; refer to the gap between the green and blue lines conditional on being out of target). Similarly, conditional on being successfully being in target, patients are more likely to use the algorithm again if the success comes from algorithm-driven decisions relative to human-driven decisions ( 1 = 6:089 percentage points more likely). Finally, we note that algorithm use is higher in the morning (6am-12pm) than in the afternoon (12pm- 6pm) or evening (6pm-12am). These differences are statistically significant (-3.319 and -3.501 percentage points respectively). We find similar results running the model (1.1) at the daily level, as shown in Table 1.9. 1.4.7 Discussion Our results are inconsistent and opposite from the dynamic algorithm aversion hypothesis (Hypothesis 1). We find an asymmetry in usage response to performance feedback that favors the human over the algorithm: patients more readily switch to using algorithms if they went out of target with human-driven decisions versus switching away from using algorithms if they went out of target with algorithm-driven decisions. Thus, we refer to this result as supporting “dynamic algorithm appreciation.” The evidence also suggests that algorithm use is habitual or “sticky” – recent algorithm use is a strong predictor of future algorithm use – and that it is higher in the morning than later in the day. 1.5 Do People use Algorithms More for Difficult Tasks? Hypothesis 2 predicts that one’s need for precision would increase one’s likelihood to use the algorithm. To test this hypothesis in our context, we proxy a patient’s desire for precision via their glucose level at the time of the bolus decision. One’s glucose levels are associated with substantially different needs for precision when it comes to dosing insulin. Being below the recommended range of 70-180 mg/dL, i.e., in hypogyl- cemia, is a key concern in regards to diabetes care, with one strongly suggested to minimize their time spent in the condition (Vigersky 2015, Battelino et al. 2019). One must be careful to not overdose on insulin when in hypogylcemia, given that consequential side effects, such as lost of consciousness and death, can 17 be more immediately felt. On the other hand, being above the recommended range, i.e., hyperglycemia, is not desirable but can be addressed via insulin and activity; consequential side effects (e.g., ketoacidosis) are not experienced unless one spends prolonged time in substantially high glucose levels. Had one overdosed on insulin when above range, they could have a larger time window to mitigate the consequences via other countermeasures (e.g., food intake, glucose tablets, stopping of insulin delivery). Consequently, patients must be careful when dosing insulin at lower glucose levels and should be precise if they do so, whereas at higher glucose levels they could be less precise. As noted in Heinemann (2018), a patient observed that “the higher I am the less the accuracy [of the reading] matters because I am going to bolus extra insulin anyway,” suggesting that patients may care less for precision about their status and corresponding decisions when their glucose levels are high. 1.5.1 Data and Measures Given that our measure for one’s need for precision – one’s glucose level at the time of the bolus – is measured at the bolus level, we perform our empirical analysis at this level. Our dependent variable of interest isAlgorithmUse ij , a binary variable which represents whether patienti used the algorithm (= 1) or not (= 0) for bolusj. Our independent variable of interest isAvgCGMReadingAtBolus ij , the average of glucose reading(s) from 15 minutes before and up to the time of the bolusj. We utilize the following control variables. We include patient fixed effects as algorithm use varies across individuals. We also control forDaysSinceVisit ij , which represents the number of days since the last outpatient clinic visit for patienti and date of bolusj,Time j , a variable that is defined as the time from the start of the randomization day (measured in days, with partial days allowed), and hour fixed effects. 1.5.2 Sample Selection We continue to focus on patients who went through the entire experiment to ensure that the effects we observe are not being driven by patients who found the conditions set forth by Aleppo et al. (2017) to be problematic. This leads us to drop 3% or 7 of the original 224 patients, who withdrew from the study, and their 3,440 boluses from the experimental period. Hence, we initially consider 217 patients and their 231,709 boluses during the experiment. We additionally drop 20 patients (and their 18,132 boluses) who never used the algorithm during the experiment. We also remove 19,042 boluses, coming from 197 unique 18 patients, which do not have CGM reading data observed 15 minutes before the bolus. Ultimately, this leads us to focus on 197 patients and their 194,535 boluses for our analysis. Of the 197 patients, the average age is 44.36 (standard deviation 13.76), with 77 patients (39%) older than 50, 100 patients (50.76%) female, and 129 patients (65.48%) part of the No Finger Stick Confirmation condition. 1.5.3 Summary Statistics and Model Free Evidence The average value ofAvgCGMReadingAtBolus ij is 177.31 mg/dL, with a standard deviation of 67.98 mg/dL. We find that 53.17% (103,426) of boluses occur within 70-180 mg/dL, 2.4% (4,662) of boluses occur below 70 mg/dL, and 44.44% (86,447) of boluses occur above 180 mg/dL. Thus, the majority of boluses occur within range, with more boluses occurring at higher glucose levels when the glucose level is not within range. Figure 1.2: Testing Hypothesis 2: Model Free Evidence Note. The average algorithm use and 95 percent confidence interval, computed across patients by average glucose reading levels 15 minutes prior to the bolus, is reported. Figure 1.2 plots the average algorithm use and 95 percent confidence interval across patients, computed by different average glucose reading levels 15 minutes prior to the bolus (every 10 mg/dL starting from 40 mg/dL to 400 mg/dL). We observe that algorithm use generally decreases as one’s glucose levels increase at the time of bolus, which provides preliminary support for Hypothesis 2. 19 1.5.4 Identification To test Hypothesis 2, we build the following linear probability model: AlgorithmUse ij =+ 1 AvgCGMReadingAtBolus ij + i + j + ij (1.2) We control for patient fixed effects i and time fixed effects j , which include the hour fixed effects, DaysSinceVisit ij , and Time j . We cluster our robust standard errors by patient. To show support for Hypothesis 2, 1 should be negative and significant. 1.5.5 Results Table 1.4 shows the regression results. Consistent with Hypothesis 2, note that 1 is -0.000390 and is significant at thep< 0:001 level. The standard deviation ofAvgCGMReadingAtBolus ij is 67.98 mg/dL, which means that an one standard deviation increase in CGM reading at time of bolus would lead to a 2.65% percentage point decrease on algorithm use (0:000390 67:98 =0:0265). We note that we observe similar results when we run the model in (1.2) with a logit specification, as shown in Table 1.10. Table 1.4: Testing Hypothesis 2: Regression Results V ARIABLES AlgorithmUseij AvgCGMReadingAtBolusij -0.000390*** (0.000080) Constant 0.382388*** (0.016877) Observations 194,535 R 2 0.457019 Patient Fixed Effects Yes Time Controls Yes Mean of Dependent Variable 0.795 Note. Robust standard errors in parentheses, clustered by patient. Significance reported as +p< 0:1, *p< 0:05, ** p< 0:01, ***p< 0:001. 1.5.6 Discussion We find evidence that people are more likely to use algorithms when their blood glucose levels are lower. Because lower glucose levels tend to require the patient to achieve more precision in their bolus amounts, these results suggest that people are more likely to use algorithms for more difficult tasks (Hypothesis 2). 20 1.6 Do Previous Deviations from Algorithm Recommendations Impact Algorithm Use? Recall that Hypotheses 3a and 3b predict that previous deviations from algorithm recommendations are associated with increases or decreases in algorithm use, respectively. To test these hypotheses in our context, we first identify whether a deviation from the algorithm’s recommendation occurred in the previous bolus decision. Then, we analyze whether such deviation, relative to not deviating from the algorithm, may change the likelihood to use the algorithm in the subsequent bolus. 1.6.1 Data Set and Measures As we are interested in understanding the effect of deviating from algorithm recommendations on subsequent algorithm use decisions, we conduct our analysis at the bolus level. For this analysis, we use the dependent variable, AlgorithmUse ij , as inx1.5, which continues to represent whether patienti used the algorithm (= 1) or not (= 0) for bolusj. The primary independent variable of interest isDeviation ij1 , which represents whether patienti de- viated from the algorithm’s recommendation or not for bolusj1. We construct this variable as follows. First, we calculateDeviationAmount ij1 as the difference between the actual bolus insulin amount de- livered (measured in U) and the recommended bolus amount for bolusj1. We then letDeviation ij1 equal 1 ifjDeviationAmount ij1 j> 0:1U, and 0 otherwise. We use 0.1U as a threshold for identifying a deviation given that the precision of the data points are at the tenths or the hundredths digit. We build three additional sets of independent variables to understand whether 1) there may be asymmet- ric effects of deviating below as opposed to above from the algorithm’s recommendation and 2) whether the degree of deviation from the algorithm matters. First, we letDeviateBelow ij1 = 1 if DeviationAmount ij1 <0:1U and 0 otherwise, andDeviateAbove ij1 = 1 if DeviationAmount ij1 > 0:1U and 0 otherwise. Second, to examine whether the absolute amount of deviation matters, we first observe that the pre- intervention median deviation amount conditional on deviating below is -0.68U and that the pre-intervention median deviation amount conditional on deviating above is 0.8U. We then define LargeDeviationBelow ij1 = 1 ifDeviationAmount ij1 <0:68U and 0 otherwise; SmallDeviationBelow ij1 = 1 if0:68U DeviationAmount ij1 < 0:1U and 0 otherwise; 21 SmallDeviationAbove ij1 = 1 if 0:1U < DeviationAmount ij1 0:8U and 0 otherwise; and LargeDeviationAbove ij1 = 1 ifDeviationAmount ij1 > 0:8U. Lastly, to understand whether the relative deviation amount matters, we define PctDeviation ij1 = DeviationAmount ij1 Algorithm Recommendation for bolusj1 . Furthermore, we find that the pre-intervention median percentage de- viation amount conditional on deviating below is -30% and pre-intervention median percentage devia- tion amount conditional on deviating above is 40%. We define LargePctDeviationBelow ij1 = 1 if PctDeviation ij1 <30% and 0 otherwise;SmallPctDeviationBelow ij1 = 1 if 30%PctDeviation ij1 < 0% and 0 otherwise;SmallPctDeviationAbove ij1 = 1 if 0%<PctDeviation ij1 40% and 0 otherwise; andLargePctDeviationAbove ij1 = 1 if PctDeviation ij1 > 40% and 0 otherwise. 1.6.2 Sample Selection Following the same steps described inx1.5, we initially consider 197 patients and their 213,577 boluses during the experimental period. We perform two additional filters for this analysis. First, to understand whether or not one deviated or not previously from the algorithm, we focus in on boluses j where the previous bolus j 1 used the algorithm. This drops 44,141 boluses belonging to 190 patients. Second, some bolus recommendations may be capped by insulin pump settings on how much insulin one can deliver during one bolus. We estimate one’s settings based off of the frequency at which the maximum bolus insulin amount is delivered (e.g. if 20U of insulin is delivered multiple times, there is likely a cap of 20U of insulin delivery defined in one’s settings) and drop 581 boluses corresponding to 34 patients, leading to 168,855 boluses which belong to 194 patients for this analysis. Of the 194 patients, the average age is 44.29 (standard deviation 13.75), with 75 patients (38.66%) older than 50, 100 patients (51.55%) female, and 128 patients (65.98%) part of the No Finger Stick Confirmation condition. 1.6.3 Summary Statistics and Model Free Evidence Deviation ij1 is equal to 1 for 16.72% of boluses in the sample (28,228 out of 168,855 boluses), suggesting that when patients use algorithms, they tend to not deviate from the algorithm’s recommendations. Of the 28,228 boluses where one previously deviated, 65.31% had deviated above the algorithm’s recommendation 22 and 34.69% had deviated below the algorithm’s recommendation. The mean ofDeviationAmount ij1 for these boluses is 0.46U, with standard deviation 2.05U. The mean of PctDeviation ij1 is 83.79%, with standard deviation 453%. Figure 1.3: Variation in Deviations from Algorithm Recommendations on Algorithm Use (a) By Previous Deviation Amount (b) By Previous Deviation Percentage Figure 1.3(a) plots the average algorithm use and 95 percent confidence interval across boluses, com- puted by different deviation amounts (ranging from below -1U to above 2.2U (the cutoffs for the categories approximately correspond to the 10th and 90th percentile ofDeviationAmount ij1 ). Figure 1.3(b) plots the average algorithm use and 95 percent confidence interval across boluses, computed by different percent- age deviation from the algorithm recommendation (ranging from below -50% to above 160%, approximately corresponding to the 10th and 90th percentile ofPctDeviation ij1 ). The general observation is that larger deviations tend to have a negative effect on subsequent algorithm use, with the effect appearing to be signif- icant especially when deviating below. Hence, this figure is inconsistent with Hypothesis 3a but consistent with Hypothesis 3b. 1.6.4 Identification To test Hypotheses 3a and 3b, we build the linear probability model: AlgorithmUse ij =+ 1 Deviation ij1 + i + j + ij (1.3) 23 Like the models inx1.5, we control for patient fixed effects i and time fixed effects j , which include the hour fixed effects, DaysSinceVisit ij , and Time j . We cluster our robust standard errors by patient. To show support for Hypothesis 3a, 1 should be positive and significant. However, as suggested by Figure 1.3, 1 may instead be negative and significant, which would provide support for Hypothesis 3b instead. Furthermore, to examine the potential symmetric effects of deviating below as opposed to above from the algorithm’s recommendation, as well as whether the absolute or relative degree of deviation from the algorithm matters, we substituteDeviation ij1 with three sets of independent variables explained inx1.6.1 and estimate the respective models. 1.6.5 Results Table 1.5 shows the regression results. Column 1 demonstrates that overall, deviations have a negative impact on algorithm use, with a 2.1 percentage point reduction in algorithm use for the focal bolus when one deviated previously. Column 2 decomposes whether deviating below or above from the algorithm’s recommendation has differential effects. Relative to not deviating from the algorithm’s recommendation, we observe that patients use algorithms 4.4 percentage points less when they deviated below and use algo- rithms 0.8 percentage points less when they deviated above. Column 3 and 4 test whether smaller or larger deviations from the algorithm may be driving the results. They suggest that larger deviations in magnitude and on a relative basis are driving the results for deviating below. Note that Column 4 has a smaller sample size as the recommended amount for bolusj1 is 0 for 4,413 boluses, leading to an inability to calculate PctDeviation ij1 . We observe similar findings when we run the models above under logit specifications, as seen in Table 1.11. 1.6.6 Discussion In summary, the results contradict Hypothesis 3a and instead support Hypothesis 3b: observing deviations from algorithm recommendations tends to lead to lower algorithm use. Moreover, the evidence suggest that observing larger deviations predicts even lower algorithm use. 24 Table 1.5: Testing Hypothesis 3a and 3b: Regression Results (1) (2) (3) (4) V ARIABLES AlgorithmUseij AlgorithmUseij AlgorithmUseij AlgorithmUseij Deviationij1 -0.021*** (0.005) DeviateBelowij1 -0.044*** (0.012) DeviateAboveij1 -0.008+ (0.004) LargeDeviationBelowij1 -0.079*** (0.019) SmallDeviationBelowij1 -0.010 (0.007) SmallDeviationAboveij1 -0.004 (0.004) LargeDeviationAboveij1 -0.011+ (0.006) LargePctDeviationBelowij1 -0.065** (0.021) SmallPctDeviationBelowij1 -0.023*** (0.007) SmallPctDeviationAboveij1 -0.011** (0.004) LargePctDeviationAboveij1 -0.002 (0.005) Constant 0.406*** 0.407*** 0.408*** 0.411*** (0.009) (0.008) (0.008) (0.008) Observations 168,855 168,855 168,855 164,442 R 2 0.241 0.242 0.242 0.242 Patient Fixed Effects Yes Yes Yes Yes Time Controls Yes Yes Yes Yes Mean of Dependent Variable 0.897 0.897 0.897 0.898 Note. Robust standard errors in parentheses, clustered by patient. Column 4 has a smaller number of observations as 0U was the algorithm’s recommendation for 4,413 boluses’j1, leading to an inability to calculatePctDeviation ij1 . Significance reported as +p< 0:1, *p< 0:05, **p< 0:01, ***p< 0:001. 25 1.7 Does Reducing Algorithm Input Sources Impact Algorithm Use? Recall Hypotheses 4a and 4b regard how the presence of multiple algorithm input sources may increase or decrease algorithm use, respectively. To test these hypotheses in our context, we use the experimental manipulation in Aleppo et al. (2017), which varied the number of information sources to be considered when making self-management decisions. Both conditions performed finger stick measurements as part of their diabetes care, and patients who utilize the algorithm are asked to input their current glucose reading as an input into the algorithm. Patients in the Finger Stick Confirmation condition were instructed to use the finger stick information as the primary source of information to make their decision and used their CGM as an adjunct-tool for decision making. In contrast, patients in the No Finger Stick Confirmation condition used a blinded finger stick meter for the majority of their bolus decisions. Hence, these patients relied on their CGM only as an algorithm input source. 1.7.1 Data and Measures Given that the choice to perform a insulin bolus decision is patient-dependent, some patients may tend to bolus less while others bolus more and each patient’s bolus frequency may be inconsistent pre- and post- intervention. This within-patient variation may lead to a poor evaluation of the experimental manipulation’s effect on algorithm use if we conduct the analysis at the bolus level. Therefore, we perform our analysis at the patient-6 hour level, as done inx1.4. This allows for more even representation of observations across patients. For our tests of Hypotheses 4a and 4b, our dependent variable of interest is PctAlgorithmUse it as defined inx1.4, that is number of boluses in periodt that used the algorithm for patienti number of boluses in periodt for patienti . We identify patients who belong to the No Finger Stick Confirmation condition usingNoFingerStick i = 1 and 0 otherwise. To understand how the intervention effects change over time, we letFirstMonth t = 1 if dayt belongs to the first 30 days in the experiment,SecondMonth t = 1 if dayt belongs to the second 30 days in the experiment, and so on up toSixthMonth t = 1 if dayt belongs to the sixth 30 days in the experiment. 1.7.2 Sample Selection We consider how algorithm use varies every 6 hour period from about 14 days before intervention (as the wide majority of patients had at least 2 weeks of pre-intervention data) to 6 months into the experiment (as 26 the majority of patients spend at least 180 days in the experiment). Note that there could be 224 patients (14+180 days) 3 time chunks from 6am to 12am in a day = 130,368 observations in the panel. We have 129,744 observations, because for 8 patients, we had the pre-intervention data from 12 or 13 days instead of 14 days, and for 63 patients, they complete the experiment between 176-179 days since the intervention. Next, we remove 14,481 observations belonging to 25 patients, who did not use the algorithm at all or had no algorithm use data in the pre-intervention period. This is important as we want to observe how patient behavior in the two conditions change after the intervention. We also drop 15,825 observations that belong to 198 patients, as patients did not complete a bolus during these periods. This leaves us with 99,438 observations belonging to 199 patients for this analysis. Of the 199 patients, the average age is 44.11 (standard deviation 13.83), with 76 patients (38.19%) older than 50, 102 patients (51.26%) female, and 133 patients (66.83%) part of the No Finger Stick Confirmation condition. 1.7.3 Model Free Evidence Figure 1.4: Testing Hypotheses 4a and 4b: Model Free Evidence Note. The average daily percentage algorithm use, computed across patients by treatment condition, is reported. Figure 1.4 shows the average algorithm use across patients, plotted across days and by treatment condi- tion; we do this to better present the pattern as the 6-hour period data observes more variation. We observe 27 a lift in algorithm use at the start of the experiment for those in the No Finger Stick Confirmation condition, but this increase appears to decay over time. Next, we investigate at what point(s) in time is the effect of the experimental intervention significant using a DiD model. 1.7.4 Identification To test Hypotheses 4a and 4b, we build the following DiD model: PctAlgorithmUse it =+ 1 NoFingerStick i FirstMonth t + 2 NoFingerStick i SecondMonth t + 3 NoFingerStick i ThirdMonth t + 4 NoFingerStick i FourthMonth t + 5 NoFingerStick i FifthMonth t + 6 NoFingerStick i SixthMonth t + i + t + it (1.4) We control for patient fixed effects i . We also include time controls t , which includeDaysSinceVisit it , Time t ,Afternoon t ,Evening t , andFirstMonth t toSixthMonth t . We cluster our standard errors by patient. This specification allows us to identify when the effect of being in the No Finger Stick Confirmation condition was significant, and whether there may be time-contingencies to this effect. To show support for Hypothesis 4a, we should observe that one or more of 1 through 6 should be negative and significant, whereas to show support for Hypothesis 4b, one or more of 1 through 6 should be positive and significant. 1.7.5 Results Table 1.6 shows the regression results. Columns 1 and 2 show the results for testing Hypotheses 4a and 4b without and with patient fixed effects. We observe that only 1 has a statistically significant coefficient (p < 0:1). Column 2 shows that for the first 30 days of the experiment, there was a 2.797 percentage point increase in algorithm use. The effect of being in the No Finger Stick Confirmation condition was not significant in later parts of the experiment. This finds support for Hypothesis 4b, though the effect is only statistically significant in the short-run. We find similar results when running the specifications in Table 1.6 at the daily level, as shown in Table 1.12. 28 Table 1.6: Testing Hypothesis 4a and 4b: Regression Results (1) (2) V ARIABLES PctAlgorithmUseit PctAlgorithmUseit NoFingerStickiFirstMonthit 2.582+ 2.797+ (1.494) (1.460) NoFingerStickiSecondMonthit 1.756 1.788 (1.916) (1.797) NoFingerStickiThirdMonthit 1.223 1.841 (1.870) (1.700) NoFingerStickiFourthMonthit 0.764 1.405 (2.138) (1.984) NoFingerStickiFifthMonthit -0.966 -0.323 (2.188) (2.097) NoFingerStickiSixthMonthit -1.659 -0.941 (2.342) (2.164) DaysSinceVisitit -0.003 -0.004 (0.007) (0.004) Timet -0.004 -0.002 (0.003) (0.003) Afternoont -3.036*** -2.924*** (0.581) (0.560) Eveningt -3.689*** -3.458*** (0.685) (0.635) Constant 83.953*** 36.334*** (3.187) (1.089) Observations 99,438 99,438 R 2 0.005 0.601 Patient Fixed Effects No Yes Time Controls Yes Yes Mean of Dependent Variable 82.87 82.87 Note. Robust standard errors in parentheses, clustered by patient. Significance reported as +p< 0:1, *p< 0:05, **p< 0:01, ***p< 0:001. 29 1.7.6 Mechanism Analysis To further examine the mechanism behind how the experimental manipulation may have impacted algorithm use, we consider variation in patients’ behavior in entering inputs into the algorithm. Recall fromx1.3 that there are two main inputs from the patient that can be made when interacting with the algorithm at the time of the bolus: 1) glucose level (for the algorithm to potentially correct for being out of range) and 2) carbohydrates (for the algorithm to potentially correct for food intake). If having two glucose level measurements causes cognitive dissonance with that part of the algorithm use, we may expect patients to avoid inputting a glucose level measurement altogether. For this analysis, we use the same sample described inx1.7.2. As this section focuses specifically on how people use algorithms, we additionally drop 10,527 observations corresponding to 151 patients where patients are not using algorithms during the 6 hour period. This leads to 88,911 observations from 196 patients for this analysis. Of the 196 patients, the average age is 44.03 (standard deviation 13.82), with 74 patients (37.76%) older than 50, 102 patients (52.04%) female, and 132 patients (67.35%) part of the No Finger Stick Confirmation condition. We first classify each patient’s algorithm input as whether they inputted their glucose level, carbohydrate, or both. To identify whether a patient inputted their glucose into the algorithm, we define MissingBGInput ij = 0 if the glucose value entered is non-zero and not missing, and set MissingBGInput ij = 1 otherwise; 0’s show up for 20.23% of the boluses and missing for 2.04% of the boluses which used the algorithm. We defineMissingCarbInput ij = 0 if the carbohydrate value entered is non-zero and not missing, and MissingCarbInput ij = 1 otherwise; 0’s show up for 25.46% of the boluses and missing for 4.42% of the boluses which used the algorithm. We then classify a patient’s input activities for each bolus as “Entered Carb Only” ifMissingCarbInput j = 0 andMissingBGInput j = 1, “Entered Carb and BG” if MissingCarbInput j = 0 and MissingBGInput j = 0, or “Entered BG Only” if MissingBGInput j = 0 and MissingCarbInput j = 1. Lastly, our dependent variable of interest isPctMissingBG it , defined as number of boluses in periodt for patienti whereMissingBGInput j =1 number of boluses that used the algorithm in periodt for patienti . In Figures 1.5(a) and 1.5(b), we plot the average daily proportion of algorithm-aided boluses performed across patients which 1) only included a carbohydrate entry, 2) included entries for both carbohydrates and glucose, and 3) included entries for carbohydrate and glucose levels. While Figure 1.5(a) suggests no 30 Figure 1.5: Mechanism Evidence for Hypothesis 4b: Model Free Evidence (a) Finger Stick Confirmation (b) Finger Stick Confirmation Table 1.7: Mechanism Evidence for Hypothesis 4b (1) (2) V ARIABLES PctMissingBGit PctMissingBGit NoFingerStickiFirstMonthit -7.574*** -6.572** (2.089) (2.039) NoFingerStickiSecondMonthit -6.494** -6.487** (2.364) (2.322) NoFingerStickiThirdMonthit -7.104** -7.699*** (2.504) (2.300) NoFingerStickiFourthMonthit -9.085*** -8.583*** (2.718) (2.384) NoFingerStickiFifthMonthit -9.998*** -9.249*** (2.880) (2.507) NoFingerStickiSixthMonthit -9.464*** -8.648*** (2.803) (2.374) DaysSinceVisitit 0.006 0.004 (0.007) (0.003) Timet 0.000 0.002 (0.004) (0.003) Afternoont 4.168*** 3.018*** (0.762) (0.679) Eveningt 3.430*** 2.483** (0.856) (0.760) Constant 27.731*** 26.667*** (3.453) (1.406) Observations 88,911 88,911 R 2 0.032 0.480 Patient Fixed Effects No Yes Time Controls Yes Yes Mean of Dependent Variable 20.69 20.69 Note. Robust standard errors in parentheses, clustered by patient. Significance reported as + p< 0:1, *p< 0:05, **p< 0:01, ***p< 0:001. 31 significant changes were observed over time for those in the Finger Stick Confirmation condition, Figure 1.5(b) suggests that for those in No Finger Stick Confirmation condition, the proportion of boluses which only included entries for carbohydrates decreased significantly after intervention start, while the proportion of boluses which had entries for both carbohydrates and glucose increased significantly. This suggests patients were more willing to input a glucose level into the algorithm when they were exposed to only one glucose level measurement (instead of two). Table 1.7 shows the results on patients’ algorithm input use behavior, where we substitute PctAlgorithmUse it in (1.4) withPctMissingBG it for the model specification. The coefficient values 1 to 6 in both Column 1 and 2 are negative and significant and show that there was an approximately 6-9 percentage point decrease in not inputting glucose entries into the algorithm throughout the experiment when patients received a reduced amount of information. We find similar results running the specifications observed in Table 1.7 at the daily level, as shown in Table 1.13. 1.7.7 Discussion In support of Hypothesis 4b and against Hypothesis 4a, the evidence suggests that that exposing patients to multiple glucose measurements lowers their algorithm use (though the effect appears to weaken over time). Consistent with the hypothesized mechanism of cognitive dissonance, the evidence suggests that when patients are exposed to multiple glucose measurements they allocate their algorithm interactions in a manner that reduces the need for glucose measurement inputs, relying more on algorithm interactions that require no glucose measurement inputs at all. 1.8 General Discussion 1.8.1 Results Summary and Managerial Implications Table 1.8 summarizes the results fromx1.4 throughx1.7. These findings provide insight into when we should anticipate higher or lower algorithm use and suggest potential design features or interventions that one can implement to increase algorithm use. We not only reject the dynamic algorithm aversion hypothesis but find evidence in the opposite direction, or “dynamic algorithm appreciation.” This finding naturally implies that strategies to increase algorithm use by attempting to mitigate dynamic algorithm aversion are unlikely to be successful and may even backfire. 32 Table 1.8: Results Summary If the patient. . . Impact on Algorithm Use Decrease No Effect Increase . . . went out-of-target with human judgment last period X . . . went out-of-target using the algorithm last period X . . . used the algorithm for the last period X . . . is making a decision later on in the day X . . . is facing a harder task requiring greater precision X . . . deviated from algorithm recommendation previously X . . . is exposed to multiple measurements required for one algorithm input X For example, to mitigate dynamic algorithm aversion, one might obscure performance feedback or limit the number of rounds of feedback, even if the algorithm performs better than human judgment. In contrast, our results suggest that highlighting performance feedback when algorithms are superior to human judgment would increase algorithm use. Our results on habitual use and time-of-day patterns also suggest that interventions may wish to take aim at different mechanisms unrelated to performance altogether. Namely, interventions designed to establish algorithm use habits early on, or nudges to “unstick” patients who are not currently using the algorithm may be fruitful. The time-of-day effects suggest that one may be able to increase algorithm use by designing them to be used when people are less fatigued and are able to follow more regular routines. We find that people tend to use the algorithm more for difficult tasks requiring higher precision. Thus, all else equal, managers who wish to induce higher algorithm use should set incentives that require high precision for their employees. Similarly, even if direct incentives for precision are not practical, this result implies that decision frames which emphasize a steep increase of negative consequences in the magnitude of the errors (when applicable) may help increase algorithm use. Our results also suggest that observing a person deviating from an algorithm’s recommendation (espe- cially large deviations) can signal that the person is less likely to use the algorithm in the future. Thus, one may wish to time interventions accordingly. Finally, our results imply designers can increase algorithm use by addressing human-algorithm inter- actions that may cause cognitive dissonance for the human. For example, if users face multiple estimates but the algorithm requires only a single input, to prevent potential cognitive dissonance, algorithm designers may wish to allow for multiple inputs in the user-input process and let the algorithm perform the aggregation automatically instead. 33 1.8.2 Interpreting our Rejection of the Dynamic Algorithm Aversion Hypothesis Given the influence of Dietvorst et al. (2015) and our finding in direct opposition to the dynamic algorithm aversion hypothesis, we now highlight a few points about differences between the two studies that may contribute to explaining the opposing findings. Before we do so, we remind readers that what we refer to (and reject) as “dynamic algorithm aversion” is not the same as a general failure to use a superior algorithm 100% of the time, nor is it the same as a general preference to prefer algorithms over humans, controlling for performance. Rather, it refers to an asymmetric usage response to performance feedback that favors the algorithm over the human. As noted by Logg et al. (2019), Dietvorst et al. (2015)’s results have often been mis-cited, and we believe that imprecise or ambiguous definitions of algorithm aversion is partly to blame. We also note that Dietvorst et al. (2015) contains more than one result – and, in fact, our results supporting Hypothesis 2 is consistent with one of Dietvorst et al. (2015)’s other findings. The high-level difference of our study being “in the field” versus Dietvorst et al. (2015)’s study being “in the lab” includes, within it, several differences that may explain why we obtain opposite results. Most evident are the differences related to the task context and the population. Our field setting occurs over a long time horizon (several months) with relatively high stakes (potentially severe health-related consequences) and with relatively experienced users (Type 1 Diabetes requires lifelong management) who receive expert advice and coaching from their health care team. Lab experiments on algorithm aversion generally occur over a short time horizon (several minutes) with relatively low stakes (a few dollars) and with relatively inexperienced users (e.g., generally no direct prior experience with the task) who receive no coaching from experts. In our field setting, the algorithm requires input(s) and is calibrated for each patient. In lab studies, algorithms typically require no effort or inputs to observe the algorithm’s recommendation, nor are they calibrated for each subject. Perhaps less evident but potentially critical are differences in the structure of the “algorithm use” deci- sion itself. In Dietvorst et al. (2015), the decision-maker chooses only once whether to use the algorithm and must commit to following it exactly for several decisions without deviation. In our empirical setting, the patient repeatedly chooses whether or not to use the algorithm for each decision and is free to deviate even if they choose to use the algorithm. Because lab evidence suggests that people will use algorithms 34 more if they can deviate (Dietvorst et al. 2018), it is possible that people also respond differently to perfor- mance feedback with more freedom to choose whether to use the algorithm for each decision and freedom to deviate from the algorithms’ recommendation. 1.8.3 Limitations and Future Work Our study does not provide a direct explanation why we find an opposite result to dynamic algorithm aver- sion. As discussed above, there are several differences from the field and laboratory settings. Consistent with other recent calls for future work on human-algorithm interactions (e.g., see Chugunova and Sele 2020), we believe a constructive future direction would be seeking to develop explanations for seemingly contradictory findings regarding algorithm use. More generally, our field setting with Type 1 Diabetes patients managing their insulin boluses – while possessing many desirable characteristics for algorithm use study – does not avoid standard generalization concerns to other algorithm use contexts. We need more research that helps to address the issue of when and why we should expect algorithm use patterns to hold as we move from one setting to another. For example, in a demand forecasting setting with a data-driven algorithm decision support system, should we also expect higher algorithm use when the task requires higher precision (as we found inx1.5)? Our collection of findings regarding algorithm use also generates questions into the relative magnitude and importance of algorithm use determinants – which would be helpful to identify to design the most impactful interventions. Factors such as being exposed to multiple measurements (x1.7) as well as habitual momentum and time-of-day patterns (x1.4) appear to have a relatively large impact on algorithm use, yet less attention has been given to these types of issues in the literature on human-algorithm interactions. Future studies on algorithm use behaviors can shed light on such issues and uncover important new yet understudied factors. 35 1.9 Additional Tables Table 1.9: Testing Hypothesis 1: Regression Results (Daily Level) PctAlgorithmUseit HighAlgorithmUseit1 6.898*** (1.233) OutOfTargetit1 1.226+ (0.720) HighAlgorithmUseit1OutOfTargetit1 -1.193 (0.798) DaysSinceVisitit -0.019+ (0.011) Timet -0.018** (0.006) Constant 31.525*** (0.580) Observations 32,423 R 2 0.798 Patient Fixed Effects Yes Time Controls Yes Mean of Dependent Variable 81.67 Note. From an initial sample of 41,990 patient-days from 224 patients, we drop 2,343 observations from 7 patients who did not complete the entire experiment, 2,389 observations belonging to 96 patients where we do not observe any boluses occurring, 213 observations from 80 patients where the patient did not bolus the previous day, 1,406 observations where performance data from the previous day is missing, and 3,216 observations from 25 patients who do not have pre-intervention algorithm use data.PctAlgorithmUse it now measures the proportion of boluses in dayt which use the algorithm, whileHighAlgorithmUse it1 is now based off of the patients’ pre-intervention daily algorithm use median value. OutOfTarget it1 observes whether the patient achieved less than 70% time in range during dayt 1, while Timet increments by 1 for each day in the experiment. Robust standard errors in parentheses, clustered by patient. Significance reported as +p < 0:1, *p < 0:05, **p< 0:01, ***p< 0:001. 36 Table 1.10: Testing Hypothesis 2: Regression Results (Logit Model) V ARIABLES AlgorithmUseij AvgCGMReadingAtBolusij 0.995680*** (0.000816) Observations 185,998 Patient Fixed Effects Yes Time Controls Yes Mean of Dependent Variable 0.798 Note. Coefficients reported as odds ratios. Robust standard errors in parentheses, clustered by patient. Relative to Table 1.4, 11 patients are dropped during the estimation process, because they either always used the algorithm or never used the algorithm. Significance reported as +p < 0:1, *p < 0:05, **p< 0:01, ***p< 0:001. 37 Table 1.11: Testing Hypotheses 3a and 3b: Regression Results (Logit Model) (1) (2) (3) (4) V ARIABLES AlgorithmUseij AlgorithmUseij AlgorithmUseij AlgorithmUseij Deviationij1 0.769*** (0.051) DeviateBelowij1 0.574*** (0.078) DeviateAboveij1 0.905+ (0.047) LargeDeviationBelowij1 0.399*** (0.078) SmallDeviationBelowij1 0.896 (0.081) SmallDeviationAboveij1 0.949 (0.050) LargeDeviationAboveij1 0.881+ (0.060) LargePctDeviationBelowij1 0.464*** (0.101) SmallPctDeviationBelowij1 0.756*** (0.063) SmallPctDeviationAboveij1 0.872** (0.039) LargePctDeviationAboveij1 0.966 (0.055) Constant 0.593*** 0.601*** 0.606*** 0.628*** (0.060) (0.061) (0.062) (0.064) Observations 162,390 162,390 162,390 157,982 Patient Fixed Effects Yes Yes Yes Yes Time Controls Yes Yes Yes Yes Mean of Dependent Variable 0.893 0.893 0.893 0.894 Note. Coefficients reported as odds ratios. Robust standard errors in parentheses, clustered by patient. Relative to Table 1.5, 8 patients and their corresponding observations are dropped in columns (1)-(3) and 9 patients are dropped in column (4) during the estimation process, as they either always used the algorithm or never used the algorithm. Significance reported as +p < 0:1, *p < 0:05, ** p< 0:01, ***p< 0:001. 38 Table 1.12: Testing Hypothesis 4a and 4b: Regression Results (Daily Level) (1) (2) V ARIABLES PctAlgorithmUseit PctAlgorithmUseit NoFingerStickiFirstMonthit 2.787+ 2.968* (1.477) (1.434) NoFingerStickiSecondMonthit 2.396 2.253 (1.921) (1.760) NoFingerStickiThirdMonthit 1.961 2.097 (1.914) (1.725) NoFingerStickiFourthMonthit 1.211 1.671 (2.038) (1.858) NoFingerStickiFifthMonthit -0.452 0.173 (2.132) (2.008) NoFingerStickiSixthMonthit -1.568 -0.769 (2.382) (2.196) DaysSinceVisitit -0.008 -0.016 (0.030) (0.013) Timet -0.021 -0.010 (0.013) (0.010) Constant 80.459*** 31.823*** (3.301) (1.103) Observations 36,636 36,636 R 2 0.003 0.786 Patient Fixed Effects No Yes Time Controls Yes Yes Mean of Dependent Variable 81.43 81.43 Note.The sample is built from 43,248 patient-days from 224 patients (as opposed to 224 patients (14+180 days) = 43,456 patient-days, as 8 patients enrolled less than 2 weeks into the experiment and 63 patients who completed the experiment before 180 days into the experiment). We drop 3,225 patient-days that belong to 96 patients, as during such days patients did not have any algorithm use (i.e. bolus) data available, and 3,387 patient-days belonging to 25 patients who did not have any pre-intervention algorithm use prior to the experiment start.PctAlgorithmUse it measures the pro- portion of boluses in dayt which use the algorithm, whileTimet increments by 1 for each day in the experiment. Robust standard errors in parentheses, clustered by patient. Significance reported as + p< 0:1, *p< 0:05, **p< 0:01, ***p< 0:001. 39 Table 1.13: Mechanism Evidence for Hypothesis 4a (Daily Level) (1) (2) V ARIABLES PctMissingBGit PctMissingBGit NoFingerStickiFirstMonthit -6.102** -6.085** (1.906) (1.888) NoFingerStickiSecondMonthit -5.473* -6.041** (2.190) (2.142) NoFingerStickiThirdMonthit -6.019** -7.096*** (2.254) (2.116) NoFingerStickiFourthMonthit -8.332*** -8.319*** (2.415) (2.168) NoFingerStickiFifthMonthit -8.807*** -8.446*** (2.598) (2.329) NoFingerStickiSixthMonthit -8.751*** -8.476*** (2.552) (2.224) DaysSinceVisitit 0.011 0.009 (0.023) (0.011) Timet -0.001 0.007 (0.015) (0.012) Constant 29.186*** 24.323*** (3.449) (1.260) Observations 34,391 34,391 R 2 0.044 0.690 Patient Fixed Effects No Yes Time Controls Yes Yes Mean of Dependent Variable 19.65 19.65 Note. Proceeding from Table 1.12, this sample is reduced by 2,245 observations from 46 patients who do not use algorithms at all during the day.PctMissingBG it measures the proportion of boluses in dayt which used the algorithm but are missing a glucose input, andTimet increments by 1 for each day in the experiment. Robust standard errors in parentheses, clustered by patient. Significance reported as +p< 0:1, *p< 0:05, **p< 0:01, ***p< 0:001. 40 Chapter 2 Worker Experience and Donor Heterogeneity: The Impact of Charitable Workers on Donors’ Blood Donation Decisions JointworkwithSusanLuandTianshuSun 2.1 Introduction Charitable organizations face a challenge of retaining a steady supply of donations to deliver upon their mission. To secure a reliable donation flow to maintain its operations, charitable organizations spend sig- nificant efforts toward donor recruitment and retention (B´ enabou and Tirole 2006, Gneezy and Rustichini 2000, Lacetera et al. 2014, Masser et al. 2008, Reich et al. 2006, Ryzhov et al. 2016). For example, in the COVID-19 pandemic, blood reserves have been critically low in multiple countries, and significant benefits, such as gift cards and coronavirus anti-body testing, have been offered to motivate donor donations (Marcus 2020). However, such costs related to fundraising are non-trivial, with “a typical charity spending from 5 to 25 percent of its donation on further fund-raising activities” (Andreoni and Payne 2011). In this study, we approach this challenge by shifting focus from the donors to the workers in chari- table organizations. We ask whether and how a charitable organization’s front-line staff members can be effectively positioned to encourage donors to donate more (in compliance with the eligibility rules) during their in-person interactions. Specifically, we consider how charitable organizations can use micro-level data on worker-donor interactions to increase donation amounts by maximizing the influence of a staff member on donors’ decisions, informed by understanding of workers’ experiences and donors’ characteristics. We find evidence that staffing decisions, when made with such considerations, can lead to improvements in productivity, and propose solutions to improve productivity without increasing operational costs. 41 Prior studies in operations management have examined how staff members can impact outcomes like efficiency (e.g., processing time) and quality (e.g., mortality rate), but to the best of our knowledge, no stud- ies have considered the impact of such factors in a charitable giving context, on organizational outcomes like amount of donations collected. Charitable giving is fundamentally different in two ways. First, be- yond fulfilling operational needs, staff members can influence donation decisions when directly interacting with donors. Thus, the relevant experience of a staff member is crucial in determining the organizational outcomes. Second and importantly, unlike most operational contexts, an individual’s donation decision is partially driven by their charitable motive. Thus, it is important to understand when the specific experience of a staff member may have greater influence on a donor’s choices. We study the importance of staff experience and its interplay with donor characteristics on charitable productivity in the context of blood donation in China, where eligible donors donating whole blood can choose how much blood they would like to donate in a session—a choice of 200, 300, or 400 millimeters (Such a decision when making voluntary whole blood donations is also prevalent in other countries, e.g., Taiwan has options of 250 or 500 ml; Japan 200 ml or 400 ml; South Korea 320 or 400 ml. Donors may also choose between different donation options, e.g., one unit of whole blood vs two units of red blood cells in the United States.). Nurses—as the front-line workers in our context—assist donors to decide the donation decision amount and perform the entire service experience for the donor. Successfully encouraging donors to choose the higher amounts has immediate implications for blood supply and the healthcare system, and would increase blood supply significantly (Shi et al. 2014). Using unique, comprehensive data about hundred thousands of nurse-donor interactions, we find strong causal evidence that a workers’ relevant experience enhances their charitable productivity. Understanding how staffing influences productivity is important, but surprisingly, no empirical work so far has systematically studied whether and how charitable organizations could leverage staffing to achieve better outcomes. The lack of research can be attributed to two fundamental challenges in measuring and identifying the effect of worker-customer interactions on outcomes, namely the attribution challenge and the causality challenge. First, in a range of operational contexts, customer outcomes cannot be clearly attributed to her matching with a single staff member. For instance, some blood banks have several nurses handling a single donor’s donation experience. Moreover, when working with charitable organizations, personnel information and 42 outcomes tied to a staff member may be considered as confidential, preventing researchers to study the role of personnel on outcomes in such context. Second, in many important contexts, the assignment of staff to a customer is endogenous. For instance, firms often dispatch their best salesperson for highly valuable customers. Hospitals often match their best physicians to patients under risk. When the matching is endogenous and involving unobserved factors, it is hard to draw conclusions about the causal effect of staffing. Our research context overcomes these issues with the exogenous inflow of donors to blood mobiles and exogenous matching between donors and nurses. In this study, we hypothesize that relevant experience can help nurses improve charitable productivity as such experience can help nurses inform donors on their appropriate donation volume and potential benefits and side effects associated with each option. In contrast, experience that does not have much opportunity for such discussions may be less helpful, as nurses instead primarily gain experience on blood collection. We refer to an nurses’ experience gained with working with voluntary donors (i.e., relevant experience) as voluntary donation experience and experience gained working with group donation donors (who usually donate a fixed amount) (i.e., complementary experience) asgroupdonationexperience. We leverage micro-level data involving 766,104 donation sessions at the nurse-donor level to empirically test the above hypotheses. First, we find that versus a new nurse, a nurse with average accumulatedvoluntary donation experience can obtain on average a 2.27% marginal increase in donation volume, while group donation experience has a limited positive effect. Moreover, we find that the effect of these dimensions varies by donors’ self-efficacy towards blood donation; those who exhibit less self-efficacy are encouraged to donate more when paired with a nurse with higher voluntary donation experience. Taking these insights into account in improved matching schemes, which do not incur additional operational costs, suggest that our blood bank can incur economically significant benefits. Our work has implications for blood banks, and more generally charitable and non-profit organizations. It illustrates that charities can improve organizational outcomes by boosting relevant experience in their interactions with donors. Such observation extends the literature, which has primarily focused on under- standing the benefits of worker experience on immediate operational outcomes and on non-interpersonal tasks. Our focus on how staff experience can influence outcomes introduces an operational lens to the char- itable organization literature, which has otherwise largely focused on the donor-side and emphasizes the importance of worker-side interventions in such contexts. Our study also contributes to the nascent field of non-profit and charitable operations, highlighting two important but underexplored considerations when 43 designing people operations: (1) staff members’ role of encouraging donors to contribute for social good (beyond having staff members fulfill appropriate staffing levels) and (2) donors’ complex motivations and potential non-compliance with donating amounts that best fits an organization’s goals. With continued ongo- ing efforts seeking to improve charitable organizations’ productivity, our research suggests a cost-effective direction: improved matching of nurses and donors via understanding of factors that are related to nurse experience and donor’s charitable motives. 2.2 Related Literature Our research builds on two streams of operations management literature: (1) charitable giving and non-profit operations and (2) staffing management. 2.2.1 Charitable Giving and Non-Profit Operations Extensive research has been done on understanding donor motives and what incentives should charitable organizations offer to encourage donors to donate. Such literature has explored factors underlying donation behavior, such as social ties (List and Price 2009, Meer 2011), peer pressure and conditional cooperation (Frey and Meier 2004), and incentives (B´ enabou and Tirole 2006, Gneezy and Rustichini 2000). Such literature also considers mechanisms to encourage donors to participate in blood donation: Reich et al. (2006) study a range of recruitment methods (T-shirts, phone messages and email recruitment), Lacetera et al. (2012) and Lacetera et al. (2014) monetary incentives, and Ryzhov et al. (2016) direct-mail marketing design. Mellstr¨ om and Johannesson (2008) find that donor compensation could lead to crowding out effects with women, while donating such compensation to charities can counteract the crowding out effect. Sun et al. (2016) and Sun et al. (2019) evaluate mobile messages, family replacement programs and motivating offline group formation in increasing donors’ blood donation activities. In contrast to prior studies focusing on donors, or how to optimize processes before or after donations take place (e.g., Aprahamian et al. 2019, Ayer et al. 2019, Cohen 1976, Cohen and Pierskalla 1979), we focus on the impact of charitable workers on donations. We examine the question of how charitable organizations can better position its personnel so that they can encourage donors to donate more within each interaction, hence improving outcomes. That is, conditional on certain donor profiles, how can the blood bank improve outcomes by matching nurses to donors via understanding of nurse experience, instead of a quasi-random 44 match? To our knowledge, there is scant literature on the worker dimension, with mixed evidence showing that giving charitable workers’ some incentives can increase or backfire on donations (Gneezy and List 2006, Gneezy and Rustichini 2000). We are one of the first to provide insights on the impact of charitable workers on donation outcomes. More broadly, our work answers the call for focus on improving non-profit operations Berenguer and Shen (2019) with the lens of improving the efficiency of nonprofit organizations via its operations. We con- tribute on insights regarding how relevant experience can drive productivity in the charitable giving setting, how to manage experience accumulation with charitable workers, and how donor self-efficacy interplays with the effectiveness of worker experience. Our work contributes to the increasing interest of non-profit operations and evaluation of corporate social responsibility in operations (e.g., Kraft et al. 2018, Singh et al. 2019). 2.2.2 Staffing Management Existing literature in operations management regarding staffing primarily focuses on contexts such as for- profit organizations (e.g., manufacturing) and some non-profit contexts (e.g., healthcare). In such do- mains, operations management scholars have studied the impact of factors such as staffing levels (Bard and Purnomo 2005, Green et al. 2013, Konetzka et al. 2008, Mani et al. 2015, Needleman et al. 2002, Yankovic and Green 2011), workload (Berry Jaeker and Tucker 2017, Cui et al. 2020, Freeman et al. 2017, KC and Terwiesch 2009, Kuntz et al. 2015, Powell et al. 2012), schedules (Emadi and Kesavan 2019, Ibanez and Toffel 2020, Kamalahmadi et al. 2021), and time of day (Deo and Jain 2019) on operational outcomes. Several studies consider how various characteristics of workers impact outcomes: for example, tem- porary workers (Kesavan et al. 2014), peers (Tan and Netessine 2019), educational background (Sharma et al. 2020), and experience (Argote and Epple 1990, Ball et al. 2017, Han et al. 2019, KC and Staats 2012, Madiedo et al. 2020, Maranto and Rodgers 1984, Staats et al. 2018). In contrast, a lack of data access had led to little study of such factors in charitable organizations, even though charitable giving is econom- ically significant in many countries, with “charitable gifts exceeding 2 percent of gross domestic product” (List 2011). Our study overcomes the data limitations and adds new insights on two dimensions: (1) staff members’ role of encouraging donors to contribute for social good and (2) the role of donor attributes and resultant saliency of staff experience. We identify nuanced effects via donor characteristics, which provides 45 more insight to the heterogeneous nature of staffing. These effects also contribute to the individual differ- ences literature regarding different responses by individual (e.g., Burton-Jones and Hubona 2005, Croson and Gneezy 2009). Empirical and analytical studies have also further explored the role of staffing via worker characteristics informed data driven matching. Wang et al. (2019, 2021) identify how to better match patients and health- care providers using patient-centric information to improve outcomes while Adhvaryu et al. (2020) study the negative assortative matching between managers and workers in garment manufacturing. Guajardo and Cohen (2018) emphasize the need for service differentiation via operating segments due to customer heterogeneity. Analytical studies in this area utilize information about such characteristics to make better operational decisions, e.g., multi-skilled abilities of workers (Garnett and Mandelbaum 2000, Inman et al. 2005) and considering both workers and customers (Armony et al. 2021, Gurvich and Whitt 2010, Wallace and Whitt 2005, Ward and Armony 2013). Our study extends this research stream on using information about the characteristics of personnel and donors to improve outcomes in charitable giving. More importantly, our context explores a task where significant interpersonal discussions/collaborations between a donor (customer) and a worker are needed to fulfill the task at hand. This is in contrast to existing experience literature (e.g., Boh et al. 2007, Narayanan et al. 2009, Staats and Gino 2012 where the task being studied do not involve significant, if any interpersonal discussions or collaborations between the employee and the customer in order to fulfill the task at hand. Our work explores the matching between donors and workers, observing that the importance of relevant experience on donation outcomes can be moderated by donor perceived self-efficacy. 2.3 Empirical Setting: Operations in the Blood Bank We collaborate with a major Chinese blood bank in a provincial capital city with a population of more than 8 million for our empirical analysis. The blood center is responsible for supplying blood to hospitals in the city and is encouraged to equalize demand and supply of whole blood on its own. The blood bank accepts both whole blood donation and platelet donation. Platelet donations have a much more stable donor pool with a repeat donor base. In contrast, whole blood donations are challenging for the bank to recruit new donors and to motivate existing donors to donate repeatedly. This could be explained due to two reasons: the first being a conservative constraint of having 6 months between subsequent blood donations (in contrast, in the 46 United States for example, one can donate every 56 days), and second being cultural factors: “traditional Chinese culture holds that the loss of even a small amount of blood has a substantial detrimental effect on health. Some people also believe that donating blood is a disloyal act against one’s ancestors” (Shan et al. 2002). Yin et al. (2015) notes “blood is usually compared to lifeline and carries great weight in the Chinese culture.” Studies have shown that almost 70% of donations in China are made by first time donors (Guo et al. 2011, Wang et al. 2010). Prior to 1998, a significant portion of blood donations came from paid donors (Shan et al. 2002); this changed after 1998, where the Law of the People’s Republic of China on Blood Donation enforced voluntary blood donations. The law defined the opportunities to donate different amounts of whole blood (Article 9), that blood collection should be done in compliance of operational procedures and regulations (Article 10), and that violating such operational procedures could lead to strong consequences (Article 19). To resolve the challenge of having enough blood supply, Chinese blood banks began allowing donors to decide how much blood they would like to donate in a session. Instead of only having the option to donate 200 ml, eligible donors can now decide whether they want to donate 200, 300 or 400 ml (Shan et al. 2002), all of which would be safe to the donor per World Health Organization guidelines (Wor 2012) (donors should donate volumes that are less than 13% of total blood volume, i.e., a donor should weigh 45 kg to donate 350 ml (+/- 10%) or 50 kg to donate 450 ml (+/- 10%)). Nurses help the donor decide the appropriate donation volume. This decision has not only operational impacts in providing healthcare effectively to patients in need, but also financial impact with the market for blood (Slonim et al. 2014). Consequently, any opportunity to increase donation volumes to 400ml could have a significant benefit to blood supply (Shi et al. 2014). To further resolve demand issues with voluntary blood donation, Chinese blood banks also collect whole-blood donation under another format: group donation, which occurs when employer/institution (e.g., universities)-organized blood drives solicit their members (e.g., workers, students) to donate blood (Sun et al. 2016). Different from the protocols in other countries where an organization asks their workers to donate at designated blood donation locations, the blood banks in China send the mobile booths to the site where an organization is located. Nurses would be focused solely on “group” donors. In other words, the two types of donors are not mixed in a waiting line for a specific mobile booth. Nurses, within the day, typically only focus on one type of donor/location. This group format may involve donors who are not truly volunteers, as they may specifically motivated by a third-party member that could offer other benefits/incentives (Shan et al. 2002). Such practice is indeed 47 the case in other countries, such as Italy (Lacetera and Macis 2013). Moreover, the likelihood of moderating discussions on donation volume choice is lowest in this setting for two reasons. First, throughput of donors is high in this setting, and hence there is little or no opportunity to discuss the donation volume options with a donor. Second, the donation volume outcomes in this setting are not tied to nurse performance. In contrast, voluntary donors simply walk in to donate and do not make appointments to donate blood. Once on the blood mobile, nurses handle the entire donation experience for the donor. Hence, both the decisions on whether the donor is eligible to donate and how much the donor donates can be attributed to the nurse. Table 2.5 demonstrates the differences between the donor profiles between voluntary and group donations. We see that the voluntary donation setting represents more of the city population, whereas the group donation setting is skewed towards the younger population. More details about staffing and donor interactions, which aid our identification, are provided inx2.5.3. V oluntary donors do not have much additional incentives beyond minor incentives such as gift cards/snacks after blood donation, and potential blood donor reimbursement, as stipulated by Article 14 of the Blood Do- nation Law. That is, donors who donate large volumes of blood over time would not need to pay for their blood transfusions if such transfusions are needed. Family members of the donor also could benefit (spouse, parents, children) as they would be able to get blood transfusions, up to the amount of cumulative blood donated by the donor, for free. 1 However, group donors have more significant extrinsic incentives (e.g., time off work, higher scholarship likelihood) to donate blood. Studies on Chinese blood donors suggest that donors donate primarily to help others in need (Ou-Yang et al. 2017) and not primarily due to such incentives (Yu et al. 2013). On the nurse side, the blood bank introduced nurse-specific incentive payment schemes on March 26, 2013. Prior to this policy, nurse pay was either the same for all (before 2011/3/26) or the same within a team (2011/3/26-2013/3/25), which assigned bonuses to nurses based on their within district performance. The nurse-specific policy provides bonuses tied to the amount of voluntary and family replacement donations that successfully passed blood quality tests, relative to a baseline monthly quota (specific to district but not specific to individual). Such scheme had a nonlinear payment structure and did not penalize nurses who did 1 In practice, there is a significant gap between what the blood bank promises and what hospitals could deliver. When a donor’s family member needs blood, the promise is often not fulfilled, especially when the blood is in shortage in China. Some patients had relied on the family replacement program to get blood transfusion (discontinued in 2018) and others had to postpone undergoing their surgeries for a while (Sun et al. 2016). Hence, while the reimbursement method may encourage some donors to donate (while not all donors object to it), it is not the primary rationale for donation for most, and moreover would not relate to increased donation volumes in a session. 48 not meet baseline needs. We discuss the effect of incentive payment schemes inx2.7.4. All nurses were equally impacted by such incentives; moreover, all nurses are subject to first year evaluations. 2.4 Hypotheses Development In this section, we theorize how organizations can leverage information about workers and donors to improve donation outcomes during worker-donor interactions. Firstly, a long line of literature has documented the volume-outcome relationship. Exploring experience at both individual and organizational level, the learning curve effect has been documented in contexts like manufacturing and services (Argote and Epple 1990, Pisano et al. 2001, Valentine et al. 2019). The ex- perience outcome relationship has been further unpacked by looking at dimensions of experience such as specialization and variety (Narayanan et al. 2009, Staats and Gino 2012) and focal and related experience (KC and Staats 2012), suggesting that experience is helpful, but some types of experience may be more helpful than others. Nurses in our blood bank can work in multiple settings (e.g., voluntary donation and group donation). Consequently, all settings allow nurses to gain experience with blood collection. However, the depth of the interaction between nurse and donor varies across settings, leading to different skills learned and practiced by the nurse. In the voluntary setting, the notion of donation volume choice is most pronounced, with nurses playing an active role in informing the donor and recommending appropriate donation volume choices. Donor knowledge about blood donation policies and practices is generally low (Zaller et al. 2005), and nurses can address this knowledge gap in their interactions. Most donors in the voluntary setting, especially new donors, which comprise most of the donor population, may be unaware of the 2 nd stage decision when donating whole blood: how much to donate. If a donor is eligible to donate blood, then they can safely choose any of the options (200, 300, or 400 ml). The nurse can help translate the donors’ intention to donate into the appropriate enacted level of behavior (donation volume), by improving and/or confirming a donor’s sense of perceived behavioral control or self-efficacy: the belief on one’s capabilities on exercising control on a particular event or behavior (Bandura, Fishbein and Ajzen 2011). Such increases in self-efficacy can be achieved with the effective delivery, that is, verbal persuasion, of codified knowledge (known facts about blood donation and its effects) and uncodified and also tacit knowledge (i.e., how to personalize the message for the donor and understanding in-the-moment behavior) (Bandura, Rodriguez Perez and Ord´ o˜ nez de Pablos 49 2003). Such information helps the donor understand the potential trade-offs with the donation volume decision: whether to (a) be fully altruistic to helping to provide blood for others (normative belief: belief that others support/oppose blood donation or the act of blood donation) and (b) consider their ability to recuperate from blood loss post-donation (behavioral and control beliefs: whether performing blood donation would lead to good or bad outcomes, and whether one could actually perform the blood donation), even if they are deemed capable of donating the maximum amount(Fishbein and Ajzen 2011). Some examples of providing such information include “您的身体情况可以每六个月回来献血” and “您的身体条件包括体重和脉 搏可以献400毫升”, which translate as follows: “Your physical condition suggests that you can come back every 6 months to donate” and “Your physical condition, including your weight and pulse, suggests you can donate 400ml.” In other words, accumulating experience in this modality helps strengthen nurses’ ability to share cod- ified knowledge, but most importantly, bolster their tacit knowledge. Experience is ultimately something that the central blood bank can influence to improve operations via nurse-location assignment as part of staffing decisions, as different locations largely belong to one specific donation setting (i.e., one location may only accept voluntary donations while another is only active for group donations). This leads to our first hypothesis: Hypothesis 1 (V oluntary Donation Experience). Anurse’sexperiencewithvoluntarydonationsispositively associatedwithahigherdonationvolumechoice. In contrast, the other settings involve nurse-donor interactions where the donation volume may be al- ready pre-determined. For example, in group donation, the median donation volume is the baseline 200 ml, likely due to the higher throughput of donors in such settings and extrinsic motivation to donate provided by the host organization. Hence, there is limited or no opportunity to impact donors’ donation volume choice outside of the voluntary setting. Experience gained in the group donation setting therefore serves to primarily reinforce codified knowl- edge (blood collection and its impact), but not build upon tacit knowledge on helping donors enact the appropriate level of behavior (donation outcome). One could argue that such experience is a related task and therefore may be helpful in improving donation volume outcomes (KC and Staats 2012). More specifically, as the skill set that a nurse needs for group donors is a proper subset of the skill set needed for voluntary donors—that is, knowing how to perform blood collection is sufficient for a nurse in dealing with group 50 donors—this suggests that experience gained in the group setting, which we will refer to asgroupdonation experience—will not necessarily increase donation volume outcomes as much as would voluntary donation experience. This leads to the following hypothesis. Hypothesis 2 (Group Donation Experience). Anurse’sexperiencewithgroupdonationshaslessimpacton donationvolumechoiceincomparisontovoluntarydonationexperience. We next examine the heterogeneous effects of voluntary donation experience on different types of donors. Although all donors, by virtue of coming to the blood bank, have signaled that their beliefs as- sociated with blood donation have guided them to intend to perform the act of blood donation, donors may have different notions of perceived behavioral control or self-efficacy (Bandura, Fishbein and Ajzen 2011)—a significant predictor for blood donation intentions (Masser et al. 2008). Moreover, donors may derive warm-glow from donation, with the majority of blood donors being altruistic (Ou-Yang et al. 2017), and consequently higher amounts of donation may actually bring more utility (Andreoni 1989, 1990). Vol- untary donation experience gained with seeing how donors make these decisions can allow the nurse to mediate a donor’s perceived behavioral control and subsequent donation choice, by informing them how to best heighten donor’s sense of control and activate their warm glow. For donors who have high perceived and/or actual behavioral control, the effect of voluntary donation experience may be less salient. On the other hand, for donors with reduced perceived and/or actual behavioral control, the nurse could have more opportunity to heighten their sense of control. We consider three dimensions that are associated with reduced control: donor status, gender, and weight. New donors are likely new to blood donation in general. Hence, more time may be needed for such donor to process the donation experience and decisions that need to be made. In contrast, Veldhuizen et al. (2011) finds that self-efficacy (a closely related sub-component of perceived behavioral control) increases with donors that have additional donation history, suggesting that “the more donation experience a donor has, the more motivated he or she becomes to donate again. The donation experiences feed back into donation intentions, making the motivation to donate more salient” (p. 2432). They also find that “at different stages of the donor career path self-efficacy emerges as the key predictor in relation to donation motivation” (p. 2433). In other words, existing donors have the information of their previous donation experience—within session and post-session recovery—to inform their future decisions. Hence, we conjecture those new donors 51 are more likely to be influenced by nurses when making the volume decision compared to existing donors. This leads to the following sub-hypothesis: Hypothesis 3a (New Donors). The effect of nurses’ voluntary donation experience on donations is more salientfornewdonors. Second, we consider gender, as France et al. (2008) finds that self-efficacy matters more as a predictor for blood donation for women versus men in a study of experienced donors. Moreover, Bani and Giussani (2010)’s review of the role of gender in blood donation highlights that female donors show greater altruism compared to males, and hence may respond more to efforts that activate their warm glow. This is despite adverse reactions may be more prevalent in women and that women are influenced negatively by such reactions. Ou-Yang et al. (2017) finds in their survey of individuals living in Guangzhou, China that female donors find responses to questions and enquiries on their physical conditions to be a more important service component than males (70.9 vs 60.6%, p=0.02). These observations on female donors suggest that self- efficacy over the donation action is a larger concern for females and suggest that the role of the nurse may be more salient. Hence, we hypothesize: Hypothesis 3b (Gender). The effect of nurses’ voluntary donation experience on donations is more salient forfemaledonors. Third, we consider weight, which is a factor in donation eligibility as weight is correlated with the amount of blood that one has in the body. Although all three choices of donation volumes are safe if one meets eligibility criteria, individuals of lower weight may just meet the eligibility criteria and hence feel less comfortable about donating higher donation volumes. Indeed, Hu et al. (2019) observes that weight increases in the blood donor population parallels with increased donation volume in Zhejiang province and emphasizes that such donors may have lessened impact of blood donation on the body as they would have higher blood volume. Nurse experience can play a role in helping lower weight donors increase their self- efficacy with donating the higher options. Hence, we also hypothesize: Hypothesis 3c (Weight). The effect of nurses’ voluntary donation experience on donations is more salient forfemaledonors. 52 2.5 Data and Empirical Strategy 2.5.1 Data Description Our dataset includes whole blood donation records from 2005/1/1 to 2017/07/04, tracking the exact time, location, donation format, donation amount (200 ml, 300 ml, or 400 ml) and quality (“pass” or “fail”) of donation, the nurse identifier for the nurse that performed the donation, as well as the donor’s age, gender, blood type, marriage status, education, weight, pulse, and blood pressure at the time of donation. The blood center carefully removes all identity-related information and identifies each donor (nurse) by a unique, scrambled donor (nurse) ID, allowing us to follow the donation (collection) behavior of each donor (nurse) over time. Starting with the full dataset of donations (766,104 records), we generate the measures of interest (de- scribed in section 5.2). Afterward, we filter for voluntary donations only, which comprise 55.9% of the original sample. We focus on donation records that take place at locations that have collected blood for over 30 days (excluding 2,536 donations (0.05% of all voluntary donations)), as locations that have few days in operation may represent one-off, non-standard operations. To resolve the left censoring issue regarding nurse experience and donor donation history, we remove all observations associated with nurses who started working before July 1, 2005 (i.e., the first 6 months of the dataset) corresponding to 157,915 donations (20.6% percent of voluntary donations) performed by 36 nurses (out of 108 nurses originally). Next, we drop records where the donation record appears to have some data entry issues (i.e., the blood pressure, pulse, age, time of donation do not appear to be in normal ranges) or the donation volume is unusual (i.e., not 200, 300 or 400). We drop the unusual donation volumes (which form very few records in our dataset), for it cannot be fully determined from the data whether these outcomes may have occurred due to donor reluctance to donate blood during the session (which would be relevant) or due to an inability to provide blood (an exogenous shock). 4 nurses with singleton observations in the remaining records are also dropped. This leads a total of 267,941 records corresponding to 68 nurses to be included in our main analyses. Additionally, we have aggregate level results on donor satisfaction surveys which are run by the blood bank. The description of such surveys is described inx2.10.2. The high satisfaction demonstrated by patients over time in such surveys, alongside the structural limitations on compliance with operational procedures and medical guidelines as set forth by the Law of the People’s Republic of China on Blood Donation help 53 us corroborate that taking a supply-side approach to improving charitable productivity could be feasible per World Health Organization guidelines (Wor 2012) and be beneficial. 2.5.2 Measures 2.5.2.1 Outcome Variables and Explanatory Variables We measure charitable productivity as thedonationvolume achieved by a nurse during a nurse-donor inter- action session. Voluntary donation experience is measured as the number of voluntary donations the nurse has performed up to the focal donation. Group donation experience is measured as the number of group donations the nurse has performed up to the focal donation. In our analyses, we perform a log(variable + 1) transformation to consider non-linearities in and account for high standard deviation for both variables. This also allows us to consider the learning curve that suggests that learning tends to be more prominent initially and limited later. We enter these variables of interest separately in the regressions due to the high multicollinearity be- tween the two variables. We do note, however, that correlation is an imperfect measure of understanding the relationship between the two variables; the correlation does not fully capture the cross-sectional and longitudinal variations between nurses. Additionally, we create two additional variables: totalexperience as the number of donations (any type) that the nurse has performed up to the donation that they are currently working on andpercentagevoluntary donations, computed as the ratio of voluntary donation experience divided by total experience. We enter both variables together into our models for a comparison between the two major types of experiences. Summary statistics and correlations of these variables are shown in Table 2.1. Additional summary statistics are available in Table 2.6 and 2.7. 2.5.2.2 Control Variables We control for a range of observable donor demographics, including gender (a binary variable to indicate male or female), the age of the donor (modeled as a linear term), education level (a factor variable with the following levels: 9 years (less than high school), 12 years (high school), 16 years (undergraduate), 18 years (graduate), and Other), and marriage status (a binary variable to indicate married or single). We also control for health indicators including weight, systolic blood pressure, and pulse, all of which can affect 54 the recommendation on whether to donate blood and how much blood to donate; these are modeled as continuous variables. We create a factor variable to control for the donor’s previous number of donations with the blood bank within our sample (0, 1, 2 ....,10 or more), with 0 as the base category. We also create a factor variable to control for previous interactions that the nurse may have had with the donor (0, 1, 2, 3 ormore), with 0 as the base category. 2.5.3 Identification and Specification Our hypotheses are ideally tested in an empirical context where nurses work individually in blood mobiles and are randomly paired with donors who also come in at random. Besides donors deciding that they want to donate (i.e., like participants agreeing to participate in a trial), our context is like a quasi-random experiment due to the largely exogenous nature of pairing nurses and donors. Why this happens is as follows: First, donors do not have the capability to schedule their blood donation session in our study period. Instead, donors simply walk on to the blood mobile of their choice and cannot specify a nurse to interact with for their donation. This contrasts with other non-profit settings, e.g., healthcare, where appointments are usually made in advance, and even some other charitable giving contexts, i.e., the American Red Cross, where appointments are recommended. The lack of appointments lessens the certainty of donor flow at a location at a specific time. We observe in the data that while returning donors (defined as those who make more than one donation in our study period) tend to return to the same location over our study period, with 68.2% of donors making a repeat visit to the same location at some point in our study period (but only 18.8% of donors go to the same location as their previous donation), an overwhelming 87.4% of donors who do make multiple donations never interact with the same nurse again. We see that the staffing pattern is quite randomized from the donor’s perspective, for the probability that a donor interacts with a nurse again conditional on the day’s staffing is 4.5%; conditional on the location-day’s staffing, it is 8%, and donors have a 1.6% chance of interacting with the same nurse in their immediate next donation. Second, nurses rotate through various locations within a one district of the city to conduct blood dona- tion. The planning of the locations assigned to a particular district is done at a yearly level, and the assigned locations for each district across years are relatively constant. Importantly, nurses do not have additional location preferences within district that are accounted for. They only can select which district to work in when joining the blood bank; such choice is largely driven by which district is closest to their residence. 55 At most of these locations, only one nurse works at the location for the entire day (median: 1, mean: 1.43) while occasionally multiple staff members may work at the same location during the day. This helps to reduce the presence of nurse peer effects that may be induced when nurses communicate with one another or working together (Tan and Netessine 2019). Staffing and location availability have increased over the years, with the median number of staff active per day from 11 in 2005 to 21 in 2017 and locations open per day from 8 in 2005 to 16 in 2017. Nurses also rotate between different donation modes and locations, with a nurse being at one location for 1-2 days (median 1 day, mean 1.9 days); consequently, most nurses only work at one location in a day (median 1, mean 1.04). With such patterns, donors have a low probability of repeat interactions with a nurse. To address endogeneity concerns regarding nurses potentially being selected into particularly busy/higher performing locations, we have included a check on whether nurses of higher voluntary donation (or group donation) experience may be assigned to a busy/high location. We discuss how we build the corresponding analysis inx2.10.1.1. The corresponding results, shown in in Table 2.8, demonstrate that nurse experience does not determine nurse pairing. We also consider whether nurse’s overall experience with locations can play a role in increasing donation volumes and find that it is not the case in Table 2.9—rather, relevant experience is the main driver. Third, nurses cannot choose their assignments for voluntary or group donations. Fairness across all teams is a major consideration by the blood bank. Nurses are assigned to group donation locations similarly as the previous discussion on nurse rotation. Although there may be variation initially in whether nurses may work in voluntary or group donation, over time, the allocation towards voluntary locations is such that a nurse obtains ˜60% of donations under the voluntary setting, which follows with the overall distribution of donations for the blood bank. Therefore, we can assume that the different types of experience a nurse gain are also largely exogenous. Fourth, nurse recruitment of donors outside of the bloodmobile is a rare phenomenon according to our blood bank. Instead, nurses typically sit inside of the bloodmobile to wait for donors to walk onto the blood mobile. The “conversion rate” of donors directly from the street by nurses is extremely low even if nurses exert efforts and if lots of people pass by a certain blood mobile during a day. This is because donation rates are low in the general population. Such phenomenon is also difficult, as it is hard to predict whether a person may be willing to donate just by observing a simple passerby. Hence, we can also rule out the potential selection of donors by nurses when considering a nurses’ impact on donor’s donation decisions. 56 Attribution of the nurses’ effect on donation outcome can also be cleanly attributed. Nurses primarily stay on-board the bloodmobiles, waiting for donors to walk-in. Then, one nurse takes ownership of the entire service experience with the donor, from registering the donor to collecting their blood if eligible. This is unlike other blood center operations, where multiple nurses may help a certain donor with their blood donation experience. The specification of our primary interests is as follows: DONATIONVOL ijmt = 0 + 1 log(Experience it )+ 2 X jmt + i + m + t + ijmt (2.1) We test our hypotheses by estimating a multivariate fixed effects regression model (2.1) that predicts the donation volume outcome in an individual session. To test our hypotheses regarding nurse experience (Hypotheses 1 and 2), we run linear regressions relating the dependent variableDONATIONVOL ijmt , the donation volume outcome for a nursei and donorj at locationm during timet to the independent variable of log(voluntarydonationexperience) or log(groupdonationexperience) (log(Experience it )) . We include the following controls: first, we control for donor characteristics (X jmt ), which is important as the donation volume decision may depend on the donor’s demographics and physical condition. Second, we include a nurse fixed effects ( i ) to control for time-invariant characteristics of nurses. Third, we include location fixed effects ( m ), to address any environmental characteristics or differences in donor populations. We do not include donor fixed effects in our main model as a large proportion of donors in our dataset do not make repeat donations. Fourth, to control for temporal issues related to blood donation, we include time fixed effects ( t ), including hour of day, day of the week, and month-year fixed effects. We cluster the standard errors by nurse and location (reghdfe command in Stata). We also perform the following specification which includes both types of experience in one regression. DONATIONVOL ijmt = 0 + 1 log(TotalExperience it )+ 2 PctVoluntary it + 3 X jmt + i + m + t + ijmt (2.2) Here,TotalExperience refers to the total number of donations the nurse has obtained in all settings prior to the donation. Pct Voluntary refers to the proportion of donations that a nurse has obtained prior to the 57 donation that were done in the voluntary setting. This specification allows us to consider the two types of experience together.X jmt , i , m , and t are defined as previously noted for specification (1). To explore the heterogeneous effects of voluntary donation experience (Hypotheses 3a, 3b, 3c), we add into specification (2.1) an interaction term for a particular donor characteristic with log(voluntary donation experience) while continuing to include the donor characteristic as a main effect. The controls for these models follow those described for specification (2.1). Table 2.1: Summary Statistics and Correlations of Variables of Interest Variable mean sd 1) 2) 3) 4) 5) 6) 7) 8) 1) Donation V olume 360.08 57.76 1 2) Log(V oluntary Donation Experience) 7.66 1.18 0.0416* 1 3) Log(Group Donation Experience) 7.04 1.55 0.0342* 0.8986* 1 4) Log(Total Experience) 8.18 1.22 0.0399* 0.9851* 0.9469* 1 5) Percentage V oluntary Donations 0.61 0.12 -0.0040 -0.1917* -0.5479* -0.3516* 1 6) New Donor 0.67 0.47 -0.1576* -0.0276* -0.0310* -0.0305* 0.0267* 1 7) Female 0.41 0.49 -0.2156* -0.001 -0.0042 -0.0014 0.0046 -0.0059* 1 8) Low Weight 0.48 0.50 -0.3487* -0.0151* -0.0200* -0.0183* 0.0247* 0.0776* 0.5330* 1 Note. The summary statistics are calculated on a sample size of N = 267,941, besides low weight and the correlation matrix (N = 231,915). Significance: *p< 0:01. V oluntary Donation Experience is measured as the number of voluntary donations performed by the nurse since joining the blood bank and before the donation session. Group Donation Experience is measured as the number of donations performed by the nurse in the group donation setting since joining the blood bank and before the donation session. Total Experience is measured as the number of donations performed by the nurse since joining the blood bank and before the donation session. Percentage V oluntary Donations is the ratio between V oluntary Donation Experience and Total Experience. New Donor is a binary variable that equals 1 if the donation is the first observed donation for a donor’s id. Female is a binary variable that equals 1 if the donor’s gender is female and 0 if the donor’s gender is male. Low Weight is binary variable that equals 1 if the donor has a median or lower weight (leq64 kg) in the entire donation level data. Female is a variable that equals 1 if the donor is female and 0 if male. 1 Active Nurse for Location-Day equals 1 if the donation is performed with a nurse working alone for the location-day and 0 otherwise Workload is measured as the number of donations registered by the nurse in the last hour prior to the donation session. 2.6 Results 2.6.1 The Effect of Experience: Voluntary versus Group Table 2.2 shows the results for how two types of experience—voluntary donation experience and group donation experience—impact resultant donation decisions. Columns (1) and (2) provide support for Hy- pothesis 1, with the coefficient of log(voluntarydonationexperience) being positive and significant. On the other hand, Columns (3) and (4) show that the coefficient of log(group donation experience) being positive and significant at the p<0.1 level. Columns (5) and (6) compare both types of experience in one regression. We observe that experience overall is helpful, but the percentage of such experience that is done in the vol- untary setting serves to further increase donation volume, supporting Hypothesis 2 and further supporting Hypothesis 1. Interpreting the coefficient for log(voluntary donation experience) on Column (2) suggests that a 10 percent increase in voluntary donation experience should be associated with a 0.037% (=1.403*log(1.10) / 58 Table 2.2: The Impact of V oluntary versus Group Donation Experience on Donation V olume Deci- sions Dependent Variable: Donation V olume V ARIABLES (1) (2) (3) (4) (5) (6) Log(V oluntary Donation Experience) 1.342*** 1.403*** (0.296) (0.371) Log(Group Donation Experience) 0.403* 0.492* (0.223) (0.267) Log(Total Experience) 1.456*** 1.500*** (0.276) (0.357) Past Percentage V oluntary Donations 13.586*** 12.931*** (2.891) (3.106) Constant 346.064*** 227.970*** 356.285*** 238.221*** 334.297*** 216.844*** (6.616) (12.197) (6.001) (11.423) (7.212) (12.934) [b] Observations 267,941 231,610 267,941 231,610 267,941 231,610 R 2 0.047 0.225 0.047 0.225 0.048 0.226 Donor Controls No Yes No Yes No Yes Location Fixed Effects Yes Yes Yes Yes Yes Yes Nurse Fixed Effects Yes Yes Yes Yes Yes Yes Time Fixed Effects Yes Yes Yes Yes Yes Yes Note. Robust standard errors in parentheses, clustered by nurse and location. Significance reported as *p< 0:1, **p< 0:05, *** p< 0:01. 360.7) increase in mean productivity per donation. To put this in more concrete terms, comparing a very new nurse (i.e., performed 10 voluntary donations) to average nurse (3444 voluntary donations) we see a 1.403 x log(3444/10) = 8.20 ml increase or 2.27% marginal increase in donation volume per donation, demonstrating that more experienced nurses on average can benefit the blood bank more than a newly hired nurse. Nurses on average perform about 1000 voluntary donations in a year. Comparing the yearly trajectory of a newly hired nurse with 10 voluntary donations versus one with one standard deviation above the mean (6,432 voluntary donations), our model predicts that the more experienced nurse would obtain an additional 4,036 ml of blood; the benefit from nurse experience overall for the experienced nurse gains an additional 12,407 ml of blood and for the new nurse, 8,371 ml of blood. Such amounts are aligned with two days to a week’s worth of additional voluntary whole blood collection. Overall, Table 2.2 suggests that gaining the relevant type of experience—that is, voluntary donation experience—is important to increase outcomes within a session, and consequently overall productivity. In contrast, group donation experience does not help a nurse as much in improving charitable productivity in the voluntary donation setting. 59 2.6.2 The Heterogenous Effect of Experience by Donor Self-Efficacy Recall that Hypotheses 3a, 3b, and 3c hypothesize that new donors, female donors, and lower weight donors may be more impacted by voluntary donation experience. To operationalize the difference between donors of different weight, we identify a “low weight” donor as having a weight less than or equal to the median donor weight in the entire dataset (64 kg), and “high weight” donor as above the median donor weight. We test these hypotheses by running regressions interacting log(voluntary donation experience) with the indicator variable identifying such donors (new donor, female, or low weight). To support Hypothesis 3a, 3b, or 3c, we expect the corresponding interaction effect to be positive and significant. As model-free evidence, Table 2.6 shows the average donation volume and standard deviation of do- nation volume for donors of differing characteristics. We find that new, female, and lower weight donors exhibit a lower average donation volume and a higher standard deviation compared to their respective coun- terparts. This suggests that these groups may generally perceive reduced control and hence donate less amounts of blood, while the higher standard deviation suggests that there may be more room for nurses to help donors feel comfortable with enacting the higher donation volumes. Table 2.3 shows how new donors, female donors, and low weight donors respond to voluntary donation experience. Columns (1) to (3) show that the interaction coefficient for new donors, female donors, and lower weight donors with voluntary donation experience is positive and significant. These results suggest donors with less self-efficacy could be encouraged to donate more when paired with a nurse with greater voluntary donation experience, supporting Hypotheses 3a, 3b, and 3c. Hence, if a location may be forecasted to have more of these types of donors, it would be more salient to have nurses with high voluntary donation experience to assist in these donors’ blood donation experiences. 2.7 Robustness Checks Our results are robust to alternative measures and specifications. 2.7.1 Alternative Measure: Days Worked Instead of measuring experience by the number of donations performed by the nurse, we consider the number of days the nurse has worked in the voluntary or group setting. These measures increment by one for each day when a nurse obtains one or more donation in the voluntary or group setting respectively. Table 2.10 60 Table 2.3: The Heterogeneous Effects of V oluntary Donation Experience on Donor Groups for Donation V olume Decisions Donation V olume V ARIABLES (1) (2) (3) Log(V oluntary Donation Experience) 0.889* 1.026** 1.074** (0.482) (0.383) (0.408) New Donor -21.843*** (3.328) New Donor x Log(V oluntary Donation Experience) 0.782* (0.401) Female -3.634 (2.231) Female x Log(V oluntary Donation Experience) 0.909*** (0.278) Low Weight -38.079*** (2.382) Low Weight x Log(V oluntary Donation Experience) 0.806** (0.370) Constant 212.011*** 230.997*** 339.619*** (12.077) (12.694) (13.946) [b] Observations 231,610 231,610 231,610 R 2 0.223 0.225 0.194 Donor Controls Yes – variables for donation history are excluded Yes Yes – excludes continuous weight measure Location Fixed Effects Yes Yes Yes Yes Nurse Fixed Effects Yes Yes Yes Time Fixed Effects Yes Yes Yes Note. Robust standard errors in parentheses, clustered by nurse and location. Significance reported as * p< 0:1, **p< 0:05, ***p< 0:01. 61 shows that looking at the number of days worked (in the voluntary setting or group setting)—also leads to similar results for Hypotheses 1 and 2. 2.7.2 Analysis on Group Donors V oluntary donation experience is not likely to affect other donation formats’ donation volumes, especially when donors have limited opportunities to decide how much they want to donate. This generally holds true in the other format of whole blood donations– group donations (where there is higher number of donors and limited or no time to discuss donation volume options). Moreover, group donation experience is likely not to affect donation volume choices, as the experience gained is on blood collection only. We perform a placebo test to see whether voluntary donation experience, as well as group donation expe- rience, may necessarily benefit group donors. We run the similar specifications as we did for Hypotheses 1 and 2 but on a different sample: group donors only. Table 2.11 shows the results that there are no significant effects of voluntary donation experience or group donation experience on this sample. Such results provide additional evidence to support the hypothesized mechanisms on nurse-donor inter- action and further illustrate the role of voluntary donation experience on benefiting interactions where there is room to impact donation volume. Moreover, they corroborate the institutional details on the different donation formats and highlight the unique experience needed for voluntary donations. If donations should be primarily coming from the voluntary format, then nurses should gain more experience in it to further organizational outcomes. 2.7.3 Alternative Explanations First, a potential confounding factor is that nurses may exhibit peer effects on one another if there are mul- tiple staff members working with one another (Tan and Netessine 2019). Nurses may benefit differentially on how to persuade donors depending on whether peers are available to discuss their work and/or encour- age each other during their workdays. Nurses as a result may perform differently depending on whether they work alone or with their peers. To alleviate this concern, we create a variable that indicates whether a location-day only had one staff member performing active blood collection. We interact this variable with voluntary donation experience to see whether nurses may be differentially influenced when working alone 62 versus working with peers. Column (1) in Table 2.12 suggests we do not find differential impacts of peer ef- fects with voluntary donation experience. We also perform subsample analyses for donations performed by nurses working alone versus working with peers and find similar results for voluntary donation experience, as demonstrated in Columns (2) and (3) of Table 2.12. Second, we explore whether our results may be driven by certain nurses via subsample analysis. We check whether nurses who have the most experience may be driving outcomes by choosing a more re- strictive cutoff month for the start of our analyses—October 2005 or January 2006—instead of July 2005. Columns (1)-(4) in Table 2.13 demonstrate that our results are still robust to this subsample check. One may wonder whether nurses who may have good performance (high donation volumes) and left the blood bank (potentially due to promotions) may also be driving the results. To check on this, we subsample away nurses who “left” before 2017 (as defined as not having performed any donations in 2017). Columns (5)-(6) in Table 2.13 report the results of this subsample check, which demonstrates that our effects are still present. Hence, it suggests that our effects are not being driven by nurses who have left the blood bank—showing that learning is still relevant for active nurses. Third, a concern could be that experimental interventions performed by the blood bank, such as those reported in Sun et al. (2016) and Sun et al. (2019), could have an impact on our results. To alleviate the concern that the interventions are not driving the results, we re-estimate our models to exclude the data periods that align with the experimental interventions that occur in those papers. Our main results remain to be robust as shown in Columns (7)-(8) of Table 2.13. Fourth, our main specification utilizes a linear regression specification, but as our donation volume outcomes are discrete, a discrete choice model specification may be preferred. Hence, we re-run the models we use to test our main hypotheses using a multinomial logit specification. Due to the lack of convergence of this model when using our full set of fixed effects (location, nurses, donor history, interactions with the nurse, time fixed effects), we estimate the model with the key independent variables of interest, nurse/location fixed effects, time fixed effects (month, year, hour, day of the week), and donor characteristics, and model the donor’s donation history (overall and with a nurse) as continuous variables instead. Table 2.14 reports the results in terms of relative risk ratios using 200 ml as the base category. We observe similar insights as we do from the linear regression model for log(voluntary donation experience), as well as with the combination of log(total experience) and percentage voluntary donations. 63 2.7.4 Incentives x2.3 mentioned that performance pay incentives were implemented at the blood bank. This may influence nurses to obtain more donations, as their pay would be tied to the amount of blood collected. Although our main models control for month-year fixed effects, which should also largely control any effects that may be recorded from the incentive schemes, it is of interest to understand whether such worker-side incentives may potentially explain away our findings. Moreover, it is also interesting to understand whether performance pay incentives may necessarily improve charitable productivity. To explore this matter more closely, we control for the presence of such incentive schemes in our mod- els as follows. We create a factor variable to distinguish the exact date in which either fixed wage (prior to 2011/3/26), group pay (between 2011/3/26-2013/3/25) or individual pay (2013/3/26 onward) was imple- mented, with the base category set as the fixed wage. We then include this variable as an additional variable in our primary regression models on the same set of observations analyzed in our primary analyses. Columns (1)-(4) on Table 2.15 suggest that having explicitly controlled for the presence of incentive, Hypotheses 1 and 2 still hold. They also suggest that there is a significant positive effect of the individual pay incentive on donation outcomes. We also perform a subsample analysis on donations prior to March 2013, as the in- dividual incentive seems to have significantly boosted donation volume. We find our results for Hypotheses 1 and 2 are robust in Columns (5)-(8) of Table 2.15. Additionally, we consider whether such incentives, as well as experience, may impact the quality of blood donation. We run regressions regarding pass rates of blood donation on donation volume, experience, and incentives, while controlling for donor characteristics, nurse and location fixed effects, and time related controls as in Equation (2.1) and 2.2. We find no relationship between our key independent variables and pass rates (which generally are very high to begin with) in Table 2.16. We acknowledge that the estimates regarding the effect of incentives are not necessarily a causal es- timate. Strictly speaking, they are pre-post analyses. Nonetheless, it suggests that there may not be a backfiring of incentives on donation outcomes, as reported in some studies on non-profits (e.g., Gneezy and Rustichini 2000). 64 2.7.5 Workload Given that empirical studies have shown that workload can influence outcomes via a speed-up effect (e.g., Berry Jaeker and Tucker 2017, KC and Terwiesch 2009, Tan and Netessine 2014), which may reduce produc- tivity, we consider whether workload may impact outcomes in our setting. We define the nurses’ workload at a given timepoint as her hourly throughput, measured as the number of donors that have been registered by the nurse in the last 60 minutes prior to the current donation session. To reduce the presence of outliers, we top-code this workload variable at the 99 th percentile. Unlike the other settings like hospitals where nurse workload is very high, workload for nurses in the blood donation setting is relatively low. We calculate an approximate average utilization for the day by noting the average interarrival time, the daily number of staff members working that day and assume a touch time/service time of 20 minutes with a donor (for pre-registration, discussing donation volume choice, and blood collection tasks (excluding observation time during collection and post collection)). We observe that average (median) daily utilization across all locations is sufficiently low at 52.5% (33.1%). Moreover, workload in our context is exogeneous. For one, there is no appointment system available at the time of the study, so donor arrivals cannot be smoothed out over time. This is unlike studies of workload in healthcare settings, where workload has been considered as the total of patients planned for arrival and patients in the current unit meeting a threshold (KC and Staats 2012). Hospitals can divert and turn away patients given existing load. In the blood donation context, donors arrive as they please, and only do not receive service if they do not meet the eligibility guidelines. Given that nurses typically work alone, they do not have much of an ability to “gatekeep” had they been under high workload. Hence, an OLS analysis should be sufficient to understand the workload effect in this setting. Table 2.17 shows the results of including the workload variable as a control variable. We find that Hypotheses 1 and 2 are still supported, with a positive and significant effect of log(voluntary donation experience) (or log(total experience) and percentage voluntary donations). 2.8 Managerial Implications for Non-Profit Operations 2.8.1 Back of the Envelope Calculation on Potential Benefits of Alternative Matching Having understood how nurse experience can be salient for donation outcomes and how its effects can be moderated by donor’s level of control, we perform a series of counterfactual estimates to understand the 65 potential benefits of using these insights to match nurses and donors for improved outcomes. To illustrate the potential benefits, we utilize donation records in the 2011 1 st half to perform counterfactual estimates. We perform two types of counterfactual estimates. The first tries to understand the effect of pairing nurses and donors individually without considering resource constraints to see an upper bound to the value of improved matching. Such system may be implemented if organizations have an appointment system that could dynamically incorporate information to improve staffing decisions. The second considers matching of existing nurses to locations each day, which parallels the blood bank’s practice of scheduling nurses. We also consider the second with also tying in the assumption that a nurse is fully specialized in voluntary donation experience versus also working in other donation settings. Our methodology and results are described in x2.10.4.2; we briefly report key results here. We find that if we can pair nurses and donors individually without considering resource constraints, there is a lift to total donation volume gained of 293,490 ml. This is equivalent to approximately 1,467 additional units of blood (a unit being 200ml), in other words, 52% of the potential gain (defined as the additional blood collected had everyone donated 400ml) was achieved. This can also be viewed as a 5.1% increase on the existing blood collected. Had we considered resource constraints, we see that the generated donation volume is equal to 5,762,440 ml, or a 45,440 ml increase in donated blood. This amount achieves 8.05% of the potential gain, which is equivalent to an additional 227 units of blood. Considering resource constraints and the possibility that nurses were instead fully specialized in voluntary donations, there would be a 9.32% of the potential gain achieved. 2.8.2 The Learning- Limited Learning Nature of Voluntary Donation Experience Noticing that the sign of the log(voluntary donation experience) measure is positive in Table 2.2, this sug- gests a concave shaped learning curve, that is, as voluntary donation experience is accumulated, the initial accumulation of experience should provide more benefits to outcomes versus accumulation once a large basis of experience has been gained. To further unpack this, we run a piecewise linear model for voluntary donation experience using the nl hockey package in Stata, which optimizes the break point at which the piecewise linear model is created for the measure to identify at what point is there a tipping point in the learning pattern. To minimize the effects of outliers, we model voluntary donation experience in this section 66 with a top-coded version that has been top-coded at the 99th percentile. The model mathematically can be described as follows, where DONATIONVOL ijmt = 1 + 1 VoluntaryDonationExperience it + 2 X jmt + i + m + t + ijmt if VoluntaryDonationExperience it c DONATIONVOL ijmt = 2 + 2 VoluntaryDonationExperience it + 2 X jmt + i + m + t + ijmt if VoluntaryDonationExperience it c 1 + 1 VoluntaryDonationExperience it = 2 + 2 VoluntaryDonationExperience it (at the breakpoint) Table 2.4: The Learning-Limited Learning Nature of V oluntary Donation Experience Donation V olume V ARIABLES (1) (2) (3) (4) V oluntary Donation Experience (before breakpoint: 1 ) 0.0024*** 0.0022*** 0.0024*** 0.0022*** (0.0006) (0.0006) (0.0006) (0.0006) V oluntary Donation Experience (after breakpoint: aggregate effect 2 ) 0.0010* 0.0006 (0.0005) (0.0005) V oluntary Donation Experience (after breakpoint: marginal effect (change in slope from before breakpoint: 2 1 ) -0.0014*** -0.0015*** (0.0004) (0.0004) Observations 267,941 231,610 267,941 231,610 R 2 0.0431 0.2217 0.0431 0.2217 Donor Controls No Yes No Yes Location Fixed Effects Yes Yes Yes Yes Nurse Fixed Effects Yes Yes Yes Yes Time Fixed Effects Yes Yes Yes Yes Marginal Spline No No Yes Yes Spline Cutoff 4898 4850 4898 4850 [b] Note. Columns (1) and (2) describe the aggregate effects of voluntary donation experience before/after the breakpoint, while Columns (3) and (4) describe the marginal effects of voluntary donation experience before and after the break- point. Due to estimation challenges, instead of month-year fixed effects we include a set of month and a set of year effects instead. Robust standard errors in parentheses, clustered by nurse and location. Significance reported as * p< 0:1, **p< 0:05, ***p< 0:01. Table 2.4 shows the results of this exercise. Columns (1) and (2) describe the aggregate impact of ex- perience before and after the breakpoint and suggest a learning and limited learning nature of voluntary donation experience, as before the breakpoint the impact of voluntary donation experience is positive and significant, while after the breakpoint the impact of voluntary donation experience is positive, but no longer significant. Column (2) suggests that at the breakpoint, the cumulative impact of voluntary donation ex- perience on outcomes is 0.0022 4850 = 10.7 ml, or a 2.96% increase in donation volume. We observe that the selected breakpoint is relatively consistent regardless of whether we include donor characteristics or not, with the breakpoints being 4898 and 4850 voluntary donations respectively. Both breakpoints are 67 larger than the 75th percentile (4799 voluntary donations) and median (2556 voluntary donations) of volun- tary donation experience, suggesting that the blood bank has nurses performing donations that have already past the initial steeper learning curve. From a tenure standpoint, this level of voluntary donation experience typically reached around a tenure of 4.6 years with the blood bank. Columns (3) and (4) describe the marginal effects of voluntary donation experience (calculated as 2 1 ) by which we see that the change in slope for voluntary donation experience after the cutoff point is negative and significant. This confirms the overall reduced benefits of learning as demonstrated in 2 ; the reduction in slope does not fully wash out the benefits of voluntary donation experience as gained in the beginning of one’s tenure at the blood bank. This observation helps to additionally confirm the log specification of experience that we used in our main analyses (we appreciate a reviewer’s comment on this). Overall, the results corroborate our findings observed inx2.6.1, emphasizing the non-linear nature of experience. One explanation is career boredom of which was suggested by our blood bank: nurses with extended tenure at the blood bank may gain less satisfaction from their work. Consequently, such nurses would start to be less focused in achieving the highest productivity for the blood bank and may be prone to making more mistakes. Our collaborating blood bank has attempted to utilize this insight in designing the staff lifecycle, such that after an extended tenure (e.g., 10 years) with the blood bank, nurses move on from front-line blood collection to other tasks. Meanwhile, new nurses are hired to work on front-line blood collection. Another explanation is given the consideration of a staff lifecycle, highly persuasive nurses who may have continued to exhibit learning may have been promoted to administrative roles. Hence, we do not fully observe their learning curve, but we observe this learning and limited learning pattern. Despite either of these observations, we still observe the learning curve phenomena when we only consider nurses who are still actively working in the blood bank near the end of the study period (Columns (5)-(6) of Table 2.13, as discussed inx2.7.3). Hence, we believe such considerations can be operationalized by charitable organizations to optimize the career trajectory of front-line staff members for improved outcomes. 2.9 Discussion and Conclusion In this study, we find strong evidence that in charitable organizations, there is a non-trivial role of worker experience on resultant donation outcomes, with the relevancy of such experience impacting outcomes dif- ferentially. That is, accumulating relevantvoluntarydonationexperience increases donation outcomes while 68 groupdonationexperience, which familiarizes workers with one of the subtasks at hand in the donation pro- cess, is (less) beneficial in increasing productivity. This voluntary donation experience moreover impacts donors with lower levels of behavioral control more. Our work makes the following contributions. First, our results illustrate, in a charitable giving context, that staff experience is helpful in the interpersonal task of blood donation to improve the business outcome of donation volume. This builds on the existing literature on experience and outcomes, which have primar- ily focused on the benefits of experience on operational outcomes (task completion time) and the benefits of experience on non-interpersonal tasks. Importantly however, our study also indicates that the types of experience also matter in achieving improved outcomes. Thus, non-profits should take a granular exami- nation on the nature of experience for their workers. Such results extend the literature which investigated the differential value of various types of experience (KC and Staats 2012, Narayanan et al. 2009, Staats and Gino 2012). Second, we provide evidence that the effect of voluntary donation experience varies depending on the donor—namely, it has a stronger effect on donors with reduced sense of control in their donation. To help alleviate the concerns from donors, more experienced nurses should be paired with such donors to better inform their decision-making and reinforce that all donation volume options are safe for the donor. Overall, our study shows that charitable organizations can leverage data at the donation session level to gauge the importance of voluntary donation experience, and consequently improve staffing performance. Our work suggests that utilizing big data to leverage the relevant experience of charitable workers can potentially provide economically significant gains, enabling charitable organizations to better realize their objectives, and a way for blood banks to help alleviate shortage. Our findings are important, and especially critical in the current COVID-19 pandemic, with severe blood shortages occurring due to heightened demand for blood and decreased number of blood donors. Our study considers the notion of looking at staffing in the context where appointment/reservation sys- tems are not available. We think that organizations could utilize such insights when they have an appoint- ment/reservation system for their customers or when there are multiple staff members working at a particular location. Taking into consideration of each customer’s profile and looking at these customers individually or at aggregate, the organization can fine-tune their staffing at their various locations to better match workers with customers. For example, at the American Red Cross, donors can pre-register and make an appoint- ment with a donation center or blood drive prior to donating. Given the certainty of said appointments, the 69 American Red Cross can use this information to further customize their staffing and recruitment strategies, leading to higher outcomes. Prior data about walk-ins at various locations can be used to help inform which nurses may be a better match to work with such donors and convince them to donate what may be most in need (whole blood, platelets, or red blood cells). Interestingly, our blood bank has recently implemented such scheduling system, which opens the possibility of testing the insights from this study in practice. Table 2.18 provides a summary of interventions that could be performed at the blood bank, which Sun et al. (2016) and Sun et al. (2019) ideate. We observe that our suggested intervention has similar if not larger lifts to amount of blood collected, while the intervention could pertain to all donors (not just existing donors). This demonstrates the value towards this perspective of considering workers’ relevant experience as a lever, versus looking only at the donor-side to improve outcomes. A limitation of our study is that our data comes from one blood bank. Future research can explore whether our results generalize to other blood banks in other countries who offer multiple donation options as part of their collection strategy. We advise readers to be mindful when generalizing the insights of the study to other charitable giving contexts by carefully considering their context’s details. Our work, to the best of our knowledge, is the first to take a granular look at the effects of worker ex- perience on organizational productivity in a charitable giving context. Our results suggest ways in which workers can accumulate experience to improve outcomes and how this affects donors with varying degrees of self-efficacy. However, it does not necessarily investigate exactly what nurses do when gaining such ex- perience, which subsequently led to higher donation volumes. Future research should further delve into the interaction process between charitable workers and donors to understand what factors within the interaction can help charitable organizations induce higher outcomes. One potential avenue to conduct such research is to observe these on-site interactions to understand what types of interactions should be encouraged to improve the efficiency of charitable organizations, as well as donor satisfaction. Additionally, previous re- search, in contexts such banking and software (Madiedo et al. 2020, Narayanan et al. 2009, Staats and Gino 2012) has demonstrated the importance of variety in impacting outcomes. Although not prevalent in our context within day-to-day experiences of a nurse, such factors may play a role in other charitable contexts and understanding their potential impacts could lead to additional relevant managerial implications. 70 2.10 Appendix 2.10.1 Supplemental Descriptive Statistics and Background Table 2.5: Comparison between V oluntary and Group Donations V oluntary (n = 268,168) Group (n = 173,168) t stat New Donor 0.667 (0.471) 0.739 (0.439) 51.1 [t] Donor’s # of Times Donating 1.774 (1.643) 1.440 (0.978) -76.2 Female 0.407 (0.491) 0.353 (0.478) -35.8 Weight 65.580 (11.367) 64.251 (10.649) -38.9 Married 0.434 (0.496) 0.281 (0.449) -87.25 Age 28.195 (9.155) 23.720 (7.909) -1700 [b] Note. mean (standard deviation) reported for each dimension (row) and donation type (column). All t statistics correspond top< 0:0000. Slightly different number of records (i.e., voluntary n = 268,168 instead of 267,941) due to application of inclusion criteria (i.e., locations with blood collection over 30 days) across both donation types. Table 2.6: Donor Groups and Donation Outcomes Donor Group N Mean SD Donate 200ml Donate 300ml Donate 400ml New Donor 178593 353.6 59.8 5.4% 35.5% 59.1% Existing Donor 89348 372.9 51.1 3.2% 20.7% 76.1% Male 158910 370.4 52.5 3.3% 22.9% 73.7% Female 109031 345.0 61.7 6.7% 41.6% 51.7% High Weight 138316 379.6 46.6 2.7% 15.0% 82.3% Low Weight 129599 339.3 61.2 6.8% 47.2% 46.1% Total 267941 360.1 57.8 4.7% 30.5% 64.8% 2.10.1.1 Discussion on Nurse Location Assignment Despite the institutional details which should rule out the endogeneity issues with the rotations, we perform additional analyses on the pairing of nurses to specific locations depending on whether the locations’ typical donation volume is below/above the median or whether the location tends to obtain more donations or not. We measure this by identifying locations in 2005-2010 and whether their average daily voluntary donation volume choice/average daily number of voluntary donations is above the median observed across the various locations. In particular, we defineHighVol imt as a variable that equals 1 if locationm worked at during the dayt for a nursei has an average donation volume (recorded from 2005-2010) that is higher than the median location’s average donation volume (which performs voluntary donations), andBusy imt as a variable that equals 1 if the location m worked at during the day t for a nurse i has a higher average donations per day than the median location (which performs voluntary donations from 2005-2010). 71 Table 2.7: Extended Summary Statistics Variable N Mean SD 1) Donation V olume 267941 360.1 57.8 2) log(V oluntary Donation Experience) 267941 7.7 1.2 3) log(Group Donation Experience) 267941 7.0 1.5 4) log(Total Experience) 267941 8.2 1.2 5) Percentage V oluntary Donations 267941 0.6 0.1 6) log(V oluntary Days Worked) 267941 5.8 1.1 7) log(Group Days Worked) 267941 4.1 1.2 8) log(Days Worked) 267941 6.0 1.1 9) Percentage V oluntary Days 267941 0.9 0.1 10) New Donor 267941 0.7 0.5 11) Female 267941 0.4 0.5 12) Age 267941 28.2 9.2 13) Weight 267915 65.6 11.4 14) Low Weight 267915 0.5 0.5 15) Sbp 267922 115.5 10.6 16) Pulse 267777 72.5 3.5 17) Married 234442 0.4 0.5 18) Education - 12 years 264091 0.3 0.5 19) Education - 16 years 264091 0.5 0.5 20) Education -18 years 264091 0.0 0.2 21) Education - 9 years 264091 0.2 0.4 22) Education - Other 264091 0.0 0.0 23) PastDonation = 0 267941 0.7 0.5 24) PastDonation = 1 267941 0.2 0.4 25) PastDonation = 2 267941 0.1 0.3 26) PastDonation = 3 267941 0.0 0.2 27) PastDonation = 4 267941 0.0 0.1 28) PastDonation = 5 267941 0.0 0.1 29) PastDonation = 6 267941 0.0 0.1 30) PastDonation = 7 267941 0.0 0.1 31) PastDonation = 8 267941 0.0 0.1 32) PastDonation = 9 267941 0.0 0.1 33) PastDonation = 10+ 267941 0.0 0.1 34) Nurse Donor Past Interactions = 0 267941 1.0 0.2 35) Nurse Donor Past Interactions = 1 267941 0.0 0.1 36) Nurse Donor Past Interactions = 2 267941 0.0 0.0 37) Nurse Donor Past Interactions = 3+ 267941 0.0 0.0 38) Blood Type A 267814 0.3 0.5 39) Blood Type AB 267814 0.1 0.3 40) Blood Type B 267814 0.3 0.4 41) Blood Type O 267814 0.3 0.5 42) 1 Nurse Active at Location-Day 267941 0.5 0.5 43) Workload 267941 1.6 1.8 44) No Incentive 267941 0.3 0.4 45) Team Incentive 267941 0.1 0.4 46) Individual Incentive 267941 0.6 0.5 47) TestPass 267814 1 0.2 72 We then build a panel at the nurse-day level that captures the nurse experience gained up to the end of day t-1, and the correspondingHighVol imt andBusy imt for nurse i and day t; 2 We run regressions with the dependent variable as eitherHighVol imt orBusy imt and independent variables of either log(V oluntary Donation Experience), log(Group Donation Experience), time fixed effects (day of week, month, year) and nurse fixed effects. We find no evidence that nurse experience plays a role in determining their paired location in Columns (1)-(4) of Table A4, with the coefficients being small in magnitude and not significant. Considering that planning is done at a yearly level for the locations that nurses rotate between in, we also build a nurse-year panel, wherebyHighVol it andBusy it are now the average values across nurse i ofHighVol imt andBusy imt from the nurse-day level panel, and Log(V oluntary Donation Experience) or Log(Group Donation Experience) are the accumulated experience by the end of year t-1. We continue to find no evidence for nurse experience to play a role in determining location allocation in Column (5)-(8) of Table 2.8. All of these results are robust to not including nurse fixed effects. Table 2.8: Relationship between Nurse Experience and Location Type Assignment Model Level Nurse Day Nurse Year Dependent Variables HighV olume Busy HighV olume Busy (1) (2) (3) (4) (5) (6) (7) (8) Log(V oluntary Donation Experience) 0.017 0.019 0.003 0.004 (0.020) (0.020) (0.009) (0.009) Log(Group Donation Experience) -0.007 -0.005 -0.002 -0.001 (0.012) (0.011) (0.009) (0.009) Constant 0.660*** 0.897*** 0.624*** 0.871*** 0.948*** 0.953*** 0.938*** 0.945*** (0.227) (0.157) (0.227) (0.157) (0.095) (0.096) (0.094) (0.095) Observations 32,083 32,083 32,083 32,083 360 360 360 360 R 2 0.148 0.147 0.155 0.154 0.356 0.356 0.363 0.362 Nurse Fixed Effects Yes Yes Yes Yes Yes Yes Yes Yes Time Fixed Effects Yes Yes Yes Yes Yes Yes Yes Yes Mean of Dependent Variable 0.844 0.844 0.848 0.848 0.815 0.815 0.818 0.818 Note. Robust standard errors in parentheses, clustered by nurse and location. Significance reported as *p< 0:1, **p< 0:05, ***p< 0:01. Additionally, a concern could be that the total number of locations worked at could play a significant role, which could complicate the tests on Hypothesis 1 and 2. We run the specifications which we use for Hypothesis 1 and 2 and add an additional variable – the number of previous locations worked by the nurse prior to the donation to the model. We observe the corresponding results in Table 2.9. We see in the models with voluntary donation experience/ percentage of voluntary donations and controls for donor characteristics (Columns 2 and 6) the previous locations worked variable has a positive but not significant 2 In the rare cases where nurses work more than one location in a day, the last location that the nurse works at during the day is used. 73 effect. The median number of locations worked is 18 per nurse, so the effect is small. More importantly, our main findings remain to be robust. Table 2.9: Locations Worked Dependent Variable: Donation V olume V ARIABLES (1) (2) (3) (4) (5) (6) Log(V oluntary Donation Experience) 1.342*** 1.403*** (0.296) (0.371) Log(Group Donation Experience) 0.403* 0.492* (0.223) (0.267) Log(Total Experience) 1.456*** 1.500*** (0.276) (0.357) Past Percentage V oluntary Donations 13.586*** 12.931*** (2.891) (3.106) Constant 346.064*** 227.970*** 356.285*** 238.221*** 334.297*** 216.844*** (6.616) (12.197) (6.001) (11.423) (7.212) (12.934) Observations 267,941 231,610 267,941 231,610 267,941 231,610 R 2 0.047 0.225 0.047 0.225 0.048 0.226 Donor Controls No Yes No Yes No Yes Location Fixed Effects Yes Yes Yes Yes Yes Yes Nurse Fixed Effects Yes Yes Yes Yes Yes Yes Time Fixed Effects Yes Yes Yes Yes Yes Yes Note. Robust standard errors in parentheses, clustered by nurse and location. Significance reported as *p< 0:1, **p< 0:05, *** p< 0:01. 2.10.2 Information About Donor Satisfaction Surveys The donor satisfaction survey is conducted by the blood bank on a half year basis, with surveys collected on a random subsample of donors (60-200) in an anonymous form for aggregated results. The survey contains several sections, including: Background information about the donor Whether donating blood is harmful to the body (original language before translation: ) How many times has one donated previously (1, 2, 3, 4+) What is the best time to donate blood (9-11, 11-14, 14-17, 17-20) How to increase your enthusiasm for blood donation (increased publicity, blood donation points, elimination of unit subsidies, other) Where the blood center can increase public relation efforts (TV , newspaper, broadcast, bus, WeChat, Weibo) Quality of Service 74 Blood donor staff reception/consultation experience (工作人员献血宣传及对献血者接待咨询) Blood center’s work on safety and health for blood donors (目前血液中心在献血者安全及健康方 面的工作) Medical laboratory examiner’s service attitude and technical level (体检化验医生服务态度技术水 平) Blood collection staff’s service attitude and technical level (采血人员服务态度技术水平) Blood Donation Card Issuer’s Service Attitude (发放献血证人员服务态度) Environment of blood donation/rest (献血及休息环境) The Quality of Service rating is the key focus in the survey. The donors would answer such questions with their satisfaction rate (0-100%), and an average score is computed based off the responses to the ques- tions. The blood bank sets very high goals for the satisfaction ranking. Hence, nurses are motivated to follow through with good service quality, else they risk not achieving the service goals. We note here that the quality of service questions may seem like different staff members are handling a donor’s donations. However, in our blood bank one nurse handles all of these roles. The data supports the standards set by the blood bank, with almost no variation (99%+ satisfaction) in the quality of service ratings. This provides additional support for no improper blood collection occurring. 75 2.10.3 Regression Tables for Robustness Checks Table 2.10: Alternative Measures for V oluntary Donation Experience (Days Worked) Dependent Variable: Donation V olume V ARIABLES (1) (2) (3) (4) (5) (6) log(V oluntary Days Worked) 1.393*** 1.435*** (0.341) (0.405) log(Group Days Worked) 0.987*** 1.065** (0.346) (0.434) log(Total Days Worked) 1.121*** 1.158*** (0.323) (0.399) Past Percentage V oluntary Donation Days 12.355** 11.956** (5.001) (5.528) Constant 348.070*** 230.137*** 353.853*** 235.948*** 338.946*** 221.489*** (5.868) (11.808) (5.944) (11.642) (6.879) (12.535) Observations 267,941 231,610 267,941 231,610 267,941 231,610 R 2 0.047 0.225 0.047 0.225 0.047 0.225 Donor Controls No Yes No Yes No Yes Location Fixed Effects Yes Yes Yes Yes Yes Yes Nurse Fixed Effects Yes Yes Yes Yes Yes Yes Time Fixed Effects Yes Yes Yes Yes Yes Yes Note. V oluntary (Group) Days Worked is the number of days worked by the nurse prior to the donation in which at least 1 donation was performed in the voluntary (group) setting, while Total Days Worked tracks the total number of days worked by the nurse (as defined as having collected a blood donation during a day) prior to the blood donation. Past % V oluntary Donation Days is the ratio between V oluntary Days Worked and Total Days Worked. Robust standard errors in parentheses, clustered by nurse and location. Significance reported as *p< 0:1, **p< 0:05, ***p< 0:01. 76 Table 2.11: Effect of Different Experiences using the Sample of Group Donations Dependent Variable: Donation V olume V ARIABLES (1) (2) (3) (4) (5) (6) Log(V oluntary Donation Experience) 0.116 -0.482 (0.606) (0.536) Log(Group Donation Experience) 0.079 -0.965 (0.684) (0.879) Log(Total Experience) 0.415 -0.663 (0.623) (0.832) Past Percentage V oluntary Donations -2.405 2.333 (5.644) (9.439) Constant 167.285*** 195.666*** 167.666*** 191.824*** 169.356*** 193.439*** (4.465) (6.849) (3.785) (8.090) (4.910) (11.533) Observations 173,168 85,855 173,168 85,855 173,168 85,855 R 2 0.143 0.279 0.143 0.279 0.143 0.279 Donor Controls No Yes No Yes No Yes Location Fixed Effects Yes Yes Yes Yes Yes Yes Nurse Fixed Effects Yes Yes Yes Yes Yes Yes Time Fixed Effects Yes Yes Yes Yes Yes Yes Mean of Dependent Variable 259 261.6 259 261.6 259 261.6 Note. Robust standard errors in parentheses, clustered by nurse and location. Significance reported as *p< 0:1, **p< 0:05, *** p< 0:01. Table 2.12: Mechanism Analysis: Peer Effects Donation V olume Full Sample Working Alone Working with Peers V ARIABLES (1) (2) (3) Log(V oluntary Donation Experience) 1.492*** 1.333** 1.377*** (0.320) (0.592) (0.460) 1 Active Nurse for Location-Day 2.327 (3.336) Log(V oluntary Donation Experience) x 1 Active Nurse for Location-Day -0.211 (0.421) Constant 227.134*** 234.496*** 225.620*** (12.087) (18.949) (16.475) Observations 231,610 109,133 122,477 R 2 0.225 0.222 0.232 Donor Controls Yes Yes Yes Location Fixed Effects Yes Yes Yes Nurse Fixed Effects Yes Yes Yes Time Fixed Effects Yes Yes Yes Note. Robust standard errors in parentheses, clustered by nurse and location. Significance reported as *p< 0:1, **p< 0:05, *** p< 0:01. 77 Table 2.13: Certain Nurses Driving Result Dependent Variable: Donation V olume Check: October 2005 Nurse January 2006 Nurse 2017 Worker No Report Experiment V ARIABLES (1) (2) (3) (4) (5) (6) (7) (8) Log(V oluntary Donation Experience) 1.014*** 0.889** 1.205** 1.380*** (0.353) (0.333) (0.445) (0.368) Log(Group Donation Experience) 0.368 0.247 0.333 0.478* (0.309) (0.297) (0.316) (0.268) Constant 240.905*** 245.470*** 237.938*** 242.321*** 236.245*** 241.905*** 229.192*** 239.316*** (11.274) (11.255) (11.904) (11.940) (13.070) (12.637) (12.243) (11.570) Observations 168,547 168,547 156,645 156,645 193,307 193,307 225,159 225,159 R 2 0.224 0.224 0.224 0.224 0.226 0.226 0.226 0.225 Donor Controls Yes Yes Yes Yes Yes Yes Yes Yes Location Fixed Effects Yes Yes Yes Yes Yes Yes Yes Yes Nurse Fixed Effects Yes Yes Yes Yes Yes Yes Yes Yes Time Fixed Effects Yes Yes Yes Yes Yes Yes Yes Yes Mean of Dependent Variable 361 361 361.1 361.1 360.9 360.9 360.8 360.8 Note. Robust standard errors in parentheses, clustered by nurse and location. Significance reported as *p< 0:1, **p< 0:05, *** p< 0:01. Table 2.14: Multinomial Logit Model Specifications for Hypotheses 1-2 (1) (2) (3) (4) (5) (6) (7) (8) V ARIABLES / 200 is base category 300 400 300 400 300 400 300 400 Log(V oluntary Donation Experience) 1.025 1.069*** 1.024 1.084** (0.0211) (0.0253) (0.0245) (0.0341) Log(Total Experience) 1.026 1.074*** 1.026 1.091*** (0.0206) (0.0249) (0.0237) (0.0342) Past Percentage V oluntary Donations 1.400 2.082*** 1.388 2.303*** (0.409) (0.574) (0.428) (0.696) Constant 1.951* 2.534** 3.018** 0.00131*** 1.382 1.191 2.153 0.000553*** (0.668) (1.099) (1.642) (0.00103) (0.543) (0.541) (1.231) (0.000453) Observations 267,941 267,941 231,610 231,610 267,941 267,941 231,610 231,610 Donor Controls No No Yes Yes No No Yes Yes Location Fixed Effects Yes Yes Yes Yes Yes Yes Yes Yes Nurse Fixed Effects Yes Yes Yes Yes Yes Yes Yes Yes Time Fixed Effects Yes Yes Yes Yes Yes Yes Yes Yes Note. Coefficients reported as odds ratios. Robust standard errors in parentheses, clustered by nurse and location. Significance reported as *p< 0:1, **p< 0:05, ***p< 0:01. Table 2.15: The Effect of Incentives Dependent Variable: Donation V olume Full Sample A Subsample Prior to March 2013 V ARIABLES (1) (2) (3) (4) (5) (6) (7) (8) Log(V oluntary Donation Experience) 1.337*** 1.394*** 1.459* 1.337* (0.309) (0.378) (0.784) (0.743) Log(Group Donation Experience) 0.401* 0.489* -0.135 0.088 (0.231) (0.271) (0.522) (0.403) Team Incentive 5.437 5.315 5.495 5.371 6.351 5.875 6.356 5.886 (4.558) (5.066) (4.568) (5.066) (4.855) (5.331) (4.942) (5.440) Individual Incentive 11.545* 14.738** 11.747* 14.973** (6.152) (5.839) (6.089) (5.764) Constant 334.557*** 213.280*** 344.539*** 223.232*** 328.690*** 238.849*** 343.522*** 251.214*** (8.687) (12.990) (8.386) (12.645) (11.299) (11.387) (7.502) (8.020) Observations 267,941 231,610 267,941 231,610 111,146 106,905 111,146 106,905 R 2 0.047 0.225 0.047 0.225 0.075 0.243 0.075 0.243 Donor Controls No Yes No Yes No Yes No Yes Location Fixed Effects Yes Yes Yes Yes Yes Yes Yes Yes Nurse Fixed Effects Yes Yes Yes Yes Yes Yes Yes Yes Time Fixed Effects Yes Yes Yes Yes Yes Yes Yes Yes Note. Robust standard errors in parentheses, clustered by nurse and location. Significance reported as *p< 0:1, **p< 0:05, *** p< 0:01. 78 Table 2.16: Analysis on Pass Rate Dependent Variable: TestPass V ARIABLES (1) (2) (3) Donation V olume 0.00002 0.00002 0.00002 (0.00001) (0.00001) (0.00001) Log(V oluntary Donation Experience) -0.00130 (0.00083) Log(Group Donation Experience) -0.00088 (0.00059) Log(Total Experience) -0.00118 (0.00084) Past Percentage V oluntary Donations 0.00606 (0.00604) Team Incentive -0.01323 -0.01325 -0.01317 (0.01523) (0.01525) (0.01523) Individual Incentive -0.01298 -0.01312 -0.01295 (0.01552) (0.01559) (0.01557) Constant 1.15858*** 1.15311*** 1.15305*** (0.03855) (0.03636) (0.04015) Observations 231,610 231,610 231,610 R 2 0.02930 0.02930 0.02931 Donor Controls Yes Yes Yes Location Fixed Effects Yes Yes Yes Nurse Fixed Effects Yes Yes Yes Time Fixed Effects Yes Yes Yes Note. Robust standard errors in parentheses, clustered by nurse and location. Signif- icance reported as *p< 0:1, **p< 0:05, ***p< 0:01. Table 2.17: Workload Analysis Dependent Variable: Donation V olume V ARIABLES (1) (2) (3) Log(V oluntary Donation Experience) 1.453*** (0.358) Log(Group Donation Experience) 0.523* (0.263) Log(Total Experience) 1.550*** (0.346) Past Percentage V oluntary Donations 12.865*** (3.107) Workload -0.671*** -0.663*** -0.667*** (0.218) (0.218) (0.220) Constant 228.925*** 239.402*** 217.821*** (12.360) (11.574) (13.101) Observations 231,610 231,610 231,610 R 2 0.226 0.226 0.226 Donor Controls Yes Yes Yes Location Fixed Effects Yes Yes Yes Nurse Fixed Effects Yes Yes Yes Time Fixed Effects Yes Yes Yes Note. Robust standard errors in parentheses, clustered by nurse and location. Signif- icance reported as *p< 0:1, **p< 0:05, ***p< 0:01. 79 2.10.4 Managerial Implications: Counterfactual Analyses and Intervention Comparisons 2.10.4.1 Simulation Methods Having understood how nurse experience can be salient for donation outcomes and how its effects can be moderated by donors’ level of control, we perform a series of counterfactual estimates to understand the potential benefits of using these insights to better match nurses and donors for improved outcomes. To illustrate the potential benefits, we utilize donation records in the 2011 1 st half to perform counterfactual estimates. We perform two types of counterfactual estimates. The first tries to understand the effect of pairing nurses and donors individually without considering resource constraints to see an upper bound to the value of improved matching. Such system may be implemented if organizations have an appointment system that could dynamically incorporate information to improve staffing decisions. The second falls considers matching of existing nurses to locations each day, which parallels the blood bank’s practice of scheduling nurses We make the following assumption regarding our donors as part of the simulation. First, we assume that the donor would have donated regardless of the nurse being staffed at the location at the day, i.e. ,the nurse impacts the donor not on the extensive margin (recruiting them to come to the blood center/mobile) but on the intensive margin (their choice of donation volume and their experience at the blood bank). We also simplify the notion of a different nurse by considering nurses have 5 different degrees of voluntary donation experience (hereby discussed as “nurse-counterfactual”), as noted in quintiles of the data. Following this, we build a multinomial logit model to predict the probability that a nursei and donorj will donate donation volumed2 D =f200ml, 300ml, 400mlg, using covariatesC ij which includes the factor variable representation of voluntary donation experience, which is interacted with the donor charac- teristics identified in Hypothesis 3a-3c – new donors, female, and low weight, several donor characteristics (age, education, marriage status, past donation history, and blood type), and time effects (day of week and month). p ijd = e T C ij P d2D e T C ij ;8d2D 80 We run two simulation types. The first aims to answer: if we could pair nurses and donors individually without considering resource constraints, what would be the effect? This provides a potential upper bound to the value of personalized matching; it considers what organizations could do if they have an appointment system and utilize the information to improve their staffing decisions. The second falls back closer the blood bank’s practice of scheduling nurses at locations: within our existing nurses, how should we staff nurses at locations for a day? For the latter, we use the predicted outcomes generated from our multinomial logit model and plug it into a constrained optimization framework. 2.10.4.2 Simulation with Personalized Nurse-Donor Pairings To determine the optimal nurse-counterfactual for each nurse-donor donation interaction, we first estimate the model on the 2011 1 st half data using an alternate experience variable: voluntary donation experience by quintiles (0-20 th percentile, 20-40 th percentile, 40 th -60 th percentile, 60 th -80 th percentile, or 80 th -100 th percentile). Then, we expand the dataset to create our counterfactual comparisons: we vary the degree of nurse experience (0-20 th percentile, 20-40 th percentile, 40 th -60 th percentile, 60 th -80 th percentile, or 80 th - 100 th percentile) that each donor could receive. To evaluate the effect of these alternative matchings, we apply the estimated model to calculate the predicted donation volume for each counterfactual index (0-20 th percentile, 20-40 th percentile, 40 th -60 th percentile, 60 th -80 th percentile, or 80 th -100 th percentile), whereby the prediction is calculated asp cj200 200+p cj300 300+p cj400 400 for a certain counterfactualc. We identify the nurse-counterfactual which provides the highest donation volume and denote it as the best nurse. If the nurse-counterfactual chosen is identical to the original nurse’s experience category, we note that a change in nurse is not necessarily helpful. Before showing the results, it is helpful to overview what happened in the 2011 1 st half data. As a reference point, out of the 15,704 voluntary donors in the 2011 1st half, 807 donated 200 ml, while 4,032 donated 300 ml, and 10,865 donated 400 ml. This led to a total donation volume of 5,717,000 ml. Had everyone donated 400 ml, the total donation volume would be 6,281,600 ml; in other words, the upper bound in terms of potential gain is 564,600 ml. Under this simulation framework, we observe that there is a respectable lift to the total donation volume gained of 293,490 ml. This is equivalent to approximately 1,467 additional units of blood (a unit being 200ml), in other words, 52% of the potential gain (defined as the additional blood collected had everyone donated 400ml) was achieved. This can also be viewed as a 5.1% increase on the existing blood collected. 81 Switching the nurse counterfactual was associated with an average gain of donation volume of 74 ml, with a standard deviation of 39 ml, for 25.4% of donors (30.8% of donors at most could have been impacted). Extrapolating the result to a year, the blood bank could get approximately 62% of a month’s worth of additional donation. 2.10.4.3 Constrained Optimization x2.10.4.2 discusses the simulation when we can allocate nurses to donors individually, without consideration of resource constraints—i.e., we might allocate nurses in the highest experience category more so than what our data show regarding nurses’ availability. Hence, one would also suggest considering resource constraints into the simulation framework. We formulate a constrained optimization problem to consider these resource constraints, and pair nurses to locations at the location-day level, as mostly done in current practice. Our decision variable is what nurse counterfactual type should be assigned to a location for the day, i.e.,a ijt equals 1 if we assign nurse counterfactual type i to location j at day t. Utilizing the estimates inxC.1, we aggregate the results to the nurse-counterfactual-location-day level to determine how much blood would be collected had the nurse- counterfactual performed all the donations at the location day. We then formulate the following optimization problem: max P t2T P i2It P j2Jt u ijt a ijt subjectto P j2Jt a ijt ncf it 8i2I t ;t2T P i2It a ijt 1 8j2J t ;t2T a ijt 2 0; 18i2I t ;j2J t ; t2T In this problem, we maximize the total predicted blood collected; in other words, we want to maximize the triple sum of a ijt u ijt , representing the predicted blood collected by nurse counterfactual type i to locationj at dayt if the nurse counterfactual type is indeed assigned to the location at that day. We have a constraint that the number of a particular nurse counterfactual assigned during the day must be less than or equal to what appears in our dataset (ncf it ) and to make a more general assumption that given that the median staffing is 1 at most locations, we subject the number of assigned nurse-counterfactuals to a location to be at maximum 1. 82 Solving this optimization problem, we see that the generated donation volume is equal to 5,762,440 ml, or a 5,762,440 - 5,717,000 = 45,440 ml increase in donated blood. This is an additional gain of 45,440/5,717,000 = 0.80% increase in overall blood donated, or 8.05% of the potential gain achieved, or equivalent to an additional 227 units of blood. Extrapolating this result to a year, this would bring in ap- proximately 9.54% of an average month’s voluntary donation volume. Suppose nurses instead completely focused on voluntary donations. Under the same optimization prob- lem, but with updated for whatncf it by mapping the total number of donations to the total voluntary dona- tion experience category, we resolve the optimization problem and see that the generated donation volume is equal to 5,769,610 ml, or a 5,769,610 - 5,717,000 = 52,610 ml increase in donated blood. This is an ad- ditional gain of 52,610/5,717,000 = 0.92% increase in overall blood donated, or 9.32% of the potential gain achieved, or equivalent to an additional 263 units of blood. Extrapolating this result to a year, this would bring in approximately 11% of an average month’s voluntary donation volume. 83 Table 2.18: Summary of Interventions That could be Done at the Blood Bank and Our Nurse Experience Staffing Framework Intervention Scope/Target Policy Type Economic Cost Short Term Effect Long Term Effect Cost Effectiveness/ Impact Shortage Message (SM) Existing Donors Message Message Ad- ministration 1.8% increase in donation rate; 5.926 ml in donation volume No significant effect to donate after 6 months of receiving the message Helpful for resolving short run issues with blood donation Family Re- placement (FR) Patient’s network Targeting Administration -3.6% decrease in donation rate for no-history FR, 2.9% for existing FR donors; -11.53 ml per half year per no-history donor, 10.15 ml more per half year per existing FR donor Motivation loss for donors who donate less in past; increase in voluntary donation for donors who donate more Increase donation from existing donors in long run, but may have a small motivation loss for new donors Reminder Message (Behavioral) Existing Donors Message Message Ad- ministration 0.274% increase in donation rate; 1.02 ml increase Helpful re- minder for existing donors to come to donate An effective approach to motivate donations Economic Reward Message Individual Existing Donors Message 30-50RMB for 200-400 ml 0.379% increase in donation rate; 1.58 ml increase More costly than having re- minder message Not cost effective, for majority of these donors would have donated anyway Group Existing Donors Message 30-50RMB for 200-400 ml and group gift 0.478% increase in donation rate; 2.47 ml increase May introduce more donors to the blood bank and have a spillover effect for increasing blood donation May increase the group donation amount but weaken self donation frequency if economic rewards are not applied; 4x more cost effec- tive than rewarding individual donors Nurse Expe- rience All donors Targeting Administration 1-5% increase in total blood col- lected; 9-52% of maximum possible do- nation amount achieved with same number of donors Tailored experi- ences can gener- ate higher satis- faction, leading to a larger donor base Personalized matching can help improve short term outcomes via effectively informing donors who self-select to donate about their donation choice Note. The group donation in Sun et al. (2019), under Economic Reward Message-Individual is about motivating donors to donate in groups in the voluntary donation setting, as opposed to participating in group drives (group donation). 84 Chapter 3 Patient Perceptions of Synchronous Telemedicine Video Visits in a Fee-for-Service Model JointworkwithSirishaMohan,FrancisReyesOrozco,JehniRobinson,andAnjaliMahoney 3.1 Introduction Synchronous or live telemedicine video visits have been implemented in various healthcare settings (Barnett et al. 2018) and have rapidly scaled up (Artandi et al. 2020, Mehrotra et al. 2020, Peden et al. 2020, Wosik et al. 2020) due to COVID-19. Prior to the pandemic, strict reimbursement policies resulted in limited use of these services. Since the current public health crisis, legislation under the CARES Act (202 2020) has enabled more payors to reimburse for telemedicine services. Our study examines patient perceptions of telemedicine visits, with specific regard to experience and copayment, in the primary care setting of an academic medical center located in Los Angeles, California. While previous research reveals that video visits garner positive impressions from patients and providers in a variety of specialties (Donelan et al. 2019, Izquierdo et al. 2003, O Parsonson et al. 2021, Odeh et al. 2015, Powell et al. 2018, Thelen-Perry et al. 2018), the literature on patient perceptions of telemedicine in primary care (Donaghy et al. 2019, Mueller et al. 2020, Polinksi et al. 2016, Powell et al. 2017), as well as provider perceptions of telemedicine in primary care (Gomez et al. 2021, Samples et al. 2021) is scarce (Thiyagarajan et al. 2020). Several studies support these positive experiences but are limited to instances when telemedicine is provided at no cost and/or to established patients only. A study performed at the Massachusetts General Hospital (Donelan et al. 2019), for example, appraises the value between telemedicine and office-based visits when telemedicine is provided free of charge. Another study considers 85 the video visit experience of patients from the National Health Service of Scotland (Donaghy et al. 2019), where beneficiaries are offered “free care, based on need and funded by general taxation.” Such studies may not be applicable in the fee-for-service model, where copayment or other costs may be required. With the pandemic, our study investigates the perceptions of new and established patients who have been offered synchronous telemedicine services, regardless of insurance carrier. Some studies (Baum et al. 2016, Kiil and Houlberg 2014, Scott et al. 2009) evaluate the financial view of clinical care in the emergency room and urgent care settings; to the best of our knowledge, we believe we are one of the first studies to focus on both the experiential and financial views of telemedicine primary care services when patients may need to provide a copayment. 3.2 Methods 3.2.1 Study Design and Setting Our study describes completed survey results from patients who completed a telemedicine video visit be- tween April to December 2020. 3.2.2 USC Telemedicine Program The Department of Family Medicine launched telemedicine services in October 2019 with one provider in one clinic location. Given physical distancing recommendations and health policy waivers prompted by the COVID-19 pandemic, these services expanded to over 20 providers across four clinical sites. A workflow was designed to replicate the face-to-face office visit experience, which allowed clerical staff to collect copayments efficiently. Once a telemedicine visit was scheduled, a personalized link was sent to the patient’s email. At the time of the visit, the patient used the link to log into the telemedicine platform which then initiated the encounter. The patient received and signed an electronic consent form which enabled the delivery of healthcare services. Like face-to-face office visits, the patient was greeted by clerical staff and a copayment was collected as pertinent to their health insurance. Next, the patient connected to the medical assistant and eventually to the provider who completed the virtual visit. Based on the level of complexity or time spent with the patient, an evaluation and management charge was submitted and billed to insurance. 86 3.2.3 Telemedicine Video Visit Survey Study participants were identified through the electronic medical record. A personalized link generated through Qualtrics was sent to the participant’s email address with an invitation to participate in a 53-item electronic survey following their telemedicine visit. Participants who did not respond to the initial email notification were sent email reminders via Qualtrics on days 7, 14, 21 and 28. The cross-sectional digital survey queried participants on demographics like age, sex, race/ethnicity, marital status, education, employment status, and income. Patient experience and satisfaction questions were developed by the Family Medicine faculty, along with insights from review of the literature. In particular, the National Quality Forum’s report for quality measures in telehealth guided development of questions (201 2017) related to quality of care received, satisfaction, technology, communication and time with the provider, transportation, and copayment. An earlier version of the survey was piloted during the department’s pilot of telemedicine services, from October 2019 to March 2020. The study was reviewed and approved by the USC Institutional Review Board as an expedited study (IRB ID: HS-19-00678). 3.2.4 Analyses 3.2.4.1 Survey Sampling 903 participants completed the survey. Respondents who declined to participate or provided feedback that were not consistent with that of Family Medicine telemedicine practices (e.g., a phone call took place, or the provider was not from Family Medicine) were excluded from the analyzed sample. 3.2.4.2 Data Coding Respondents who entered values corresponding to an unreasonable age, e.g., 775, had their age cleared. Free text responses with “other” were reviewed and re-coded into default responses when possible. Free text entries noting a copayment to be unreasonable were coded under assigned themes. 87 3.2.4.3 Distance from Clinic In determining differences between typical and actual distance from the patient and the clinic, each patient’s zip code was cross referenced with their provider’s clinic zip code using the NBER Zip Code Distance Database (Roth). 3.2.4.4 Statistical Analysis The data is described as absolute numbers and percent frequency of occurrence. We perform a statistical analysis to compare the actual and typical distance from the clinic using paired t tests and subgroup analyses between reported demographics using 2 sample t tests. Stata 15 and the usespss package was used to process the Qualtrics data. 3.3 Results 3,414 potential respondents received the survey, of which 3,377 were successfully contacted. 903 respon- dents responded to the study, corresponding to a 26.7% effective response rate. Of these, 73 declined to participate, and 33 provided feedback inconsistent with Family Medicine telemedicine practices. 797 sur- veys were analyzed and represent patient experiences from 25 Family Medicine physicians and advanced practice providers. 3.3.1 Primary Analyses 3.3.1.1 Demographics and Baseline Medical Appointment Visits Baseline demographics and medical appointment information of study participants are reported in Table 3.1 and 3.2. Survey respondents were of similar backgrounds to commercial telehealth users (Barnett et al. 2018). 54% (427/797) of participants reported being first-time telehealth users. Table 3.1: Participant Baseline Medical Appointment Experiences and Characteristics Question and Response n (percent) Howdoyouusuallygettoyourmedicalappointment? I drive in my personal vehicle 666 (84%) A friend or family member drives me in their personal vehicle 134 (17%) Bus, train, or some form of public transportation 34 (4%) 88 Rideshare (e.g. Uber, Lyft, etc.) 36 (5%) Senior ride program (Dial-a-ride, etc.) 4 (1%) Taxi 3 (0%) Bicycle 7 (1%) Walk 23 (3%) Scooter 0 (0%) Approximatelyhowlongdoesittakeyoutogettoyourmedicalappointments? Less than 10 minutes 62 (8%) 10 to 15 minutes 172 (22%) 16 to 30 minutes 308 (39%) 31 to 45 minutes 147 (18%) 46 to 60 minutes 67 (8%) Over 1 hour 41 (5%) Approximately how many miles do you usually travel to get to your medical appoint- ments? 1 to 3 miles 93 (12%) 4 to 6 miles 207 (26%) 7 to 10 miles 199 (25%) 11 to 15 miles 126 (16%) Over 15 miles 172 (22%) Whatisyourstatedgender? Male 199 (25%) Female 591 (74%) Other: 7 (1%) Whatisyourrace? Chooseallthatapply. White 403 (51%) Black or African American 44 (6%) American Indian or Alaska Native 6 (1%) Asian Indian 14 (2%) Chinese 43 (5%) Filipino 18 (2%) Japanese 20 (3%) Korean 15 (2%) Vietnamese 2 (0%) Other Asian 15 (2%) Native Hawaiian 0 (0%) Guamanian or Chamorro 1 (0%) Samoan 0 (0%) Other Pacific Islander 4 (1%) Mexican, Mexican American, Chicano/a 197 (25%) Puerto Rican 3 (0%) Cuban 4 (1%) Another Hispanic, Latino/a or Spanish O 64 (8%) Whatisyourinsurancetype? EPO 200 (25%) PPO 410 (51%) HMO 11 (1%) Medicare 122 (15%) 89 Medicaid/Medi-Cal 39 (5%) Other 15 (2%) What is your current employment status? Employed 491 (62%) Unemployed 50 (6%) Homemaker 23 (3%) Student 31 (4%) Retired 137 (17%) Disabled 34 (4%) Other: 31 (4%) WhatisyourroleatUSC? Student 22 (3%) Faculty 51 (6%) Staff 168 (21%) Alumni 49 (6%) Family member/Dependent 72 (9%) Not Affiliated with USC 435 (55%) Whatisyourmaritalstatus? Markonlyone: Married 394 (49%) Not married but living with a partner 57 (7%) Divorced 91 (11%) Widowed 47 (6%) Separated 6 (1%) Single, never been married 202 (25%) What is the highest grade or level of schooling you completed? Less than 8 years 8 (1%) 8 to 11 years 9 (1%) 12 years or completed high school (including GED) 39 (5%) Post high school training other than college 36 (5%) Some college 145 (18%) College graduate 252 (32%) Postgraduate 308 (39%) Thinkingaboutmembersofyourfamilylivinginthishousehold,whatisyourcombined annualincome(totalpre-taxincomefromallsources)? $0 to $9,999 20 (3%) $10,000 to $14,999 18 (2%) $15,000 to $19,999 19 (2%) $20,000 to $34,999 34 (4%) $35,000 to $49,999 68 (9%) $50,000 to $74,999 115 (14%) $75,000 to $99,999 116 (15%) $100,000 to $199,999 216 (27%) $200,000 or more 127 (16%) Omitted 64 (8%) Areyouanewpatienttofamilymedicine? Yes 295 (37%) No 498 (62%) 90 Omitted 4 (1%) 3.3.1.2 Transportation Participants used their personal vehicle most frequently to travel to their in-person visits (84%, 666/797) and most frequently mentioned that they were four to ten miles away (51%, 480/797) or 16 to 30 minutes away (39%, 308/797) from their provider’s clinic. 3.3.1.3 Distance Table 3.2 shows the typical and actual distance to the indicated provider’s clinical site, as measured by zip code. Using a paired two-tailed t-test, we found statistical significance between the two metrics when looking at respondents whose typical distances were less than 200 miles (n = 721; typical: mean 11.2 miles (standard deviation 16.7 miles), actual: mean 14.1 miles (standard deviation 39.1 miles)). The difference between the two values suggest that respondents are located further away from their medical home during the pandemic, possibly due to a permanent or temporary move of place of residence. 3.3.1.4 Logistics While 82% (650/797) of participants spent less than 15 minutes coordinating their visit, 7% (58/797) re- ported coordination times lasting longer than 30 minutes. Most participants conducted their visits from home (91%, 724/797) or workplace (6%, 51/797). Other indicated locations included their vehicle and the homes of other family members. Participants typically used a laptop (48%, 379/797) or smartphone (34%, 268/797) for their visit; desktops (11%, 87/797) and tablets (7%, 59/797) were less used. Table 3.2: Participant Demographics and Visit Distances Attribute n mean sd p50 min max Age 788 48.70 17.67 47.00 18.00 98.00 Children Under Age of 18 in Household 797 0.48 0.79 0.00 0.00 4.00 Typical Distance From Clinic 729 12.86 40.11 6.27 0.00 955.97 Actual Distance From Clinic 735 14.43 40.24 6.56 0.00 823.27 Difference Between Typical and Actual Distance From Clinic 723 1.56 50.78 0.00 -944.78 805.51 91 Figure 3.1: Video Visits Rated Better than Office Visits in 3 Domains (n = 797). 3.3.1.5 Comparisons to Office Visits Figure 3.1 reports survey respondents’ perceptions of video visits compared to office visits. 91% (727/797) of respondents described their telemedicine video visit as somewhat or very convenient. 74% (592/797) reported somewhat or much shorter wait times, and 72% (572/797) reported a shorter time for coordinating and taking part in their visit. 3.3.1.6 Provider Experience Survey respondents revealed positive interactions with providers and nearly all respondents agreed that they could see (93%, 743/797) and hear (96%, 762/797) their provider clearly. An overwhelming majority felt that the provider spent enough time with them (96%, 767/797) and had enough time to discuss their issues (93%, 745/797). When asked about communication, 97% (775/797, 773/797) found that their clinician explained things in a way that was easy to understand and that they were carefully listened to. Most felt confident about the care plan provided (89%, 710/797). 3.3.1.7 Privacy While 87% (693/797) of respondents felt confident that their privacy was protected during their video visit, 70% (558/797) were comfortable in discussing their concerns respectively, and 26% (205/797) uncomfort- able to discuss their concerns via this modality. 92 3.3.1.8 Outlook 81% (653/797) of respondents strongly agreed that a telemedicine visit was a good option to see their doctor due to COVID-19 concerns. Over 90% of respondents (719/797) were satisfied or very satisfied with their visit, with 91% (723/797) expressing that they would be fairly (19%, 158/797) or completely (71%, 565/797) willing to use telemedicine again. Figure 3.2 shows preferred times for video visits to be 9AM-3PM, and on weekdays more than weekends generally. 3.3.2 Secondary Analyses To understand whether there was heterogeneity in survey responses, we ran two-sample t-tests on the 5-point Likert Scale questions by demographics (above median income ($75,000 or above), some college education, gender, elderly (above 65 years old), patient status with Family Medicine, new user of telehealth). We found significant variation in responses for the following areas, as summarized in Table 3.3 1. Respondents of higher household income (p¡0.001) and new users of telehealth (p¡0.001) were more willing to be seen by other Family Medicine providers (not their regularly assigned provider for a returning patient, or not the assigned provider for the visit of a new patient). 2. Respondents who were new users of telehealth were more willing to use video visits again in the future (p=0.0303). 3. Respondents with some college education felt less confident about their privacy being protected (p=0.0415) while individuals of higher income and elderly individuals felt more confident about their privacy being protected (p=0.0139, p=0.0151). 4. Elderly respondents felt more satisfied about their video visits (p=0.0123) but also felt less strongly about telehealth taking less time to coordinate and take part in an office visit (p¡0.001). 3.3.3 Copayment 29% (235/797) of respondents indicate that a copayment was unreasonable. Of these responses, 25% (59/235) were EPO (Exclusive Provider Organization) carriers, 47% (111/235) were PPO (Preferred Provider 93 Figure 3.2: Times and days of the week in which patients were willing to use video visits (multiple selections allowed) 94 Table 3.3: Secondary Analyses (Means of Responses Provided by Demographic Group) How willing would you be using the TeleCARE platform to be seen by other physicians at USC Family Medicine? (1 = Completely willing,. . . , 5 = Not at all willing) Income <$75,000 (n = 459) $75,000 (n = 274) p value 1.861 1.588 0.0002 Telehealth Use Existing User (n=365) New User (n=427) p value 1.825 1.595 0.0008 In the future, how willing are you to use a TeleCARE video visit again? (1 = Completely willing,. . . , 5 = Not at all willing) Telehealth Use Existing User (n=365) New User (n=427) p value 1.485 1.365 0.0303 How confident are you that this TeleCARE video visit has protected your privacy as a patient? (1 = Not at all confident,. . . , 5 = Completely confident) Education No college education (n=92) Some college education (n=705) p value 4.554 4.356 0.0415 Income <$75,000 (n = 459) $75,000 (n = 274) p value 4.296 4.458 0.0139 Age <65 (n=626) 65+ (n=162) p value 4.343 4.531 0.0151 Overall, how satisfied are you with your telehealth visit? (1 = Very satisfied,. . . , 5 = Very dissatisfied) Age <65 (n=626) 65+ (n=162) p value 1.597 1.389 0.0123 Compared to an office visit, how long did it take for you to coordinate and participate in your TeleCARE video visit? (1 = Much shorter than an office visit,. . . , 5 = Much longer than an office visit) Age <65 (n=626) 65+ (n=162) p value 1.823 2.204 0.0000 95 Organization) carriers, 23% (54/235) Medicare/ Medi-Cal beneficiaries, and 5% (11/235) had other insur- ance or had no insurance. Using two-sample t-tests, we find that individuals with some college education (73% average with vs 51% average without, p=0.0000), above median household income (78% average if above versus 60% if below, p=0.0000), and existing Family Medicine patients (73% average if existing, 66% average if new, p = 0.0317) more likely to find a copayment reasonable. Willingness to see other Family Medicine physicians (rho = -0.2575) and satisfaction with video visits (rho = -0.2077) had the strongest correlations with willingness to submit a copayment. A lack of a physical exam (34%, 79/235) was the most common explanation for the sentiment. Some patients felt strongly about this: “They can’t really check your vitals. That’s mainly the point of visiting a doctor.” “I can’t have my doctor check for concerns that need to be assessed through looking or feeling” “touch and smell are not involved.” A perceived sense of lessened care was noted in two primary aspects: 1) respondents felt the health system/provider saved resources/overhead costs by not needing to visit a clinic space or that the provider conducted the visit from their own home (12%, 28/235), and 2) the reason for a video visit was straightforward (i.e. a prescription refill, bloodwork) or took less time as compared to an in-person visit (25%, 59/235). Relevant financial concerns (14%, 34/235) consisted of 1) those respondents not needing to submit a copayment (9%, 20/235), 2) preference to see a reduced copay versus in-person (7%, 17/235), or 3) combination of other reasons. Technical concerns and COVID-19 were given as justification in thirteen and eight responses respectively. 3.3.4 Other Feedback Additional free text commentary of video visit experiences is described here. Some respondents were espe- cially satisfied with their experience (“It was amazing and long over-due”, “I have more healthcare appoint- ments than most people and have been DELIGHTED by telecare system. So much less commute time!”) and expressed continued use of the service post-pandemic (“Please continue telehealth even after covid”). While some commented on the strengths of the modality (“I do have anxiety about seeing the doctor, so this was good for my comfort level”), while other respondents described difficulties of interacting through video (“hard to explain myself through a computer”, “Also, I’ve back problems, and sitting waiting an hour for the visit to begin can aggravate it”). Seven patients were resistant towards telehealth (“During COVID 19 I can understand its need, but when conditions return to normal, I am not interested in this process”). 96 3.4 Discussion Conducted in the family medicine setting of an academic medical institution, where cost for services is rendered, our study finds that survey respondents perceive telemedicine services favorably, with 71% of respondents perceiving a copayment to be reasonable. However, 29% of respondents felt that a copayment was unreasonable, listing a lack of a physical exam and personal financial concerns as reasons for this sen- timent. Despite these observations, we find that respondents value telemedicine visits as equal to or greater than traditional office visits and describe positive experiences with efficiency, convenience, decreased wait time to see a provider and time in coordinating and participating in care. This illustrates that telemedicine is useful in increasing access to primary care and suggests that video visits is an acceptable medium for care as an overwhelming number of respondents (91%) were fairly or completely willing to utilize telemedicine video visits again in the future. Our findings contribute to the nascent literature on primary care telemedicine practices (Lawrence et al. 2020, Martinez et al. 2018, Polinksi et al. 2016, Powell et al. 2017, Thiyagarajan et al. 2020), telemedicine video visits (Donelan et al. 2019, Samples et al. 2021, Thelen-Perry et al. 2018, Welch et al. 2017), and patients’ copayment perceptions (Baum et al. 2016, Kiil and Houlberg 2014, Scott et al. 2009). To the best of our knowledge, this study is one of the first to explore patient perceptions of telemedicine services in primary care when patients may need to pay for services, unlike previous studies where visits were free of charge (Donelan et al. 2019). This study took place during the COVID-19 pandemic, a time when patients were offered telemedicine as an alternative to in-person visits. Under these circumstances, the study collected responses from a multitude of providers’ patients. This differs from other studies where respondents were either self-selected or selected by their provider to participate in a telemedicine visit (Donelan et al. 2019, Powell et al. 2018). Given this population sample, we believe our study to be more externally valid. One limitation of the study is that we received a relatively low response rate. Incidentally, the low re- sponse rate of the study may have introduced bias, as possibly noted by the overrepresentation of specific demographic groups (e.g., female, higher educational/income backgrounds). Also, our study was partially conducted during the shelter at home orders put in place by the state of California, which likely affected re- sponses on where patients were conducting their visits from. Additionally, this might have caused providers to be considered as front-line workers, possibly leading to higher satisfaction rate in responses. Respondents 97 were neither queried on the amount or existence of a copayment nor other cost related to care. Conversely, about half of respondents were first time users of telehealth, which is a significant increase from prior stud- ies (Welch et al. 2017). Similarly, the percentage of respondents who expressed high willingness to use telemedicine in the future may be upward biased as the study was conducted during a period when patients may have especially feared having in-person visits. Lastly, as our study focuses on those who were able to conduct a telemedicine video visit, our study does not capture the perceptions of individuals who do not have access to telemedicine video visit, such as those without internet access, those who do not have access to a computer or smartphone, the very elderly who may be unable to operate technologies by themselves, amongst other groups. Given these limitations, we suggest that generalizing these results should be done with careful consideration, although our results may be applicable to other academic medical centers as well. The findings of this study suggest that telemedicine video visits continue to be a promising modality for accessing primary care. Moreover, the findings suggest that payors should consider copayment in detail when designing telehealth benefits to ensure such copayments do not become a barrier in seeking care. That is, payors may be able to affect telemedicine use by setting the copayment rates and reimbursement rates of for telemedicine visits accordingly. For example, payors could discourage telemedicine use by setting very high copayments for patients. They also underline the need to establish best practices for patients and providers which could determine the use of telehealth in the future. Our study suggests that further inquiry is needed to determine the significance of the internet as a household utility and technology as determinants of healthcare access. Additional studies may consider the factors that contribute to telemedicine use to ensure that telemedicine does not widen the health disparities in our communities. 98 Chapter 4 Future Directions This dissertation presents several empirical analyses, performed in various healthcare and charitable settings, which study how we can improve human interactions with humans, algorithms, and technologies for better outcomes. They highlight the importance of leveraging data analytics to better understand human behavior, as such efforts inform process and system improvements which can enhance outcomes. Going forward, I hope to 1) develop human-centric solutions for organizations and individuals to better interface with one another and 2) enhance the role of algorithms and technologies on operations. In this chapter, I describe some future directions for research, of which some are inspired by the preceding chapters. 4.1 Future Directions for Chapter 1 Chapter 1 presents a detailed empirical analysis on how patients utilize algorithms for diabetes care. Since Aleppo et al. (2017)’s field experiment, the field continues to advance its interest in algorithms and data- driven decisions, with closed-loop algorithms (Brown et al. 2019) gaining FDA approval, efforts to quantify and visualize patient data more informatively, and remote patient monitoring emerging as a channel for chronic care management. Closed-loop algorithms represent an evolution in the role of algorithms in care, raising the algorithm’s role from decision support tool to a continuous monitor and decision maker. That is, closed-loop algorithms for insulin dosing perform micro-adjustments over time for basal insulin (insulin for baseline needs). Such adjustments could even substitute a bolus insulin dose (done for meal or high glucose levels). Some of these algorithms can even automate the bolus insulin dose itself, potentially removing any human input into the insulin delivery process. Data from field experiments evaluating the efficacy of closed-loop systems, e.g., Brown et al. (2019), can allow us to explore additional drivers of algorithm use. Specifically, we can 99 study how patient perceptions on technology, certain alarms/alerts, and patient discretion, such as setting changes, impact closed-loop algorithm use. This research effort would also add to the emerging body of field evidence in understanding drivers of algorithm use, as done in Chapter 1, by 1) exploring additional drivers of algorithm use and 2) considering drivers where the algorithm has a larger, continuous role in the decision-maker’s life. Diabetes is a highly quantifiable disease, with platforms such as Tidepool (Snider 2018) providing uni- fication of various sources of diabetes device data. The micro data raises the question of what are the best ways for both clinicians and patients to interpret the data and take action from it. Existing efforts such as the Ambulatory Glucose Profile (AGP) (Johnson et al. 2019) and weekly reporting (e.g., Dexcom Clarity) as well as new metrics like the Glycemic Risk Index (Klonoff et al. 2022) show promise for impacting practice. Anecdotal evidence suggests that there is more room for improvement: while patients primarily rely on providers to provide additional perspective and rarely review their own aggregated data, providers need training to review the data and/or may not take much time to interpret the data. To move forward in this regard, we can consider answering some of the following questions. Would presenting average pat- terns of performance be most meaningful, or should outlier situations also be noted? Would highlighting certain types of performance feedback (e.g., positive versus negative) lead to more improvements taken by patients/clinicians? In what order should these pieces of performance data be highlighted to motivate further improvements? Field experiments can allow us to test the effectiveness of various approaches, informed by patient and provider preferences, insights from user experience/interface design, and mechanisms to build health and data literacy. The quantified nature of diabetes care raises another opportunity for change: changing how care is deliv- ered. COVID-19 helped transform chronic care management towards telehealth over office visits. Moving beyond this shift, remote patient monitoring provides an emerging channel to reduce physical resources for healthcare providers and improve convenience for patients, with many unanswered questions. How should physicians and providers allocate their time to focus on remote patient monitoring? Specifically, what frequencies of check-ins necessarily improve patients’ health outcomes, e.g. time in range? Could such check-ins, coupled with secure messages, substitute for routine check-ins that might otherwise happen via face-to-face or telehealth visits? What might be the corresponding operational impacts (e.g., productivity, financial implications) of these changes? Thinking beyond these operational questions, there are also policy and payment oriented questions. Informed by these operational insights, payors should consider what types 100 of benefit designs would be the most appropriate to induce these improved health outcomes. An ideal design should help ensure that remote patient monitoring does not become overused for the sole sake of revenue (e.g., a provider checks in multiple times in a day and bills for each interaction with the patient record) but still induce utilization of this modality. Understanding the answers to these questions will help transform and potentially simplify healthcare operations and improve healthcare outcomes. 4.2 Future Directions for Chapter 2 Chapter 2 demonstrates the potential of matching schemes to improve outcomes via understanding the im- pact of charitable workers in influencing blood donors’ donation decisions. We can further go in this direc- tion by identifying the potential benefits in practice by conducting a field experiment which can quantify the changes to outcomes via different matching schemes (versus the quasi-random approach studied in the study period of Chapter 2). As alluded in Chapter 2, mixed methods research, such as interviews with donors and nurses and additional survey instruments, can help further unpack mechanisms within the interaction itself–for example, denoting certain pieces of information–that may impact the blood donation decision. One challenge that remains unsolved via our initial analyses is how to motivate future donations via non-cost approaches. For example, commitment mechanisms such as planning prompts have been shown to increase vaccination rates (Milkman et al. 2011) and preventative screening rates (Milkman et al. 2013). Such mechanisms can be re-examined to see whether they could benefit charitable organizations. For exam- ple, we can explore whether blood banks could further a habit towards blood donation by testing whether nudging patients to think about their next donation and schedule a future donation appointment time at the end of their current donation experience may induce future donations to occur. Blood banks face challenges on fulfilling demand for different product types, such as plasma, platelets, whole blood, and red blood cells, which may further be complicated by blood type (A, B, AB, O). How can blood banks organize potential supply to match demand? Could scheduling systems, which are increasingly being utilized by blood banks, play a more significant role? Beyond asking patients to schedule blood donations in general, we should also consider whether specificity in this scheduling request may benefit. Specifically, we should study whether prompting donors to sign up for certain product types when they are more in need may be more valuable; such targeting could be informed by the development of predictive 101 models. This can help us understand the value of predictive models in such environments, whereby heuristics have more commonly been used. 4.3 Future Directions for Chapter 3 Chapter 3 suggests that the benefits structure for patients and the reimbursement structure for providers can influence telehealth usage. Given recent regulations and coverage shifts brought forth by the COVID- 19 pandemic, greater understanding of the mechanisms which drive telehealth usage and corresponding healthcare outcomes in this modality can shed light on how to better navigate subsequent changes. Benefit designs incentivize certain patients and levels of care to be taken over others. Given this, greater understanding of the demand for telehealth can help us identify which subpopulations may need more atten- tion towards telehealth seeking behaviors. Survey data which inquire about patient’s telehealth use during the pandemic and whether they would be willing to use telehealth services post-pandemic can provide some further insights. Furthermore, claims data can allow us to track the changes in telehealth use across special- ties and appointment/procedure types. Telehealth payment parity laws, which aim to put reimbursement for telehealth services at equal footing with office visits, are also coming into effect across numerous states. Using claims data from multiple states who have implemented such payment parity policies, we can understand the implications of these payment parity laws on operations and health outcomes via a staggered differences-in-differences design. Operationally, such laws should induce significant changes in operations for health organizations, and indeed organizations have started building up their service lines in response to such laws. Consequently, such laws should have meaningful impacts towards the proportion of usage of various healthcare modalities (e.g., face-to-face visits versus video visits and telephone visits). The increased access to telehealth from such laws may help level disparities for patient populations– hence, we should also explore whether these laws influence how various population types use healthcare services. Moreover, physician behavior should also be significantly impacted by these organization level changes. We should consider how these laws may impact granular operational metrics such as physician RVUs, value added time with patients, and patient panel sizes. All of these changes would ideally also benefit patients’ health outcomes. Specifically, we should consider how more severe needs such as hospitalizations/readmissions and emergency department visits change with the implementation of such laws, but also explore how other health metrics, such as patients’ utilization 102 of various healthcare services (number of visits, visits across different specialties, duration between visits), prescription needs, and physiological metrics (e.g., cholesterol, A1c) change as a result of such laws. Beyond policy level and demand driven questions, on the provider side, as healthcare institutions move towards the model of providing care when, where, and how patients want it, they must also address the questions of 1) how clinicians should specialize and cycle between in-person and virtual visits to improve their productivity and patients’ health outcomes and 2) which types of visits are more effectively done on telehealth. Answering these questions can help organizations more intelligently build their telehealth practices and develop improved workflows for care teams to interact with patients. 4.4 Broader Research Directions The preceding sections highlight potential directions of interest inspired by the dissertation chapters. Here, I summarize those directions into broader research questions and highlight other research questions of interest. 1. What are the unintended consequences of algorithms when applied to decision-making? For instance, may an algorithm require significant changes over time to work for the patient or not be compati- ble? How can we ensure that when algorithms make unsuitable judgments, humans can override the algorithm, and (temporarily) return to decision-making without the algorithm? 2. How can organizations and individuals best make sense of technology and data to improve their per- formance? How can we design health information systems (e.g., electronic health records, digital plat- forms for connected health devices) with behavioral insights (e.g., reminders) to encourage healthy behaviors (e.g., medication adherence, productive patterns for completing work tasks)? 3. How can we address disparities in how people are treated in service operations? How do we ensure similar service levels and experiences in access and treatment in parts of an organization’s opera- tions? With increased digitization efforts, how do we ensure that the technology benefits not only the technology-savvy, but also those who may need more training? 4. How can we use trace data to a) identify operational inefficiencies and biases in human decision- making and b) motivate interventions for improvement? For instance, audit log data from electronic health records can help us identify how operational structures (e.g., schedule templates, workload 103 sharing (nurses and physicians working on a single patient)) impact healthcare outcomes and how physician work patterns impact their decision-making. 5. How can we design and operationalize reimbursement policies to improve operational and health outcomes? This question is not only applicable for the modality of services used (e.g., remote patient monitoring and telehealth) as discussed previously but also usage of certain types of services (e.g., mental health, nutrition services). Relatedly, how can we better match individuals to health insurance plans that best fit their healthcare needs while reducing costs? 104 Bibliography (2012) Blood donor selection: guidelines on assessing donor suitability for blood donation ISSN 10523359. (2017) Creating a Framework to Support Measure Development for Telehealth. URL http://www. qualityforum.org/Publications/2017/08/Creating_a_Framework_to_Support_ Measure_Development_for_Telehealth.aspx. (2020) H.R. 748: Coronavirus Aid, Relief, and Economic Security Act. URL https://www.govtrack.us/ congress/bills/116/hr748/text. Adhvaryu A, Bassi V , Nyshadham A, Tamayo J (2020) No Line Left Behind: Assortative Matching Inside the Firm. URLhttp://dx.doi.org/10.3386/w27006. Adjerid I, Ayvaci M, ¨ Ozer ¨ O (2019) Saving Lives With Algorithm-Enabled Process Innovation for Sepsis Care.SSRN ElectronicJournal ISSN 1556-5068, URLhttp://dx.doi.org/10.2139/ssrn.3456870. Aleppo G, Ruedy KJ, Riddlesworth TD, et al. (2017) REPLACE-BG: A randomized trial comparing continuous glu- cose monitoring with and without routine blood glucose monitoring in adults with well-controlled type 1 dia- betes.DiabetesCare 40(4), ISSN 19355548, URLhttp://dx.doi.org/10.2337/dc16-2482. Andreoni J (1989) Giving with impure altruism: Applications to charity and ricardian equivalence.JournalofPolitical Economy 97(6), ISSN 0022-3808, URLhttp://dx.doi.org/10.1086/261662. Andreoni J (1990) Impure altruism and donations to public goods: A theory of warm-glow giving. The Economic Journal 100(401), ISSN 00130133, URLhttp://dx.doi.org/10.2307/2234133. Andreoni J, Payne AA (2011) Is crowding out due entirely to fundraising? evidence from a panel of charities. Jour- nal of Public Economics 95(5–6):334–343, ISSN 00472727, URL http://dx.doi.org/10.1016/j. jpubeco.2010.11.011. Aprahamian H, Bish DR, Bish EK (2019) Optimal risk-based group testing. Management Science 65(9):4365–4384, ISSN 15265501, URLhttp://dx.doi.org/10.1287/mnsc.2018.3138. Argote L, Epple D (1990) Learning curves in manufacturing. Science 247(4945):920–924, ISSN 00368075, URL http://dx.doi.org/10.1126/science.247.4945.920. Armony M, Roels G, Song H (2021) Pooling queues with strategic servers: The effects of customer ownership. OperationsResearch 69(1), ISSN 15265463, URLhttp://dx.doi.org/10.1287/OPRE.2020.2004. Artandi M, Thomas S, Shah NR, Srinivasan M (2020) Rapid system transformation to more than 75% primary care video visits within three weeks at stanford: Response to public safety crisis during a pandemic. New England JournalofMedicine URLhttp://dx.doi.org/10.1056/CAT.20.0100. Association AD, et al. (2021) 7. diabetes technology: Standards of medical care in diabetes—2021. Diabetes Care 44(Supplement 1):S85–S99. Ayer T, Zhang C, Zeng C, White CC, Roshan Joseph V (2019) Analysis and improvement of blood collection op- erations. Manufacturing and Service Operations Management 21(1):29–46, ISSN 15265498, URL http: //dx.doi.org/10.1287/msom.2017.0693. Ayvaci M, Mobini Z, ¨ Ozer ¨ O (2021) To catch a killer: A data-driven personalized and compliance-aware sepsis alert system.AvailableatSSRN3805931 . Bai B, Dai H, Zhang D, Zhang F, Hu H (2020) The Impacts of Algorithmic Work Assignment on Fairness Perceptions and Productivity: Evidence from Field Experiments. SSRN Electronic Journal ISSN 1556-5068, URLhttp: //dx.doi.org/10.2139/ssrn.3550887. 105 Ball GP, Siemsen E, Shah R (2017) Do plant inspections predict future quality? the role of investigator experience. Manufacturing&ServiceOperationsManagement 19(4):534–550, ISSN 1526-5498, URLhttp://dx.doi. org/10.1287/msom.2017.0661. Bandura A (????) Self-efficacy: Toward a unifying theory of behavioral change. Psychological Review 84(2), ISSN 0033295X, URLhttp://dx.doi.org/10.1037/0033-295X.84.2.191. Bani M, Giussani B (2010) Gender differences in giving blood: A review of the literature. Blood Transfusion 8(4), ISSN 17232007, URLhttp://dx.doi.org/10.2450/2010.0156-09. Bard JF, Purnomo HW (2005) Short-term nurse scheduling in response to daily fluctuations in supply and demand. HealthCareManagementScience 8(4):315–324, ISSN 13869620, URLhttp://dx.doi.org/10.1007/ s10729-005-4141-9. Barnett ML, Ray KN, Souza J, Mehrotra A (2018) Trends in telemedicine use in a large commercially insured population, 2005-2017. JAMA - Journal of the American Medical Association ISSN 15383598, URL http: //dx.doi.org/10.1001/jama.2018.12354. Battelino T, Danne T, Bergenstal RM, Amiel SA, Beck R, Biester T, Bosi E, Buckingham BA, Cefalu WT, Close KL, Cobelli C, Dassau E, Hans DeVries J, Donaghue KC, Dovc K, Doyle FJ, Garg S, Grunberger G, Heller S, Heinemann L, Hirsch IB, Hovorka R, Jia W, Kordonouri O, Kovatchev B, Kowalski A, Laffel L, Levine B, Mayorov A, Mathieu C, Murphy HR, Nimri R, Nørgaard K, Parkin CG, Renard E, Rodbard D, Saboo B, Schatz D, Stoner K, Urakami T, Weinzimer SA, Phillip M (2019) Clinical targets for continuous glucose monitoring data interpretation: Recommendations from the international consensus on time in range. Diabetes Care 42(8), ISSN 19355548, URLhttp://dx.doi.org/10.2337/dci19-0028. Baum Z, Simmons MR, Guardiola JH, Smith C, Carrasco L, Ha J, Richman P (2016) Potential impact of co-payment at point of care to influence emergency department utilization. PeerJ ISSN 21678359, URLhttp://dx.doi. org/10.7717/peerj.1544. Berenguer G, Shen ZJM (2019) Challenges and strategies in managing nonprofit operations: An operations manage- ment perspective. Manufacturing & Service Operations Management 22(5):888–905, ISSN 1523-4614, URL http://dx.doi.org/10.1287/msom.2018.0758. Berry Jaeker J, Tucker AL (2017) Past the point of speeding up: The negative effects of workload saturation on efficiency and patient severity. Management Science 63(4):1042–1062, ISSN 15265501, URL http://dx. doi.org/10.1287/mnsc.2015.2387. Bertsimas D, Kallus N, Weinstein AM, Zhuo YD (2017) Personalized diabetes management using electronic medical records 40(2), ISSN 19355548, URLhttp://dx.doi.org/10.2337/dc16-0826. Boh WF, Slaughter SA, Espinosa JA (2007) Learning from experience in software development: A multilevel analysis. Management Science 53(8):1315–1331, ISSN 00251909, URL http://dx.doi.org/10.1287/mnsc. 1060.0687. Bretthauer KM, Savin S (2018) Introduction to the Special Issue on Patient-Centric Healthcare Management in the Age of Analytics. URLhttp://dx.doi.org/10.1111/poms.12976. Brown SA, Kovatchev BP, Raghinaru D, Lum JW, Buckingham BA, Kudva YC, Laffel LM, Levy CJ, Pinsker JE, Wadwa RP, et al. (2019) Six-month randomized, multicenter trial of closed-loop control in type 1 diabetes.New EnglandJournalofMedicine 381(18):1707–1717. Bundorf K, Polyakova M, Tai-Seale M (2019) How do humans interact with algorithms? experimental evidence from health insurance. Technical report, National Bureau of Economic Research. Burton JW, Stein MK, Jensen TB (2020) A systematic review of algorithm aversion in augmented decision making. Journal of Behavioral Decision Making 33(2), ISSN 10990771, URL http://dx.doi.org/10.1002/ bdm.2155. Burton-Jones A, Hubona GS (2005) Individual Differences and Usage Behavior: Revisiting a Technology Acceptance Model Assumption. Data Base for Advances in Information Systems 36(2), ISSN 00950033, URL http:// dx.doi.org/10.1145/1066149.1066155. B´ enabou R, Tirole J (2006) Incentives and prosocial behavior. American Economic Review 96(5):1652–1678, ISSN 00028282, URLhttp://dx.doi.org/10.1257/aer.96.5.1652. Caro F, de Tejada Cuenca AS (2018) Believing in analytics: Managers adherence to price recommendations from a dss . 106 Christin A (2017) Algorithms in practice: Comparing web journalism and criminal justice.BigDataandSociety 4(2), ISSN 20539517, URLhttp://dx.doi.org/10.1177/2053951717718855. Chugunova M, Sele D (2020) We and it: An interdisciplinary review of the experimental evidence on human-machine interaction.MaxPlanckInstituteforInnovation&CompetitionResearchPaper (20-15). Clemen RT (1989) Combining forecasts: A review and annotated bibliography. International journal of forecasting 5(4):559–583. Cohen MA (1976) Analysis of single critical number ordering policies for perishable inventories.OperationsResearch 24(4):726–741, URLhttp://dx.doi.org/10.1287/opre.24.4.726. Cohen MA, Pierskalla WP (1979) Target inventory levels for a hospital blood bank or a decentralized regional blood banking system. Transfusion 19(4):444–454, ISSN 00411132, URL http://dx.doi.org/10.1046/j. 1537-2995.1979.19479250182.x. Croson R, Gneezy U (2009) Gender differences in preferences.JournalofEconomicLiterature 47(2), ISSN 00220515, URLhttp://dx.doi.org/10.1257/jel.47.2.448. Cui H, Rajagopalan S, Ward AR (2020) Impact of task-level worker specialization, workload, and product personal- ization on consumer returns. Manufacturing & Service Operations Management 1–21, ISSN 1523-4614, URL http://dx.doi.org/10.1287/msom.2019.0836. Dai T, Tayur S (2020) OM Forum—Healthcare Operations Management: A Snapshot of Emerging Research. Man- ufacturing & Service Operations Management 22(5), ISSN 1523-4614, URL http://dx.doi.org/10. 1287/msom.2019.0778. de V´ ericourt F, Perakis G (2020) Frontiers in service science: The management of data analytics services: New challenges and future directions.ServiceScience 12(4):121–129. Deo S, Jain A (2019) Slow first, fast later: Temporal speed-up in service episodes of finite duration. Production and operationsmanagement 28(5):1061–1081. Dietvorst BJ, Simmons JP, Massey C (2015) Algorithm aversion: People erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General 144(1), ISSN 00963445, URL http://dx.doi. org/10.1037/xge0000033. Dietvorst BJ, Simmons JP, Massey C (2018) Overcoming algorithm aversion: People will use imperfect algorithms if they can (even slightly) modify them. Management Science 64(3), ISSN 15265501, URLhttp://dx.doi. org/10.1287/mnsc.2016.2643. DiMeglio LA, Evans-Molina C, Oram RA (2018) Type 1 diabetes.TheLancet 391(10138):2449–2462. Donaghy E, Atherton H, Hammersley V , McNeilly H, Bikker A, Robbins L, Campbell J, McKinstry B (2019) Ac- ceptability, benefits, and challenges of video consulting: A qualitative study in primary care. British Journal of GeneralPractice ISSN 14785242, URLhttp://dx.doi.org/10.3399/bjgp19X704141. Donelan K, Barreto EA, Sossong S, Michael C, Estrada JJ, Cohen AB, Wozniak J, Schwamm LH (2019) Patient and clinician experiences with telehealth for patient follow-up care. American Journal of Managed Care ISSN 10880224. Emadi S, Kesavan S (2019) Can “ very noisy ” information go a long way ? an exploratory analysis of personalized scheduling in service systems . Festinger L (1957)Atheoryofcognitivedissonance, volume 2 (Stanford university press). Fildes R, Goodwin P (2007) Against your better judgment? How organizations can improve their use of management judgment in forecasting.Interfaces 37(6), ISSN 00922102, URLhttp://dx.doi.org/10.1287/inte. 1070.0309. Fildes R, Goodwin P, Lawrence M, Nikolopoulos K (2009) Effective forecasting and judgmental adjustments: an em- pirical evaluation and strategies for improvement in supply-chain planning. International journal of forecasting 25(1):3–23. Filiz I, Judek JR, Lorenz M, Spiwoks M (2021) Reducing algorithm aversion through experience. Journal of Behav- ioralandExperimentalFinance 100524. Fishbein M, Ajzen I (2011) Predicting and changing behavior: The reasoned action approach. URL https://books.google.com/books?hl=en&lr=&id=2rKXqb2ktPAC&oi=fnd&pg=PR2& 107 dq=Predicting+and+changing+behavior:+The+reasoned+action+approach.+&ots= zbcoRIowpo&sig=XMEtzaOnxcCSvcUz9pFsGBwm3DI. France JL, France CR, Himawan LK (2008) Re-donation intentions among experienced blood donors: Does gender make a difference? TransfusionandApheresisScience 38(2), ISSN 14730502, URLhttp://dx.doi.org/ 10.1016/j.transci.2008.01.001. Freeman M, Savva N, Scholtes S (2017) Gatekeepers at work: An empirical analysis of a maternity unit. Manage- mentScience 63(10):3147–3167, ISSN 1526-5501, URLhttp://dx.doi.org/10.1287/mnsc.2016. 2512. Frey BS, Meier S (2004) Social comparisons and pro-social behavior: Testing conditional cooperation in a field experiment. American Economic Review 94(5):1717–1722, ISSN 00028282, URL http://dx.doi.org/ 10.1257/0002828043052187. Frick W (2015) Here’s why people trust human judgment over algorithms.HarvardBusinessReview . Garnett O, Mandelbaum A (2000) An Introduction to Skills-Based Routing and its Operational Complexities. URL https://ie.technion.ac.il/ ˜ serveng/course2012spring/Lectures/SBR.pdf. Glikson E, Woolley AW (2020) Human trust in artificial intelligence: Review of empirical research. Academy of ManagementAnnals 14(2):627–660. Gneezy U, List JA (2006) Putting behavioral economics to work: Testing for gift exchange in labor markets using field experiments.Econometrica 74(5):1365–1384, ISSN 00129682, URLhttp://dx.doi.org/10.1111/j. 1468-0262.2006.00707.x. Gneezy U, Rustichini A (2000) Pay enough or don’t pay at all*. Quarterly Journal of Economics 115(3):791–810, ISSN 0033-5533, URLhttp://dx.doi.org/10.1162/003355300554917. Gomez T, Anaya YB, Shih KJ, Tarn DM (2021) A qualitative study of primary care physicians’ experiences with telemedicine during covid-19. Journal of the American Board of Family Medicine 34, ISSN 15587118, URL http://dx.doi.org/10.3122/JABFM.2021.S1.200517. Gonzalez C, Pic´ on MJ, Tom´ e M, Pujol I, Fern´ andez-Garc´ ıa JC, Chico A (2016) Expert study: Utility of an auto- mated bolus advisor system in patients with type 1 diabetes treated with multiple daily injections of insulin - A crossover study. Diabetes Technology and Therapeutics 18(5), ISSN 15578593, URL http://dx.doi. org/10.1089/dia.2015.0383. Green LV , Savin S, Savva N (2013) “nursevendor problem”: Personnel staffing in the presence of endogenous absen- teeism.ManagementScience 59(10):2237–2256, ISSN 00251909, URLhttp://dx.doi.org/10.1287/ mnsc.2013.1713. Gross TM, Kayne D, King A, Rother C, Juth S (2003) A bolus calculator is an effective means of controlling postpran- dial glycemia in patients on insulin pump therapy.DiabetesTechnologyandTherapeutics 5(3), ISSN 15209156, URLhttp://dx.doi.org/10.1089/152091503765691848. Grove WM, Zald DH, Lebow BS, Snitz BE, Nelson C (2000) Clinical versus mechanical prediction: a meta-analysis. Psychologicalassessment 12(1):19. Guajardo JA, Cohen MA (2018) Service differentiation and operating segments: A framework and an application to after-sales services. Manufacturing & Service Operations Management 20(3):440–454, ISSN 1526-5498, URL http://dx.doi.org/10.1287/msom.2017.0645. Guo N, Wang J, Ness P, Yao F, Dong X, Bi X, Mei H, Li J, He W, Lu Y , Ma H, et al (2011) Analysis of chinese donors’ return behavior. Transfusion 51(3), ISSN 00411132, URL http://dx.doi.org/10.1111/j. 1537-2995.2010.02876.x. Gurvich I, Whitt W (2010) Service-level differentiation in many-server service systems via queue-ratio routing.Oper- ations Research 58(2):316–328, ISSN 1526-5463, URL http://dx.doi.org/10.1287/opre.1090. 0736. Han L, Fine J, Robinson SM, Boyle AA, Freeman M, Scholtes S (2019) Is seniority of emergency physician as- sociated with the weekend mortality effect? an exploratory analysis of electronic health records in the uk. Emergency Medicine Journal 36(12):708–715, ISSN 14720213, URL http://dx.doi.org/10.1136/ emermed-2018-208114. Harrell E (2016) Managers shouldn’t fear algorithm-based decision making.HarvardBusinessReview . 108 Heinemann L (2018) Continuous glucose monitoring (cgm) or blood glucose monitoring (bgm): interactions and implications.Journalofdiabetesscienceandtechnology 12(4):873–879. Hertwig R, Barron G, Weber EU, Erev I (2006) The role of information sampling in risky choice.Informationsampling andadaptivecognition 72–91. Hu W, Meng H, Hu Q, Feng L, Qu X (2019) Blood donation from 2006 to 2015 in zhejiang province, china: An- nual consecutive cross-sectional studies.BMJOpen 9(5), ISSN 20446055, URLhttp://dx.doi.org/10. 1136/bmjopen-2018-023514. Ibanez MR, Clark JR, Huckman RS, Staats BR (2018) Discretionary task ordering: Queue management in radiological services.ManagementScience 64(9):4389–4407. Ibanez MR, Toffel MW (2020) How Scheduling Can Bias Quality Assessment: Evidence from Food-Safety Inspec- tions. Management Science 66(6):2396–2416, ISSN 0025-1909, URL http://dx.doi.org/10.1287/ mnsc.2019.3318. Ibrahim R, Kim SH, Tong J (2021) Eliciting human judgment for prediction algorithms. Management Science 67(4):2314–2325. Inman RR, Blumenfeld D, Ko A (2005) Cross-training hospital nurses to reduce staffing costs. Health Care Manage- mentReview 30(2):116–125, URLhttp://dx.doi.org/10.1097/00004010-200504000-00006. Izquierdo RE, Knudson PE, Meyer S, Kearns J, Ploutz-Snyder R, Weinstock RS (2003) A comparison of diabetes education administered through telemedicine versus in person. Diabetes Care ISSN 01495992, URL http: //dx.doi.org/10.2337/diacare.26.4.1002. Johnson ML, Martens TW, Criego AB, Carlson AL, Simonson GD, Bergenstal RM (2019) Utilizing the ambulatory glucose profile to standardize and implement continuous glucose monitoring in clinical practice.Diabetestech- nology&therapeutics 21(S2):S2–17. Juslin P, Winman A, Hansson P (2007) The na¨ ıve intuitive statistician: A na¨ ıve sampling model of intuitive confidence intervals.Psychologicalreview 114(3):678. Kamalahmadi M, Yu Q, Zhou YP (2021) Call to duty: Just-in-time scheduling in a restaurant chain. Management Science 67(11):6751–6781. Kamalzadeh H, Ahuja V , Hahsler M, Bowen ME (2021) An Analytics-Driven Approach for Optimal Individualized Diabetes Screening.ProductionandOperationsManagement ISSN 19375956, URLhttp://dx.doi.org/ 10.1111/poms.13422. Karlinsky-Shichor Y , Netzer O (2019) Automating the b2b salesperson pricing decisions: A human-machine hybrid approach.ColumbiaBusinessSchoolResearchPaperForthcoming . KC DS, Staats BR (2012) Accumulating a portfolio of experience: The effect of focal and related experience on surgeon performance. Manufacturing and Service Operations Management 14(4):618–633, ISSN 15234614, URLhttp://dx.doi.org/10.1287/msom.1120.0385. KC DS, Terwiesch C (2009) Impact of workload on service time and patient safety: An econometric analysis of hospital operations. Management Science 55(9):1486–1498, ISSN 00251909, URLhttp://dx.doi.org/ 10.1287/mnsc.1090.1037. Kesavan S, Kushwaha T (2020) Field experiment on the profit implications of merchants’ discretionary power to override data-driven decision-making tools.ManagementScience 66(11):5182–5190. Kesavan S, Staats BR, Gilland W (2014) V olume flexibility in services: The costs and benefits of flexible labor re- sources. Management Science 60(8):1884–1906, ISSN 15265501, URLhttp://dx.doi.org/10.1287/ mnsc.2013.1844. Kiil A, Houlberg K (2014) How does copayment for health care services affect demand, health and redistribution? a systematic review of the empirical evidence from 1990 to 2011. European Journal of Health Economics ISSN 16187601, URLhttp://dx.doi.org/10.1007/s10198-013-0526-8. Klonoff DC, Wang J, Rodbard D, Kohn MA, Li C, Liepmann D, Kerr D, Ahn D, Peters AL, Umpierrez GE, et al. (2022) A glycemia risk index (gri) of hypoglycemia and hyperglycemia for continuous glucose monitoring validated by clinician ratings.Journalofdiabetesscienceandtechnology 19322968221085273. Klupa T, Benbenek-Klupa T, Malecki M, Szalecki M, Sieradzki J (2008) Clinical usefulness of a bolus calculator in maintaining normoglycaemia in active professional patients with type 1 diabetes treated with continuous 109 subcutaneous insulin infusion.JournalofInternationalMedicalResearch 36(5), ISSN 03000605, URLhttp: //dx.doi.org/10.1177/147323000803600531. Konetzka RT, Stearns SC, Park J (2008) The staffing-outcomes relationship in nursing homes. Health Services Research 43(3):1025–1042, ISSN 00179124, URL http://dx.doi.org/10.1111/j.1475-6773. 2007.00803.x. Kraft T, Vald´ es L, Zheng Y (2018) Supply chain visibility and social responsibility: Investigating consumers’ behav- iors and motives. Manufacturing and Service Operations Management 20(4):617–636, ISSN 15265498, URL http://dx.doi.org/10.1287/msom.2017.0685. Kruger J (1999) Lake wobegon be gone! the” below-average effect” and the egocentric nature of comparative ability judgments.Journalofpersonalityandsocialpsychology 77(2):221. Kuntz L, Mennicken R, Scholtes S (2015) Stress on the ward: Evidence of safety tipping points in hospitals. Man- agement Science 61(4):754–771, ISSN 1526-5501, URLhttp://dx.doi.org/10.1287/mnsc.2014. 1917. Lacetera N, Macis M (2013) Time for blood: The effect of paid leave legislation on altruistic behavior. Journal of Law,Economics,andOrganization 29(6), ISSN 87566222, URLhttp://dx.doi.org/10.1093/jleo/ ews019. Lacetera N, Macis M, Slonim R (2012) Will there be blood? incentives and displacement effects in pro-social behavior. AmericanEconomicJournal: EconomicPolicy 4(1):186–223, ISSN 19457731, URLhttp://dx.doi.org/ 10.1257/pol.4.1.186. Lacetera N, Macis M, Slonim R (2014) Rewarding volunteers: A field experiment.ManagementScience 60(5):1107– 1129, ISSN 1526-5501, URLhttp://dx.doi.org/10.1287/mnsc.2013.1826. Lamberti DM, Wallace WA (1990) Intelligent interface design: An empirical assessment of knowledge presentation in expert systems. MIS Quarterly: Management Information Systems 14(3), ISSN 02767783, URL http: //dx.doi.org/10.2307/248891. Larrick RP, Soll JB (2006) Intuitions about combining opinions: Misappreciation of the averaging principle.Manage- mentscience 52(1):111–127. Lawrence K, Hanley K, Adams J, Sartori DJ, Greene R, Zabar S (2020) Building telemedicine capacity for trainees during the novel coronavirus outbreak: a case study and lessons learned. Journal of General Internal Medicine 35(9), ISSN 15251497, URLhttp://dx.doi.org/10.1007/s11606-020-05979-9. Lawrence M, Goodwin P, O’Connor M, ¨ Onkal D (2006) Judgmental forecasting: A review of progress over the last 25 years.InternationalJournalofforecasting 22(3):493–518. List JA (2011) The market for charitable giving. Journal of Economic Perspectives 25(2):157–180, ISSN 08953309, URLhttp://dx.doi.org/10.1257/jep.25.2.157. List JA, Price MK (2009) The role of social connections in charitable fundraising: Evidence from a natural field experiment. Journal of Economic Behavior and Organization 69(2):160–169, ISSN 01672681, URL http: //dx.doi.org/10.1016/j.jebo.2007.08.011. Logg JM, Minson JA, Moore DA (2019) Algorithm appreciation: People prefer algorithmic to human judgment. Or- ganizational Behavior and Human Decision Processes 151, ISSN 07495978, URLhttp://dx.doi.org/ 10.1016/j.obhdp.2018.12.005. Luo X, Tong S, Fang Z, Qu Z (2019) Frontiers: Machines vs. humans: The impact of artificial intelligence chatbot disclosure on customer purchases.MarketingScience 38(6):937–947. Madiedo JP, Chandrasekaran A, Salvador F (2020) Capturing the benefits of worker specialization: Effects of man- agerial and organizational task experience. Production and Operations Management 29(4):973–994, ISSN 19375956, URLhttp://dx.doi.org/10.1111/poms.13145. Mani V , Kesavan S, Swaminathan JM (2015) Estimating the impact of understaffing on sales and profitability in retail stores. Production and Operations Management 24(2):201–218, ISSN 19375956, URL http://dx.doi. org/10.1111/poms.12237. Maranto CL, Rodgers RC (1984) Does work experience increase productivity? a test of the on-the-job training hy- pothesis. The Journal of Human Resources 19(3):341, ISSN 0022166X, URL http://dx.doi.org/10. 2307/145877. 110 Marcus AD (2020) U.s. blood reserves are critically low. URL https://www.wsj.com/articles/ u-s-blood-reserves-are-critically-low-11591954200. Martinez KA, Rood M, Jhangiani N, Kou L, Rose S, Boissy A, Rothberg MB (2018) Patterns of use and correlates of patient satisfaction with a large nationwide direct to consumer telemedicine service.JournalofGeneralInternal Medicine 33(10), ISSN 15251497, URLhttp://dx.doi.org/10.1007/s11606-018-4621-5. Masser BM, White KM, Hyde MK, Terry DJ (2008) The psychology of blood donation: Current research and future di- rections. Transfusion Medicine Reviews 22(3):215–233, ISSN 08877963, URLhttp://dx.doi.org/10. 1016/j.tmrv.2008.02.005. Meehl PE (1954) Clinical versus statistical prediction: A theoretical analysis and a review of the evidence. . Meer J (2011) Brother, can you spare a dime? peer pressure in charitable solicitation. Journal of Public Eco- nomics 95(7–8):926–941, ISSN 00472727, URL http://dx.doi.org/10.1016/j.jpubeco.2010. 11.026. Mehrotra A, Ray K, Brockmeyer DM, Barnett ML, Bender JA (2020) Rapidly converting to “virtual practices”: Out- patient care in the era of covid-19.NEJMCatalyst 1(2):1–5, URLhttp://dx.doi.org/10.1056/CAT. 20.0091. Mellstr¨ om C, Johannesson M (2008) Crowding out in blood donation: Was titmuss right? Journal of the Eu- ropean Economic Association 6(4):845–863, ISSN 1542-4766, URL http://dx.doi.org/10.1162/ JEEA.2008.6.4.845. Milkman KL, Beshears J, Choi JJ, Laibson D, Madrian BC (2011) Using implementation intentions prompts to en- hance influenza vaccination rates.ProceedingsoftheNationalAcademyofSciences 108(26):10415–10420. Milkman KL, Beshears J, Choi JJ, Laibson D, Madrian BC (2013) Planning prompts as a means of increasing preven- tive screening rates.PreventiveMedicine 56(1):92. Mueller M, Knop M, Niehaves B, Adarkwah CC (2020) Investigating the acceptance of video consultation by patients in rural primary care: Empirical comparison of preusers and actual users.JMIRMedicalInformatics 8(10), ISSN 22919694, URLhttp://dx.doi.org/10.2196/20813. Murdoch WJ, Singh C, Kumbier K, Abbasi-Asl R, Yu B (2019) Definitions, methods, and applications in interpretable machine learning.ProceedingsoftheNationalAcademyofSciences 116(44):22071–22080. Narayanan S, Balasubramanian S, Swaminathan JM (2009) A matter of balance: Specialization, task variety, and individual learning in a software maintenance environment. Management Science 55(11):1861–1876, URL http://dx.doi.org/10.1287/mnsc.1090.1057. Needleman J, Buerhaus P, Mattke S, Stewart M, Zelevinsky K (2002) Nurse-staffing levels and the quality of care in hospitals. New England Journal of Medicine 346(22):1715–1722, ISSN 00284793, URLhttp://dx.doi. org/10.1056/NEJMsa012247. O Parsonson A, Grimison P, Boyer M, Horvath L, Mahon K, Beith J, Kao S, Hui M, Sutherland S, Kumar S, Heller G, et al (2021) Patient satisfaction with telehealth consultations in medical oncology clinics: A cross-sectional study at a metropolitan centre during the covid-19 pandemic.JournalofTelemedicineandTelecare ISSN 1357-633X, URLhttp://dx.doi.org/10.1177/1357633X211045586. Odeh B, Woo R Kayyali, Nabhani-Gebara S, Philip N, Robinson P, Wallace CR (2015) Evaluation of a telehealth ser- vice for copd and hf patients: Clinical outcome and patients’ perceptions.JournalofTelemedicineandTelecare 21(5), ISSN 17581109, URLhttp://dx.doi.org/10.1177/1357633X15574807. Ou-Yang J, Bei CH, He B, Rong X (2017) Factors influencing blood donation: a cross-sectional survey in guangzhou, china.TransfusionMedicine 27(4), ISSN 13653148, URLhttp://dx.doi.org/10.1111/tme.12410. Peden C, Mohan S, Pag´ ın V (2020) Telemedicine and covid-19: an observational study of rapid scale up in a us academic medical system. Journal of General Internal Medicine 19–21, ISSN 15251497, URLhttp://dx. doi.org/10.1007/s11606-020-05917-9. Pisano GP, Bohmer RM, Edmondson AC (2001) Organizational differences in rates of learning: Evidence from the adoption of minimally invasive cardiac surgery. Management Science 47(6):752–768, ISSN 00251909, URL http://dx.doi.org/10.1287/mnsc.47.6.752.9811. Polinksi JM, Barker T, Gagliano N, Sussman A, Brennan TA, Shrank WH (2016) Patients’ satisfaction with and preference for telehealth visits. Journal of General Internal Medicine ISSN 15251497, URL http://dx. doi.org/10.1007/s11606-015-3489-x. 111 Powell A, Savin S, Savva N (2012) Physician workload and hospital reimbursement: Overworked physicians generate less revenue per patient. Manufacturing & Service Operations Management 14(4):512–528, ISSN 1526-5498, URLhttp://dx.doi.org/10.1287/msom.1120.0384. Powell RE, Henstenburg JM, Cooper G, Hollander JE, Rising KL (2017) Patient perceptions of telehealth primary care video visits. Annals of Family Medicine ISSN 15441717, URL http://dx.doi.org/10.1370/afm. 2095. Powell RE, Stone D, Hollander JE (2018) Patient and health system experience with implementation of an enterprise- wide telehealth scheduled video visit program: Mixed-methods study. Journal of Medical Internet Research ISSN 14388871, URLhttp://dx.doi.org/10.2196/medinform.8479. Prahl A, Van Swol L (2017) Understanding algorithm aversion: When is advice from automation discounted? Journal ofForecasting 36(6), ISSN 1099131X, URLhttp://dx.doi.org/10.1002/for.2464. Reich P, Roberts P, Laabs N, Chinn A, McEvoy P, Hirschler N, Murphy EL (2006) A randomized trial of blood donor recruitment strategies. Transfusion 46(7):1090–1096, ISSN 00411132, URL http://dx.doi.org/ 10.1111/j.1537-2995.2006.00856.x. Rodriguez Perez J, Ord´ o˜ nez de Pablos P (2003) Knowledge management and organizational competitiveness: A framework for human capital analysis. Journal of Knowledge Management 7(3):82–91, ISSN 17587484, URL http://dx.doi.org/10.1108/13673270310485640. Roth J (????) Nber zip code distance database -- zip code tabulation area (zcta) distance database. URL http: //data.nber.org/data/zip-code-distance-database.html. Ryzhov IO, Han B, Bradi´ c J, Smith RH, Bradi J (2016) Cultivating disaster donors using data analytics.Management Science 62(3):849–866, ISSN 1526-5501, URLhttp://dx.doi.org/10.1287/mnsc.2015.2149. Samples LS, Martinez J, Beru YN, Rochester MR, Geyer JR (2021) Provider perceptions of telemedicine video visits to home in a veteran population. Telemedicine and e-Health 27(4), ISSN 15563669, URLhttp://dx.doi. org/10.1089/tmj.2020.0045. Sanders GL, Courtney JF (1985) A Field Study of Organizational Factors Influencing DSS Success. MIS Quarterly 9(1), ISSN 02767783, URLhttp://dx.doi.org/10.2307/249275. Schmidt S, Nørgaard K (2014) Bolus calculators.Journalofdiabetesscienceandtechnology 8(5):1035–1041. Scott DR, Batal HA, Majeres S, Adams JC, Dale R, Mehler PS (2009) Access and care issues in urban urgent care clinic patients. BMC Health Services Research ISSN 14726963, URL http://dx.doi.org/10.1186/ 1472-6963-9-222. Shan H, Wang JX, Ren FR, Zhang YZ, Zhao HY , Gao GJ, Ji Y , Ness PM (2002) Blood banking in china. Lancet 360(9347):1770–1775, ISSN 01406736, URL http://dx.doi.org/10.1016/S0140-6736(02) 11669-2. Sharma L, Chandrasekaran A, Bendoly E (2020) Does the office of patient experience matter in improving delivery of care? Production and Operations Management 29(4):833–855, ISSN 19375956, URLhttp://dx.doi. org/10.1111/poms.13141. Shi L, Wang JX, Stevens L, Ness P, Shan H (2014) Blood safety and availability: Continuing challenges in china’s blood banking system. Transfusion 54(2), ISSN 00411132, URL http://dx.doi.org/10.1111/trf. 12273. Singh J, Teng N, Netessine S (2019) Philanthropic campaigns and customer behavior: Field experiments on an online taxi booking platform. Management Science 65(2):913–932, ISSN 15265501, URLhttp://dx.doi.org/ 10.1287/mnsc.2017.2887. Slonim R, Wang C, Garbarino E (2014) The market for blood.JournalofEconomicPerspectives 28(2):177–196, ISSN 08953309, URLhttp://dx.doi.org/10.1257/jep.28.2.177. Snider C (2018) Let’s talk about your insulin pump data. URL https://www.tidepool.org/blog/ lets-talk-about-your-insulin-pump-data. Staats BR, Diwas SK, Gino F (2018) Maintaining beliefs in the face of negative news: The moderating role of ex- perience. Management Science 64(2):804–824, ISSN 15265501, URL http://dx.doi.org/10.1287/ mnsc.2016.2640. 112 Staats BR, Gino F (2012) Specialization and variety in repetitive tasks: Evidence from a japanese bank. Management Science 58(6):1141–1159, ISSN 00251909, URLhttp://dx.doi.org/10.1287/mnsc.1110.1482. Sun J, Zhang D, Hu H, Van Mieghem JA (2020) Predicting human discretion to adjust algorithmic prescription: A large-scale field experiment in warehouse operations.ManagementScience . Sun T, Gao G, Jin GZ (2019) Mobile messaging for offline group formation in prosocial activities: A large field exper- iment. Management Science 65(6):2717–2736, ISSN 15265501, URL http://dx.doi.org/10.1287/ mnsc.2018.3069. Sun T, Lu SF, Jin GZ (2016) Solving shortage in a priceless market: Insights from blood donation. Journal of Health Economics 48:149–165, ISSN 18791646, URL http://dx.doi.org/10.1016/j.jhealeco.2016. 05.001. Surowiecki J (2005)Thewisdomofcrowds (Anchor). Tan TF, Netessine S (2014) When does the devil make work? an empirical study of the impact of workload on worker productivity. Management Science 60(6):1574–1593, ISSN 15265501, URLhttp://dx.doi.org/ 10.1287/mnsc.2014.1950. Tan TF, Netessine S (2019) When you work with a superman, will you also fly? an empirical study of the impact of coworkers on performance. Management Science 65(8):3495–3517, ISSN 15265501, URL http://dx. doi.org/10.1287/mnsc.2018.3135. Tan TF, Staats BR (2020) Behavioral drivers of routing decisions: Evidence from restaurant table assignment. Pro- ductionandOperationsManagement 29(4):1050–1070. Thelen-Perry S, Ved R, Ellimoottil C (2018) Evaluating the patient experience with urological video visits at an academic medical center.mHealth 4, URLhttp://dx.doi.org/10.21037/mhealth.2018.11.02. Thiyagarajan A, Grant C, Griffiths F, Atherton H (2020) Exploring patients’ and clinicians’ experiences of video consultations in primary care: A systematic scoping review. BJGP Open 4(1), ISSN 23983795, URL http: //dx.doi.org/10.3399/bjgpopen20X101020. Tong S, Jia N, Luo X, Fang Z (2021) The janus face of artificial intelligence feedback: Deployment versus disclosure effects on employee performance.StrategicManagementJournal . Valentine MA, Tan TF, Staats BR, Edmondson AC (2019) Fluid teams and knowledge retrieval: Scaling service operations. Manufacturing and Service Operations Management 21(2), ISSN 15265498, URL http://dx. doi.org/10.1287/msom.2017.0704. Veldhuizen I, Ferguson E, De Kort W, Donders R, Atsma F (2011) Exploring the dynamics of the theory of planned behavior in the context of blood donation: Does donation experience make a difference? Transfusion 51(11), ISSN 00411132, URLhttp://dx.doi.org/10.1111/j.1537-2995.2011.03165.x. Vigersky RA (2015) The benefits, limitations, and cost-effectiveness of advanced technologies in the management of patients with diabetes mellitus.Journalofdiabetesscienceandtechnology 9(2):320–330. Wallace RB, Whitt W (2005) A staffing algorithm for call centers with skill-based routing.ManufacturingandService Operations Management 7(4):276–294, ISSN 15234614, URL http://dx.doi.org/10.1287/msom. 1050.0086. Wang G, Li J, Hopp WJ (2021) Personalized health care outcome analysis of cardiovascular surgical procedures.SSRN ElectronicJournal ISSN 1556-5068, URLhttp://dx.doi.org/10.2139/ssrn.2891517. Wang G, Li J, Hopp WJ, Fazzalari FL, Bolling SF (2019) Using patient-specific quality information to unlock hidden healthcare capabilities. Manufacturing and Service Operations Management 21(3):582–601, ISSN 15265498, URLhttp://dx.doi.org/10.1287/msom.2018.0709. Wang J, Guo N, Guo X, Li J, Wen GX, Yang T, Yun Z, Huang Y , Schreiber GB, Kavounis K, Ness P, et al (2010) Who donates blood at five ethnically and geographically diverse blood centers in china in 2008. Transfusion 50(12), ISSN 00411132, URLhttp://dx.doi.org/10.1111/j.1537-2995.2010.02722.x. Ward AR, Armony M (2013) Blind fair routing in large-scale service systems with heterogeneous customers and servers. Operations Research 61(1):228–243, ISSN 0030364X, URL http://dx.doi.org/10.1287/ opre.1120.1129. Welch BM, Harvey J, O’Connell NS, McElligott JT (2017) Patient preferences for direct-to-consumer telemedicine services: A nationwide survey. BMC Health Services Research ISSN 14726963, URL http://dx.doi. org/10.1186/s12913-017-2744-8. 113 Wladawsky-Berger I (2020) The Coming Era of Decision Machines. URLhttps://www.wsj.com/articles/ the-coming-era-of-decision-machines-01585338301. Wosik J, Fudim M, Cameron B, Gellad ZF, Cho A, Phinney D, Curtis S, Roman M, Poon EG, Ferranti J, Katz JN, et al (2020) Telehealth transformation: Covid-19 and the rise of virtual care. Journal of the American Medical InformaticsAssociation: JAMIA 27(6):957–962, ISSN 1527974X, URLhttp://dx.doi.org/10.1093/ jamia/ocaa067. Yankovic N, Green LV (2011) Identifying good nursing levels: A queuing approach.OperationsResearch 59(4):942– 955, ISSN 0030364X, URLhttp://dx.doi.org/10.1287/opre.1110.0943. Yin YH, Li CQ, Liu Z (2015) Blood donation in china: Sustaining efforts and challenges in achieving safety and availability.Transfusion 55(10), ISSN 15372995, URLhttp://dx.doi.org/10.1111/trf.13130. Yu C, Holroyd E, Cheng Y , Fai Lau JT (2013) Institutional incentives for altruism: Gifting blood in china.BMCPublic Health 13(1), ISSN 14712458, URLhttp://dx.doi.org/10.1186/1471-2458-13-524. Zaller N, Nelson KE, Ness P, Wen G, Bai X, Shan H (2005) Knowledge, attitude and practice survey regarding blood donation in a northwestern chinese city. Transfusion Medicine 15(4), ISSN 09587578, URL http://dx. doi.org/10.1111/j.0958-7578.2005.00589.x. 114
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Physician burnout in the COVID-19 pandemic: Healthcare organization and leadership implications for patient care and clinical team leadership with nurses
PDF
Essays on information, incentives and operational strategies
PDF
Ubiquitous computing for human activity analysis with applications in personalized healthcare
PDF
A series of longitudinal analyses of patient reported outcomes to further the understanding of care-management of comorbid diabetes and depression in a safety-net healthcare system
PDF
Using a human factors engineering perspective to design and evaluate communication and information technology tools to support depression care and physical activity behavior change among low-inco...
PDF
Essays on consumer returns in online retail and sustainable operations
PDF
Essays on information design for online retailers and social networks
PDF
Essays on the unintended consequences of digital platform designs
PDF
Understanding the quality of geriatric healthcare and family caregiver advocacy for patients with dementia
PDF
Modeling customer choice in assortment and transportation applications
PDF
Statistical inference for dynamical, interacting multi-object systems with emphasis on human small group interactions
PDF
Essays on revenue management with choice modeling
PDF
Reproducing inequity in organizations: gendered and racialized emotional labor in pubic organizations
PDF
Essays on consumer product evaluation and online shopping intermediaries
PDF
Essays on development economics
PDF
Online learning algorithms for network optimization with unknown variables
PDF
Essays on understanding consumer contribution behaviors in the context of crowdfunding
PDF
Investigating a physiological pathway for the effect of guided imagery on insulin resistance
PDF
Essays on the economics of digital entertainment
PDF
Essay on monetary policy, macroprudential policy, and financial integration
Asset Metadata
Creator
Lin, Wilson Wai-Gin
(author)
Core Title
Essays on improving human interactions with humans, algorithms, and technologies for better healthcare outcomes
School
Marshall School of Business
Degree
Doctor of Philosophy
Degree Program
Business Administration
Degree Conferral Date
2022-08
Publication Date
07/19/2024
Defense Date
05/11/2022
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
behavioral operations management,blood donation,charitable giving,charitable operations,empirical operations management,family medicine,healthcare management,healthcare operations management,human-algorithm interactions,non-profit operations,OAI-PMH Harvest,patient perception,physician-patient relations,Public Health,telemedicine,type 1 diabetes
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Rajagopalan, Sampath (
committee chair
), Kim, Song-Hee (
committee member
), Liu, Yan (
committee member
), Sun, Tianshu (
committee member
)
Creator Email
wilson.wl.lin@gmail.com,wilsonli@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC111373336
Unique identifier
UC111373336
Legacy Identifier
etd-LinWilsonW-10859
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Lin, Wilson Wai-Gin
Type
texts
Source
20220719-usctheses-batch-956
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
behavioral operations management
blood donation
charitable giving
charitable operations
empirical operations management
family medicine
healthcare management
healthcare operations management
human-algorithm interactions
non-profit operations
patient perception
physician-patient relations
telemedicine
type 1 diabetes