Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Towards trustworthy and data-driven social interventions
(USC Thesis Other)
Towards trustworthy and data-driven social interventions
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Towards Trustworthy and Data-Driven Social Interventions by Aida Rahmattalabi A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) August 2022 Copyright 2022 Aida Rahmattalabi Dedication To my beloved Amin, my dear parents, and brother. ii Acknowledgements I would like to express my deepest gratitude to my advisors, Dr. Phebe Vayanos and Dr. Milind Tambe for their constant support and guidance through the course of my PhD. Thank you so much Phebe for taking me under your guidance. Your investment in my professional and personal development has been truly remarkable and I can not thank you enough for it. Milind, I am very grateful for having you as my mentor. You inspired me to always reach for the best and persevere until I achieve it. Thank you for trusting in me and helping me find my path. This thesis is the result of close collaboration with other amazing faculty members, researchers and experts inside and outside of USC. I would like to specially thank Dr. Eric Rice for his unwavering support and guidance throughout these years. Eric, I am constantly amazed by the depth of your knowledge and dedication to community-driven research. It has been an absolute pleasure to work with you and learn from you. I would also like to thank my other thesis committee members: Dr. Cyrus Shahabi and Dr. Bistra Dilkina. Thank you so much for your invaluable feedback and mentorship over the years. I also thank Dr. Barman-Adhikari, Dr. Ryan Brown, Dr. Anthony Fulginiti, Dr. David Gray Grant, Daniel Ho (J.D., Ph.D.), Maxwell Izenberg, Dr. Shahin Jabbari, Dr. Ece Kamar, Dr. Christopher D. Kiekintveld, Dr. Hima Lakkaraju and Alice Xiang (J.D.) whom I had the opportunity to work with and learn from. In the last six years, I have had the chance to meet and work with so many great people at USC CAIS. Specially, I would like to thank Sina Aghaei, Elizabeth Bondi, Sarah Cooney, Aaron Ferber, Ben Ford, Shahrzad Gholami, Tye Hines, Qing Jin, Caroline Johnston, Nathan Justin, Debarun Kar, Jackson Killian, iii Laksh Matai, Aditya Mate, Han Ching Ou, Caleb Robinson, Aaron Schlenker, Bill Tang, Omkar Thakoor, Kai Wang, Bryan Wilder, Hailey Winetrobe, Lily Xu, Amulya Yadav, Yingxiao Ye and Han Yu. It was absolutely amazing to get to know each and every one of you. I have also been blessed with many great friends who have created a home for me and been there by my side when I most needed it. In particular, I thank Niloufar Zarei who is more than just a friend for me. I thank Haleh Akrami, Arash Fayazi, Zohre Azizi for all the fun days and nights together. I also thank Arman Massahi for being my sport inspirations (and teaching me how to play tennis properly!), Nripsuta Saxena for being such a fun spirit and Ehsan EmamJomeZade, Nami Mogharabin, Shiva Navabi for all the thought-provoking (and fun) conversations. I would also like to specially thank my dear Amin for his love, support and contagious positive attitude. Most importantly, I would like to thank my parents and brother for their encouragement and support throughout my life. I am who I am today because of you. Funding My PhD work was supported in part by Smart and Connected Communities program of the National Science Foundation under NSF award No. 1831770 and the US Army Research Office under grant number W911NF1710445. iv TableofContents Dedication ii Acknowledgements iii List of Tables viii List of Figures x Abstract xiv Introduction 1 I FairnessinSocialNetwork-BasedInterventions 11 Chapter 1: Robust and Fair Graph Covering 12 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.3 Fair and Robust Graph Covering Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.4 Price of Group Fairness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.4.1 Deterministic Case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.4.2 Uncertain Case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.5 Solution Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.5.1 Equivalent Reformulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.5.2 Bender’s Decomposition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 1.6 Results on Social Networks of Homeless Youth . . . . . . . . . . . . . . . . . . . . . . . . . 28 1.7 Conclusion and Broader Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Chapter 2: Fair Influence Maximization via Welfare Optimization 32 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.4 Existing Notions of Fairness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.5 Fair Influence Maximization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.5.1 Cardinal Welfare Theory Background . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.5.2 Group Fairness and New Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.5.3 Group Fairness and Welfare Maximization . . . . . . . . . . . . . . . . . . . . . . . 45 2.5.4 Connection to Existing Notions of Fairness . . . . . . . . . . . . . . . . . . . . . . 46 v 2.6 Computational Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 2.7 Conclusion and Broader Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 II AlgorithmicFairnessunderObservationalData 54 Chapter 3: Fair and Efficient Housing Allocation Policy Design 55 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.3 Housing Allocation as a Queuing System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.3.2 Matching Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.3.3 Policy Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.3.4 Optimization Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.4 Solution Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.4.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.4.2 Building the Partitioning Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.5 Computational Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.5.1 Synthetic Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.5.2 HMIS Data of Youth Experiencing Homelessness . . . . . . . . . . . . . . . . . . . 76 3.5.3 Data Pre-Processing and Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.5.4 Policy Optimization Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.6 Conclusion and Broader Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Chapter 4: Causal Inference for Ethical Decision-Making 84 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.3 Causal Fairness: A Potential Outcomes Perspective . . . . . . . . . . . . . . . . . . . . . . 90 4.3.1 Causal Assumptions for Identification . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.3.2 Fairness Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.3.3 Unfairness Mitigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.4 Trade-offs under the Lens of Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.4.1 Causal Fairness Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.4.2 Trade-offs among Causal Criteria of Fairness . . . . . . . . . . . . . . . . . . . . . 100 4.5 Computational Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.6 Conclusion and Broader Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Conclusions 111 Bibliography 114 Appendices 129 Appendix Chapter A: Technical Appendix to Chapter 1 130 A.1 Experimental Results in Section 1.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 A.2 Proof of Statements in Section 1.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 A.3 Proofs of Statements in Section 1.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 A.3.1 Worst-Case PoF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 A.3.2 Supporting Results for the PoF Derivation . . . . . . . . . . . . . . . . . . . . . . . 133 vi A.3.3 PoF in the Deterministic Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 A.3.4 PoF in the Robust Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 A.4 Proofs of Statements in Section 1.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 A.4.1 Equivalent Reformulation as a Max-Min-Max Optimization . . . . . . . . . . . . . 146 A.4.2 Exact MILP Formulation of the K-Adaptability Problem . . . . . . . . . . . . . . . . 148 A.5 Bender’s Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Appendix Chapter B: Technical Appendix to Chapter 2 158 B.1 Omitted Proofs from Section 2.5.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 B.2 Leximin Fairness and Social Welfare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 B.3 Omitted Proofs from Table 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 B.3.1 Monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 B.3.2 Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 B.3.3 Independence of Unconcerned Individuals . . . . . . . . . . . . . . . . . . . . . . . 166 B.3.4 Affine Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 B.3.5 Influence Transfer Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 B.3.6 Utility Gap Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 B.4 Omitted Details from Section 2.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 B.4.1 Estimating the SBM Parameters for Landslide Risk Management . . . . . . . . . . 172 B.4.2 Relative Community Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 B.4.3 Suicide Prevention Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 Appendix Chapter C: Technical Appendix to Chapter 3 177 C.1 Proof of Proposition 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 C.2 Proof of Proposition 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 C.3 Computational Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 vii ListofTables 1.1 Racial discrimination in node coverage resulting from applying the algorithm in [176] on real-world social networks from two homeless drop-in centers in Los Angeles, CA [17], when 1/3 of nodes (individuals) can be selected as monitors, out of which at most 10% will fail. The numbers correspond to the worst-case percentage of covered nodes across all monitor availability scenarios. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.2 Improvement on the worst-case coverage of the worse-off group and associated PoF for each of the five real-world social networks from Table 1.1. The first five rows correspond to the settingI = N/3. In the interest of space, we only show averages for the settings I = N/5 andI = N/7. In the deterministic case (J = 0), the PoF is measured relative the coverage of the true optimal solution (obtained by solving the integer programming formulation of the graph covering problem). In the uncertain case (J > 0), the PoF is measured relative to the coverage of the greedy heuristic of [176]. . . . . . . . . . . . . . . 30 2.1 Summary of the properties of different fairness notions through the lens of welfare principles for influence maximization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.1 Out-of-sample estimated policy performance measured in terms of rates of stable exit from homelessness and wait times. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.1 Fairness violation of statistical criteria and the classification accuracy. . . . . . . . . . . . . 108 4.2 Fairness violation of causal criteria. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 A.1 Racial composition (%) of the social networks considered after preprocessing . . . . . . . . 130 A.2 Values ofW output by our search procedure and used in the experiments associated with Table 1.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 A.3 Reduction in racial discrimination in node coverage resulting from applying our proposed algorithm relative to that of [176] on the five real-world social networks from Table A.1, when 1/3 of nodes (individuals) can be selected as monitors, out of which at most 10% may fail. The numbers correspond to the worst-case percentage of covered nodes across all monitor availability scenarios. The numbers in the parentheses are solutions to the state-of-the-art algorithm [176] (same numbers as in Table 1.1. . . . . . . . . . . . . . . . . 132 viii A.4 Companion figure to Lemma 2. The figures illustrate a network sequence {G N } ∞ N=5 parameterized byN and consisting of two disconnected clusters: a small and a large one, with 4 andN− 4 nodes, respectively. The small cluster remains intact asN grows. The nodes in the large cluster form a clique. In the figures, each color (white, grey, black) represents a different group and we investigate the price of imposing fairness across these groups. The subfigures show the original graph (a) and an optimal solution when I = 2 monitors can be selected in the cases (b) when fairness constraints are not imposed and (c) when fairness constraints are imposed, respectively. It holds that OPT fair (G N ,2,0)=4 and OPT(G N ,2,0)=N− 3 so that the PoF inG N converges to one asN tends to infinity. 132 B.1 Racial composition (%) after pre-processing as well as the number of vertices and edges of the social networks [17] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 B.2 Summary of the utility gap and PoF results averaged over 6 different real world social networks for various budget, fairness approaches and baselines. Numbers in bold highlight the best values in each setting (row) across different approaches. . . . . . . . . . . . . . . . 174 C.1 Prediction accuracy for propensity estimation using HMIS data. . . . . . . . . . . . . . . . 181 C.2 Propensity calibration within group for PSH (left) and RRH (right) of random forest model. None of the coefficients of the demographic attributes are found to be significant. In addition, the coefficient associated with the predicted probability is close to 1 in both models, suggesting that the model is well-calibrated even when we control for the demographic attributes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 C.3 Out-of-Sample Accuracy (%) of different outcome estimation models (outcome definition in Figure 3.4). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 C.4 Outcome calibration of logistic regression model within group under PSH, RRH and SO. None of the coefficients of the demographic attributes are found to be significant. In addition, the coefficient associated with the predicted probability is close to 1 in both models, suggesting that the model is well-calibrated even when we control for the demographic attributes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 ix ListofFigures 1 Project collaborators at RAND Corporation gather comments at the Sitka Sound Science Center in Sitka, Alaska on our research on landslide preparedness. . . . . . . . . . . . . . . 2 1.1 PoF in the uncertain (top) and deterministic (bottom) settings for SBM networks consisting of two communities (C ={1,2}) where the size of the first community is fixed at|N 1 | = 20 and the size of the other community is increased from|N 2 | = 20 to10,000. In the uncertain setting,γ denotes the fraction of nodes that fail. . . . . . . . . . . . . . . . 23 1.2 Left figure: Solution quality (overall worst-case coverage versus worst-case coverage of the group that is worse-off) for each approach (DC, Greedy, and K-adaptability for K =1,2,3); The points represent the results of each approach applied to each of the five real-world social networks from Table 1.1; Each shaded area corresponds to the convex hull of the results associated with each approach; Approaches that are more fair (resp. efficient) are situated in the right- (resp. top-)most part of the graph. Right figure: Average of the ratio of the objective value of the master problem to the network size (across the five instances) in dependence of solver time for the Bender’s decomposition approach (dotted line) and the Bender’s decomposition approach augmented with symmetry breaking constraints (solid line). For both sets of experiments, the setting wasI =N/3 andJ =3. . 29 2.1 The effect of network structure and in particular between-community edges on coupling of the utilities of communities. The figure shows two sample networks consisting of three communities, differentiated by shape: (a) is the same as (b) except that between- community edges are removed. Black fillings show the choice of influencers. We further assumep is small enough such that influence spread dissipates after one step. Transferring an influencer from circles to squares (top to bottom panel) affects the utility of diamonds in (b) but not in (a). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.2 Left and right panels: utility gap and PoF for different K andα values for our framework and baselines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 2.3 PoF vs. utility gap trade-off curves. Each line corresponds to a different budget K across different α values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.4 Utility gap and PoF for various levelsq 3 . All results are compared across different values ofα and the baselines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 x 3.1 NST-recommended resource allocation policy utilized by housing allocation agencies in the homelessness context. The policy is in the form of a resource eligibility structure. According to this figure, individuals with score eight and above qualify for PSH, score 4 to 7 are assigned to the RRH wait list and finally individuals who score below 4 are not assigned to any of the housing interventions. . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.2 Example partitioning by sample causal trees for PSH and RRH interventions. . . . . . . . . 72 3.3 Synthetic data experiments: policy value vs. the minimum propensity weight (left) and policy value vs. the number of queues (right). Each line corresponds to a different estimator. 76 3.4 HMIS data: success definition flow chart (left) and heterogeneous treatment effect using DR method (right) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.5 Out-of-sample rates of exit from homelessness by race (left panel) and age (right panel) using the DR estimator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 3.6 Optimal Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 3.7 Matching topology split by resource type: left (SO), middle (RRH) and right (PSH). Individuals are divided into four different score groups: S < 6,S ={6,7},S ={8,9}, S >9. Queues are constructed based on score groups and race jointly. Solid lines indicate that a resource is connected to the entire score group (a collection of queues). Dotted lines indicate connection to a single queue within the score group. For example, in the left figure, SO is only connected to the individuals with S ={6,7} and race White. . . . . . . 81 3.8 Fair topology (race) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.1 Decision-making timeline: the time when one’s sensitive attribute A is perceived determines pre- and post-treatment variables. Here,X is the vector of pre-treatment variables, ˜ X is a post-treatment variable andY is the outcome or decision. . . . . . . . . 91 4.2 Synthetic results in the hiring scenario. Colors denote the evaluation method: causal pre- interview, causal post-interview and statistical. From top to bottom, each row corresponds to a different value of α ∈ {− 0.5,0,0.5}. Column are different fairness evaluation criteria. On the x− axis, we vary the value of β , which reflects the dependence of the interview score on one’s gender. They− axis shows fairness violation across four different definitions. We note that for causal approaches we use the causal variants of the fairness definitions. The value of γ is set to 0.2. The error bars show 95% confidence interval. Depending on the joint setting of the parameters, statistical criteria may erroneously result in an over- or under-estimation of fairness violation. Further, post-interview fairness evaluation does not capture discrimination at earlier points in time. . . . . . . . . 106 xi B.1 An illustration for the graph used in the proof of Proposition 5 without the correct scaling. There are three communities (circle, square and diamond) and they all have size 100. The circle community consists of an “all-circle" star structure with 80 vertices, 14 isolated vertices and a mixed star structure (shared with the diamond community) with 6 circle vertices. The square community consists of two “all-square" star structures with sizes 60 and 10 plus a set of 30 isolated vertices. The diamond community consists of an “all-diamond" star structure with 30 vertices, 66 isolated vertices and a mixed star structure (shared with the circle community) with 4 diamond vertices. . . . . . . . . . . . 159 B.2 The difference of W α (u)− W α (u ′ ) on the vertical axis versus α on the horizontal axis for different welfare functions (this difference is scaled by a factor of 10 − 24 on the bottom panel). Top panel: W α (u) = Σ c∈C N c u α c /α for α ∈ (0,1); bottom panel: W α (u)=Σ c∈C N c u α c /α forα< 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 B.3 Companion figure to Proposition 14. The network consists of two communities circle and square, each of sizeN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 B.4 Companion figure to Proposition 15. The network consists of two communities circle and square, each of sizeN. All edges except the two shown by arrows are undirected meaning that influence can spread both ways. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 B.5 Companion figure to Proposition 17. The network consists of two communities circle and square each of sizeN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 B.6 Companion figure to Proposition 25 of a graph with two communities: N black vertices andN/3 white vertices forN = 9. We chooseK = 4 and arbitraryp < 1. All edges are undirected, meaning that influence can spread both ways. . . . . . . . . . . . . . . . . . . 170 B.7 Companion figure to Proposition 26 for the case of p = 1. The network consists of three groups: white, blue and black. The edges are undirected so the influence can spread both ways. For arbitraryp, the number of isolated black vertices should scale to⌈21/p⌉. . . . . 171 B.8 Utility gap and PoF for various relative community sizes where the ratio changes from 1 to 9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 B.9 Top and bottom panels: utility gap and PoF for each real world network instances (K =30). 176 C.1 Probability of exiting homelessness across the NST score range estimated using the DR method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 C.2 Reliability diagram of propensity estimation, RRH (top) and PSH (bottom). . . . . . . . . . 181 C.3 Reliability diagram of outcome, SO (top), RRH (middle) and PSH (bottom). . . . . . . . . . 183 C.4 The matching topology split by resource type: left (SO), middle (RRH) and right (PSH). The solid line indicates that the resource is connected to the entire queue. The dotted line indicates connection to a sub-group within the queue, e.g., SO is only connected to the individuals with NST = 6 and age>17. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 xii C.5 Fair topology (age) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 xiii Abstract This thesis examines social interventions conducted to address societal challenges such as homelessness, substance abuse or suicide. In most of these applications, it is challenging to purposefully collect data. Hence, we need to rely on social (e.g., social network data) or observational data (e.g., administrative data) to guide our decisions. Problematically, these datasets are prone to different statistical or societal biases. When optimized and evaluated on these data, ostensibly impartial algorithms may result in disparate im- pacts across different groups. In addition, these domains are plagued by limited resources and/or limited data which create a computational challenge with respect to improving the delivery of these interventions. In this thesis, I investigate the interplay of fairness and these computational challenges which I present in two parts. In the first part, I introduce the problem of fairness in social network-based interventions where I propose to use social network data to enhance interventions that rely on individual’s social connectedness such as HIV/suicide prevention or community preparedness against natural disasters. I demonstrate how biases in the social network can manifest as disparate outcomes across groups and describe my approach to mitigate such unfairness. In the second part, I focus on fairness challenges when data is observational. Motivated by the homelessness crisis in the U.S., I study the problem of learning fair resource allocation policies using observational data where I develop a methodology that handles selection bias in the data. I conclude with a critique on the fairness metrics proposed in the literature, both causal and observational (statistical). I present a novel causal view that addresses the shortcomings of existing approaches and sheds new light on well-known impossibility results from the fair machine learning literature. xiv Introduction For long, societies around the globe have struggled with complex societal problems in the areas of social justice and welfare, education or health that disproportionately impact the most vulnerable. Over the years, researchers, practitioners and policymakers have examined a variety of interventions to address these social problems which I refer to as “social interventions.” Fueled by recent algorithmic advances, there has been an increasing interest in developing evidence-based, AI-augmented social interventions that have greater reach and impact and are tailored to the needs of affected communities. In particular, this thesis investigates three social problems that are critical in the current state of our society and aims to develop trustworthy and data-driven algorithmic solutions to address them. First is suicide prevention. Suicide is a critical public health problem in the United States specially among the youth population such as college students, where suicide takes more than a 1000 lives each year [9]. In this regard, the present thesis studies how we can leverage individuals’ social support to mitigate the risk of suicidal ideation and death. Another application of this thesis is for landslide risk management. In particular, Sitka, Alaska experiences frequent landslide incidents which cause significant damage and disruption to the lives of those affected by it. Effective risk management depends heavily on timely and reliable access to risk information [144]. In this thesis, I study how we can use social influence to cre- ate resilient and informed communities that can protect themselves against landslide. Finally, this thesis investigates solutions to mitigate homelessness. Cities with high homeless population often suffer from shortage of resources to address this problem. For instance, in Los Angeles County, there are over 63,000 1 Figure 1: Project collaborators at RAND Corporation gather comments at the Sitka Sound Science Center in Sitka, Alaska on our research on landslide preparedness. homeless individuals and far fewer housing units to accommodate them. Furthermore, there is a signifi- cant disparity in the rate of homelessness across different racial groups, hitting those from minority groups the hardest [78]. To address these problems, this thesis explores equitable and data-driven policies to help match individuals with suitable resources in order to guarantee a high chance of safe and stable exit from homelessness. The problems studied in this thesis are identified through close collaborations with social scientists at RAND Corporation, a nonprofit global policy think tank, and social work academics who specialize in social network science and community-based research. Despite the wealth of knowledge in fields such as public health, public policy or social work on the underlying social phenomena, transferring that knowledge to develop practical computational models of interventions is non-trivial. Further, these interventions often give rise to highly intractable models which are difficult to optimize. In addition, designing algorithmic solutions in such complex real-world settings are faced with several unique challenges. Below, I highlight some of the challenges that are central to this thesis. Fairness: Social problems do not affect all groups equally. For instance, most minority groups in the United States experience homelessness at higher rates than Whites, and therefore make up a dispropor- tionate share of the homeless population. African Americans make up 13% of the general population, but more than 40% of the homeless population. Further, in Los Angeles County, recent studies have found 2 racial inequities in outcomes for Black residents of homeless services —particularly residents of Perma- nent Supportive Housing, a long-term housing intervention —where Black residents are 39% more likely to return to homelessness than White residents [128]. Similar disparities exist in other areas such as risk of suicide, where studies have identified evidence of widening gaps in rate of suicide across sex, sexual orientation, race/ethnicity, age, and socioeconomic status subgroups among college students [91, 119]. As algorithms enter such socially-sensitive domains, it is critical that they take the welfare of every group and individual into consideration and strive towards equitable outcomes for all. DataBias: Data is central to modern decision-making. However, in these settings controlled experiments are typically costly. As a result, most of available data comes from passive observations which are prone to different forms of bias. For example, recent studies have identified structural racism as one of the main factors for the high rate of homelessness among Black people [12]. Such societal biases will inevitably creep into the data that is used to inform the interventions which can be problematic as they may result in algorithms that discriminate against certain individuals or groups, entrenching existing inequalities. There are also naturally occurring biases. For example, it has been shown that individuals have a tendency to associate and bond with similar others, a phenomenon known as homophily [125]. While these natural biases are not intrinsically objectionable, care must be taken when using this type of data for various decision-making tasks as it may lead to undesirable disparities in the outcomes of different individuals or groups. ResourceLimitation: Designing social interventions typically involves the allocation of scarce resources, e.g., limited housing units or social worker hours. In Los Angeles County, there are over 63,000 homeless persons and only 21,000 housing units, most of which are temporary housing assistance. In such settings, providers often need to make complex decisions under great uncertainty with small margins of error. This may lead to fraught decisions that either under- or over-serve specific groups. Resource limitation 3 further compounds fairness challenges as it raises the question of who should or should not receive these resources. DataScarcity: Real-world settings are permeated by different forms of uncertainty, e.g., unknown avail- ability of intervention participants, that may negatively affect the outcome of these interventions. In practice, there may not be enough data to inform those uncertainties. This thesis is concerned with tackling the above challenges where a special emphasis is placed on the issue of fairness and its interplay with resource and data limitation. Specifically, the overarching question that the present thesis aims to address is How can we develop fair, efficient and data-driven algorithms to enhance social interventions? I investigate this question within the context of the aforementioned social problems, namely suicide prevention, landslide risk management and mitigating homelessness, where I propose computational mod- els of popular interventions as well as equitable and efficient algorithmic solutions. It is noteworthy that the proposed intervention models are not restricted to the above applications and can be generalized to address other problems that share the same underlying characteristics. OverviewofContributions This thesis is divided in two parts. The first part introduces the computational problem of fairness in social network-based interventions, i.e., interventions that rely on social support and individuals’ connectedness to succeed, such as suicide prevention and community resilience for landslide risk management. This thesis draws on material published in [76, 98, 148, 149, 150, 151, 152, 154]. Chapter 1 focuses on suicide prevention. Gatekeeper training is one of the widely used suicide pre- vention interventions which involves teaching individuals to recognize and support those in crisis. A successful intervention seeks to achieve a good coverage of the individuals in a social network (e.g., stu- dent population). Targeted enlistment of individuals helps achieve more desirable coverage than baseline 4 strategies [77]. However, the performance is significantly affected by the uncertainty in the availability and performance of the training candidates. In collaboration with the schools of social work at the Univer- sity of Denver and the University of Southern California, this work proposes a novel intervention model to select a limited number of individuals, with uncertain performance, to identify warning signs of sui- cide among their peers in a social network. Using social network of youth experiencing homelessness, this work demonstrates how purely coverage-centric algorithms, such as those introduced in [42, 114, 176], may result in discriminatory coverage across different social groups. Devising efficient solutions that perform well across groups, even under worst-case uncertain scenarios, also poses a highly intractable problem. Chapter 1 addresses this problem by providing a novel formulation of the problem as a robust graph covering problem with group fairness constraints. The solution approach is in the form of a tractable approximation applicable to real-world instances. In addition, this work provides a theoretical analysis of price of group fairness (PoF), with and without uncertainty. Specifically, it shows that uncertainty can lead to greater PoF compared to the deterministic case which highlights the trade-off between fairness and robustness. Empirically, the proposed method yields competitive node coverage while significantly improving group fairness over the state-of-the-art methods. Chapter 2 investigates interventions for community preparedness against natural hazards such as land- slide risk. In collaboration with scientists at RAND Corporation and Sitka Sound Science Center, we iden- tified a major challenge associated with landslide risk management to be timely and reliable access to risk information. Community-based interventions can improve risk communication and access to informa- tion, particularly in rural and remote contexts. These interventions often seek to engage and educate a limited set of individuals who can act as community-leaders to spread information to others. Algorithmic influence maximization can aid with the choice of “peer leaders” or “influencers” in such interventions. Existing techniques for fair influence maximization require committing to a single fairness measure or are imposed as strict constraints leading to undesirable properties such as wastage of resources [171, 175]. 5 Chapter 2 revisits the problem of fairness in influence maximization from a welfare optimization perspec- tive. It provides a principled characterization of the properties that a fair influence maximization algorithm should satisfy. As a result, it proposes a framework that aggregates the cardinal utilities derived by each community using isoelastic social welfare functions. Under this framework, the trade-off between fair- ness and efficiency can be controlled by a single inequality aversion design parameter which is crucial specially when these solutions are deployed at scale. In addition, the proposed framework encompasses as special cases leximin and proportional fairness. It is further shown that the resulting optimization problem is monotone and submodular and can be solved efficiently with optimality guarantees. Extensive experiments on synthetic and real world datasets including a case study on landslide risk management demonstrate the efficacy of the proposed framework. The second part of this thesis focuses on challenges that arise when data is observational. In this setting, a decision-maker has to rely on passive data observations prone to selection bias. In particular, selection bias occurs when the assignment of individuals in different groups are not completely at random. For instance, individuals who have been exposed to a certain treatment may be systematically different from those who have been assigned to a control group. Similarly, individuals’ sensitive attributes may be correlated with other risk factors important for decision-making. One can view this as a selection bias, as individuals in different sensitive groups have different underlying risk distributions. Selection bias poses unique challenges for designing data-driven interventions as well as unfairness evaluation which I explore in Chapters 3 and 4. Chapter 3 focuses on the problem of mitigating homelessness. Homeless services authorities com- monly consider housing as a key solution to homelessness [92]. Despite different government funding programs and services, the number of homeless individuals in the U.S. surpasses the available resources which necessitates strategic allocations to maximize the intervention’s effectiveness. A natural, or rather 6 complex, objective for housing allocation is to optimize the expected number of people exiting homeless- ness from different social groups (e.g., racial groups). However, the treatment effects of different interven- tions are unknown and heterogeneous. In other words, the likelihood of a successful outcome depends on the joint characteristics of the resource and individual which is unknown to the decision-maker and should be estimated from data. Historical data, on the other hand, suffers from selection bias which poses a challenge for evaluating and optimizing policies that perform well across different protected groups. In addressing this problem, this work proposes a computational model to match heterogeneous individuals and resources that arrive stochastically over time. Each individual, upon arrival, is assigned to a queue where they wait to be matched to a resource. The resources are assigned in a first come first served (FCFS) fashion according to an eligibility structure that encodes the resource types that serve each queue. This work provides a methodology based on techniques in modern causal inference to construct the individual queues as well as learn the matching outcomes and provide a mixed-integer optimization (MIO) formula- tion to optimize the eligibility structure. The MIO problem maximizes policy outcome subject to wait time and fairness constraints. It is very flexible, allowing for additional linear domain constraints. Empirical results using data from the U.S. Homeless Management Information System (HMIS) results in wait times as low as an FCFS policy while improving the rate of exit from homelessness for underserved or vulnerable groups (7% higher for the Black individuals and 15% higher for those below 17 years old). Finally, Chapter 4 studies unfairness evaluation and mitigation in more generic decision-making appli- cations. In recent years, there has been increasing interest in causal reasoning for designing fair decision- making systems due to its compatibility with legal frameworks, interpretability for human stakeholders, and robustness to spurious correlations inherent in observational data, among other factors. The recent attention to causal fairness, however, has been accompanied with great skepticism due to practical and epistemological challenges with applying current causal fairness approaches in the literature. Motivated 7 by the long-standing empirical work on causality in econometrics, social sciences, and biomedical sci- ences, this work lays out the conditions for appropriate application of causal fairness under the “potential outcomes framework.” Specifically, it highlight key aspects of causal inference that are often ignored in the causal fairness literature, namely the importance of specifying the nature and timing of interventions on social categories such as race or gender. Precisely, instead of postulating an intervention on immutable attributes, this work proposes a shift in focus to their perceptions and discuss the implications for fairness evaluation. Such conceptualization of the intervention is key in evaluating the validity of causal assump- tions and conducting sound causal analyses including avoiding post-treatment bias (a form of bias due to variables that have materialized after one’s sensitive attribute is observed). Sound application of causal fairness can further address the limitations of existing fairness metrics, including those that depend upon statistical correlations. Specifically, I introduce causal variants of common statistical notions of fairness, and make a novel observation that under the causal framework there is no fundamental disagreement between different notions of fairness. Finally, extensive experiments demonstrate the effectiveness of the proposed approach for evaluating and mitigating unfairness, specially when post-treatment variables are present. RelatedWork Interest in fairness properties of algorithms can be broadly categorized into two themes: fairness inpredic- tion and fairness in decision. In recent years, there has been an explosion of research focusing on fairness in machine learning (ML). These works aim to ensure that predictions made by ML algorithms are equi- table. To this end, different notions of fairness are defined based on one or more sensitive attributes such as age, race or gender [85, 111, 191]. Despite the variety of individual and group fairness definitions, there is still a lack of expressiveness [127]. Most of these definitions focus solely on the inputs and outputs of the algorithm without taking into account the complexities of the downstream task such as constrained 8 allocation or heterogeneity in utility of different individuals or groups [71]. A few exceptions exist in which the authors study a welfare-based prediction model with fairness considerations [54, 86, 93]. It is worth noting that there is a line of work on budgeted ML which considers resource limitations such as computational cost, time or information input [7, 55]. However, these applications do not directly relate to our settings which require resource constraints on the model prediction. Research on fairness in decision-making and resource allocation, on the other hand, has a long history. Different disciplines, from operations research, computer science, mathematics to mechanism design and welfare economics, have studied the fair allocation allocation under different assumptions. In this regard, a typical setting concerns a scenario where a central decision-maker must make an allocation of goods to a number of distinct entities (e.g., individuals) in a fair manner. A line of work studies the fair allo- cation problem among individuals [16, 20, 41], or groups of agents [15, 66, 117, 166] by defining various fairness criteria. The literature tends to focus on several primary notions of fairness: proportional divi- sion [174] (every agent receives at least1/n of her perceived value of resources); equitability [73] (every agent equally values their allocations); envy-freeness [180] (every agent values their allocation at least as much as another’s) and maximin fairness [156] (the value received by the worse-off agent is maximized). While these notions capture fairness of allocations in many real-world applications, there are several barriers to their adoption in practice. First, the common assumption in these works is that utilities are given which overlooks the fact that in practice utilities are unknown and predicted utilities, trained on past behavior, are subject to bias. In addition, they assume that individuals’ utilities are independent of one another, i.e., changing an individual’s utility will not affect other individuals as long as their share of resources is fixed. Moreover, real-world decisions are subject to different forms of uncertainty. Works that study fairness under uncertainty (e.g., unknown demand) [19, 61, 64, 131] often assume full distributional information about the uncertain parameters. In some social settings, however, distributional information is not available and there may be little data to inform our decisions. Finally, fairness/efficiency trade-offs 9 is another crucial consideration that arises in a variety of applications including organ allocation [172] or disaster response [148]. Prior work is often limited to point-solutions, with little quantitative understand- ing about the trade-off between efficiency and fairness, which impedes the applicability of these solutions. In spite of recent efforts to discover, evaluate and mitigate algorithmic bias and unfairness, data-driven allocation problems continue to form an important topic for fairness considerations, especially as auto- mated systems enter a wide range of application domains far beyond the original computational settings of the problem. As highlighted above, there are many unresolved challenges that arise when we consider developing these solutions for real-world settings. This thesis focuses on three social domains. However, it is noteworthy that the fairness challenges and the proposed solutions are not restricted to the above applications and can be generalized to other domains that share the same underlying characteristics. 10 PartI FairnessinSocialNetwork-BasedInterventions 11 Chapter1 RobustandFairGraphCovering 12 1.1 Introduction We consider the problem of selecting a subset of nodes (which we refer to as ‘monitors’) in a graph that can ‘cover’ their adjacent nodes. We are mainly motivated by settings where monitors are subject to failure and we seek to maximize worst-case node coverage. We refer to this problem as therobustgraphcovering. This problem finds applications in several critical real-world domains, especially in the context of optimizing social interventions on vulnerable populations. Consider for example the problem of designingGatekeeper traininginterventionsforsuicideprevention, wherein a small number of individuals can be trained to identify warning signs of suicide among their peers [97]. A similar problem arises in the context of disaster risk management in remote communities wherein a moderate number of individuals are recruited in advance and trained to watch out for others in case of natural hazards (e.g., in the event of a landslide [155]). Previous research has shown that social intervention programs of this sort hold great promise [97, 155]. Unfortunately, in these real-world domains, intervention agencies often have very limited resources, e.g., moderate number of social workers to conduct the intervention, small amount of funding to cover the cost of training. This makes it essential to target the right set of monitors to cover a maximum number of nodes in the network. Further, in these interventions, the performance and availability of individuals (monitors) is unknown and unpredictable. At the same time, robustness is desired to guarantee high coverage even in worst-case settings to make the approach suitable for deployment in the open world. Robust graph covering problems similar to the one we consider here have been studied in the literature, see e.g., [42, 176]. Yet, a major consideration distinguishes our problem from previous work: namely, the need for fairness. Indeed, when deploying interventions in the open world (especially in sensitive domains impacting life and death like the ones that motivate this work), care must be taken to ensure that algorithms do not discriminate among people with respect to protected characteristics such as race, ethnicity, disability, etc. In other words, we need to ensure that independently of their group, individuals have a high chance of being covered, a notion we refer to as group fairness. 13 Network Name Network Size Worst-case coverage of individuals by racial group (%) White Black Hispanic Mixed Other SPY1 95 70 36 – 86 94 SPY2 117 78 – 42 76 67 SPY3 118 88 – 33 95 69 MFP1 165 96 77 69 73 28 MFP2 182 44 85 70 77 72 Table 1.1: Racial discrimination in node coverage resulting from applying the algorithm in [176] on real- world social networks from two homeless drop-in centers in Los Angeles, CA [17], when 1/3 of nodes (individuals) can be selected as monitors, out of which at most 10% will fail. The numbers correspond to the worst-case percentage of covered nodes across all monitor availability scenarios. To motivate our approach, consider deploying in the open world a state-of-the art algorithm for robust graph covering (which does not incorporate fairness considerations). Specifically, we apply the solutions provided by the algorithm from [176] on five real-world social networks. The results are summarized in Table 1.1 where, for each network, we report its size and the worst-case coverage by racial group. In all instances, there is significant disparity in coverage across racial groups. As an example, in network SPY1 36% of Black individuals are covered in the worst-case compared to 70% (resp. 86%) of White (resp. Mixed race) individuals. Thus, when maximizing coverage without fairness, (near-)optimal interventions end up mirroring any differences in degree of connectedness of different groups. In particular, well-connected groups at the center of the network are more likely to be covered (protected). Motivated by the desire to support those that are the less well off, we employ ideas from maximin fairness to improve coverage of those groups that are least likely to be protected. We investigate the robust graph covering problem with fairness constraints. Formally, given a social network, where each node belongs to a group, we consider the problem of selecting a subset of I nodes (monitors), when at mostJ of them may fail. When a node is chosen as a monitor and does not fail, all of its neighbors are said to be ‘covered’ and we use the term ‘coverage’ to refer to the total number of covered nodes. Our objective is to maximize worst-case coverage when any J nodes may fail, while ensuring fairness in coverage across groups. We adopt maximin fairness from the Rawlsian theory of justice [156] 14 as our fairness criterion: we aim to maximize the utility of the groups that are worse-off. To the best of our knowledge, ours is the first work enforcing fairness constraints in the context of graph covering subject to node failure. We make the following contributions: (i) We achieve maximin group fairness by incorporating con- straints inside a robust optimization model, wherein we require that at least a fractionW of each group is covered, in the worst-case;(ii) We propose a novel two-stage robust optimization formulation of the prob- lem for which near-optimal conservative approximations can be obtained as a moderately-sized mixed- integer linear program (MILP). By leveraging the decomposable structure of the resulting MILP, we pro- pose a Benders’ decomposition algorithm augmented with symmetry breaking to solve practical problem sizes;(iii) We present the first study of price of group fairness (PoF), i.e., the loss in coverage due to fairness constraints in the graph covering problem subject to node failure. We provide upper bounds on the PoF for Stochastic Block Model networks, a widely studied model of networks with community structure; (iv) Fi- nally, we demonstrate the effectiveness of our approach on several real-world social networks of homeless youth. Our method yields competitive node coverage while significantly improving group fairness relative to state-of-the-art methods. 1.2 RelatedWork This work relates to three streams of literature which we review. Algorithmic Fairness. With increase in deployments of AI, OR, and ML algorithms for decision and policy-making in the open world has come increased interest in algorithmic fairness. A large portion of this literature is focused on resource allocation systems, see e.g., [33, 112, 192]. Group fairness in particular has been studied in the context of resource allocation problems [53, 165, 173]. A nascent stream of work proposes to impose fairness by means of constraints in an optimization problem, an approach we also 15 follow. This is for example proposed in [4], and in [24, 64], and in [5] for machine learning, resource allo- cation, and matching problems, respectively. Several authors have studied the price of fairness. In [33], the authors provide bounds for maximin fair optimization problems. Their approach is restricted to convex and compact utility sets. In [21], the authors study price of fairness for indivisible goods with additive utility functions. In our graph covering problem, this property does not hold. Several authors have investigated notions of fairness under uncertainty, see e.g, [18, 72, 131, 192]. These papers all assume full distributional information about the uncertain parameters and cannot be employed in our setting where limited data is available about node availability. Motivated by data scarcity, we take a robust optimization approach to model uncertainty which does not require distributional information. This problem is highly intractable due to the combinatorial nature of both the decision and uncertainty spaces. When fair solutions are hard to compute, “approximately fair” solutions have been considered [112]. In our work, we adopt an approx- imation scheme. As such, our approach falls under the “approximately fair” category. Recently, several authors have emphasized the importance of fairness when conducting interventions in socially sensitive settings, see e.g., [13, 115, 175]. Our work most closely relates to [175], wherein the authors propose an algorithmic framework for fair influence maximization. We note that, in their work, nodes are not subject to failure and therefore their approach does not apply in our context. SubmodularOptimization. One can view the group-fair maximum coverage problem as a multi-objective optimization problem, with the coverage of each community being a separate objective. In the determin- istic case, this problem reduces to the multi-objective submodular optimization problem [48], as coverage has the submodularity (diminishing returns) property. In addition, moderately sized problems of this kind can be solved optimally using integer programming technology. However, when considering uncertainty in node performance/availability, the objective function loses the submodularity property while exact tech- niques fail to scale to even moderate problem sizes. Thus, existing (exact or approximate) approaches do not apply. Our work more closely relates to the robust submodular optimization literature. In [42, 142], 16 the authors study the problem of choosing a set of up toI items, out of whichJ fail (which encompasses as a special case the robust graph covering problem without fairness constraints). They propose a greedy algorithm with a constant (0.387) approximation factor, valid for J = o( √ I), and J = o(I), respec- tively. Finally, in [176], the authors propose another greedy algorithm with a general bound based on the curvature of the submodular function. These heuristics, although computationally efficient, are coverage- centered and do not take fairness into consideration. Thus, they may lead to discriminatory outcomes, see Table 1.1. Robust Optimization. Our solution approach closely relates to robust optimization paradigm which is a computationally attractive framework for obtaining equivalent or conservative approximations based on duality theory, see e.g., [23, 33, 189]. Indeed, we show that the robust graph covering problem can be written as a two-stage robust problem with binary second-stage decisions which is highly intractable in general [36]. One stream of work proposes to restrict the functional form of the recourse decisions to functions of benign complexity [32, 35]. Other works rely on partitioning the uncertainty set into finite sets and applying constant decision rules on each partition [35, 38, 84, 146, 182]. The last stream of work investigates the so-calledK-adaptability counterpart [30, 47, 84, 153, 181], in whichK candidate policies are chosen in the first stage and the best of these policies is selected after the uncertain parameters are revealed. Our work most closely relates to [84, 153]. In [84], the authors show that for bounded polyhedral uncertainty sets, linear two-stage robust optimization problems can be approximately reformulated as MILPs. Paper [153] extends this result to a special case of discrete uncertainty sets. We prove that we can leverage this approximation to reformulate robust graph covering problem with fairness constraints exactly for a much larger class of discrete uncertainty sets. 17 1.3 FairandRobustGraphCoveringProblem We model a social network as a directed graphG = (N,E), in whichN := {1,...,N} is the set of all nodes (individuals) andE is the set of all edges (social ties). A directed edge from ν to n exists, i.e., (ν,n ) ∈ E, if node n can be covered by ν . We use δ (n) := {ν ∈ N : (ν,n ) ∈ E} to denote the set of neighbors (friends) of n inG, i.e., the set of nodes that can cover node n. Each node n ∈ N is characterized by a set of attributes (protected characteristics) such as age, race, gender, etc., for which fair treatment is important. Based on these node characteristics, we partitionN into C disjoint groupsN c , c∈C :={1,...,C}, such that∪ c∈C N c =N . We consider the problem of selecting a set ofI nodes fromN to act as ‘peer-monitors’ for their neigh- bors, given that the availability of each node is unknown a-priori and at mostJ nodes may fail (be unavail- able). We encode the choice of monitors using a binary vectorx of dimensionN whosenth element is one iff the nth node is chosen. We requirex∈X :={x∈{0,1} N : e ⊤ x≤ I}, wheree is a vector of all ones of appropriate dimension. Accordingly, we encode the (uncertain) node availability using a binary vector ξ of dimension N whose nth element equals one iff node n does not fail (is available). Given that data available to inform the distribution ofξ is typically scarce, we avoid making distributional assumptions onξ . Instead, we view uncertainty as deterministic and set based, in the spirit of robust optimization [23]. Thus, we assume thatξ can take-on any value from the setΞ which is often referred to as the uncertainty set in robust optimization. The setΞ may for example conveniently capture failure rate information. Thus, we requireξ ∈ Ξ := {ξ ∈ {0,1} N : e ⊤ (e− ξ ) ≤ J}. A node n is counted as ‘covered’ if at least one of its neighbors is a monitor and does not fail (is available). We lety n (x,ξ ) denote ifn is covered for the monitor choicex and node availabilityξ . y n (x,ξ ):=I P ν ∈δ (n) ξ ν x ν ≥ 1 . 18 The coverage is then expressible as F G (x,ξ ) := e ⊤ y(x,ξ ). The robust covering problem which aims to maximize the worst-case (minimum) coverage under node failures can be written as max x∈X min ξ ∈Ξ F G (x,ξ ). (RC) Problem (RC) ignores fairness and may result in discriminatory coverage with respect to (protected) node attributes , see Table 1.1. We thus propose to augment the robust covering problem with fairness constraints. Specifically, we propose to achieve max-min fairness by imposing fairness constraints on each group’s coverage: we require that at least a fraction W of nodes from each group be covered. In [175], the authors show that by conducting a binary search for the largestW for which fairness constraints are satisfied for all groups, the max-min fairness optimization problem is equivalent to the one with fairness constraints. Thus, we write the robust covering problem with fairness constraints as ( max x∈X min ξ ∈Ξ X c∈C F G,c (x,ξ ) : F G,c (x,ξ )≥ W|N c | ∀c∈C, ∀ξ ∈Ξ ) , (RC fair ) where F G,c (x,ξ ) := P n∈Nc y n (x,ξ ) is the coverage of group c ∈ C. Note that if|C| = 1, Prob- lem (RC fair ) reduces to Problem (RC), and if Ξ = {e}, Problem (RC fair ) reduces to the deterministic covering problem with fairness constraints. We emphasize that our approach can handle fairness with respect to more than one protected attribute by either: (a) partitioning the network based on joint values of the protected attributes and imposing a max-min fairness constraint for each group; or (b) imposing max-min fairness constraints for each protected attribute separately. Problem (RC fair ) is computationally hard due to the combinatorial nature of both the uncertainty and decision spaces. Lemma 1 characterizes its complexity. Proofs of all results are in the supplementary document. Lemma1. Problem (RC fair ) isNP-hard. 19 1.4 PriceofGroupFairness In Section 1.3, we proposed a novel formulation of the robust covering problem incorporating fairness constraints, Problem (RC fair ). Unfortunately, adding fairness constraints to Problem (RC) comes at a price to overall worst-case coverage. In this section, we study this price of group fairness. Definition1. Given a graphG, the Price of Group Fairness PoF(G,I,J) is the ratio of the coverage loss due to fairness constraints to the maximum coverage in the absence of fairness constraints, i.e., PoF(G,I,J):=1− OPT fair (G,I,J) OPT(G,I,J) , (1.1) where OPT fair (G,I,J) and OPT(G,I,J) denote the optimal objective values of Problems (RC fair ) and (RC), respectively, whenI monitors can be chosen and at mostJ of them may fail. In this work, we are motivated by applications related to social networks, where it has been observed that people with similar (protected) characteristics tend to interact more frequently with one another, forming friendship groups (communities). This phenomenon, known as homophily [126], has been ob- served for characteristics such as race, gender, education, etc.[56]. This motivates us to study the PoF in Stochastic Block Model (SBM) networks [70], a widely accepted model for networks with community structure. In SBM networks, nodes are partitioned into C disjoint communitiesN c , c ∈ C. Within each communityc, an edge between two nodes is present independently with probabilityp in c . Between a pair of communitiesc andc ′ ∈C, edges exist independently with probabilityp out cc ′ and we typically havep in c >p out cc ′ to capture homophily. Thus, SBM networks are very adequate models for our purpose. We assume w.l.o.g. that the communities are labeled such that:|N 1 |≤ ...≤|N C |. 20 1.4.1 DeterministicCase. We first study the PoF in the deterministic case for which J =0. Lemma 2 shows that there are worst-case networks for which PoF can be arbitrarily bad. Lemma 2. Given ϵ > 0, there exists a budget I and a network G with N ≥ 4 ϵ + 3 nodes such that PoF(G,I,0)≥ 1− ϵ. Fortunately, as we will see, this pessimistic result is not representative of the networks that are seen in practice. We thus investigate the loss in expected coverage due to fairness constraints, given by PoF(I,J):=1− E G∼ SBM [OPT fair (G,I,J)] E G∼ SBM [OPT(G,I,J)] . (1.2) We emphasize that we investigate the loss in the expected coverage rather than the expected PoF for analytical tractability reasons. We make the following assumptions about SBM network. Assumption 1. For all communitiesc ∈ C, the probability of an edge between two individuals in the com- munity is inversely proportional to the size of the community, i.e.,p in c =Θ( |N c | − 1 ). Assumption 2. For any two communities c,c ′ ∈ C, the probability of an edge between two nodes n ∈ N c andν ∈N c ′ isp out cc ′ =O((|N c |log 2 |N c |) − 1 ). Assumption 1 is based on the observation that social networks are usually sparse. This means that most individuals do not form too many links, even if the size of the network is very large. Sparsity is characterized in the literature by the number of edges being proportional to the number of nodes which is the direct result of Assumption 1. Assumption 2 is necessary for meaningful community structure in the network. We now present results for the upper bound on PoF in SBM networks. 21 Proposition1. Consider an SBM network model with parametersp in c andp out cc ′,c,c ′ ∈C, satisfying Assump- tions 1 and 2. IfI =O(logN), then PoF(I,0)=1− P c∈C |N c | P c∈C |N c |d(C)/d(c) − o(1), whered(c):=log|N c |(loglog|N c |) − 1 . ProofSketch. First, we show that under Assumption 1, the coverage within each community is the sum of the degrees of the monitoring nodes. Then, using the assumption onI in the premise of the proposition (which can be interpreted as a “small budget assumption”), we evaluate the maximum coverage within each community. Next, we show that between-community coverage is negligible compared to within- community coverage. Thus, we determine the distribution of the monitors, in the presence and absence of fairness constraints. PoF is computed based on the these two quantities.■ 1.4.2 UncertainCase. Here, imposing fairness is more challenging as we do not know a-priori which nodes may fail. Thus, we must ensure that fairness constraints are satisfied under all failure scenarios. Proposition2. Consider an SBM network model with parametersp in c andp out cc ′,c,c ′ ∈C, satisfying Assump- tions 1 and 2. IfI =O(logN), then PoF(I,J)=1− η P c∈C |N c | (I− J)× d(C) − J P c∈C\{C} d(c) (I− J)× d(C) − o(1), whered(c) is as in Proposition 1 andη :=(I− CJ) P c∈C |N c |/d(c) − 1 . ProofSketch. The steps of the proof are similar to those in the proof of Proposition 1 with the difference that, under uncertainty, monitors should be distributed such that the fairness constraints are satisfied even 22 Uncertain Deterministic 0 100 200 300 400 500 0 10 20 0.0 0.5 1.0 1.5 Relative Community Size Price of Group Fairness (%) γ value 0.4 0.3 0.2 0.1 0 Figure 1.1: PoF in the uncertain (top) and deterministic (bottom) settings for SBM networks consisting of two communities (C = {1,2}) where the size of the first community is fixed at |N 1 | = 20 and the size of the other community is increased from|N 2 | = 20 to 10,000. In the uncertain setting, γ denotes the fraction of nodes that fail. after J nodes fail. Thus, we quantify a minimum number of monitors that should be allocated to each community. We then determine the worst-case coverage both in the presence and absence of fairness constraints. PoF is computed based on these two quantities.■ Propositions 1 and 2 show how PoF changes with the relative sizes of the communities for the deter- ministic and uncertain cases, respectively. Our analysis shows that without fairness, one should place all the monitors in the biggest community. Under a fair allocation however monitors are more evenly dis- tributed (although larger communities still receive a bigger share). Figure 1.1 illustrates the PoF results in the case of two communities for different failure rates γ (J =γI ), ignoring theo(.) order terms. We keep the size of the first (smaller) community fixed and vary the size of the larger community. In both cases, if |N 1 | =|N 2 |, the PoF is zero since uniform distribution of monitors is optimal. As|N 2 | increases, the PoF increases in both cases. Further increases in|N 2 | result in a decrease in the PoF for the deterministic case: under a fair allocation, the bigger community receives a higher share of monitors which is aligned with 23 the total coverage objective. Under uncertainty however, the PoF is non-decreasing: to guarantee fairness, additional monitors must be allocated to the smaller groups. This also explains why PoF increases withγ . 1.5 SolutionApproach Given the intractability of Problem (RC fair ), see Lemma 1, we adopt a conservative approximation ap- proach. To this end, we proceed in three steps. First, we note that a difficulty of Problem ( RC fair ) is the discontinuity of its objective function. Thus, we show that (RC fair ) can be formulated equivalently as atwo- stage robust optimization problem by introducing afictitious counting phaseafterξ is revealed. Second, we propose to approximate this decision made in the counting phase (which decides, for each node, whether it is or not covered). Finally, we demonstrate that the resulting approximate problem can be formulated equivalently as a moderately sized MILP, wherein the trade-off between suboptimality and tractability can be controlled by a single design parameter. 1.5.1 EquivalentReformulation. For any given choice ofx∈X andξ ∈Ξ , the objectiveF G (x,ξ ) can be explicitly expressed as the optimal objective value of a covering problem. As a result, we can express (RC fair ) equivalently as the two-stage linear robust problem max x∈X min ξ ∈Ξ max y∈Y X n∈N y n :y n ≤ X ν ∈δ (n) ξ ν x ν , ∀n∈N , (1.3) see Proposition 3 below. The second-stage binary decision variablesy∈Y :={y∈{0,1} N : P n∈Nc y n ≥ W|N c |, ∀c∈C} admit a very natural interpretation: at an optimal solution,y n =1 if and only if noden is covered. Henceforth, we refer toy as a covering scheme. 24 Definition 2 (Upward Closed Set). A setX given as a subset of the partially ordered set [0,1] N equipped withtheelement-wiseinequality,issaidtobeupwardclosedifforallx∈X and¯x∈[0,1] N suchthat¯x≥ x, it holds that ¯x∈X. Intuitively, sets involving lower bound constraints on the (sums of) parameters satisfy this definition. For example, sets that require a minimum fraction of nodes to be available. We can also consider group- based availability and require a minimum fraction of nodes to be available in every group. Assumption 3. We assume that: The set Ξ is defined through Ξ := {0,1} N ∩T for some upward closed setT given byT :={ξ ∈R N :Aξ ≥ b}, withA∈R R× N andb∈R R . Proposition3. Problems (RC fair ) and (1.3) are equivalent. K-adaptabilityCounterpart. Problem (1.3) has the advantage of being linear. Yet, its max-min-max structure precludes us from solving it directly. We investigate a conservative approximation to Prob- lem (1.3) referred to asK-adaptability counterpart, whereinK candidate covering schemes are chosen in the first stage and the best (feasible and most accurate) of those candidates is selected after ξ is revealed. Formally, theK-adaptability counterpart of Problem (1.3) is maximize x∈X y k ∈Y,k∈K min ξ ∈Ξ max k∈K X n∈N y k n : y k n ≤ X ν ∈δ (n) ξ ν x ν ∀n∈N , (1.4) wherey k denotes thekth candidate covering scheme,k∈K. We emphasize that the covering schemes are not inputs but rather decision variables of theK-adaptability problem. Only the valueK is an input. The optimization problem will identify the bestK covering schemes that satisfy all the constraints including fairness constraints. The trade-off between optimality and computational complexity of Problem (1.4) can conveniently be tuned using the single parameterK. 25 ReformulationasanMILP. We derive an exact reformulation for theK-adaptability counterpart (1.4) of therobustcoveringproblem as a moderately sized MILP. Our method extends the results from [153] to sig- nificantly more general uncertainty sets that are useful in practice, and to problems involving constraints on the set of covered nodes. Henceforth, we letL :={0,...,N} K , and we define L + :={ℓ∈L : ℓ > 0} andL 0 :={ℓ∈L :ℓ≯ 0}. We present a variant of the genericK-adaptability Problem (1.4), where the uncertainty set Ξ is parameterized by vectorsℓ ∈ L. Eachℓ is a K-dimensional vector, whose kth component encodes if thekth covering scheme satisfies the constraints of the second stage maximization problem. In this case,ℓ k = 0. Else, if thekth covering scheme is infeasible,ℓ k is equal to the index of a constraint that is violated. 26 Theorem1. Under Assumption 3, Problem (1.4) is equivalent to the mixed-integer bilinear program max τ s.t. τ ∈R, x∈X, y k ∈Y ∀k∈K θ (ℓ), β k (ℓ)∈R N + , α (ℓ)∈R R + , ν (ℓ)∈R K + , λ (ℓ)∈∆ K (ℓ) τ ≤ − e ⊤ θ (ℓ)+α (ℓ) ⊤ b− X k∈K: ℓ k ̸=0 y k ℓ k − 1 ν k (ℓ)+... ...+ X k∈K: ℓ k =0 X n∈N y k n β k n (ℓ)+ X k∈K λ k (ℓ) X n∈N y k n θ n (ℓ) ≤ A ⊤ α (ℓ)+ X k∈K: ℓ k ̸=0 X ν ∈δ (ℓ k ) x ν ν k (ℓ)− X k∈K: ℓ k =0 X ν ∈δ (n) x ν β k n (ℓ) ∀n∈N ∀ℓ∈L 0 θ (ℓ)∈R N + , α (ℓ)∈R R + , ν (ℓ)∈R K + 1≤− e ⊤ θ (ℓ)+α (ℓ) ⊤ b− X k∈K: ℓ k ̸=0 y k ℓ k − 1 ν k (ℓ) θ n (ℓ) ≤ A ⊤ α (ℓ)+ X k∈K: ℓ k ̸=0 X ν ∈δ (ℓ k ) x ν ν k (ℓ) ∀n∈N ∀ℓ∈L + , (1.5) which can be reformulated equivalently as an MILP using standard “Big-M” techniques since all bilinear terms are products continuous and binary variables. The size of this MILP scales with|L| = (N +1) K ; it is polynomial in all problem inputs for any fixed K. Proof Sketch. The reformulation relies on three key steps: First, we partition the uncertainty set by using the parameterℓ. Next, we show that by relaxing the integrality constraint on the uncertain parametersξ , 27 the problem remains unchanged. This is the key result that enables us to provide an equivalent formula- tion for Problem (1.4). Finally, we employ linear programming duality theory, to reformulate the robust optimization formulation over each subset. As a result, the formulation has two sets of decision variable: (a) The decision variables of the original problem; (b) Dual variables parameterized byℓ which emerge from the dualization. ■ 1.5.2 Bender’sDecomposition. In Problem (1.5), once binary variablesx and{y k } k∈K are fixed, the problem decomposes across ℓ, i.e., all remaining variables are real valued and can be found by solving a linear program for eachℓ. Bender’s decomposition is anexact solution technique that leverages such decomposable structure for more efficient solution [25, 37]. Each iteration of the algorithm starts with the solution of a relaxed master problem, which is fed into the subproblems to identify violated constraints to add to the master problem. The process repeats until no more violated constraints can be identified. The formulations of master and subproblems are provided in Appendix A. Symmetry Breaking Constraints. Problem (1.5) presents a large amount of symmetry. Indeed, givenK candidate covering schemesy 1 ,...,y K , their indices can be permuted to yield another, distinct, feasible solution with identical cost. The symmetry results in significant slow down of the Brand-and- Bound procedure [39]. Thus, we introduce symmetry breaking constraints in the formulation (1.5) that stipulate the candidate covering schemes be lexicographically decreasing. We refer to [181] for details. 1.6 ResultsonSocialNetworksofHomelessYouth We evaluate our approach on the five social networks from Table 1.1. Details on the data are provided in Section A.1. We investigate the robust graph covering problem with maximin racial fairness constraints. All experiments were ran on a Linux 16GB RAM machine with Gurobi v6.5.0. 28 40 50 60 70 80 20 30 40 50 Worst−Case Coverage of Worse−Off Group (%) Worst−Case Coverage (%) Approach DC Greedy K=1 K=2 K=3 K=1 K=2 K=3 0 2 4 6 0 25 50 75 100 125 0 30 60 90 0.3 0.4 0.5 0.6 0.7 0.8 Solver Time (minutes) Normalized Objective Value Figure 1.2: Left figure: Solution quality (overall worst-case coverage versus worst-case coverage of the group that is worse-off) for each approach (DC, Greedy, and K-adaptability forK = 1,2,3); The points represent the results of each approach applied to each of the five real-world social networks from Table 1.1; Each shaded area corresponds to the convex hull of the results associated with each approach; Approaches that are more fair (resp. efficient) are situated in the right- (resp. top-)most part of the graph. Right figure: Average of the ratio of the objective value of the master problem to the network size (across the five instances) in dependence of solver time for the Bender’s decomposition approach (dotted line) and the Bender’s decomposition approach augmented with symmetry breaking constraints (solid line). For both sets of experiments, the setting wasI =N/3 andJ =3. First, we compare the performance of our approach against the greedy algorithm of [176] and the degree centrality heuristic (DC). The results are summarized in Figure 1.2 (left). From the figure, we observe that an increase inK results in an increase in performance along both axes, with a significant jump from K =1 toK =2,3 (recall thatK controls complexity/optimality trade-off of our approximation). We note that the gain starts diminishing fromK = 2 toK = 3. Thus, we only run up toK = 3. In addition the computational complexity of the problem increases exponentially withK, limiting us to increaseK beyond 3 for the considered instances. As demonstrated by our results, K ∼ 3 was sufficient to considerably improve fairness of the covering at moderate price to efficiency. Compared to the baselines, with K = 3, we significantly improve the coverage of the worse-off group over greedy (resp. DC) by 11% (resp. 23%) on average across the five instances. Second, we investigate the effect of uncertainty on the coverage of the worse-off group and on the PoF, for both the deterministic (J =0) and uncertain (J >0) cases as the number of monitorsI is varied in the set{N/3,N/5,N/7}. These settings are motivated by numbers seen in practice (typically, the number of 29 Name SizeN Improvement in Min. Percentage Covered (%) PoF (%) Uncertainty LevelJ Uncertainty LevelJ 0 1 2 3 4 5 0 1 2 3 4 5 SPY1 95 15 16 14 10 10 9 1.4 1.0 2.1 1.3 3.3 4.2 SPY2 117 20 14 9 10 8 10 0.0 1.2 3.7 3.3 3.6 3.7 SPY3 118 20 16 16 15 11 10 0.0 3.4 4.8 6.4 3.2 4.0 MFP1 165 17 15 7 11 14 9 0.0 3.1 5.4 2.4 6.3 4.4 MFP2 182 11 12 10 9 12 12 0.0 1.0 1.0 2.2 2.4 3.6 Avg. (I =N/3) 16.6 14.6 11.2 11.0 11.0 10.0 0.3 1.9 3.4 3.1 3.8 4.0 Avg. (I =N/5) 15.0 13.8 14.0 10.0 9.0 6.7 0.6 2.1 3.2 3.2 3.9 3.8 Avg. (I =N/7) 12.2 11.4 11.2 11.4 8.2 6.4 0.1 2.5 3.5 3.2 3.5 4.0 Table 1.2: Improvement on the worst-case coverage of the worse-off group and associated PoF for each of the five real-world social networks from Table 1.1. The first five rows correspond to the setting I =N/3. In the interest of space, we only show averages for the settingsI =N/5 andI =N/7. In the deterministic case (J = 0), the PoF is measured relative the coverage of the true optimal solution (obtained by solving the integer programming formulation of the graph covering problem). In the uncertain case (J > 0), the PoF is measured relative to the coverage of the greedy heuristic of [176]. people that can be invited is 15-20% of network size). Our results are summarized in Table 1.2. Indeed, from the table, we see for example that forI =N/3 andJ =0 our approach is able to improve the coverage of the worse-off group by 11-20% and for J >0 the improvement in the worse-case coverage of the worse-off group is 7-16%. On the other hand, the PoF is very small: 0.3% on average for the deterministic case and at most 6.4% for the uncertain case. These results are consistent across the range of parameters studied. We note that the PoF numbers also match our analytical results on PoF in that uncertainty generally induces higher PoF. Third, we perform a head-to-head comparison of our approach forK =3 with the results in Table 1.1. Our findings are summarized in Table A.3 in Section A.1. As an illustration, in SPY3, the worst-case coverage by racial group under our approach is: White 90%, Hispanic 44%, Mixed 85% and Other 87%. These numbers suggest that coverage of Hispanics (the worse-off group) has increased from 33% to 44%, a significant improvement in fairness. To quantify the overall loss due to fairness, we also compute PoF values. The maximum PoF across all instances was at most 4.2%, see Table A.3. 30 Finally, we investigate the benefits of augmenting our formulation with symmetry breaking con- straints. Thus, we solve all five instances of our problem with the Bender’s decomposition approach with and without symmetry breaking constraints. The results are summarized in Figure 1.2 (right). Across our experiments, we set a time limit of 2 hours since little improvement was seen beyond that. In all cases, and in particular forK = 2 and3, symmetry breaking results in significant speed-ups. For K = 3 (and contrary to Bender’s decomposition augmented with symmetry breaking), Bender’s decomposition alone fails to solve the master problem to optimality within the time limit. We would like to remark that em- ploying K-adaptability is necessary: indeed, Problem (RC fair ) would not fit in memory. Similarly, using Bender’s decomposition is needed: even for moderate values ofK (2 to 3), theK-adaptability MILP (1.5) could not be loaded in memory. 1.7 ConclusionandBroaderImpact We believe that the robust graph covering problem with fairness constraints is worthwhile to investigate. It poses a huge number of challenges and holds great promise in terms of the realm of possible real-world applications with important potential societal benefits, e.g., to prevent suicidal ideation and death and to protect individuals during disasters such as landslides. 31 Chapter2 FairInfluenceMaximizationviaWelfareOptimization 32 2.1 Introduction The success of many behavioral, social, and public health interventions relies heavily on effectively lever- aging social networks [97, 175, 178]. For instance, health interventions such as suicide/HIV preven- tion [190] and community preparedness against natural disasters involve finding a small set of well- connected individuals who can act as peer-leaders to detect warning signals (suicide prevention) or dissem- inate relevant information (HIV prevention or landslide risk management). The influence maximization framework has been employed to find such individuals [185]. However, such interventions may lead to discriminatory solutions as individuals from racial minorities or LGBTQ communities may be dispropor- tionately excluded from the benefits of the intervention [151, 175]. Recent work has incorporated fairness directly into influence maximization by proposing various no- tions of fairness such as maximin fairness [151] and diversity constraints [175]. Maximin fairness aims at improving the minimum amount of influence that any community receives. Inspired by the game theory literature, diversity constraints ensure that each community is at least as well-off had they received their share of resources proportional to their size and allocated them internally. Each of these notions offers a unique perspective on fairness. However, they also come with drawbacks. For example maximin fairness can result in significant degradation in total influence due to its stringent requirement to help the worst-off group as much as possible, where in reality it may be hard to spread the influence to some communities due to their sparse connections. On the other hand, while the diversity constraints aim at taking the com- munity’s ability in spreading influence into account, it does not explicitly account for reducing inequality (i.e., does not exhibitinequalityaversion). Consequently, there is no universal agreement on what fairness means and in fact, it is widely known that fairness is domain dependent [136]. For example, excluding vulnerable communities from suicide prevention might have higher negative consequences compared to interventions promoting a healthier lifestyle. 33 Building on cardinal social welfare theory from the economics literature and principles of social wel- fare, we propose a principled characterization of the properties of social influence maximization solutions. In particular, we propose a framework for fair influence maximization based on social welfare theory, wherein the cardinal utilities derived by each community are aggregated using the isoelastic social wel- fare functions [26]. Isoelastic functions are in the general form ofu α /α,α < 1,α ̸= 0 andlogu,α = 0 whereα is a constant and controls the aversion to inequality andu is the utility value. They are used to measure the goodness or desirability of a utility distribution. However, due to the structural dependencies induced by the underlying social network, i.e., between-community and within-community edges, social welfare principles cannot be directly applied to our problem. Our contributions are as follows: • We extend the cardinal social welfare principles including the transfer principle to the influence maximization framework, which is otherwise not applicable. We also propose a new principle which we callutilitygapreduction. This principle aims to avoid situations where high aversion to inequality leads to even more utility gap, caused by between-community influence spread. • We generalize the theory regarding these principles and show that for all problem instances, there does not exist a welfare function that satisfies all principles. Nevertheless, we show that if all com- munities are disconnected from one another (no between-community edges), isoelastic welfare func- tions satisfy all principles. This result highlights the importance of network structure, specifically between-community edges. • Under this framework, the trade-off between fairness and efficiency can be controlled by a single inequalityaversion parameterα . This allows a decision-maker to effectively trade-off quantities like utility gap and total influence by varying this parameter in the welfare function. We then incorporate these welfare functions as objective into an optimization problem to rule out undesirable solutions. We show that the resulting optimization problem is monotone and submodular and, hence, can 34 be solved with a greedy algorithm with optimality guarantees; (iv) Finally, we carry out detailed experimental analysis on synthetic and real social networks to study the trade-off between total influence spread and utility gap. In particular, we conduct a case study on the social network-based landslide risk management in Sitka, Alaska. We show that by choosing α appropriately we can flexibly control utility gap (4%-26%) and the resulting influence degradation (36% - 5%). 2.2 RelatedWork Artificial Intelligence and machine learning algorithms hold great promise in addressing many pressing societal problems. These problems often pose complex ethical and fairness issues which need to be ad- dressed before the algorithms can be deployed in the real world. The nascent field of algorithmic fairness has emerged to address these fairness concerns. To this end, different notions of fairness are defined based on one or more sensitive attributes such as age, race or gender. For example, in the classification and re- gression setting, these notions mainly aim at equalizing a statistical quantity across different communities or populations [85, 191]. While surveying the entirety of this field is out of the score scope (see e.g., [28] for a recent survey), we point out that there is a wide range of fairness notions defined across different settings and it has been shown that the right notion is problem dependent [27, 136] and also different notions of fairness can be incompatible with each other [111]. Thus, care must be taken when we employ these notions of fairness across different applications. Motivated by the importance of fairness when conducting interventions in social initiatives, fair influ- ence maximization has received a lot of attention recently [6, 72, 151, 175]. These works have incorporated fairness directly into the influence maximization framework by (1) relying on either Rawlsian theory of justice [156, 151], (2) game theoretic principles [175] or (3) equality based notions [6, 171]. We will dis- cuss the first two approaches in more details in Sections 2.4 and 2.5, as well as in our experimental section. 35 Equality based approaches strive for equal outcomes across different communities. In general, strict equal- ity is hard to achieve and may lead to wastage of resources. This is amplified in influence maximization as different communities have different capacities in being influenced (e.g., marginalized communities are hard to reach). In [72], the authors investigate the notion of information access gap, where they propose maximizing the minimum probability that an individual is being influenced/informed to constrain this gap. As a result they study fairness at an individual level while we study fairness at the group level. Also, their notion of access gap is limited to the gap in a bipartition of the network which is in principle different from utility gap that we study which accommodates arbitrary number of protected groups. Similar to our work, in [6] the authors also study utility gap. They propose an optimization model that directly penalizes utility gap which they solve via a surrogate objective function. Their surrogate functions are in the form of a sum of concave functions of the group utilities which are aggregated with arbitrary weights. Unlike their work, our approach takes an axiomatic approach with strong theoretical justifications and it does not allow for arbitrary concave functions and weights as they violate the welfare principles. There has also been a long line of work considering fairness in resource allocation problems (see e.g., [33, 112, 43, 41]). More recently, group fairness has been studied in the context of resource allo- cation problems [53, 64, 24] and specifically in graph covering problems [151]. In resource allocation setting, maximin fairness and proportional fairness are widely adopted fairness notions. Proportional fair- ness is a notion introduced for bandwidth allocation [43]. An allocation is proportionally fair if the sum of percentage-wise changes in the utilities of all groups cannot be improved with another allocation. In classical resource allocation problems, each individual or group has a utility function that is independent of the utilities of others individuals or groups. However, this is not the case in influence maximization due to the underlying social network structure i.e., the between-community edges which makes our problem distinct from the classical resource allocation problems. We note that, while in the bandwidth allocation 36 setting there is also a network structure, the utility of each vertex is still independent of the other vertices and is only a function of the amount of resources that the vertex receives. Social welfare functions have been used within the economic literature to study trade-offs between equality and efficiency [167] and have been widely adopted in different decision making areas including health [1]. Recently, the authors in [86] proposed to study inequality aversion and welfare through cardinal welfare theory in the context of regression problems. Their main contribution is to use this theory to draw attention to other fairness considerations beyond equality. However, the classical social welfare theory, does not readily extend to our setting due to dependencies induced by the between-community connections. Indeed, extending those principles is a contribution of our work. 2.3 ProblemFormulation We useG = (V,E) to denote a graph (or network) in whichV is the set of N vertices andE is the set of edges. In the influence maximization problem , a decision-maker chooses a set of at most K vertices to influence (or activate). The selected vertices then spread the influence in rounds according to the In- dependent Cascade Model [106]. ∗ Under this model, each newly activated vertex spreads the influence to its neighbors independently and with a fixed probability p ∈ [0,1]. The process continues until no new vertices are influenced. We use A to denote the initial set of vertices, also referred to as influencer vertices. The goal of the decision-maker is to select a setA to maximize the expected number of vertices that are influenced at the end of this process. Each vertex of the graph belongs to one of the disjoint communi- ties (empty intersection) c ∈ C := {1,...,C} such thatV 1 ∪···∪V C = V whereV c denotes the set of vertices that belong to community c. This partitioning can be induced by, e.g., the intersection of a set of (protected) attributes such as race or gender for which fair treatment is important. We use N c to denote the size of communityc, i.e.,N c =|V c |. Furthermore, communities may be disconnected, in which ∗ Our framework is also applicable to other forms of diffusion such as Linear Threshold Model [106] 37 case∀c,c ′ ∈ C and∀v ∈ V c ,v ′ ∈ V c ′, there is no edge between v and v ′ (i.e., (v,v ′ ) / ∈ E). We define A ⋆ :={A⊆V ||A|≤ K} as the set of budget-feasible influencers. Finally, for any choice of influencers A ∈ A ⋆ , we letu c (A) denote the utility, i.e., the expected fraction of the influenced vertices of commu- nityc, where the expectation is taken over randomness in the spread of influence. The standard influence maximization problem solves the optimization problem maximize A∈A ⋆ X c∈C N c u c (A). (2.1) When clear from the context, we will drop the dependency ofu c (A) onA to minimize notational overhead. 2.4 ExistingNotionsofFairness Problem (2.1) solely attempts to maximize the total influence which is also known as the utilitarian ap- proach. Existing fair influence maximization problems are variants of Problem (2.1) involving additional constraints. We detail these below. MaximinFairness(MMF). Based on the Rawlsian theory [156], MMF [175] aims to maximize the utility of the worst-off community. Precisely, MMF only allows A∈A ⋆ that satisfy the following constraint min c∈C u c (A)≥ γ, whereA∈A ⋆ , where the left term is the utility of the worst-off community and γ is the highest value for which the constraint is feasible. Diversity Constraints (DC). Inspired by the game theoretic notion of core, DC requires that every community obtains a utility higher than when it receives resources proportional to its size and allocates 38 them internally [175]. This is illustrated by the following constraint whereU c denotes the maximum utility that communityc can achieve with a budget equal to⌊KN c /N⌋. u c (A)≥ U c , ∀c∈C whereA∈A ⋆ . (2.2) DC sets utility lower bounds for communities based on their relative sizes and how well they can spread influence internally. As a result, it does not explicitly account for reducing inequalities and may lead to high influence gap. We show this both theoretically and empirically in Sections 2.5.4 and 2.6. Demographic Parity (DP). Formalizing the legal doctrine of disparate impact [191], DP requires the utility of all communities to be roughly the same. For anyδ ∈[0,1), DP implies the constraints [4, 6, 171] u c (A)− u c ′(A) ≤ δ, ∀c,c ′ ∈C whereA∈A ⋆ . The degree of tolerated inequality is captured byδ and higherδ values are associated with higher tolerance. We use exact and approximate DP to distinguish betweenδ =0 andδ > 0. 2.5 FairInfluenceMaximization 2.5.1 CardinalWelfareTheoryBackground Following the cardinal welfare theory [26], our aim is to design welfare functions to measure the goodness of the choice of influencers. Cardinal welfare theory proposes a set of principles and welfare functions that are expected to satisfy these principles. Given two utility vectors, these principles determine if they are indifferent or one of them is preferred. For ease of exposition, let W denote this welfare function defined over the utilities of all individuals in the population (we will formalize W shortly). Then the existing 39 principles of social welfare theory can be summarized as follows. Throughout this section, without loss of generality, we assume all utility vectors belong to[0,1] N . (1)Monotonicity. Ifu≺ u ′ , thenW(u) < W(u ′ ). † In other words, ifu ′ Pareto dominatesu, thenW should strictly preferu ′ tou. This principle also appears as levelling down objection in political philoso- phy [143]. (2) Symmetry. W(u) = W (P(u)), where P(u) is any element-wise permutation ofu. According to this principle, W does not depend on the naming or labels of the individuals, but only on their utility levels. (3) Independence of Unconcerned Individuals. Let (u| c b) be a utility vector that is identical tou, except for the utility of individualc which is replaced by a new valueb. The property requires that for all c,b,b ′ ,u andu ′ ,W(u| c b)<W(u ′ | c b)⇔W(u| c b ′ )<W(u ′ | c b ′ ). Informally, this principle states thatW should be independent of individuals whose utilities remain the same. (4) Affine Invariance. For any a > 0 and b, W(u) < W(u ′ ) ⇔ W(au+b) < W(au ′ +b) i.e., the relative ordering is invariant to a choice of numeraire. (⋆5) Transfer Principle [57, 145]. Consider individuals i and j in utility vectoru such that u i < u j . Letu ′ be another utility vector that is identical tou in all elements excepti andj whereu ′ i =u i +δ and u ′ j =u j − δ for someδ ∈(0,(u j − u i )/2). Then,W(u)<W(u ′ ). Informally, transferring utility from a high-utility to a low-utility individual should increase social welfare. It is well-known that any welfare functionW that satisfies the first four principles is additive and in the form ofW α (u) = Σ N i=1 u α i /α forα ̸= 0 andW α (u) = Σ N i=1 log(u i ) forα = 0. Further, forα < 1 the last principle is also satisfied. In this case α can be interpreted as an inequality aversion parameter, where smallerα values exhibit more aversion towards inequalities. We empirically investigate the effect of α in Section 2.6. † ≺ meansuc≤ u ′ c for allc∈C anduc u c for somec∈C, thenW(u)<W(u ′ ). Informally, influence transfer principle states that in a desirable transfer of utilities, the magnitude of the improvement in lower-utility communities should be at least as high as the magnitude of decay in higher-utility communities while enforcing that at least one low-utility community receives a higher utility 42 after the transfer. The original transfer is a special case of influence transfer principle when communities are disconnected and utilities transferred remain the same. Next, we study whether any of the welfare functions that satisfy the first 4 principles satisfy the influ- ence transfer principle. In Proposition 4, we show any additive and strictly concave function satisfies influ- ence transfer principle. Since functions that satisfy the first 4 principles are strictly concave for α< 1, the influence transfer principle is automatically satisfied in this regime. We defer all proofs to the Appendix B. Proposition4. Any strictly concave and additive function satisfies influence transfer principle. To measure inequality, notion of utility gap (or analogous notions such as ratio of utilities) is commonly used [72, 171]. Utility gap measures the difference between the utilities of a pair of communities. In this work, we focus on the maximum utility gap, i.e., the gap between communities with the highest and lowest utilities (utility gap henceforth). For a utility vector u, we define ∆( u) = max c∈C u c − min c∈C u c to denote the utility gap. Fair interventions are usually motivated by the large utility gap before the intervention [124]. In [72], the authors have shown that in social networks the utility gap can further increase after an algorithmic influence maximizing intervention. We extend this result to the entire class of welfare functions that we study in this work and we notice that the utility gap can increase even if we optimize for these welfare functions. This is a surprising result since, unlike the influence maximization objective, these welfare functions are designed to incorporate fairness, yet we may observe an increase in the utility gap. We now introduce another principle which aims to address this issue. Again we focus on neighboring solutions. (6)UtilityGapReduction. LetA andA ′ ∈A ⋆ be two neighboring solutions with corresponding utility vectorsu=u(A) andu ′ =u(A ′ ). IfΣ c∈C N c u c ≤ Σ c∈C N c u ′ c . and∆( u)>∆( u ′ ) thenW(u)<W(u ′ ). The utility gap reduction simply states that the welfare function should prefer the utility vector whose total utility is at least as high as the other vector and also has smaller utility gap. We now show that, in 43 general, it is not possible to design a welfare function that obeys the utility gap reduction principle along with the other principles. Proposition5. LetW be a welfare function that obeys principles 1-5. Then there exists an instance of influ- ence maximization whereW does not satisfy the utility gap reduction. Next, we show on a special class of networks, i.e., networks with disconnected communities, the utility gap reduction principle is satisfied in all influence maximization problems. Proposition 6. LetW be a welfare function that obeys principles 1-5. If the communities are disconnected, thenW also satisfies the utility gap reduction principle. Propositions 5 and 6 and their proofs establish new challenges in fair influence maximization. These challenges arise due to the coupling of the utilities as a result of the network structure and more precisely the between-community edges. The results in Propositions 5 and 6 leave open the following question: “In what classes of networks, there exists a welfare function that satisfies all the 6 principles over all instances of influence maximization problems?" As an attempt to answer this question, we empirically show that over various real and synthetic networks including stochastic block models, there exist welfare functions that obey all of our principles. We conclude this section by the following three remarks. Remark 1 (Application to Other Settings). Our welfare-based framework can be theoretically applied to different graph-based problems (e.g., facility location) but algorithmic solution is domain-dependent. The choiceofinfluencemaximizationismotivatedbyevidenceaboutdiscriminationstudiedinpreviouswork[151, 171]. Remark2 (Relationship between Principles and Fairness). Monotonicityensuresthereisnowastageofutil- ities. Symmetry enforces the decision-maker to not discriminate based on communities’ names. According to the Independence of Unconcerned Individuals, between two solutions (choices of influencers) only those indi- viduals/communities whose utilities change should impact the decision-maker’s preference. Affine Invariance 44 is a natural requirement that the preferences over different solutions should not change based on the choice of numeraire. Finally, the Transfer Principle promotes solutions that are more equitable. Remark 3 (Selecting the Inequality Aversion Parameter in Practice). In our approach,α is a user-selected parameter that the user can vary to tune the trade-off between efficiency and fairness. Leaving the single parameterα in the hands of the user is a benefit of our approach since the user can inspect the solution as α is varied to select their preferred solution. Since a single parameter must be tuned, this can be done without the need for a tailored algorithm. In particular, we recommend thatα be either selected by choosing among a moderate number of values and picking the one with the most desirable behavior for the user or by using the bisection method. Typically, choosing α will reduce to letting the user select how much utility gap they are willing to tolerate: they will select the largest possible value ofα for which the utility gap is acceptable. 2.5.3 GroupFairnessandWelfareMaximization The welfare principles reflect the preferences of a fair decision-maker between a pair of solutions. Thus a welfare function that satisfies all the principles would always rank the preferred (in terms of fairness and efficiency) solution higher. As a result, we can maximize the welfare function to get the most preferred solution. We show that the two classes of welfare functionsW α (u)=Σ N i=1 u α i /α forα< 1,α ̸=0 andW α (u)= Σ N i=1 log(u i ) forα = 0 satisfy 5 of our principles. Hence as a natural notion of fairness we can define a fair solution to be a choice of influencers with the highest welfare as defined in the following optimization problem. maximize A∈A ⋆ W α (u(A)). (2.3) Lemma3. Intheinfluencemaximizationproblem, anywelfarefunctionthatsatisfiesprinciples1-5ismono- tone and submodular. 45 It is well-known that to maximize any monotone submodular function, there exists a greedy algorithm with a (1− 1/e) approximation factor [137] which we can also use to solve the welfare maximization problem. Each choice of the inequality aversion parameterα results in a different welfare function and hence a fairness notion. A decision-maker can directly use these welfare functions as objective of an optimization problem and study the trade-off between fairness and total utility by varying α , see Section 5. 2.5.4 ConnectiontoExistingNotionsofFairness Our framework allows for a spectrum of fairness notions as a function ofα . It encompasses as a special case leximin fairness ‡ , a sub-class of MMF, forα →−∞ . Proportional fairness [43], a notion for resource allocation problems, is also closely connected to the welfare function forα =0. It is natural to ask which of the fairness principles are satisfied by the existing notions of fairness for influence maximization. As we discussed in Section 2.4, the existing notions of fairness are imposed by adding constraints to the influence maximization problem. However, our welfare framework directly incorporates fairness into the objective. In order to facilitate the comparison, instead of the constrained in- fluence maximization problems we consider an equivalent reformulation in which we bring the constraints into the objective via the characteristic function of the feasible set. We then have a single objective func- tion which we can treat as the welfare function corresponding to the fairness constrained problem. More formally, given an influence maximization problem and fairness constraints written as a feasible set F max A∈A ⋆ X c∈C N c u c (A) s.t. u(A)∈F. ‡ Leximin is subclass of MMF. According to its definition, among two utility vectors, leximin prefers the one where the worst utility is higher. If the worst utilities are equal, leximin repeats this process by comparing the second worst utilities and so on. 46 Mono. Sym. Ind. of Unconcerned Affine Inf. Transfer Gap Red. Exact DP ✗ (Prop. 5) ✓ ✗ (Prop. 8) ✓ ✗ (Prop. 11) ✓ (Prop. 15) Approx. DP ✗ (Prop. 6) ✓ ✗ (Prop. 8) ✗ ✗ (Prop. 11) ✗ (Prop. 15) DC ✓ (Cor. 1) ✗ ✗ (Prop. 9) ✗ ✗ (Prop. 12) ✗ (Prop. 16) MMF ✓ (Cor. 1) ✓ ✗ (Prop. 10) ✓ ✗ (Prop. 13) ✗ (Prop. 17) Utilitarian ✓ (Cor. 1) ✓ ✓ ✓ ✗ (Prop. 14) ✗ (Prop. 18) Welfare ✓ ✓ ✓ ✓ ✓ ✗ (Prop. 2) Table 2.1: Summary of the properties of different fairness notions through the lens of welfare principles for influence maximization. We consider the following equivalent optimization problem max A∈A ⋆ X c∈C N c u c (A)+I F (u(A)):= max A∈A ⋆ W F (u(A)), in whichI F (u) is equal to 0 if u ∈ F and−∞ otherwise. Using this new formulation, we can now examine each of the existing notions of fairness though the lens of the welfare principles. Given the new interpretation, to show that a fairness notion does not satisfy a specific principle, it suffices to show there exist solutionsA,A ′ ∈A ⋆ and corresponding utility vectorsu = u(A) andu ′ = u(A ′ ) such that the principle prefersu overu ′ butW F (u) < W F (u ′ ). The results are summarized in Table 2.1 where in addition to comparing with the previous notions introduced in Section 2.4, we compare with the utilitarian notion i.e., Problem (2.1). We provide formal proofs for each entry of Table 2.1 in Appendix B. We observe that none of the previously defined notions of fairness for influence maximization satisfies all of our principles and each existing notion violates at least 3 out of 6 principles. We point out that exact DP is the only notion that satisfies the utility gap reduction. However, this comes at a cost as enforcing exact DP may result in significant reduction in total utility in the fair solution compared to the optimal unconstrained solution [54]. We evaluate our approach in terms of both the total utility or spread of influence (to account for effi- ciency) and utility gap (to account for fairness). We show by changing the inequality aversion parameter, we can effectively trade-off efficiency with fairness. As baselines, we compare with DC and MMF. To the 47 best our knowledge, there is no prior work that handles DP constraints over the utilities. We follow the ap- proach of [175] for both problems and view these problems as a multi-objective submodular optimization with utility of each each community being a separate objective. They propose an algorithm and imple- mentation with asymptotic(1− 1/e) approximation guarantee which we also utilize here. We use Price of Fairness (PoF), defined as the percentage loss in the total influence spread as a measure of efficiency. Precisely, PoF := 1− OPT fair /OPT IM in which OPT fair and OPT IM are the the total influence spread, with and without fairness. Hence PoF∈ [0,1] and smaller values are more desirable. The normalization in PoF allows for a meaningful comparison between networks with different sizes and budgets as well as between different notions of fairness. In the PoF calculations, we utilize the generic greedy algorithm [106] to com- pute OPT IM . To account for fairness, we compare the solutions in terms of the utility gap. Analogous measures are widely used in fairness literature [85] and more recently in graph-based problems [72, 171]. We also note that our framework ranks solutions based on their welfare and does not directly optimize utility gap, as such our evaluation metric of fairness does not favor any particular approach. We perform experiments on both synthetic and real networks. We study two applications: commu- nity preparedness against landslide incidents and suicide prevention among homeless youth. We discuss the latter in Appendix B. In the synthetic experiments, we use the stochastic block model (SBM) networks, a widely used model for networks with community structure [70]. In SBM networks, vertices are par- titioned into disjoint communities. Within each community c, an edge between two vertices is present independently with probabilityq c . Between any two vertices in communitiesc andc ′ , an edge exists in- dependently with probability q cc ′ and typically q c > q cc ′ to capture homophily [126]. SBM captures the community structure of social networks [81]. We report the average results over 20 random instances and setp=0.25 in all experiments. LandslideRiskManagementinSitka,Alaska. Sitka, Alaska is subject to frequent landslide incidents. In order to improve communities’ preparedness, an effective approach is to instruct people on how to 48 0 10 20 30 −9 −7 −5 −2 0 0.5 DC MMF Method Utility Gap (%) Budget 2% 4% 6% 8% 10% 0 10 20 30 40 −9 −7 −5 −2 0 0.5 DC MMF Method PoF (%) Budget 2% 4% 6% 8% 10% Figure 2.2: Left and right panels: utility gap and PoF for different K andα values for our framework and baselines. protect themselves before and during landslide incidents. Sitka has a population of more than 8000 and instructing everyone is not feasible. Our goal is to select a limited set of individuals as peer-leaders to spread information to the rest of the city. The Sitka population is diverse including different age groups, political views, seasonal and stable residents where each person can belong to multiple groups. These groups differ in their degree of connectedness. This makes it harder for some groups to receive the intended information and also impacts the cost of imposing fairness. Since collecting the social network data for the entire city is cumbersome, we assume a SBM network and use in-person semi-structured interview data from 2018-2020 with members of Sitka to estimate the SBM parameters. Using the interview responses in conjunction with the voter lists, we identified 5940 individuals belonging to 16 distinct communities based on the intersection of age groups, political views, arrival time to Sitka (to distinguish between stable and transient individuals). The size of the communities range from 112 (stable, democrat and 65+ years of age) to 693 (republican, transient fishing community, age 30-65). See Appendix B for details on the estimation of network parameters. 2.6 ComputationalResults Figure 2.2 summarizes results across different budget values K ranging from 2% to 10% of the network sizeN for our framework (different α values) as well as the baselines. In the left panel, we observe that asα decreases, our welfare-based framework further reduces the utility gap, achieving lower gap than DC 49 and competitive gap as MMF. As we noted in Section 2.5.4, our framework recovers leximin (which has stronger guarantees than MMF) asα →−∞ , though we show experimentally that this is achieved with moderate values ofα . Overall, utility gap shows an increasing trend with budget, however the sensitivity to budget decreases when more strict fairness requirements are in place, e.g. in MMF andα =− 9.0. From the right panel, PoF varies significantly across different approaches and budget values surpassing 40% for MMF. This is due to the stringent requirement of MMF to raise the utility of the worst-off as much as possible. Same holds true for lower values ofα as they exhibit higher aversion to inequality. The results also indicate that PoF decreases asK grows which captures the intuition that fairness becomes less costly when resources are in greater supply. Resource scarcity is true in many practical applications, including the landslide risk management domain which makes it crucial for decision-makers to be able to study different fairness-efficiency trade-offs to come up with the most effective plan. Figure 2.3 depicts such trade-off curves where each line corresponds to a different budget value across the range of α . Previous work only allows a decision-maker to choose among a very limited set of fairness notions regardless of the application requirements. Here, we show that our framework allows one to chooseα to meaningfully study the PoF-utility gap trade-offs. For example, given a fixed budget and a tolerance on utility gap, one can choose anα with the lowest PoF. We now investigate the effect of relative connectedness. We provide the effect of relative community size in Appendix A. ● ● ● ● ● ● ● ● ● ● 10 20 30 40 5 10 15 20 25 Utility Gap (%) PoF (%) Budget 2% 4% 6% 8% 10% Baseline ●● ●● DC MMF Figure 2.3: PoF vs. utility gap trade-off curves. Each line corresponds to a different budget K across different α values. 50 0 10 20 30 40 50 −9 −7 −5 −2 0 0.5 DC MMF Method Utility Gap (%) q3 0 0.01 0.02 0.03 0.04 0.05 0.06 0 5 10 15 20 25 −9 −7 −5 −2 0 0.5 DC MMF Method PoF (%) q3 0 0.01 0.02 0.03 0.04 0.05 0.06 Figure 2.4: Utility gap and PoF for various levelsq 3 . All results are compared across different values of α and the baselines. RelativeConnectedness. We sample SBM networks consisting of 3 communities each of size 100 where communities differ in their degree of connectedness. We set q 1 = 0.06,q 2 = 0.03,q 3 = 0.0 to obtain three communities with high, mid and low relative connectedness. We choose these values to reflect asymmetry in the structure of different communities which mirrors real world scenarios since not every community is equally connected. We set between-community edge probabilitiesq cc ′ to 0.005 for allc and c ′ andK =0.1N. We gradually increaseq 3 from 0.0 to 0.06. Results are summarized in Figure 2.4, where each group of bars correspond to a different approach. We observe as q 3 increases utility gap and PoF decrease until they reach a minimum around at around q 3 = 0.03. From this point, the trend reverses. This U-shaped behavior is due to structural changes in network. More precisely, for q 3 < 0.03 we are in the high-mid-low connectedness regime for the three groups, where the third community receives the minimum utility. As a result, asq 3 increases it becomes more favorable to choose more influencer vertices in this community which in turn reduces the utility gap. For q 3 > 0.03, the second community will be become the new worst-off community due its lowest connectedness. Hence, further increase in q 3 causes more separation in connectedness and we see previous behavior in reverse. Thus, by further increasing q 3 , communities 1 and 3 receive more and more influencer vertices. This behavior translates to PoF as the relative connectedness of communities impacts how hard it is to achieve a desired level of utility gap. Finally, we see that the U-shaped behavior is skewed, i.e., we observe higher gap and PoF in lower range of 51 q 3 which is due to higher gap in connectedness of communities. We can also compare the effect of relative connectedness and community size (see Appendix B). We observe that connectedness has a more significant impact on PoF (up to 25%) compared to commu- nity size (less than 4%). In other words, when communities are structurally different it is more costly to impose fairness. This is an insightful result given that in different applications we may encounter different populations with structurally different networks. Utility gap on the other hand is affected by both size and connectedness. Finally while our theory indicates that in the network setting, no welfare function can satisfy all principles including utility gap reduction over all instances of the influence maximization, we observe that our class of welfare functions satisfies all of the desiderata on the class of networks that we empirically study. Our theoretical results showed this for a special case of networks with disconnected communities. In particular, we see higher PoF is accompanied by lower utility gap which complies with utility gap reduction principle. 2.7 ConclusionandBroaderImpact As the empirical evidence highlighting ethical side effects of algorithmic decision-making is growing [8, 129], the nascent field of algorithmic fairness has also witnessed a significant growth. It is well-established by this point that there is no universally agreed-upon notion of fairness, as fairness concerns vary from one domain to another [136, 27]. The need for different fairness notions can also be explained by theoretical studies that show that different fairness definitions are often in conflict with each other [111, 51, 74]. To this end, most of the literature on algorithmic fairness proposes different fairness notions motivated by different ethical concerns. A major drawback of this approach is the difficulty of comparing these methods against each other in a systematic manner to choose an appropriate notion for the domain of interest. Instead of following this trend, we propose a unifying framework controlled by a single parameter that can be used by a decision-maker to systematically compare different fairness measures which typically 52 result in different (and possibly also problem-dependent) trade-offs. Our framework also accounts for the social network structure while designing fairness notions – a consideration that is mainly overlooked in the past. Given these two contributions, it is perceivable that our approach can be used in many of the public health interventions such as suicide, HIV or Tuberculous prevention that rely on social networks. This way, the decisions-makers can compare a menu of fairness-utility trade-offs proposed by our approach and decide which one of these trade-offs are more desirable without a need to understand the underlying mathematical details that are used in deriving these trade-offs. There are crucial considerations when deploying our system in practice. First, cardinal welfare is one particular way of formalizing fairness considerations. This by no means implies that other approaches for fairness e.g. equality enforcing interventions should be completely ignored. Second, we have assumed that the decision-maker has the full knowledge of the network as well as possibly protected attributes of the individuals which can be used to define communities. Third, while our experimental evaluation is based on utilizing a greedy algorithm, it is conceivable that this greedy approximation can create complications by imposing undesirable biases that we have not accounted for. Intuitively (and as we have seen in our experiments) the extreme of inequality aversion (α → −∞ ) can be used as a proxy for pure equality. However, the last two concerns require more care and we leave the study of such questions as future work. 53 PartII AlgorithmicFairnessunderObservationalData 54 Chapter3 FairandEfficientHousingAllocationPolicyDesign 55 3.1 Introduction We study the problem of designing policies to effectively match heterogeneous individuals to scarce re- sources of different types. We consider the case where both individuals and resources arrive stochastically over time. Upon arrival, each individual is assigned to a queue where they wait to be matched to a resource. This problem arises in several public systems such as those providing social services, posing unique chal- lenges at the intersection of efficiency and fairness. In particular, the joint characteristics of individuals and their matched resources determine the effectiveness of an allocation policy, making it crucial to match individuals with the right type of resource. Furthermore, when a resource becomes available, a decision- maker should decide whom among the individuals waiting in various queues should receive the resource which impacts the wait time of different individuals. In addition, since there are insufficient resources to meet demand, there are inherent fairness considerations for designing such policies. We are particularly motivated by the problem of allocating housing resources among individuals ex- periencing homelessness. According to the U.S. Department of Housing and Urban Development (HUD), more than 580,000 people experience homelessness on a given night [87]. The Voices of Youth Count study found youth homelessness has reached a concerning prevalence level in the United States; one in 30 teens (13 to 17) and one in 10 young adults (18 to 25) experience at least one night of homelessness within a 12- month period, amounting to 4.2 million persons a year [134]. Housing interventions are widely considered as the key solution to address homelessness [92]. In the U.S., the government funds programs that assist homeless using different forms of housing interventions and services [177]. The HMIS database collects information on the provision of these services. Unfortunately, the number of homeless individuals in the U.S. far exceeds the available resources which necessitates strategic allocation to maximize the intervention’s effectiveness. Many communities have at- tempted to address this problem by creating coordinated community responses, typically referred to as Coordinated Entry Systems (CES). In such systems, most agencies within a community pool their housing 56 1 8-17 4-7 0-3 PSH RRH SO λ 1 λ 2 λ 3 μ 1 μ 2 μ 3 Figure 3.1: NST-recommended resource allocation policy utilized by housing allocation agencies in the homelessness context. The policy is in the form of a resource eligibility structure. According to this figure, individuals with score eight and above qualify for PSH, score 4 to 7 are assigned to the RRH wait list and finally individuals who score below 4 are not assigned to any of the housing interventions. resources in a centralized system called a Continuum of Care (CoC). A CoC is a regional or local planning body that coordinates housing and services funding—primarily from HUD—for people experiencing home- lessness. Individuals in a given CoC who seek housing are first assessed for eligibility and vulnerability and those identified as having the greatest need are matched to appropriate housing resources [157]. For example, in the context of youth homelessness, the most widely adopted tool for assessing vulnerability is the Transition Age Youth-Vulnerability Index-Service Prioritization Decision Assistance Tool (TAY-VI- SPDAT): Next Step Tool (NST), which was developed by OrgCode Consulting, Corporation for Supportive Housing (CSH), Community Solutions, and Eric Rice. OrgCode claims that hundreds of CoC’s in the USA, Canada and Australia have adopted this tool [141]. After assessment, each individual receives a vulnera- bility score ranging from 0 to 17. One of the main challenges that CoC’s face is how to use the information about individuals to decide what housing assistance programs should be available to a particular homeless individual. In many communities, based on the recommendations provided in the NST tool documenta- tion, individuals who score 8 to 17 are considered as “high risk” and are prioritized for resource-intensive housing programs or Permanent Supportive Housing (PSH). Those who score in the 4-7 range are typically assigned to short-term rental subsidy programs or Rapid-ReHousing (RRH) and those with score below 4 are eligible for services that meet basic needs which we refer to as Service Only (SO) [158]. Figure 3.1 depicts how the individuals are matched to resources according to the status-quo policy. 57 The aforementioned policy can be viewed as a resource eligibility structure as from the onset, it deter- mines the resources an individual is eligible for. Such policies have the advantage of being interpretable, i.e., it is easy to explain why a particular allocation is made. Earlier work shows that most communities follow the policy recommendations when assigning housing [158]. However, controversy has surrounded the use of these cut scores and as of December 2020, OrgCode has called for new approaches to using the data collected by HMIS [140]. There is also an overwhelming desire on the part of HUD to design sys- tematic and data-driven housing policies, including the design of the cut scores and the queues that they induce [177]. Currently, the cut scores are not tied to the treatment effect of interventions or the relative arrival rate of individuals and resources in the respective queues. This is problematic as it is not evidently clear that assigning high-scoring and mid-scoring individuals to particular housing interventions, such as PSH or RRH, actually increases their chances of becoming stably housed. Additionally, there may not be enough resources to satisfy the needs of all individuals matched to a particular resource, resulting in long wait times. Prolonged homelessness may in turn increase the chances of exposure to violence, substance use, etc., or individuals dropping out of the system. In particular, OrgCode and others have called for a new equity focus to how vulnerability tools are linked to housing allocation [140, 128]. Despite recent efforts to understand and mitigate disparities in homelessness, current system suffers from a significant gap in the prevalence of homelessness across dif- ferent groups. For example, studies show that most racial minority groups experience homelessness at higher rates than Whites [78]. Also, recent work has revealed that PSH outcomes are worse for Black clients in Los Angeles [128] and based on the same HMIS data used in present study, Black, Latinx, and LGBQ youth have been shown to experience worse housing outcomes [89]. Addressing these disparities requires an understanding of the distribution of the individuals vulnerability to homelessness, the het- erogeneity in the treatment affect and the associations with protected attributes such as race, gender, or age. 58 In this work, we build on the literature on causal inference and queuing theory and we propose a methodology to use historical data about the waitlisted individuals and their allocated resources to evaluate and optimize new resource allocation policies that take policy effectiveness, fairness and wait time into account. We make the following contributions: • We model the policy optimization problem as a multi-class multi-server queuing system between heterogeneous individuals and resources that arrive over time. We extend the literature on queuing theory by proposing a data-driven methodology to construct the model from observational data. Specifically, we use tools from modern causal inference to learn the treatment effect of the interven- tions from data and construct the queues by grouping individuals that have similar average treatment effects. • We propose interpretable policies that take the form of a resource eligibility structure, encoding the resource types that serve each queue. We provide an MIO formulation to optimize the eligibility structure that incorporates flexibly defined fairness considerations or other linear domain-specific constraints. The MIO maximizes the policy effectiveness and guarantees minimum wait time. • Using HMIS data, we conduct a case study to demonstrate the effectiveness of our approach. Our results indicate superior performance along policy effectiveness, fairness and wait time. Precisely, we are able to obtain wait time as low as a fully FCFS policy while improving the rate of exit from homelessness for traditionally underserved or vulnerable groups (7% for the Black individuals and 15% higher for youth below 17 years old) and overall. The remainder of this chapter is organized as follows. In Section 3.2, we review the related literature. In Section 3.3, we introduce the policy optimization problem. In Section 3.4, we propose our data-driven 59 methodology for solving the policy optimization problem. Finally, we summarize our numerical experi- ments and present a case study using HMIS data on youth experiencing homelessness in Section 3.5. Proofs and detailed numerical results are provided in the Appendix C. 3.2 RelatedWork This work is related to several streams of literature which we review. Specifically, we cover queuing theory as the basis of our modelling framework. We also position our methodology within the literature on data- driven policy optimization and causal inference. We conclude by highlighting recent works on fairness in resource allocation. A large number of scarce resource allocation problems give rise to one-sided queuing models. In these models, resources are allocated upon arrival, whereas individuals queue before being matched. Examples are organ matching [14] and public housing assignment [104]. One stream of literature studies dynamic matching policies to find asymptotically optimal scheduling policies under conventional heavy traffic con- ditions [10, 122]. Another stream focuses on the system behavior under FCFS service discipline aiming to identify conditions that ensure the stability of the queuing system and characterize the steady-state match- ing flow rates, i.e., the average rate of individuals of a given queue (or customer class) that are served by a particular resource (server) [45, 67]. These works only focus on minimizing delay and do not explicitly model the heterogeneous service value among the customers. Recently, [60] studied one-sided queuing system where resources are allocated to the customer with the highest score (or index), which is the sum of the customer’s waiting score and matching score. The authors derive a closed-form index that opti- mizes the steady-state performance subject to specific fairness considerations. Their proposed fairness metric measures the variance in the likelihood of getting service before abandoning the queue. Contrary to their model, we consider FCFS policies subject to resource eligibility structures which we optimize over. Our model is based on the policies currently being implemented for housing allocation among homeless 60 individuals and are interpretable by design. In addition, our model allows for a more general class of fairness constrained commonly used in practice including fairness in allocation and outcome. Our approach builds upon [3], in which the authors study the problem of designing a matching topol- ogy between customer classes and servers under a FCFS service discipline. They focus on finding matching topologies that minimize the customers’ waiting time and maximize matching rewards obtained by pairing customers and servers. The authors characterize the average steady-state wait time across all customer classes in terms of the structure of the matching model, under heavy-traffic condition. They propose a quadratic program (QP) to compute the steady-state matching flows between customers and servers and prove the conditions under which the approximation is exact. We build on the theoretical results in [3] to design resource eligibility structures that match heterogeneous individuals and resources in the homeless- ness setting. Contrary to the model in [3], we do not assume that the queues or the matching rewards are given a priori. Instead, we propose to use observational data from historical policy to learn an appropriate grouping of individuals into distinct queues, estimate the matching rewards, and evaluate the resulting policies. Another stream of literature focuses on designing data-driven policies, where fairness considerations have also received significant attention due to implicit or explicit biases that models or the data may exhibit [34, 59, 148, 107]. In [34], the authors propose a data-driven model for learning scoring policies for kidney allocation that matches organs at their time of procurement to available patients. Their approach satisfies linear fairness constraints approximately and does not provide any guarantees for wait time. In addition, they take as input a model for the matching rewards (i.e., life years from transplant)to optimize the scoring policy. In [13], the authors propose a data-driven mixed integer program with linear fairness constraints to solve a similar resource allocation which provides an exact, rather than an approximate, formulation. They also give an approximate solution to achieve faster run-time. We consider a class policies in the form of matching topologies that is different from scoring rules and is more closely related 61 to the policies implemented in practice. Such policies offer more interpretability as individuals know what resources they are eligible for from the onset. Several works have considered interpretable functional forms in policy design. For example, in [31, 99], the authors consider decision trees and develop techniques to obtain optimal trees from observational data. Their approach is purely data-driven and do not allow for explicit modelling of the arrival of resources, individuals which impact wait time. In the homelessness setting, our work is closely related to [115] which proposes a resource allocation mechanism to match homeless households to resources based on the probability of system re-entry. In this work, the authors provide a static formulation of the problem which requires frequent re-optimization and does not take the waiting time into account. In [138], the authors propose a fairness criterion that prioritizes those who benefit the most from a resource, as opposed to those who are the neediest and study the price of fairness under different fairness definitions. Similar to [115], their formulation is static and does not yield a policy to allocate resources in dynamic environments. 3.3 HousingAllocationasaQueuingSystem 3.3.1 Preliminaries We model the resource allocation system as an infinite stream of heterogeneous individuals and resources that arrive over time. Each individual is characterized by a (random) feature vectorX ∈ X ⊆ R n and receives an interventionR from a finite set of treatments indexed in the set R. We note thatR may include “no intervention” or minimal interventions such as SO in the housing allocation setting. Using the potential outcomes framework [161], each individual has a vector of potential outcomesY(r) ∈ Y ⊆ R ∀r ∈ R, whereY(r) is an individual’s outcome when matched to resourcer. 62 We assume having access to N historical observations D := {(X i ,R i ,Y i )} N i=1 , generated by the deployed policy, where X i ∈ X denotes the feature vector of the ith observation, R i ∈ R is the re- source assigned to it and Y i = Y i (R i ) is the observed outcome, i.e., the outcome under the resource received. A (stochastic) policy π (r|x) : X ×R → [0,1] maps featuresx to the probability of receiving resourcer. We define the value of a policy as the expected outcome when the policy is implemented, i.e., V(π ) := E[ X r∈R π (r|X)Y(r)]. A major challenge in evaluating and optimizing policies is that we cannot observe the counterfactual outcomes Y i (r),r ∈ R,r ̸= R i of resources that were not received by data pointi. Hence, we need to make further assumptions to identify policy values from historical data. In Sec- tion 3.4, we elaborate on these assumptions and propose our methodology for evaluating and optimizing policies from data. We model the system as a multi-class multi-server (MCMS) queuing system where a set of resources R serve a finite set of individual queues indexed in the set Q. Upon arrival, individuals are assigned to different queues based on their feature vector. For example, in the housing allocation setting and according to the recommended policy the assignment is based on the vulnerability score. We use p : X → Q to denote the function that maps the feature vector to a queue that the individual will join. We refer top as thepartitioningfunction (as it partitionsX and assigns each subset to a queue) and note that it is unknown a priori. In this work, we consider partitioning functions in the form of a binary trees similar to classification trees, due to their interpretability [13]. We assume that individuals arrive according to independent Poisson processes and that inter-arrival time of resources follows an exponential distribution. These are common assumptions in queuing theory for modeling arrivals. We useλ :=(λ 1 ,...,λ |Q| ) andµ :=(µ 1 ,...,µ |R| ) to denote the vector of arrival rates of individuals and resources, respectively. We define λ Q := X q∈Q λ q andµ R := X r∈R µ r as the cumulative arrival rates of individuals and resources, respectively. Without loss of generality, we assume thatλ q >0∀q∈Q andµ r >0∀r∈R. 63 3.3.2 MatchingPolicy Once a new resource becomes available, it is allocated according to a resource eligibility structure that determines what queues are served by any particular resource. The resource eligibility structure can be equivalently represented as a matching topologyM := [M qr ]∈{0,1} |Q|×|R| , whereM qr = 1 indicates that individuals in queueq is eligible for resourcer. Resources are assigned to queues in an FCFS fashion subject to matching topology M. For a partitioning function p and matching topology M, we denote the allocation policy byπ p,M (r|x). We concern ourselves with the long-term steady state of the system. Proposition 7 gives the necessary and sufficient conditions to arrive at a steady-state. Proposition7 ([2], Theorem 2.1). GiventheMCMSsystemdefinedthrough (Q,R,λ ,µ ,M),undertheFCFS service discipline matchingM admits a steady state if and only if the following condition is satisfied: µ R − X r∈R X q∈Q R (M) λ q >0 ∀R⊆R . The left-hand side is the cumulative arrival rate of resources inR in excess of the cumulative arrival rate of all the queues inQ R , whereQ R is the set ofqueues that are only eligible forresources inR, i.e.,Q R ={q∈ Q: X r∈R\R M qr =0}. According to the above result, we can define the set of admissible matching topologies as those that satisfy the inequality in Proposition 7. In the housing allocation problem, we assume that SO resources are abundant, i.e.,µ R − λ Q >0. As a result, there exists at least one admissible matching: the fully connected matching topologyM qr = 1 ∀q ∈ Q,r ∈ R. The abundance assumption is necessary in order to avoid overloaded queues. However, in practice housing resources are strictly preferred. As a result, we propose to study the system under the so-called heavy traffic regime, where the system is loaded very close to its capacity and we assume that the system utilization parameter ρ := µ R /λ Q approaches 1, i.e., ρ → 1. In general, we assume thatλ andµ are such that λ Q = ρµ R . This assumption will additionally make 64 the analytical study of the matching system more tractable. In particular, in [3], the authors propose a quadratic program to approximate the exact steady-state flows of the stochastic FCFS matching system under heavy traffic conditions. They enforce the steady-state flows in an optimization model to find the optimal matching topology using KKT optimality conditions. We adopt the same set of constraints in the present work. We discuss in more detail when we present the final optimization formulation. We let F:= [F qr ]∈R |Q|×|R| + denote the steady-state flow, where M qr = 0⇒ F qr = 0. Given a partitioning function p, the policy associated with a matching topologyM is equal to π p,M (r|x) = F qr / X r∈R F qr = F qr /λ q , in whichq = p(x) and the second inequality follows from the flow balance constraints. In Proposition 8 we show how the policy value can be written using the matching model parameters and treatment effect of different interventions. We define the conditional average treatment effect (CATE) of resource r and queueq asτ qr :=E[Y(r)− Y(1)|P(X)=q]∀r∈R,q∈Q, in whichr =1 is the baseline intervention. In many applications, the baseline intervention corresponds to “no-intervention” (also referred to as the control group). In the housing allocation context, we setr =1 to be the SO intervention. Proposition8. Givenapartitioningfunctionp,anMCMSmodel(Q,R,λ ,µ ,M),andthesteady-stateFCFS flow F under FCFS discipline, the value of the induced policyπ p,M is equal to: V(π p,M )= 1 λ Q X q∈Q X r∈R F qr τ qr +C, whereC is a constant that depends on the expected outcome under the baseline intervention. 65 3.3.3 PolicyOptimization We now introduce the policy optimization problem under the assumption that the joint distribution of X,Y(r),r∈R as well as the partitioning functionp is known. In Section 3.4, we propose a methodology to constructp. Furthermore, we describe how we can use historical data to optimize new policies. P(p):= max M∈M V(π p,M ). (3.1) In the above formulation, M is the set of feasible matching topologies. In addition to the constraints that impose steady-state flow, we incorporate fairness and wait time constraints in the set M which we describe next. Fairness In this work, we focus on group-based notions of fairness which have been widely studied in recent years in various data-driven decision making settings [151, 13, 138, 34]. Formally, we let G be a random variable describing the group that an individual belongs to, taking values inG. For example, G can correspond to protected features such as race, gender or age. It is also possible to define fairness with respect to other features, such as vulnerability score in the housing allocation setting. We give several examples to which our framework applies. Example1 (Maximin Fair in Allocation). MotivatedbyRawlstheoryofsocialjustice[156],maximinfairness aims to help the worst-off group as much as possible. Formally, the fairness constraints can be written as X q∈Qg F qr ≥ w ∀g∈G,r∈R, wherew is the minimum acceptable flow across groups and Q g ⊆Q is a subset of queues whose individuals belong to G = g. If queues contain individuals with different values of g, one should separate them by creating multiple queues with uniqueg. By increasing the parameterw, one is imposing more strict fairness 66 requirements. This parameter can be used to control the trade-off between fairness and policy value. It can also be set to the highest value for which the constraint is feasible. Example2 (Group-based Parity in Allocation). Parity-basedfairnessnotionsstriveforequaloutcomesacross groups. X q∈Qg F qr − X q∈Q g ′ F qr ≤ ϵ ∀g,g ′ ∈G,r∈R. In words, for every resource the difference between the cumulative flow between any pair of groups should be at mostϵ , whereϵ can be used to control the trade-off between fairness and policy value. Example3 (Maximin Fair in Outcome). For every group, the policy value should be at leastw. 1 λ Qg X q∈Qg X r∈R F qr τ qr ≥ w ∀g∈G. Example4 (Group-based Parity in Outcome). Thedifferencebetweenthepolicyvalueforanypairofgroups is at mostϵ . 1 λ Qg X q∈Qg X r∈R F qr τ qr − 1 λ Q g ′ X q∈Q ′ g X r∈R F qr τ qr ≤ ϵ ∀g,g ′ ∈G. In the experiments, we focus on fairness in outcome due to treatment effect heterogeneity. In other words, it is important to match individuals with the right type of resource, rather than ensuring all groups have the same chance of receiving any particular resource. Further, we adopt maximin fairness which guarantees Pareto optimal policies [148]. Wait Time Average wait time is dependent on the structure of the matching topology. For example, minimum average wait time is attainable in a fully FCFS policy where M qr = 1 ∀q ∈ Q,r ∈ R. In [3], the authors characterize the general structural properties that impact average wait time. In particular, they show that under the heavy traffic condition, a matching system can be partitioned into a collection of complete resource pooling (CRP) subsystems that operate “almost” independently of each other. A key 67 property of this partitioning is that individuals that belong to the same CRP component experience the same average steady-state wait time. Furthermore, the average wait time is tied to the number of CRPs of a matching topology, where a single CRP achieves minimum average wait time. In [3], the authors introduce necessary and sufficient constraints to ensure that the matching topology M induces a single CRP component. We adopt these constraints in order to achieve minimum wait time which we discuss next. 68 3.3.4 OptimizationFormulation Suppose the joint distribution of X, Y(r) ∀r ∈ R is known. Given a partitioning function p to assign individuals to queues, problem (3.1) can be solved via the MIO below. max X q∈Q X r∈R τ qr f qr (3.2a) s.t. f qr ,ν qr ∈R + ,γ r ,θ q ∈R ∀q∈Q,r∈R (3.2b) M qr ,z qr ∈{0,1} ∀q∈Q,r∈R (3.2c) g (k) qr ∈R + ∀q,k∈Q,r∈R (3.2d) X q∈Q f qr =µ r ∀r∈R (3.2e) X r∈R f qr =λ q ∀q∈Q (3.2f) f qr ≤ λ q µ r (θ q +γ r +ν qr )+Z(1− m qr ) ∀q∈Q,r∈R (3.2g) f qr ≥ λ q µ r (θ q +γ r +ν qr )− Z(1− m qr ) ∀q∈Q,r∈R (3.2h) f qr ≤ Zm qr ∀q∈Q,r∈R (3.2i) f qr ≤ Zz qr ∀q∈Q,r∈R (3.2j) ν qr ≤ (|Q|+|R|+1)W(1− z qr ) ∀q∈Q,r∈R (3.2k) X q∈C g (k) qr =µ r ∀r∈R,k∈Q (3.2l) X r∈R h (k) qr =λ q − δ |Q|− 1 ∀q∈Q\{k},k∈Q (3.2m) X r∈R g (k) qr ≤ Zm qr ∀q,k∈Q,r∈R (3.2n) X r∈R g (k) qr =λ k +δ ∀q,k∈Q (3.2o) F∈F. (3.2p) 69 In this formulation, δ := Y q∈Q v q Y r∈R v r − 1 and w q v q = λ q , w r v r = µ r are rational number repre- sentations. The constants W and Z are defined as: W := 1/2max{max q∈Q 1/λ q ,max r∈R 1/µ r }, and Z := max q∈Q λ q max r∈R µ r P q∈Q 1/λ q + P r∈R 1/µ r +(|Q|+|R|+1) 2 W . Constraints (3.2e) and (3.2f) are the flow balance constraints. Constants W,Z ensure that constraints (3.2g)-(3.2k) impose the KKT conditions of the quadratic program that approximates steady-state-flow for a matching topology M. Constraints (3.2l)-(3.2o) enforce a single CRP component to ensure minimum wait time. Finally, con- straint (3.2p) is a fairness constraint where we can use any of the aforementioned examples. In order to solve problem (3.2), there are several parameters that must be estimated. In particular, we need to estimate τ qr andλ which depend ofp, as well asµ . 3.4 SolutionApproach We first partition X and then estimate CATE in each subset of the partition. We propose to use causal trees to achieve both tasks simultaneously [183]. Causal trees estimate CATE of binary interventions by partitioning the feature space into sub-populations that differ in the magnitude of their treatment effects. The method is based on regression trees, modified to estimate the goodness-of-fit of treatment effects. A key aspect of using causal trees for partitioning is that the cut points on features are such that the treatment effect variance within each leaf node is minimized. In other words, individuals who are similar in the treatment effect are grouped together in a leaf node. This results in queues that are tied to the treatment effect of resources which will result in improved policy value (see Section 3.5). 70 3.4.1 Assumptions Causal trees rely on several key assumptions which are standard in causal inference for treatment effect estimation [88]. These assumptions are usually discussed for the case of binary treatments. Below, we provide a generalized form of the assumptions for multiple treatments. Assumption4 (Stable Unit Treatment Value Assumption (SUTVA)). The treatment that one unit (individ- ual) receives does not change the potential outcomes of other units. Assumption5 (Consistency). The observed outcome agrees with the potential outcome under the treatment received. The implication of this assumption is that there are no different forms of each treatment which lead to different potential outcomes. In the housing allocation setting, this requires that there is only one version of PSH, RRH and SO. Assumption6 (Positivity). Forallfeaturevalues,theprobabilityofreceivinganyformoftreatmentisstrictly positive, i.e., P(R=r|X=x)>0∀r∈Rx∈X. Th positivity assumption states that any individual should have a positive probability of receiving any treatment. Otherwise, there is no information about the distribution of outcome under some treatments and we will not be able to make inferences about it. In Section 3.5, we discuss the implications of this assumption in the context of HMIS data. Assumption 7 (Conditional Exchangeability). Individuals receiving a treatment should be considered ex- changeable, withrespecttopotentialoutcomesY(r),r∈R, withthosenotreceivingitandviceversa. Math- ematically, Y(1),...,Y(|R|)⊥R|X=x∀x∈X. 71 1 Score < 11 Score < 7 Score < 9 Yes No Yes No Yes No PSH RRH Figure 3.2: Example partitioning by sample causal trees for PSH and RRH interventions. Conditional exchangeability means that there are no unmeasured confounders that are a common cause of both treatment and outcomes. If unmeasured confounders exist, it is impossible to accurately esti- mate the causal effects. In experimental settings, conditional exchangeability is obtained through stratified randomization. In observational settings, however, a decision-maker only relies on passive observations. As a result, in order to increase the plausibility of this assumption, researchers typically include as many features as possible inX to ensure that as many confounders as possible between treatment and outcome are accounted for. In the housing allocation setting, the HMIS database contains a rich set of features (54 features) associated with different risk factors for homelessness which we use in order to obtain the treatment effect estimates. In Section 3.6, we discuss the consequences of violating the above assumptions. 3.4.2 BuildingthePartitioningFunction Next, we describe our approach for estimating CATE. We first consider a simple case with binary treat- ments, i.e.,|R| = 2 as causal trees work primarily for binary treatments. After training the causal tree using the data on a pair of treatments, the leaves induce a partition on the feature spaceX . Hence, we can view the causal tree as the partitioning functionp where each individual is uniquely mapped to a leaf node, i.e., a queue. Extending to the case of|R| > 2 is non-trivial. Assumingr = 1 is the baseline intervention, we con- struct|R|− 1 separate causal trees to estimate CATE forr∈R\{1}. We denote the resulting causal trees 72 or partitioning functions p r : X → Q ∀r ∈ R\{1}. We define X r (q) = {x∈X :p r (x)=q}∀r ∈ R \ {1},q ∈ Q as the set of all individuals who belong to queue q according to partitioning func- tion p r . Also, let q r = p r (x). In order to aggregate the individual partitioning functions to obtain a unified partition on X , we consider the intersections ofX r (q) created by each tree. We define subsets X(q 1 ,...,q |R|− 1 ) = |R|− 1 \ r=1 X r (q r ) for all combinations of q r ∈ Q. We can viewX(q 1 ,...,q |R|− 1 ) as a new (finer) partition on X . We illustrate with an example using the housing allocation setting. Suppose we have constructed two causal trees for PSH and RRH according to Figure 3.2 such that PSH tree splits the vulnerability score into intervals of[0,6],[7,10],[11,17] and RRH creates[0,8],[9,17] subsets. According to our procedure, the final queues are constructed using the intersection of these subsets. In other words, we obtain[0,6],[7,8],[9,10],[11,17] which corresponds to four queues. We note that the granularity of the partition can be controlled through the tree depth or the minimum allowable number of data points in each leaf, both of which are adjustable parameters. Finally, in order to estimate τ qr , we should avoid using the estimates from each individual tree. The reason is that each tree estimatesE[Y(r)− Y(1)|p(X) = q,R∈{1,r}]∀r∈R\{1}. That is, a subset of the data associated with a pair of treatments is used to build each tree. Therefore, τ qr values are not generalizable to the entire population and need to be re-evaluated over all data points that belong to a subset. We adopt Doubly Robust estimator (DR) for this task. Proposed in [62], DR combines an outcome regression with a model for the treatment assignment (propensity score) to estimate treatment effects. DR is an unbiased estimate of treatment effects, if at least one of the two models are correctly specified. Hence, it has a higher chance of reliable inference. CATE estimates ˆ τ qr are provided below. ˆ τ qr = 1 |I q | X i∈Iq ˆ y(X i ,r)+(Y i − ˆ y(X i ,R i )) I(R i =r) ¯π (R i |X i ) − 1 |I q | X i∈Iq ˆ y(X i ,1)+(Y i − ˆ y(X i ,R i )) I(R i =1) ¯π (R i |X i ) , 73 whereI q := {i ∈ {1,...,N} : p(X i ) = q} is the set of indices in the historical data that belongs toq. Further, ˆ y and ¯π are the outcome and historical policy (i.e., propensity score) models, respectively. We end this section by discussing a practical consideration which is a desire to design policies that depend on low-dimensional features, such as risk scores. In cases that we only use risk scores, not the full feature vector, it is critical that they satisfy the causal assumptions. We provide a risk score formulation that satisfies this requirement. Proposition9. We define risk score functions as S r (x) =P[Y(r) = 1|X =x]∀r∈R. SupposeS∈S is a (random) vector of risk scores. Also, letY = (Y(1),...,Y(|R|)) be the vector of potential outcomes. The following statements hold: 1. IfY⊥R|X, thenY⊥R|S. 2. IfP(P(R=r|X=x)>0)=1∀x∈X, thenP(P(R=r|S=s)>0)=1∀s∈S. Under causal assumptions, S r (x) = P(Y(r) = 1|X = x,R = r) = P(Y = 1|X = x,R = r), which relies on observed data, rather than counterfactuals. According to Proposition 9, as in general individuals respond differently to various treatments, one risk score per resource may be required in order to summarize the information ofX. Alternatively, one can utilize all features to learn the propensity and outcome models and use those estimates in causal tree construction. 3.5 ComputationalResults We conduct two sets of experiments to study the performance of our approach to design resource allocation policies: (i) synthetic experiments where the treatment and potential outcomes are generated according to a known model; (ii) experiments on the housing allocation system based on HMIS data for youth ex- periencing homelessness. We use the causal tree implementation in the grf package in R. We control the partition granularity by changing the minimum node size parameter which is minimum number of 74 observations in each tree leaf. We evaluate policies using three estimators from the causal inference litera- ture [62]: Inverse Propensity Weighting (IPW) which corrects the mismatch between the historical policy and new policy by re-weighting the data points with their propensity values, Direct Method (DM) which uses regression models to estimate the unobserved outcomes, and DR. In addition, we include objective value of Problem (3.2) obtained by matching flow and CATE estimates (CT). When models of outcome and propensity are correctly specified, the above estimators are all unbiased [62]. 3.5.1 SyntheticExperiments We generate synthetic potential outcomes and resource assignments in the HMIS data collected between 2015 and 2017 from 16 communities across the United States [46]. We use the following setting using vulnerability scoreS (unless mentioned otherwise): ¯π (SO|S > 0.2) = 0.3, ¯π (SO|0.0 < S ≤ 0.2) = 0.3 and ¯π (SO|S ≤ 0.0) = 0.3. Additionally, ¯π (RRH|S > 0.2) = 0.2, ¯π (RRH|0.0 < S ≤ 0.2) = 0.4 and ¯π (RRH|S ≤ 0.0) = 0.3 and finally, ¯π (PSH|S > 0.2) = 0.5, ¯π (PSH|0.0 < S ≤ 0.2) = 0.3 and ¯π (PSH|S≤ 0.0)=0.4. The potential outcomes are sampled from binomial distributions with probabilities that depend on S. For PSH, we useE[Y(PSH)|S ≤ 0.3] = 0.6, E[Y(PSH)|0.3 < S ≤ 0.5] = 0.2 and E[Y(PSH)|0.5 < S] = 0.6. For RRH,E[Y(RRH)|S ≤ 0.2] = 0.2,E[Y(RRH)|0.2 < S ≤ 0.7] = 0.6 and E[Y(RRH)|0.7 17 Age <= 17 Figure 3.5: Out-of-sample rates of exit from homelessness by race (left panel) and age (right panel) using the DR estimator. 79 1 7-8 > 8 < 6 6 OPT PSH RRH SO Figure 3.6: Optimal Topology policy does not necessarily result in policies that are fair in terms of their outcomes neither by age nor by race. This is because FCFS policies ignore treatment effect heterogeneity. In other words, according to the FCFS discipline, everyone has the same probability of receiving any one of the resource types (fairness in allocation). However, not everyone benefits equally from the resources. Indeed, Black individuals seem to suffer the most under a fully FCFS policy. SQ also yields a low worst-case performance mainly due its low overall performance. SQ (data) has relatively better worst-case performance. However, there is still a sig- nificant gap between the performance of Black/Other groups and Whites. By explicitly imposing fairness constraints on policy outcomes across protected groups, OPT-Fair significantly improves the performance for the Black and Other groups. Figure 3.5, similar observations can be made for fairness by age, where compared to baselines with no fairness considerations, OPT-Fair exhibits significant improvements in the policy value for those with age below 17. We now present a schematic diagram of OPT and OPT-fair matching topologies. Figure 3.6 is the matching topology corresponding to OPT policy. Compared to SQ, OPT uses different cut points on NST score, specifically for the lower-scoring individuals. Across the four score groups, we observe a gradual transition from eligibility for a more resource-intensive intervention (PSH) to a basic intervention (SO). Figure 3.7 depicts OPT-fair topology for fairness on race, in which queues are constructed using the joint values of NST score and race. According to this figure, PSH is matched to all individuals with scores above 9 as well as mid-scoring Black individuals, i.e.,6≤ score≤ 9. RRH is connected to every individual in the mid-score range. Our modeling strategy uses the protected characteristics in order to ensure fairness. This 80 is motivated by discussions with our community advisory board, including housing providers/matchers and people with past history of homelessness, who suggested that in order to create a fair housing alloca- tion system there ought to be special accommodations for historically disadvantaged people. Our policies align with affirmative action policies that take individuals’ protected attributes into account in order to overcome present disparities of past practices, policies, or barriers by prioritizing resources for under- served or vulnerable groups. In this regard, recently HUD restored Affirmatively Furthering Fair Housing rule that requires “HUD to administer its programs and activities relating to housing and urban develop- ment in a manner that affirmatively furthers the purposes of the Fair Housing Act”, extending the existing non-discrimination mandates [58]. SO 8-9 Black White Other Black White Other 9 > 6 < Black White Other 6-7 Black White Other Race - dependent on race col (SO) RRH 8-9 Black White Other Black White Other 9 > 6 < Black White Other 6-7 Black White Other Race - dependent on race col (RRH) PSH 8-9 Black White Other 9 > Black White Other 6 < Black White Other 6-7 Black White Other Race - dependent on race col (PSH) Figure 3.7: Matching topology split by resource type: left (SO), middle (RRH) and right (PSH). Individuals are divided into four different score groups: S <6,S ={6,7},S ={8,9},S >9. Queues are constructed based on score groups and race jointly. Solid lines indicate that a resource is connected to the entire score group (a collection of queues). Dotted lines indicate connection to a single queue within the score group. For example, in the left figure, SO is only connected to the individuals with S ={6,7} and race White. Race - not dependent on race col 8-9 > 9 < 6 6-7 PSH RRH SO Figure 3.8: Fair topology (race) 81 Our approach can also be extended to non-affirmative policies. This is possible by imposing con- straints that ensure a topology has the same connections to all protected groups within a score group. Such constraints are expressible as linear constraints and can be easily incorporated in Problem (3.2). We demonstrate the result for fairness on race in Figure 3.8. We observe that all individuals who belong to a certain queue, regardless of their race, are eligible for the same types of resources. However, as a result of combining the queues, the worst-case policy value across the racial groups decreases from 0.76 to 0.73 which still outperforms SQ and SQ (data) with worse-case value of 0.61 and 0.69, respectively. We defer the results for fairness by age to Appendix C. 3.6 ConclusionandBroaderImpact Recently, there has been a significant growth in algorithms that assist decision-making across various domains [159, 82, 168, 132]. Homelessness is a pressing societal problem with complex fairness consider- ations which can benefit greatly from data-driven solutions. As empirical evidence on ethical side effects of algorithmic decision-making is growing, care needs to be taken to minimize the possibility of indirect or unintentional harms of such systems. We take steps towards this goal. Specifically, we propose in- terpretable data-driven policies that make it easy for a decision-maker to identify and prevent potential harms. Further, we center our development around issues of fairness that can creep into data from differ- ent sources such as past discriminatory practices. We provide a flexible framework to design policies that overcome such disparities while ensuring efficient allocations in terms of wait time and policy outcome. There are also crucial consideration before applying our framework in real-world. Our approach relies on several key assumptions about the data. Specifically, the consistency assumption requires that there is only one version of PSH, RRH, and SO. In practice, different organizations may implement different variants of these interventions. For example, combining substance abuse intervention with PSH and RRH. Such granular information about the interventions, however, is not currently recorded in the data which may 82 impact CATE estimates. Further, the exchangeability assumption requires that there are no unobserved confounders between treatment assignment and outcomes. Even though our dataset consists of a rich set of features for each individual, in practice, unobserved factors may influence the allocation of resources which calls for more rigorous inspection of service assignment processes. Unobserved confounders may lead to biased estimates of treatment effects which in turn impacts the allocation policies. In addition, our dataset consists of samples from 16 communities across the U.S., which may not be representative of new communities or populations. Hence, the external validity of such policies should be carefully studied before applying to new populations. Finally, there are other domain-specific constraints that we have not considered as they require collecting additional data. For example, resources can not be moved between different CoCs. We leave such considerations to future work. 83 Chapter4 CausalInferenceforEthicalDecision-Making 84 4.1 Introduction Recently, there has been a growing interest in applying causality for unfairness evaluation and mitiga- tion [116, 110, 135, 50]. Causality provides a conceptual and technical framework for addressing questions about the effect of (hypothetical) interventions on, in this context, sensitive attributes such as race, gender, etc. This is in contrast with fairness criteria that merely rely on passive observations [44, 102, 85, 191, 111]. Observational criteria achieve fairness by constraining the relationships between variables, often in con- flicting ways. Consequently, it has been shown that it is impossible to satisfy these criteria simultaneously on a dataset [111, 51, 169]. Causality helps unify these different perspectives by shifting the focus from association to causation in order to identify and mitigate the sources of disparity. This perspective is also more compatible with legal requirements of evaluating algorithmic bias discussed in earlier work [186]. Nevertheless, causal fairness too has been subject to criticism. One objection is around the validity of the assumptions in causal modeling. The majority of recent research on causal fairness has focused on structural causal models, which encode the relationships between variables via a Directed Acyclic Graph (DAG) [50, 116, 135]. In realistic settings, however, constructing the DAG model is a challenging task. In particular, it is generally difficult to come up with arguments for the absence of links without conducting controlled experiments [96]. Causal discovery from observational data also relies on strong untestable assumptions or do not generally pin down all possible causal details in a DAG [121, 170]. There are also concerns about considering categories such as race or gender as a cause [123, 113, 93, 105]. From one perspective, most of these attributes are determined at the time of an individual’s con- ception and are modeled as source nodes in a causal graph which can directly or indirectly influence the descendent variables. This view raises several major issues. Through such conceptualization, in order to evaluate and mitigate unfairness, one is inevitably required to identify all possible pathways through which sensitive attributes influence an outcome. In addition to the modelling challenge this view poses, in practice a single entity may not be held liable for the discrimination across an entire causal pathway. In this 85 regard, many anti-discrimination mechanisms investigate whether aparticularpersonorinstitutionalactor has behaved in a discriminatory manner. For example, in the employment setting, a racial discrimination lawsuit aims to determine whether a firm has withheld some benefits such as hiring with regard to racial identity of the applicant. However, disparities in hiring rates for different groups might be a reflection of either discrimination or differences in the applicant pool’s qualifications. For example, if past discrimina- tion in the educational system has led to some applicants having lower educational achievements, by hiring based on educational achievements, the employer will perpetuate the effects of this discrimination. Under anti-discrimination law, however, as long as the employer makes the hiring decision based on educational achievements that are legitimately connected with the job and business needs—with no regard to race— no liability is attached. In fact, if the employer seeks to proactively address past societal discrimination, this could lead to reverse discrimination lawsuits [130]. Another issue is post-treatment bias, which arises when one controls for post-treatment variables, resulting in biased estimates of the treatment effect [160]. Since some attributes such as race, gender, etc. are fixed at the time of one’s conception, almost all mea- surable variables become post-treatment. Hence, conditioning on those variables may lead to misleading estimates of discrimination. Removing those variables, e.g., as proposed in [116, 135], leaves little to no information for valid causal analysis. Alternatively, many view attributes such as race or gender as social constructs that evolve over the course an individual’s life. Recently, [94, 105] studied epistemological and ontological aspects of counter- factuals in the context of fairness evaluation. In [94], the authors argue that social categories such as race may not admit counterfactual manipulation. In [105], the authors aim to address this problem by proposing a set of tenets which require a decision-maker to state implicit and unspecified assumptions about social ontology as explicitly as possible. Despite recent efforts, there has been limited empirical investigation on how the nature of the intervention impacts the scope and validity of causal analysis of sensitive attributes and conclusions one draws. 86 In this work, we investigate the practical and epistemological challenges of applying causality for fairness evaluation. In particular, we highlight two key aspects that are often ignored in the current causal fairness literature: nature and timing of the interventions on social categories such as race, gender, etc. Further, we discuss the impact of this specification on the plausibility of causal assumptions. To facilitate this discussion, we draw a distinction between intervening on immutable attributes and their perception, and demonstrate how such conceptualization allows us to disentangle the potential unfairness along causal pathways and attribute it to the respective actors. The idea that perceptions matter and can be manipulated is not new. For example, researchers have examined the effect of manipulated names associated with political speeches [164] and resumes [29]. Nevertheless, in the machine learning literature, little attention has been paid to the consequences for valid causal inference for unfairness evaluation and mitigation. We make the following contributions: • We propose a causal framework to investigate and mitigate unfairness of a particular actor’s behav- ior, along a causal pathway. To the best of our knowledge, no prior work has aimed to isolate such effects for fair prediction. To tackle this problem, we highlight the importance of identifying the timing and nature of the intervention on social categories and its impact on conducting valid causal analysis including avoiding post-treatment bias. • We illustrate how causality can address the limitations of existing fairness criteria, including those that depend upon statistical correlations. In particular, we introduce the causal variants of the popu- lar statistical criteria for fairness and we make a novel observation that under the causal framework there is indeed no fundamental disagreement between different fairness definitions. • We conduct extensive experiments where we demonstrate the effectiveness of our methodology for unfairness evaluation and mitigation compared to common baselines. Our results indicate that the 87 causal framework is able to effectively identify and remove disparities at various stages of decision- making. 4.2 RelatedWork There are two main frameworks for causal inference: structural causal models [90], also referred to as DAGs, and the potential outcomes framework (POF) [161]. DAGs can be viewed as a sequence of steps for generating a distribution from independent noise variables. Causal queries are performed by changing the value of a variable and propagating its effect through the DAG [90]. POF, on the other hand, starts by defining the counterfactuals with reference to an intervention and postulates potential outcomes un- der different interventions, albeit some unobserved. In general, DAGs encode more assumptions about the relationships of the variables; i.e., one can derive potential outcomes from a DAG, but potential out- comes alone are not sufficient to construct the DAG. Consequently, POF has been more widely adopted in empirical research, including bias evaluation outside of ML [29, 179]. More detailed discussion on the differences between the two frameworks in relation to empirical research can be found in [96]. Causal inference on immutable attributes has appeared in several works including [179, 110] via proxy variables and [83] through the perception of an immutable attribute. In this work, we follow the footsteps of [83] and provide a rigorous framework to reason about the causal effect of immutable attributes which helps avoid some of the common issues in causal inference including post-treatment bias. Recently, there has been much interest in causality in the machine learning community, where the majority of works have adopted the DAG framework [116, 110, 194, 193, 120, 50] with a few exceptions that rely on POF [135, 108]. Specifically, [116] provides an individual-based causal fairness definition that renders a decision fair towards an individual if it is the same in the actual world and a counterfactual world where the individual possessed a different sensitive attribute. In [110], the authors propose proxy discrimination as (indirect) discrimination via proxy variables such as name, visual features, and language 88 which are more amenable to manipulation. Additionally, [135, 50] study path-specific discrimination, where the former proposes to remove the descendants of the protected attribute under the unfair pathway and the latter aims to correct the those variables. In [108], the authors propose two causal definitions of group fairness: fair on average causal effect (FACE), and fair on average causal effect on the treated (FACT) and show how these quantities can be estimated for specific attributes such as race or gender as the treatment. The authors restrict their attention to the fairness evaluation task and do not discuss the distinction between pre- and post-treatment variables. Further, [194, 193] discusses counterfactual direct, indirect, and spurious effects and provides formulas to identify these quantities from observational data. These works rely on a causal model, or DAG, and develop different methodologies to identify and mitigate unfairness. However, a clear discussion of the causal assumptions is typically missing, which consequently hinders the adoption of these methods in practice. In addition, the validity of the causal assumptions are influenced by the nature of the postulated intervention and its timing, which is not clearly articulated in the current literature. In many applications, discrimination by specific individuals or institutional actors is the subject of a study not an entire causal pathway. Our work makes this distinction and discusses the importance of specifying the timing and nature of a hypothetical intervention to conduct such analyses. Finally, we briefly review the observational notions of fairness. Demographic parity and its variants have been studied in numerous papers [63, 69, 54]. Also referred to as statistical parity, this fairness crite- ria requires the average outcome to be the same across different sensitive groups. Conditional statistical parity [69, 54] imposes a similar requirement after conditioning on a set of legitimate factors. In the clas- sification setting, equalized odds and a relaxed variant, equality of opportunity, have been proposed [85] to measure the disparities in the error rate across different sensitive groups. The aforementioned criteria can be expressed using probability statements involving the observed random variables at hand, hence the name observational. These criteria are often easy to state and interpret. However, they suffer from a major limitation: it is impossible to simultaneously achieve these criteria on any particular dataset [111, 51, 169]. 89 In this work, we revisit these notions and introduce their causal variants, where we show that under the causal framework, there is no fundamental disagreement between different criteria. 4.3 CausalFairness: APotentialOutcomesPerspective We consider a decision-making scenario where X ∈ X ⊆ R n is the available set of attributes for an individual which we aim to use in order to make a (discrete) decisionY ∈{0,1}. An individual is further characterized by a sensitive attributeA ∈ {0,1} for which fair treatment is important. We assumeA is a single binary variable, however, our discussion can naturally be extended to cases where A has more than two levels. It also applies when there is more than one sensitive attribute, such as the intersection of race and gender, by considering their joint values. Causal fairness views the unfairness evaluation and mitigation problem as a counterfactual inference problem. For example, we aim to answer questions of type: What would have been the hiring decision, if the person had been perceived to be of a different gender? orWouldthepersonhavebeenarrestediftheyhadbeenperceivedtobeadifferentrace? Such causal criteria are centered around the notion of an intervention or treatment on social categories such as gender and race. Formally, we build on POF [161] and define Y(A),A ∈ {0,1} as random variables describing the potential outcomes under different values of A, i.e., the outcome after we manipulate one’s sensitive attributeA (its perception). It is important to note that for any individual, only one of the values ofY(A) is observed which is the outcome corresponding to the possessed value ofA. Other outcomes are considered as counterfactual quantities and are treated as unobservable variables. In this work, we take a decision-maker’s perspective considering how their perception of one’s sensi- tive attribute may lead to different decisions. Through this conceptualization, it is possible for discrimina- tion to operate not just at one point in time and within one particular domain but at various points within and across multiple domains throughout the course of an individual’s life. For example, in the context 90 X ˜ X(A) A Y(A) time Figure 4.1: Decision-making timeline: the time when one’s sensitive attributeA is perceived determines pre- and post-treatment variables. Here,X is the vector of pre-treatment variables, ˜ X is a post-treatment variable andY is the outcome or decision. of racial discrimination, earlier work has recognized potential points of discrimination across different domains including labor market, education, etc. [147]. Consequently, we need to specify the point in time at which we wish to measure and mitigate un- fairness. In causal terms, this is closely related to the notion of timing of the intervention, i.e., the time at which one’s sensitive attribute is perceived by an actor. To illustrate, consider a hiring scenario and suppose we are interested in evaluating whether the hiring decision is fair with respect to gender or not. We can investigate unfairness at different stages, e.g., from the first time an individual comes into contact with the company (e.g., resume review), progresses in the system (during interviews), or when the final decision is being made. We may even take a much broader perspective and investigate the effect of gender from the point an individual attends college and study how gender affects education and subsequently the opportunities in the job market. Indeed, as we expand our view the causal inference problem we are faced with becomes increasingly more challenging but the conceptual framework remains valid. Both timing and nature of the intervention impact the conclusions we draw. For example, under an un- fair educational system, a hiring decision that is based on educational achievements will perpetuate those biases, even if it treats individuals fairly given their educational background. Similarly, a discriminatory interview process will result in an unfair hiring decision. However, the difference is that in the latter, the company is now liable for the discriminatory behavior as it stems from the a point in its decision-making process. The timing of the intervention is thus important in conducting causal analysis. In particular, consider an interview process which is discriminatory, resulting in unfair interview scores for a particular 91 group. In our fairness evaluation, if we control for the interview score, we will find no relationship be- tween gender and hiring decision contrary to our intuition that the decision-maker discriminates between female and male candidates through the interview score. This observation is due to post-treatment bias cautioned in the causal inference literature which happens when variables that are fixed after the interven- tion are used in evaluating the treatment effect [133]. Figure 4.1 demonstrates this over a decision-making timeline. After we fix the point of (hypothetical) intervention on A, variables ˜ X ∈ ˜ X ⊆ R m deter- mined afterwards are considered as post-treatment variables and in principle are affected by A. Hence, we introduce the counterfactual values of ˜ X(0), ˜ X(1) to differentiate between pre-treatment and post- treatment variables. Consequently, the observed values of post-treatment variables are determined as ˜ X = ˜ X(0)(1− A)+ ˜ X(1)A. Furthermore, the nature of the intervention influences the causal effect that we are able to uncover. For instance, in the study conducted in [29], the authors manipulated the names on the resumes to measure racial discrimination which only allowed them to capture the level of discrimination exhibited through the relationship between one’s name and perception of race. Under a different manipulation, e.g., zip code of the applicant, the outcome of the study would have been different. In observational studies, where the analyst has no control over how an individual’s sensitive attribute is perceived, a careful examination of mechanisms through which one’s attributes are perceived is necessary. Indeed, it is possible to identify several mechanisms affecting perceived attributes (e.g., name, clothing, language, etc.). In this case, it is possible to study the joint effect of the mechanisms by modeling the missing counterfactual values, under each mechanism, as random variables with a distribution. The distribution for each individual’s missing counterfactual value can then be represented by a stochastic mixture of distributions associated with each mechanism [83]. 92 Building on the above discussion, we define fairness in terms of the treatment effect of a specific in- tervention on perceived sensitive attribute at a particular point in time. We refer to this notion as causal parity and under the POF, we can express it mathematically via the following definition. Definition3 (Causal Parity). A decision-making process achieves causal parity ifE[Y(1)− Y(0)]=0. In the above definition, τ =E[Y(1)− Y(0)] is the treatment effect of A onY . As stated earlier, both potential outcomes Y(0),Y(1) are not simultaneously observed for any individual. In order to conduct meaningful causal inference to identify the treatment effects several assumptions are necessary. We re- view the assumptions and discuss how the precise specification of the intervention helps establish their plausibility. 4.3.1 CausalAssumptionsforIdentification Assumption8. There is a set of established conditions under which causal inference becomes possible: • Stable Unit Treatment Value Assumption (SUTVA): It states the treatment that one unit (individual) receives does not change the potential outcomes of other units. • Consistency: Formally, Y = Y(0)(1− A)+Y(1)A. In words, Y agrees with the potential outcome under the respective treatment. The implication of this assumption is that there are no two “flavors” or versions of treatment such that A = 1 under both versions but the potential outcome for Y would be different under the alternative versions. • Positivity: Ateachlevelofpre-treatmentvariablesX,theprobabilityofreceivinganyformoftreatment is strictly positive. Mathematically, P(P(A=a|X=x)>0)=1∀a∈{0,1},x∈X. 93 • Conditional Exchangeability: it states that those individuals receiving the treatment should be consid- eredexchangeable(withrespecttopotentialoutcomesY andthepost-treatmentvariable ˜ X)withthose not receiving the treatment and vice versa. Mathematically, Y, ˜ X ⊥A|X=x∀x∈X, whereY ={Y(0),Y(1)}, ˜ X = n ˜ X(0), ˜ X(1) o andX is the vector of pre-treatment variables. Earlier works have emphasized the criticality of these assumptions in determining the causal effects [162]. Here, we highlight their importance in the context of fairness evaluation. SUTVA can also be viewed as a non-interference assumption and depends very much on the problem under study and the choice of the decision-maker. For example, for a recruiter as the decider, one should think carefully whether the re- cruiter’s decision to proceed with an application is independent from case to case. If a recruiter screened three candidates in a row with exceptional resumes, they might raise their standards when judging the fourth resume. In this case, SUTVA is violated as historical data on other candidates influences the future candidates outcomes. The consistency assumption means, for example, that for candidates perceived as either male or fe- male, an employer would not base the hiring decision on the level of “manliness.” Similarly, the degree of “blackness” of an individual should not affect the decision made for an individual. This assumption, however, can be potentially relaxed with information beyond what is typically assumed. For example, if an accurate estimate of the level of “manliness” or skin color were recorded, then the treatment could be conceptualized as having multiple levels [83]. Consistency can also be viewed as treatment invariance, which we discussed in the previous section in the context of nature of intervention on social categories. When intervening on social categories such as race, it is possible that different factors contribute to the perception of one’s sensitive attribute. Under consistency, one needs to make sure that there is sufficient 94 data in order to capture the different levels of “race.” Without such nuanced data, it is still possible to measure the causal effect, but the interpretation changes, as the estimated causal effect is an average of multiple potential treatments. The positivity assumption is also essential in order identify the treatment effect. It requires that there is not a complete overlap between the treatment assignment and pre-treatment variables. For example, if all of the women in a hiring pool have a PhD, and all of the men only have a Master’s degree, then it is not possible to separate the effect of gender discrimination from the effect of the educational attainment on the employment decision. Positivity is often easy to verify from the data once the pre-treatment variables X are determined. Conditional exchangeability is one of the cornerstone assumptions for causal inference, which is in principle impossible to verify in observational studies. Conditional exchangeability in experimental set- tings can be obtained through stratified randomization. In order to increase the plausibility of this as- sumption in observational contexts, analysts typically include as many pre-treatment variables as possible to ensure that as many confounders as possible between treatment and outcome are accounted for. Intu- itively, the goal is to ensure that once all of the pre-treatment variablesX are controlled for, the allocation of individuals between treatment and control is as close to random as possible. In the fairness setting, this would mean that, after controlling forX, the only systematic difference between the two groups is the perception of their protected attribute (i.e., whether they were discriminated against), allowing for an empirical estimate of the effect of discrimination. We note that in the exchangeability assumption, we have the conditional independence of the counterfactuals of both ˜ X andY. This is a key distinction with earlier work [108] that does not differentiate between pre- and post-treatment variables. In more complicated settings, where an individual interacts with multiple parts of a system, we may have more than one choice of decision-maker to study. In such situations, an analyst may have to balance the need to make the exchangeability assumption plausible against the desire to study a decision-maker’s 95 behavior early in the decision-making chain. Choosing the timing of the intervention towards the later interactions renders more measured variables pre-treatment which in turn can make the exchangeability assumption more plausible. However, by treating such variables as pre-treatment and thus conditioning on them in the analysis, the analyst forgoes the detection of any prior discrimination that may have af- fected the values of these variables. In cases where there is sufficient data to detect discrimination starting from earlier stages of decision-making, it may be still important to pin down the different sources of dis- crimination throughout the decision-making process. For example, in the hiring context, suppose from the onset (the first interaction of the applicant with the company), a rich set of data about the applicant’s background and qualifications is collected that allows an analyst to determine the hiring process is unfair towards to a group. In such a case, it is important to understand whether discrimination is attributed to the recruitment process, the interview stage or the final hiring process. Additionally, there may be a long delay between the time of perceiving an individual’s sensitive attribute and outcome. In this case, it may be helpful to use post-treatment variables to improve the precision [11]. 4.3.2 FairnessEvaluation So far, we have examined the causal assumptions and their implications in the context of fairness evalu- ation. Once the plausibility of the assumptions are established, we can proceed to estimate the treatment effect of A on Y . While there are many approaches in estimating the causal effect, we mainly focus on direct regression method. We first consider a case where post-treatment variables are absent. Under causal assumptions, treatment effect of A can be formulated as τ =E[Y(1)− Y(0)]=E[E[Y(1)− Y(0)|X=x]] =E[E[Y |X=x,A=1]− E[Y |X=x,A=0]], 96 which can be estimated from observational data via two separate regression models. When post-treatment variables are present, it may be helpful to use them in order to improve the precision of treatment effect estimates. In this case, simply conditioning on those variables will introduce bias in the analysis. Instead, we should treat them as dependants onA. In order to emphasize the causal effect of post-treatment vari- ables on the potential outcomes, we consider potential outcomesY(A, ˜ X(A)) that are indexed by both the treatment and the post-treatment counterfactuals. We estimate the treatment effect of A onY is given as τ =E h Y(1, ˜ X(1))− Y(0, ˜ X(0)) i . In the mediation literature, this quantity is known as total effect [95]. Estimating the total effect poses a considerable identification challenge as it depends on four ( ˜ X(0), ˜ X(1),Y(0, ˜ X(0)),Y(1, ˜ X(1))) counterfactuals which are not simultaneously observed for any individual. To tackle this problem, we pro- pose to use imputation [163] which is commonly used in causal inference literature to assign values to unobserved variables in the data. Precisely, in order to attain the causal effect of A onY , we sequentially impute the missing variables were conditional on the previous step. Precisely, we first impute the counter- factual post-treatment variables ˜ X as a function of the pre-treatment variablesX andA. Next, we impute unobservedY(A, ˜ X(A)) values as a function of pre-treatment variablesX, post-treatment counterfactu- als ˜ X and A. Similar sequential imputation techniques have been used in causal inference literature in order to evaluate the long-term impact of policy shifts [187]. 4.3.3 UnfairnessMitigation In the previous section, we focused solely on fairness evaluation which we formulated as a causal inference problem on the effect of A onY . Here, we discuss how we can mitigate unfairness if the treatment effect of A on Y is non-zero. Similar to the previous section, we distinguish between pre- and post-treatment variables as the post-treatment variables are affected by A. The core idea of our unfairness mitigation approach is to adjust the post-treatment and outcome variables to achieve τ = 0. The idea of adjusting 97 downstream variables, affected by sensitive attributes, has been recently investigated in the fair ML liter- ature and in the context for mitigating path-specific effects under the DAG framework [50]. In this work, we are interested in mitigating unfairness that attributed to a specific actor’s decision-making process, rather than an entire causal path. Intuitively, our approach is based on the assumption that in a fair world, everyone is treated with no regard to their group membership. In other words, we deem a decision-making process fair if everyone is treated as if they belong to the same group, which we refer to as a baseline group. The baseline group can be viewed as either the majority group or a historically advantaged group. We first consider a setting with no post-treatment variables and assume E[Y(1)− Y(0)] ̸= 0. Let A=0 be the baseline group. If we had access toY(0) for every individual in the population, we could use that in order to learn a fair classifier. That is, if we observed the outcome of individuals had they belonged to the baseline group, we could use this data to learn a predictive model. The reason is that when we useY(0) to learn a model, we are effectively eliminating decision-maker’s unfavorable attitude towards membership to group A = 1. Consequently, we can model the prediction problem as P(Y(0) = 1 | X =x). In the presence of post-treatment variables, we employ a similar approach in order to eliminate the discriminatory effects of A. Therefore, we formulate the prediction problem asP(Y(0) = 1 | X = x, ˜ X(0)= ˜ x), in which we use ˜ X(0) which is the value of the post-treatment variable had the individual belonged to group A = 0. Remarkably, under this formalization causal parity is automatically achieved as E[Y(0)− Y(0)] = 0. In words, we are practically assuming that an the potential outcomes for an individual is the same and is equal toA = 0, regardless of the observed value ofA. A key challenge with this approach is that Y(0) values are not observed for every individual. We leverage imputation from causal inference literature to tackle this problem [163]. 98 4.4 Trade-offsundertheLensofCausality We now turn to another important aspect of our analysis. We introduce causal variants of common statis- tical criteria of fairness to study their behavior under the causal lens. 4.4.1 CausalFairnessDefinitions We center our discussion on the criteria with known impossibility results in the fair ML literature. Definition4 (Conditional Causal Parity). A decision-making process achieves conditional causal parity if E[Y(1)− Y(0)|X=x]=0∀x∈X. The above definition is closely related to conditional statistical parity which aims to evaluate fairness after controlling for a limited set of “legitimate” factors [102]. The set of legitimate factors significantly impacts the conclusions we draw. However, it is typically assumed as given, e.g., by domain experts. In contrast, in our definition X collects all the pre-treatment variables. Hence, once the nature of the intervention is explicitly defined, all remaining pre-treatment variables can be considered as legitimate since the main effect we aim to identify is the effect of the treatment. Definition5 (Causal Equalized Odd). A predictor ˆ Y satisfies causal equalized odds if: P( ˆ Y =1|Y(0)=1)=P( ˆ Y =1|Y(1)=1) P( ˆ Y =1|Y(0)=0)=P( ˆ Y =1|Y(1)=0) The above definition is the causal counterpart of equalized odds proposed in [85]. It states that the probability of receiving a positive prediction ˆ Y =1 in worlds where everyone is treated asA=0 orA=1 should be the same. Therefore, an individual does not have any preferences to be in either of these worlds 99 since in either world the prediction is the same. Next, we define the causal variant of calibration [111]. Calibration is defined in the context of risk scores. Definition6 (Causal Calibration). LetS∈S denote a random variable encoding an individual’s risk score. The risk assignment is well-calibrated within groups if it satisfies the following condition: P(Y(0)=1|S =s)=P(Y(1)=1|S =s)∀s∈S. Causal calibration states that a risk scoreS should have the same meaning in worlds where everyone is treated asA=0 orA=1, i.e., the proportion of positive outcomes in either worlds should be the same for any fixed S =s. Subsequently, we can define causal positive predictive parity. Definition7 (Causal Positive Predictive Parity). A predictor ˆ Y satisfies causal positive predictive parity if: P(Y(0)=1| ˆ Y =1)=P(Y(1)=1| ˆ Y =1). Causal predictive parity is applicable in the binary decision-making scenarios and has a similar inter- pretation as causal calibration in that it requires the proportion of positive outcomes in worlds withA=0 andA=1 to be the same for any fixed ˆ Y =1. Therefore, an individual with positive prediction does not feel being discriminated against since in both worlds, the rate of positive outcome is the same. 4.4.2 Trade-offsamongCausalCriteriaofFairness We investigate two main impossibility results known for the statistical fairness criteria and show that there is no fundamental disagreement between their causal variants. CausalParityandConditionalCausalParity. It is easy to see that statistical parity and conditional statistical parity may not be satisfied simultaneously on a dataset. The Berkeley college admission study is 100 a notorious example [40]. In this study, it was shown that while female students where admitted at a lower rate compared to the male students, after controlling for department choice, the difference in admission rates became insignificant among the two groups. This observation can be expressed formally as below. Observation1. There exists a joint distributionp(X,A,Y) such that conditional statistical parity does not imply statistical parity, i.e.,E[Y |X = x,A = 1]− E[Y |X = x,A = 0] = 0∀x∈X ̸ =⇒ E[Y | A = 1]− E[Y |A=0]=0. Contrary to the above result, it is straightforward to show that conditional causal parity implies causal parity. Proposition10. Conditional causal parity implies causal parity. Mathematically, E[Y(1)− Y(0)|X=x]=0∀x∈X =⇒ E[Y(1)− Y(0)]=0. Proof. The proof follows simply from taking the expectation overX. ■ The intuition behind the above result is that E[Y | A] merely measures the statistical dependence betweenY andA and does not differentiate between different sources of dependence, e.g., female students applying for more competitive departments than male students, or a discriminatory admission process. We note that conditional causal parity is a more stringent requirement than causal parity and the reverse implication does not generally hold true in Proposition 10. Causal Positive Predictive Parity and Causal Equalized Odds. It is well-known that one can not achieve positive predictive parity or calibration together with equalized odds simultaneously unless either the base rates P(Y | A = a) are equal or the classifier is perfect [111, 51]. Here, we show no such restrictions are necessary for their causal variants. We first define f :S →{0,1} as a mapping from the 101 risk scoreS to binary prediction ˆ Y . For example,f(S) =I(S >θ ) classifying the data points based on a threshold. We now present our main results. Theorem2. Causal calibration implies causal parity and causal equalized odds. Proof. First, we note that causal calibration implies causal parity by taking the expectation over s ∈ S. Also, we can show∀s∈S, the equalityP(Y(0)=1|S =s)=P(Y(1)=1|S =s) implies: P(S =s|Y(0)=1)P(Y(0)=1)=P(S =s|Y(1)=1)P(Y(1)=1) ⇒P(S =s|Y(0)=1)=P(S =s|Y(1)=1) ⇒P( ˆ Y =f(s)|Y(0)=1)=P( ˆ Y =f(s)|Y(1)=1). Similarly, we can show: P(Y(0)=0|S =s)=P(Y(1)=0|S =s)⇒P( ˆ Y =f(s)|Y(0)=0)=P( ˆ Y =f(s)|Y(1)=0). ■ Causal parity, i.e.,P(Y(0) = 1) = P(Y(1) = 1), is satisfied if decisions are made regardless of one’s group membership and is different from the equal base rate assumption which does not necessarily hold in many applications. The above result shows that there is an inherent compatibility between different causal fairness criteria as achieving one automatically implies one or two other criteria. Lemma4. IfA/B =C/D and(1− A)/(1− B)=(1− C)/(1− D), thenA=C andB =D. 102 Proof. A B = C D ⇒ A− B B = C− D D 1− A 1− B = 1− C 1− D ⇒ A− B 1− B = C− D 1− D ⇒ 1− B B = 1− D D . It follows thatA=C andB =D. ■ Theorem3. Causal equalized odds implies causal parity and causal positive predictive parity. Proof. P( ˆ Y =1|Y(0)=1)=P( ˆ Y =1|Y(1)=1)⇒ P(Y(0)=1| ˆ Y =1) P(Y(0)=1) = P(Y(1)=1| ˆ Y =1) P(Y(1)=1) . P( ˆ Y =1|Y(0)=0)=P( ˆ Y =1|Y(1)=0)⇒ P(Y(0)=0| ˆ Y =1) P(Y(0)=0) = P(Y(1)=0| ˆ Y =1) P(Y(1)=0) . From Lemma 4, it follows thatP(Y(0) = 1) =P(Y(1) = 1) andP(Y(0) = 1| ˆ Y = 1) =P(Y(1) = 1| ˆ Y = 1), where the first and second equations correspond to causal parity and causal positive predictive parity, respectively. ■ We conclude this section by providing a complementary result that relates conditional causal parity to causal calibration and causal positive predictive parity. Proposition11. Given a risk score as a function of pre-treatment variablesX, i.e.,S = h(X), it holds that conditional causal parity implies causal calibration and causal positive predictive parity. 103 P(Y(0)=1|X=x)=P(Y(1)=1|X=x)∀x∈X ⇒P(Y(0)=1|X∈h − 1 (s))=P(Y(1)=1|X∈h − 1 (s))∀s∈S ⇒P(Y(0)=1|h(X)=s)=P(Y(1)=1|h(X)=s)∀s∈S ⇒P(Y(0)=1| ˆ Y =f(s))=P(Y(1)=1| ˆ Y =f(s))∀f(s)∈{0,1}. Consequently, causal parity and causal equalized odds will be satisfied. The above result implies that for a given set of pre-treatment variablesX, if conditional causal parity is satisfied, all other causal fairness criteria discussed in the present work will be satisfied provided that the risk score function h and the classifier f are a functions of the pre-treatment variables. Conditional causal parity can be achieved using the imputation technique described in the previous section. Finally, it is important to note that the above results are based on the assumption that the joint distribution of variables is known. In practice, factors such as inadequate sample sizes, modelling choices, hyper-parameter selection, etc. can influence the performance of models across different groups. In the standard ML setting, previous work has aimed to address some of these limitations through careful model selection or additional training data collection, etc. [49]. 4.5 ComputationalResults We consider a stylized hiring scenario to illustrate our causal unfairness evaluation and mitigation ap- proach. Specifically, we consider a decision-making process that involves two interactions: interview and final hiring decision. We study how the timing of the intervention impacts our conclusions. We useA to represent gender, which we draw from a Bernoulli distribution Bern(0.75) with the ma- jority class being maleA = 1. An individual’s qualification is described by a random variable X drawn 104 from a normal distributionN(2α (A− 0.5),1), whereα controls the difference in the average qualifica- tions between genders. Each candidate has a score S reflecting their performance during the interview. We model the score as a binary variable whose mean depends on the qualifications and possibly gender. We haveP(S =1)=σ (X +2β (A− 0.5)), whereσ (z)=1/(1+e − z ) is the logistic function andβ ≥ 0 determines the level of discrimination in S, e.g., when β > 0 being a male A = 1 increases one’s prob- ability of receiving a higher score. Subsequently, a decisionY is made indicating whether the candidate receives an offer or not. We use the probabilistic model P(Y = 1) = σ (X +S+2γ (A− 0.5)), with γ ≥ 0 controlling the level of discrimination inY for a fixed X,S. The vector of potential outcomes, for bothS andY , can be obtained by substituting the respective value ofA in the model. We present results across a wide range ofα,β andγ values. According to our causal framework, we need to specify the point in time from which the effect of gender needs to be assessed. There are two possibilities: after interview is conducted or before the interview (as one may be concerned about an unfair interview process). Naturally, the above choice will impact our conclusions about whether the system is fair or not. We generate 100,000 data points (X,A,S(0),S(1),Y(0,S(0)),Y(1,S(1))) according to the process explained above. For post-interview fairness evaluation we can use the observed S values as the score is a pre-treatment variable. However, when evaluating fairness before the interview, the score becomes post-treatment. In order to impute the missing counterfactual score values S(A), we use logistic regression to model P(S = 1 | X = x,A = a), a ∈ {0,1}, from which we sample (10 samples). We use a second logistic regression P(Y = 1 | X = x,S(a) = s,A = a), ∀a,s ∈ {0,1} to impute Y(A,S(A)) values. This approach is based on multiple imputation in the causal inference literature [163]. We then use these counterfactual values in the expression that evaluates the treatment effect of A. We compare our causal criteria against statistical fairness definitions, where we measure the fairness violation of a logistic regression model trained to predictY using observed values ofX,S andA. Figure 4.2 105 Parity Positive Pred. Parity E−Odds (TP) E−Odds (FP) −0.5 0 0.5 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 0.00 0.20 0.40 0.00 0.20 0.40 0.00 0.20 0.40 β Fairness Violation Causal (Post) Causal (Pre) Statistical Figure 4.2: Synthetic results in the hiring scenario. Colors denote the evaluation method: causal pre- interview, causal post-interview and statistical. From top to bottom, each row corresponds to a different value of α ∈ {− 0.5,0,0.5}. Column are different fairness evaluation criteria. On the x− axis, we vary the value ofβ , which reflects the dependence of the interview score on one’s gender. The y− axis shows fairness violation across four different definitions. We note that for causal approaches we use the causal variants of the fairness definitions. The value of γ is set to 0.2. The error bars show 95% confidence interval. Depending on the joint setting of the parameters, statistical criteria may erroneously result in an over- or under-estimation of fairness violation. Further, post-interview fairness evaluation does not capture discrimination at earlier points in time. depicts a summary of results. We can make several key observations. First, the post-interview causal plot remains flat across different values of α,β exhibiting a constant fairness violation at 0.07 due to constant γ = 0.2 which is independent of prior discrimination in the interview step. This suggests that early dis- crimination can not be captured when one chooses a later time as the point of fairness investigation. In other words, any unfairness in the pre-treatment variables used in the analysis will remain undetected. Pre- and post-interview lines only intersect atβ =0 and pre-interview fairness violation increases mono- tonically with β across all causal fairness definitions. Statistical fairness definitions exhibit significantly different results. For example, when α =− 0.5, all statistical lines lie below the causal ones which suggests that they underestimate the true level of discrimination. This is due to the fact that when α < 0, males 106 qualification is lower than females on average. However, since β,γ > 0 the interview score and the final decision are in favor of male candidates. Since the statistical criteria fail to disentangle different sources of disparities, these opposing effects are cancelled, resulting in lower estimates of unfairness. On the other hand, whenα > 0, these effects reinforce each other resulting in an over-estimation of unfairness. Only when α = 0, do statistical parity and causal parity, in Causal (Pre), match which indicates the sensitive of statistical criteria to baseline differences between groups (average qualifications). For β,γ = 0 (no discrimination in interview or the hiring process), our results indicate near-zero estimates for all causal definitions of fairness across different values of α . This confirms that it is indeed possible to satisfy dif- ferent causal fairness definitions simultaneously, even when there are baseline differences between the qualifications of different groups. Conversely, statistical criteria yield non-zero estimates except for the case whereα,β,γ =0 which points to the equal base rate condition highlighted in previous work [111]. Next, we study the power of our approach to mitigate unfairness. We focus on the setting where α = 0 andβ,γ ̸= 0. This is because we aim to remove unfairness in the decision-making process, which is associated with β and γ . Since statistical approaches are not able to disentangle different sources of unfairness, by setting α we are able to compare our results against those criteria. Specifically, we train a model using our approach by imputing the missing potential outcomes. We compare the accuracy and fairness violation with an unconstrained model (No Fairness), as well as the same model after applying one of three common unfairness mitigation algorithms in the literature: pre-processing method (Re-weighting) in [100] which generates weights for the training examples in each combination ofA andY differently to ensure statistical parity, in-processing method (PrejudiceRemover) of [103] that adds a regularization term to the learning objective, and post-processing approach (RejectOption) in [101] which gives favorable outcomes to unprivileged groups and unfavorable outcomes to privileged groups in a confidence band around the decision boundary. We rely on the implementations in AI Fairness 360 package [22]. The 107 Fairness Violation (Statistical Criteria) Parity Positive Pred. Parity E-Odds (TP) E-Odds (FP) Acc. (%) No Fairness 0.31 0.02 0.26 0.21 72.7 Re-weighting 0.10 0.11 0.04 0.03 71.8 PrejudiceRemover 0.16 0.04 0.02 0.24 74.0 RejectOption 0.05 0.19 0.09 0.16 72.0 Causal (Pre) 0.03 0.14 -0.02 -0.04 70.0 Causal (Post) 0.17 0.07 -0.11 -0.09 72.0 Table 4.1: Fairness violation of statistical criteria and the classification accuracy. training model used in all of the methods is a logistic regression model. For fairness violation, we consider both statistical criteria and their causal variants. Table 4.1 summarizes the statistical fairness violation results forβ = 1.0 andγ = 0.2. Among all fair baselines, RejectOption and Causal (Pre) perform significantly better in terms of statistical parity. Despite the fact that other fair baselines are also designed to remove average disparities between groups, they still exhibit significant disparities. On the other hand, RejectOption performs worse than Causal (Pre) with respect to all other criteria. We also note the difference in pre- and post-interview results. The increase in parity violation in Causal (Post) can be explained by the fact that it only adjusts for the outcome variable and assumes disparities in the interview score are acceptable. Disparities in score will in turn result in different outcomes across groups but in post-interview analysis this effect remains undetected. Finally, in terms of accuracy, Causal (Post) achieves comparable results to the other methods. The difference in the accuracy of Causal (Pre) and (Post) is in part due to the fact that the accuracy is measured with respect to the observed unfair outcomes. As a result, Causal (Pre) which significantly reduces the gap between female and male candidates may not conform to the historical decisions. Finally, these results highlight the importance of determining the timing of the intervention. Specifically, they suggest that through the causal framework, we are able to identify and remove sources of disparities by actively adjusting the affected variables. Finally, we evaluated our approach based on causal criteria of fairness, choosing pre- interview as the starting time of fairness assessment. In Table 4.2, we observe no violation of fairness in 108 Fairness Violation (Causal Criteria) Parity Positive Pred. Parity E-Odds (TP) E-Odds (FP) Causal (Pre) 0.00 0.00 0.00 0.00 Causal (Post) 0.06 0.02 0.05 0.01 Table 4.2: Fairness violation of causal criteria. Causal (Pre) as expected. However, Causal (post) exhibits small violations which is due to the fact that it only mitigates unfairness due toγ and in the hiring decision. 4.6 ConclusionandBroaderImpact As empirical evidence on ethical implications of algorithmic decision-making is growing [159, 132, 139, 168], a variety of approaches have been proposed to evaluate and minimize the harms of these algorithms. In the statistical fairness literature, it is well-established that it is not possible to satisfy every fairness criterion simultaneously, which results in significant trade-offs in selecting a metric. On the other hand, in the causal fairness literature, there is substantial ambiguity around how the proposed methods should be applied to a particular problem. Also, these methods rely on assumptions that are often too strong to be applicable in practice. In this work, we addressed some of these limitations. First, we illustrated the utility of applying concepts from the “potential outcomes framework” to algo- rithmic fairness problems. In particular, we emphasized the timing and nature of the intervention as two key aspects of causal fairness analysis. That is, for any valid causal analysis, it is critical to precisely define the starting point of the fairness evaluation and the postulated intervention. We argue that fairness eval- uation is not a static problem and unfairness can happen at various points and within and across multiple domains. This is contrast with methods that rely on fixed DAG models. Next, we demonstrated how such a causal framework can address the limitations of existing approaches. Specifically, our theoretical investi- gation indicates that there is an inherent compatibility between the causal fairness definitions we propose. Finally, we showed the effectiveness of our approach in evaluating and mitigating unfairness associated 109 with different stages of decision-making. We hope that our empirical observations spark additional work on collecting new datasets that lend themselves to temporal fairness evaluation. 110 Conclusions This thesis identifies and addresses several challenges with respect to designing fair algorithmic social interventions. The contributions of this thesis are both technical and practical. On a technical level, this work presents novel computational models that capture real-world complexities such as data and resource scarcity as well as fairness considerations. Specifically, this thesis investigates the interplay of fairness with data limitation, data biases and resource scarcity to develop fair and efficient algorithmic frame- works that solve the resulting optimization models. From a practical perspective, it contributes different intervention models to prevention and social sciences. In particular, this work proposes to use social net- work information to inform gatekeeper training for suicide prevention and enhance community resilience against natural hazards. It also presents the first use of quantitative techniques to inform these interven- tions. Finally, this work proposes an implementable policy model to allocate scarce housing resources to individuals experiencing homelessness. All in all, this thesis covers a subset of challenges in developing effective algorithmic social inter- ventions and much work remains to be done. Specifically, due to the socio-technical nature of fairness research, there are several social and legal considerations before any of these solutions can be deployed. For instance, one question is related to the choice of social groups in the group-based definitions of fair- ness. This problem is known as intersectionality fairness which states that disparities can be amplified in subgroups that combine membership from different categories (e.g., race and gender), especially if such a subgroup is particularly under-represented historically [52, 79]. While the methodologies presented in this 111 work cater for fairness over intersection of different groups, there is still ambiguity in how these groups should be identified. Individual-based definitions, on the other hand, are often too restrictive and are not readily applicable to problems that suffer from resource scarcity as we can not ensure every individual receives a fair share of resources. There are also issues with respect to legal compatibility of these solutions. In particular, U.S. law prohibits policies that differentiate between individuals based on protected attributes such as race, gender or age [186]. On the other hand, there are exceptions that allow policies that aim to overcome present disparities of past practices, policies, or barriers by prioritizing resources for underserved or vulnerable groups. Existing methods typically use the information about one’s group membership to ensure fair distribution of intervention benefits which may not be immediately compatible with the legal frameworks. As a result, further research is needed to ensure the usability of these solutions from the legal perspectives. There are also issues related to unobserved confounders. When estimating the treatment effect of different interventions, a common assumption is that all the confounding factors between treatment and outcome are captured in the data. In observational studies, this assumption is practically impossible to verify and there is always a threat to its validity. As a result, one avenue of research could consider robustifying the estimates against such unobserved factors. Furthermore, due to the context-dependent nature of fairness, re-purposing algorithmic solutions de- signed for one social context may be misleading or even inaccurate when applied to a different con- text. Existing frameworks of fairness also suffer from a lack of expressiveness, i.e., they provide point- solutions tailored to a specific context. This thesis aimed to tackle this problem by presenting two unifying frameworks for fairness: fairness in social-network interventions and a causal framework for fairness for decision-making under observational data. Even though the presented frameworks encompass a wide- range of problems, they are not universal. For instance, social network-based intervention models with 112 non-submodular utility functions are not handled by the presented framework and further research is necessary to tackle those problems. 113 Bibliography [1] Ignacio Abasolo and Aki Tsuchiya. “Exploring social welfare functions and violation of monotonicity: an example from inequalities in health”. In: Journal of Health Economics 23.2 (2004), pp. 313–329. [2] Ivo Adan and Gideon Weiss. “A skill based parallel service system under FCFS-ALIS — steady state, overloads, and abandonments”. In: Stochastic Systems 4.1 (2014), pp. 250–299.issn: 1946-5238.doi: 10.1214/13-ssy117. [3] Philipp Afèche, René Caldentey, and Varun Gupta. “On the Optimal Design of a Bipartite Matching Queueing System”. In: Operations Research (2021).issn: 0030-364X.doi: 10.1287/opre.2020.2027. [4] Sina Aghaei, Mohammad Javad Azizi, and Phebe Vayanos. “Learning Optimal and Fair Decision Trees for Non-Discriminative Decision-Making”. In: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence . 2019. [5] Faez Ahmed, John P. Dickerson, and Mark Fuge. “Diverse weighted bipartite b-matching”. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence . AAAI Press. 2017, pp. 35–41. [6] Junaid Ali, Mahmoudreza Babaei, Abhijnan Chakraborty, Baharan Mirzasoleiman, Krishna Gummadi, and Adish Singla. “On the Fairness of Time-Critical Influence Maximization in Social Networks”. In: arXiv abs/1905.06618 (2019). [7] Kareem Amin, Satyen Kale, Gerald Tesauro, and Deepak Turaga. “Budgeted prediction with expert advice”. In: Twenty-Ninth AAAI Conference on Artificial Intelligence . 2015. [8] Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. “Machine Bias”. In: Propublica (2016). [9] Bryan Anselm. “Suicide on Campus”. In: The New York Times (Aug. 2015).url: https://www.nytimes.com/2015/08/09/opinion/sunday/suicide-on-campus.html. [10] Barics Ata and Mustafa H. Tongarlak. “On scheduling a multiclass queue with abandonments under general delay costs”. In: Queueing Systems 74.1 (2013), pp. 65–104.issn: 15729443.doi: 10.1007/s11134-012-9326-6. 114 [11] Susan Athey, Raj Chetty, Guido W. Imbens, and Hyunseung Kang. “The Surrogate Index: Combining Short-Term Proxies to Estimate Long-Term Treatment Effects More Rapidly and Precisely”. In: SSRN (2019).doi: 10.3386/w26463. [12] LOS ANGELES HOMELESS SERVICES AUTHORITY. REPORT AND RECOMMENDATIONS OF THE AD HOC COMMITTEE ON BLACK PEOPLE EXPERIENCING HOMELESSNESS. Tech. rep. Feb. 2019.url: https://www.lahsa.org/news?article=514-groundbreaking-report-on-black-people-and- homelessness-released&ref=ces&te=1&nl=california-today&emc=edit_ca_20200815. [13] Mohammad-Javad Azizi, Phebe Vayanos, Bryan Wilder, Eric Rice, and Milind Tambe. “Designing fair, efficient, and interpretable policies for prioritizing homeless youth for housing resources”. In: International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research. Springer. 2018, pp. 35–51. [14] Chaithanya Bandi, Nikolaos Trichakis, and Phebe Vayanos. “Robust multiclass queuing theory for wait time estimation in resource allocation systems”. In: Management Science 65.1 (2018), pp. 152–187. [15] Siddharth Barman, Arpita Biswas, Sanath Krishnamurthy, and Yadati Narahari. “Groupwise maximin fair allocation of indivisible goods”. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 32. 2018. [16] Siddharth Barman, Sanath Kumar Krishnamurthy, and Rohit Vaish. “Finding fair and efficient allocations”. In: Proceedings of the 2018 ACM Conference on Economics and Computation. 2018, pp. 557–574. [17] Anamika Barman-Adhikari, Stephanie Begun, Eric Rice, Amanda Yoshioka-Maxwell, and Andrea Perez-Portillo. “Sociometric network structure and its association with methamphetamine use norms among homeless youth”. In: Social science research 58 (2016), pp. 292–308. [18] Mohammad-Hossein Bateni, Yiwei Chen, Dragos F. Ciocan, and Vahab Mirrokni. “Fair Resource Allocation in A Volatile Marketplace”. In: Proceedings of the 2016 ACM Conference on Economics and Computation. ACM. 2016, pp. 819–819. [19] MohammadHossein Bateni, Yiwei Chen, Dragos Florin Ciocan, and Vahab Mirrokni. “Fair resource allocation in a volatile marketplace”. In: Operations Research 70.1 (2022), pp. 288–308. [20] Xiaohui Bei, Zihao Li, Jinyan Liu, Shengxin Liu, and Xinhang Lu. “Fair division of mixed divisible and indivisible goods”. In: Artificial Intelligence 293 (2021), p. 103436. [21] Xiaohui Bei, Xinhang Lu, Pasin Manurangsi, and Warut Suksompong. “The Price of Fairness for Indivisible Goods”. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19. International Joint Conferences on Artificial Intelligence Organization, July 2019, pp. 81–87.doi: 10.24963/ijcai.2019/12. 115 [22] Rachel K.E. Bellamy, Kuntal Dey, Michael Hind, Samuel C. Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilovic, Seema Nagar, Karthikeyan Natesan Ramamurthy, John Richards, Diptikalyan Saha, Prasanna Sattigeri, Moninder Singh, Kush R. Varshney, and Yunfeng Zhang. “Ai fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias”. In: arXiv (2018).issn: 2331-8422. [23] Aharon Ben-Tal, Laurent El Ghaoui, and Arkadi Nemirovski. Robust optimization. Vol. 28. Princeton University Press, 2009. [24] Nawal Benabbou, Mithun Chakraborty, Xuan-Vinh Ho, Jakub Sliwinski, and Yair Zick. “Diversity constraints in public housing allocation”. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems. 2018, pp. 973–981. [25] Jacques F. Benders. “Partitioning procedures for solving mixed-variables programming problems”. In: Computational Management Science 2.1 (2005), pp. 3–19. [26] Abram Bergson. “A Reformulation of Certain Aspects of Welfare Economics”. In: The Quarterly Journal of Economics 52.2 (1938), pp. 310–334. [27] Richard Berk, Hoda Heidari, Shahin Jabbari, Matthew Joseph, Michael Kearns, Jamie Morgenstern, Seth Neel, and Aaron Roth. “A Convex Framework for Fair Regression”. In: 4th Workshop on Fairness, Accountability, and Transparency in Machine Learning. 2017. [28] Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth. “Fairness in Criminal Justice Risk Assessments: The State of the Art”. In: Sociological Methods & Research (2018). [29] Marianne Bertrand and Sendhil Mullainathan. “Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination”. In: American Economic Review 94.4 (2004).issn: 0002-8282.doi: 10.1257/0002828042002561. [30] Dimitris Bertsimas and Constantine Caramanis. “Finite adaptability in multistage linear optimization”. In: IEEE Transactions on Automatic Control 55.12 (2010), pp. 2751–2766. [31] Dimitris Bertsimas, Jack Dunn, and Nishanth Mundru. “Optimal Prescriptive Trees”. In: INFORMS Journal on Optimization 1.2 (2019), pp. 164–183.issn: 2575-1484.doi: 10.1287/ijoo.2018.0005. [32] Dimitris Bertsimas and Iain Dunning. “Multistage robust mixed-integer optimization with adaptive partitions”. In: Operations Research 64.4 (2016), pp. 980–998. [33] Dimitris Bertsimas, Vivek Farias, and Nikolaos Trichakis. “The Price of Fairness”. In: Operations Research 59.1 (2011), pp. 17–31. [34] Dimitris Bertsimas, Vivek F. Farias, and Nikolaos Trichakis. “Fairness, efficiency, and flexibility in organ allocation for kidney transplantation”. In: Operations Research 61.1 (2013), pp. 73–87.issn: 0030364X.doi: 10.1287/opre.1120.1138. 116 [35] Dimitris Bertsimas and Angelos Georghiou. “Binary decision rules for multistage adaptive mixed-integer optimization”. In: Mathematical Programming 167.2 (2018), pp. 395–433. [36] Dimitris Bertsimas and Angelos Georghiou. “Design of near optimal decision rules in multistage adaptive mixed-integer optimization”. In: Operations Research 63.3 (2015), pp. 610–627. [37] Dimitris Bertsimas and John Tsitsiklis. “Introduction to linear programming”. In: Athena Scientific 1 (1997), p. 997. [38] Dimitris Bertsimas and Phebe Vayanos. “Data-driven learning in dynamic pricing using adaptive optimization”. In: Optimization Online (2017). [39] Dimitris Bertsimas and Robert Weismantel. Optimization over integers. Vol. 13. 2005. [40] P. J. Bickel, E. A. Hammel, and J. W. O’Connell. “Sex bias in graduate admissions: Data from Berkeley”. In: Science 187.4175 (1975).issn: 00368075.doi: 10.1126/science.187.4175.398. [41] Arpita Biswas and Siddharth Barman. “Fair Division Under Cardinality Constraints.” In: IJCAI. 2018, pp. 91–97. [42] Ilija Bogunovic, Slobodan Mitrović, Jonathan Scarlett, and Volkan Cevher. “Robust submodular maximization: A non-uniform partitioning approach”. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org. 2017, pp. 508–516. [43] Thomas Bonald and Laurent Massoulié. “Impact of fairness on Internet performance”. In: Joint International Conference on Measurements and Modeling of Computer Systems. 2001, pp. 82–91. [44] Toon Calders, Faisal Kamiran, and Mykola Pechenizkiy. “Building classifiers with independency constraints”. In: ICDM Workshops 2009 - IEEE International Conference on Data Mining. 2009.doi: 10.1109/ICDMW.2009.83. [45] Francisco Castro, Hamid Nazerzadeh, and Chiwei Yan. “Matching queues with reneging: a product form solution”. In: Queueing Systems 96.3-4 (2020), pp. 359–385.issn: 15729443.doi: 10.1007/s11134-020-09662-y. [46] Hau Chan, Eric Rice, Phebe Vayanos, Milind Tambe, and Matthew Morton. “Evidence from the past: AI decision AIDS to improve housing systems for homeless youth”. In:AAAIFallSymposium - Technical Report. Vol. FS-17-01 - FS-17-05. Stanford University, United States: AAAI Press, 2017. [47] André Chassein, Marc Goerigk, Jannis Kurtz, and Michael Poss. “Faster Algorithms for Min-max-min Robustness for Combinatorial Problems with Budgeted Uncertainty”. In: European Journal of Operational Research (2019). [48] Chandra Chekuri, Jan Vondrak, and Rico Zenklusen. “Dependent randomized rounding via exchange properties of combinatorial structures”. In: 2010 IEEE 51st Annual Symposium on Foundations of Computer Science. IEEE. 2010, pp. 575–584. [49] Irene Y. Chen, Fredrik D. Johansson, and David Sontag. “Why is my classifier discriminatory?” In: Advances in Neural Information Processing Systems. Vol. 2018-December. 2018. 117 [50] Silvia Chiappa. “Path-specific counterfactual fairness”. In: 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019 . 2019. doi: 10.1609/aaai.v33i01.33017801. [51] Alexandra Chouldechova. “Fair prediction with disparate impact: A study of bias in recidivism prediction instruments”. In: 3rd Workshop on Fairness, Accountability, and Transparency in Machine Learning. 2017. [52] Alexandra Chouldechova and Aaron Roth. “The frontiers of fairness in machine learning”. In: arXiv preprint arXiv:1810.08810 (2018). [53] Vincent Conitzer, Rupert Freeman, Nisarg Shah, and Jennifer W. Vaughan. “Group fairness for the allocation of indivisible goods”. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI). 2019. [54] Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. “Algorithmic decision making and the cost of fairness”. In: Proceedings of the 23rd acm sigkdd international conference on knowledge discovery and data mining. 2017, pp. 797–806. [55] Koby Crammer, Jaz Kandola, and Yoram Singer. “Online classification on a budget”. In: Advances in neural information processing systems 16 (2003). [56] Sergio Currarini, Matthew O. Jackson, and Paolo Pin. “An economic model of friendship: Homophily, minorities, and segregation”. In: Econometrica 77.4 (2009), pp. 1003–1045. [57] Hugh Dalton. “The Measurement of the Inequality of Incomes”. In: The Economic Journal 30.119 (1920), pp. 348–361. [58] Department of Housing and Urban Development. Restoring Affirmatively Furthering Fair Housing Definitions and Certifications . Tech. rep. Office of Fair Housing and Equal Opportunity, HUD., 2021. [59] John P. Dickerson and Tuomas Sandholm. “FutureMatch: Combining human value judgments and machine learning to match in dynamic environments”. In: Proceedings of the National Conference on Artificial Intelligence . Vol. 1. Austin, Texas, United States: AAAI press, 2015, pp. 622–628. [60] Yichuan Ding, S. Thomas McCormick, and Mahesh Nagarajan. “A fluid model for one-sided bipartite matching queues with match-dependent rewards”. In: Operations Research 69.4 (2021). issn: 15265463.doi: 10.1287/opre.2020.2015. [61] Kate Donahue and Jon Kleinberg. “Fairness and utilization in allocating resources with uncertain demand”. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 2020, pp. 658–668. [62] Miroslav Dudik, John Langford, and Hong Li. “Doubly robust policy evaluation and learning”. In: Proceedings of the 28th International Conference on Machine Learning, ICML 2011. Bellevue Washington USA: Omnipress 2600 Anderson St Madison WI United States, 2011, pp. 1097–1104. 118 [63] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. “Fairness through awareness”. In: 3rd Innovations in Theoretical Computer Science. 2012, pp. 214–226. [64] Hadi Elzayn, Shahin Jabbari, Christopher Jung, Michael Kearns, Seth Neel, Aaron Roth, and Zachary Schutzman. “Fair algorithms for learning in allocation problems”. In: Proceedings of the Conference on Fairness, Accountability, and Transparency. 2019, pp. 170–179. [65] Paul Erd6s. “On the evolution of random graphs”. In: Publ. Math. Inst. Hungar. Acad. Sci 5 (1960), pp. 17–61. [66] Brandon Fain, Kamesh Munagala, and Nisarg Shah. “Fair allocation of indivisible public goods”. In: Proceedings of the 2018 ACM Conference on Economics and Computation. 2018, pp. 575–592. [67] Mohammad M. Fazel-Zarandi and Edward H. Kaplan. “Approximating the first-come, first-served stochastic matching model with Ohm’s law”. In: Operations Research 66.5 (2018), pp. 1423–1432. issn: 15265463.doi: 10.1287/opre.2018.1737. [68] Uriel Feige. “A threshold of ln n for approximating set cover”. In: Journal of the ACM (JACM) 45.4 (1998), pp. 634–652. [69] Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. “Certifying and removing disparate impact”. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Vol. 2015-August. 2015.doi: 10.1145/2783258.2783311. [70] Stephen E. Fienberg and Stanley S. Wasserman. “Categorical data analysis of single sociometric relations”. In: Sociological methodology 12 (1981), pp. 156–192. [71] Jessie Finocchiaro, Roland Maio, Faidra Monachou, Gourab K Patro, Manish Raghavan, Ana-Andreea Stoica, and Stratis Tsirtsis. “Bridging machine learning and mechanism design towards algorithmic fairness”. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 2021, pp. 489–503. [72] Benjamin Fish, Ashkan Bashardoust, Danah Boyd, Sorelle Friedler, Carlos Scheidegger, and Suresh Venkatasubramanian. “Gaps in Information Access in Social Networks?” In: The World Wide Web Conference. ACM. 2019, pp. 480–490. [73] Duncan Karl Foley. Resource allocation and the public sector. Yale University, 1966. [74] Sorelle Friedler, Carlos Scheidegger, and Suresh Venkatasubramanian. “On the (im)possibility of fairness”. In: arXiv abs/1609.07236 (2016). [75] Alan Frieze and Michał Karoński. Introduction to random graphs. Cambridge University Press, 2016. [76] Anthony Fulginiti, Aida Rahmattalabi, Jarrod Call, Phebe Vayanos, and Eric Rice. “Using Algorithmic Solutions to Address Gatekeeper Training Issues for Suicide Prevention on College Campuses”. In: Artificial Intelligence for Healthcare: Interdisciplinary Partnerships for Analytics-driven Improvements in a Post-COVID World (2022), p. 83. 119 [77] Anthony Fulginiti, Aida Rahmattalabi, Jarrod Call, Phebe Vayanos, and Eric Rice. “Using Algorithmic Solutions to Address Gatekeeper Training Issues for Suicide Prevention on College Campuses”. In: Artificial Intelligence for Healthcare: Interdisciplinary Partnerships for Analytics-driven Improvements in a Post-COVID World (2022), p. 83. [78] Vincent A. Fusaro, Helen G. Levy, and H. Luke Shaefer. “Racial and Ethnic Disparities in the Lifetime Prevalence of Homelessness in the United States”. In: Demography 55.6 (2018), pp. 2119–2128.issn: 15337790.doi: 10.1007/s13524-018-0717-0. [79] Avijit Ghosh, Lea Genuit, and Mary Reagan. “Characterizing intersectional group fairness with worst-case comparisons”. In: Artificial Intelligence Diversity, Belonging, Equity, and Inclusion . PMLR. 2021, pp. 22–34. [80] Edgar N Gilbert. “Random graphs”. In: The Annals of Mathematical Statistics 30.4 (1959), pp. 1141–1144. [81] Michelle Girvan and Mark EJ Newman. “Community structure in social and biological networks”. In: Proceedings of the national academy of sciences 99.12 (2002), pp. 7821–7826. [82] Steven N. Goodman, Sharad Goel, and Mark R. Cullen. Machine learning, health disparities, and causal reasoning. 2018.doi: 10.7326/M18-3297. [83] D. James Greiner and Donald B. Rubin. “Causal effects of perceived immutable characteristics”. In: Review of Economics and Statistics 93.3 (2011).issn: 00346535.doi: 10.1162/REST{\_}a{\_}00110. [84] Grani A. Hanasusanto, Daniel Kuhn, and Wolfram Wiesemann. “K-adaptability in two-stage robust binary programming”. In: Operations Research 63.4 (2015), pp. 877–891. [85] Moritz Hardt, Eric Price, and Nati Srebro. “Equality of opportunity in supervised learning”. In: Advances in neural information processing systems 29 (2016). [86] Hoda Heidari, Claudio Ferrari, Krishna Gummadi, and Andreas Krause. “Fairness behind a veil of ignorance: A welfare analysis for automated decision making”. In: Advances in Neural Information Processing Systems 31 (2018). [87] Meghan Henry, Tanya de Sousa, Caroline Roddey, Swati Gayen, Thomas Joe Bednar, and Abt Associates. AHAR: Part 1—PIT Estimates of Homelessness in the US HUD Exchange. Tech. rep. The U.S. Department of Housing, Urban Development, Office of Community Planning, and Development, 2020. [88] Miguel a Hernán and James M Robins. “Causal Inference Book”. In: Http://Www.Hsph.Harvard.Edu/Miguel-Hernan/Causal-Inference-Book/ (2013). [89] C Hill, H Hsu, M Holguin, M Morton, H Winetrobe, and E Rice. “An examination of housing interventions among youth experiencing homelessness: an investigation into racial/ethnic and sexual minority status”. In: Journal of Public Health (2021).issn: 1741-3842.doi: 10.1093/pubmed/fdab295. 120 [90] Christopher Hitchcock and Judea Pearl. “Causality: Models, Reasoning and Inference”. In: The Philosophical Review 110.4 (2001).issn: 00318108.doi: 10.2307/3182612. [91] Adam G Horwitz, Taylor McGuire, Danielle R Busby, Daniel Eisenberg, Kai Zheng, Jacqueline Pistorello, Ronald Albucher, William Coryell, and Cheryl A King. “Sociodemographic differences in barriers to mental health care among college students at elevated suicide risk”. In: Journal of affective disorders 271 (2020), pp. 123–130. [92] U.S. Dept. of Housing, Office of Policy Development Urban Development, and Research. The applicability of housing first models to homeless persons with serious mental illness: Final report . Tech. rep. OFFICE OF POLICY DEVELOPMENT and RESEARCH (PD&R), 2007. [93] Lily Hu and Yiling Chen. “Fair classification and social welfare”. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 2020, pp. 535–545. [94] Lily Hu and Issa Kohler-Hausmann. “What’s sex got to do with machine learning?” In:Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 2020, pp. 513–513. [95] Kosuke Imai, Luke Keele, and Teppei Yamamoto. “Identification, inference and sensitivity analysis for causal mediation effects”. In: Statistical Science 25.1 (2010).issn: 08834237.doi: 10.1214/10-STS321. [96] Guido W. Imbens. Potential outcome and directed acyclic graph approaches to causality: Relevance for empirical practice in economics. 2020.doi: 10.1257/JEL.20191597. [97] Michael Isaac, Brenda Elias, Laurence Y. Katz, Shay-Lee Belik, Frank P. Deane, Murray W. Enns, Jitender Sareen, and Swampy Cree Suicide Prevention Team (12 members). “Gatekeeper training as a preventative intervention for suicide: a systematic review”. In: The Canadian Journal of Psychiatry 54.4 (2009), pp. 260–268. [98] Maxwell Izenberg, Ryan Brown, Cora Siebert, Ron Heinz, Aida Rahmattalabi, and Phebe Vayanos. “A Community-Partnered Approach to Social Network Data Collection for a Large and Partial Network”. In: Field Methods (2022), p. 1525822X221074769. [99] Nathanael Jo, Sina Aghaei, Andres Gomez, and Phebe Vayanos. “Learning Optimal Prescriptive Trees from Observational Data”. 2021. [100] Faisal Kamiran and Toon Calders. “Data preprocessing techniques for classification without discrimination”. In: Knowledge and Information Systems 33.1 (2012).issn: 02193116.doi: 10.1007/s10115-011-0463-8. [101] Faisal Kamiran, Asim Karim, and Xiangliang Zhang. “Decision theory for discrimination-aware classification”. In: Proceedings - IEEE International Conference on Data Mining, ICDM. 2012.doi: 10.1109/ICDM.2012.45. [102] Faisal Kamiran, Indre Žliobaite, and Toon Calders. “Quantifying explainable discrimination and removing illegal discrimination in automated decision making”. In: Knowledge and Information Systems 35.3 (2013).issn: 02193116.doi: 10.1007/s10115-012-0584-8. 121 [103] Toshihiro Kamishima, Shotaro Akaho, Hideki Asoh, and Jun Sakuma. “Fairness-aware classifier with prejudice remover regularizer”. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) . Vol. 7524 LNAI. 2012. doi: 10.1007/978-3-642-33486-3{\_}3. [104] Edward Harris Kaplan. “Managing the Demand for Public Housing”. PhD thesis. MIT, 1984. [105] Atoosa Kasirzadeh and Andrew Smart. “The use and misuse of counterfactuals in ethical machine learning”. In: FAccT 2021 - Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 2021.doi: 10.1145/3442188.3445886. [106] David Kempe, Jon Kleinberg, and Éva Tardos. “Maximizing the spread of influence through a social network”. In: 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2003, pp. 137–146. [107] Moniba Keymanesh, Tanya Berger-Wolf, Micha Elsner, and Srinivasan Parthasarathy. Fairness-aware Summarization for Justified Decision-Making . July 2021. [108] Aria Khademi, David Foley, Sanghack Lee, and Vasant Honavar. “Fairness in algorithmic decision making: An excursion through the lens of causality”. In: The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019. 2019.doi: 10.1145/3308558.3313559. [109] Shakeer Khan and Elie Tamer. “Irregular Identification, Support Conditions, and Inverse Weight Estimation”. In: Econometrica 78.6 (2010), pp. 2021–2042.issn: 0012-9682.doi: 10.3982/ecta7372. [110] Niki Kilbertus, Mateo Rojas-Carulla, Giambattista Parascandolo, Moritz Hardt, Dominik Janzing, and Bernhard Schölkopf. “Avoiding discrimination through causal reasoning”. In: Advances in Neural Information Processing Systems. Vol. 2017-December. 2017. [111] Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. “Inherent Trade-Offs in the Fair Determination of Risk Scores”. In: 8th Conference on Innovations in Theoretical Computer Science. 2017, 43:1–43:23. [112] Jon Kleinberg, Yuval Rabani, and Éva Tardos. “Fairness in routing and load balancing”. In: 40th Annual Symposium on Foundations of Computer Science (Cat. No. 99CB37039). IEEE. 1999, pp. 568–578. [113] Issa Kohler-Hausmann. “Eddie murphy and the dangers of counterfactual causal thinking about detecting racial discrimination”. In: Northwestern University Law Review 113.5 (2019).issn: 00293571. [114] Andreas Krause and Carlos Guestrin. “Near-optimal observation selection using submodular functions”. In: AAAI. Vol. 7. 2007, pp. 1650–1654. 122 [115] Amanda Kube, Sanmay Das, and Patrick J. Fowler. “Allocating interventions based on predicted outcomes: A case study on homelessness services”. In: 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019 . Honolulu, Hawaii, United States: AAAI Press, 2019, pp. 622–629.doi: 10.1609/aaai.v33i01.3301622. [116] Matt Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. “Counterfactual fairness”. In: Advances in Neural Information Processing Systems. Vol. 2017-December. 2017. [117] Maria Kyropoulou, Warut Suksompong, and Alexandros A Voudouris. “Almost envy-freeness in group resource allocation”. In: Theoretical Computer Science 841 (2020), pp. 110–123. [118] Hui Lin and Jeff Bilmes. “A class of submodular functions for document summarization”. In: 49th Annual Meeting of the Association for Computational Linguistics. 2011, pp. 510–520. [119] Cindy H Liu, Courtney Stevens, Sylvia HM Wong, Miwa Yasui, and Justin A Chen. “The prevalence and predictors of mental health diagnoses and suicide among US college students: Implications for addressing disparities in service use”. In: Depression and anxiety 36.1 (2019), pp. 8–17. [120] David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. “Fairness through causal awareness: Learning causal latent-variable models for biased data”. In: FAT* 2019 - Proceedings of the 2019 Conference on Fairness, Accountability, and Transparency. 2019.doi: 10.1145/3287560.3287564. [121] Daniel Malinsky and David Danks. “Causal discovery algorithms: A practical guide”. In: Philosophy Compass 13.1 (2018).issn: 17479991.doi: 10.1111/phc3.12470. [122] Avishai Mandelbaum and Alexander L. Stolyar. “Scheduling flexible servers with convex delay costs: Heavy-traffic optimality of the generalized c µ -rule”. In: Operations Research 52.6 (2004), pp. 836–855.issn: 0030364X.doi: 10.1287/opre.1040.0152. [123] Alexandre Marcellesi. “Is race a cause?” In: Philosophy of Science 80.5 (2013).issn: 00318248.doi: 10.1086/673721. [124] Michael Marmot, Sharon Friel, Ruth Bell, Tanja AJ Houweling, Sebastian Taylor, Commission on Social Determinants of Health, et al. “Closing the gap in a generation: health equity through action on the social determinants of health”. In: The lancet 372.9650 (2008), pp. 1661–1669. [125] Miller McPherson, Lynn Smith-Lovin, and James M Cook. “Birds of a feather: Homophily in social networks”. In: Annual review of sociology 27.1 (2001), pp. 415–444. [126] Miller McPherson, Lynn Smith-Lovin, and James M. Cook. “Birds of a feather: Homophily in social networks”. In: Annual review of sociology 27.1 (2001), pp. 415–444. 123 [127] Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. “A survey on bias and fairness in machine learning”. In: ACM Computing Surveys (CSUR) 54.6 (2021), pp. 1–35. [128] Norweeta Milburn, Earl Edwards, Dean Obermark, and Janey Rountree. Inequity in the Permanent Supportive Housing System in Los Angeles: Scale, Scope and Reasons for Black Residents’ Returns to Homelessness. Tech. rep. California Policy Lab, 2021. [129] Clair Miller. “Can an Algorithm Hire Better than a Human?” In: The New York Times (June 2015). Ret. 4/28/2016.url: http://www.nytimes.com/2015/06/26/upshot/can-an-algorithm-hire-better-than-a-human.html/. [130] Charles E. Mitchell. “An analysis of the U.S. Supreme Court’s decision in Ricci v. DeStefano: The New Haven firefighter’s case”. In: Public Personnel Management 42.1 (2013).issn: 00910260.doi: 10.1177/0091026013484574. [131] Kaname Miyagishima. “Fair criteria for social decisions under uncertainty”. In: Journal of Mathematical Economics 80 (2019), pp. 77–87. [132] John Monahan and Jennifer L. Skeem. “Risk Assessment in Criminal Sentencing”. In: Annual Review of Clinical Psychology 12 (2016).issn: 15485951.doi: 10.1146/annurev-clinpsy-021815-092945. [133] Jacob M. Montgomery, Brendan Nyhan, and Michelle Torres. “How Conditioning on Posttreatment Variables Can Ruin Your Experiment and What to Do about It”. In: American Journal of Political Science 62.3 (2018).issn: 15405907.doi: 10.1111/ajps.12357. [134] Matthew H. Morton, Amy Dworsky, Jennifer L. Matjasko, Susanna R. Curry, David Schlueter, Raúl Chávez, and Anne F. Farrell. “Prevalence and Correlates of Youth Homelessness in the United States”. In: Journal of Adolescent Health 62.1 (2018).issn: 18791972.doi: 10.1016/j.jadohealth.2017.10.006. [135] Razieh Nabi and Ilya Shpitser. “Fair inference on outcomes”. In: 32nd AAAI Conference on Artificial Intelligence, AAAI 2018 . 2018. [136] Arvind Narayanan. “Translation tutorial: 21 fairness definitions and their politics”. In: Proc. Conf. Fairness Accountability Transp., New York, USA. Vol. 2. 2018, pp. 6–2. [137] George L Nemhauser and Laurence A Wolsey. “Maximizing submodular set functions: formulations and analysis of algorithms”. In: North-Holland Mathematics Studies. Vol. 59. Elsevier, 1981, pp. 279–301. [138] Quan Nguyen, Sanmay Das, and Roman Garnett. “Scarce Societal Resource Allocation and the Price of (Local) Justice”. In: Proceedings of the AAAI Conference on Artificial Intelligence . Virtual Conference: AAAI Press, 2021, pp. 5628–5636. [139] Ziad Obermeyer and Sendhil Mullainathan. “Dissecting Racial Bias in an Algorithm that Guides Health Decisions for 70 Million People”. In: 2019.doi: 10.1145/3287560.3287593. 124 [140] OrgCode. The Time Seems Right: Let’s Begin the End of the VI-SPDAT. Dec. 2020. [141] Orgcode. Transition Age Youth – Vulnerability Index – Service Prioritization Decision Assistance Tool (TAY-VI-SPDAT): Next Step Tool for Homeless Youth. Tech. rep. http://ctagroup.org/wp-content/ uploads/2015/10/Y-SPDAT-v1.0-Youth-Print.pdf, 2015. [142] James B. Orlin, Andreas Schulz, and Rajan Udwani. “Robust monotone submodular function maximization”. In: International Conference on Integer Programming and Combinatorial Optimization. Waterloo, Canada: Springer, 2016, pp. 312–324. [143] Derek Parfit. “Equality and priority”. In: Ratio 10.3 (1997), pp. 202–221. [144] Douglas Paton and David Johnston. Disaster resilience: an integrated approach. Charles C Thomas Publisher, 2017. [145] Arthur Pigou. Wealth and welfare. Macmillan and Company, 1912. [146] Krzysztof Postek and Dick den Hertog. “Multistage adjustable robust mixed-integer optimization via iterative splitting of the uncertainty set”. In: INFORMS Journal on Computing 28.3 (2016), pp. 553–574. [147] Lincoln Quillian. “Measuring Racial Discrimination”. In: Contemporary Sociology: A Journal of Reviews 35.1 (2006).issn: 0094-3061.doi: 10.1177/009430610603500165. [148] Aida Rahmattalabi, Shahin Jabbari, Himabindu Lakkaraju, Phebe Vayanos, Max Izenberg, Ryan Brown, Eric Rice, and Milind Tambe. “Fair Influence Maximization: a Welfare Optimization Approach”. In: Proceedings of the AAAI Conference on Artificial Intelligence . Vol. 35. 2021, pp. 11630–11638. [149] Aida Rahmattalabi, Phebe Vayanos, Kathryn Dullerud, and Eric Rice. “Learning Resource Allocation Policies from Observational Data with an Application to Homeless Services Delivery”. In: arXiv preprint arXiv:2201.10053 (2022). [150] Aida Rahmattalabi, Phebe Vayanos, Kathryn Dullerud, and Eric Rice. “Learning Resource Allocation Policies from Observational Data with an Application to Homeless Services Delivery”. In: arXiv preprint arXiv:2201.10053 (2022). [151] Aida Rahmattalabi, Phebe Vayanos, Anthony Fulginiti, Eric Rice, Bryan Wilder, Amulya Yadav, and Milind Tambe. “Exploring Algorithmic Fairness in Robust Graph Covering Problems”. In: Advances in Neural Information Processing Systems 32. 2019, pp. 15750–15761. [152] Aida Rahmattalabi, Phebe Vayanos, Anthony Fulginiti, and Milind Tambe. “Robust Peer-Monitoring on Graphs with an Application to Suicide Prevention in Social Networks”. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. 2019, pp. 2168–2170. [153] Aida Rahmattalabi, Phebe Vayanos, and Milind Tambe. “A Robust Optimization Approach to Designing Near-Optimal Strategies for Constant-Sum Monitoring Games”. In: International Conference on Decision and Game Theory for Security. Springer. 2018, pp. 603–622. 125 [154] Aida Rahmattalabi, Phebe Vayanos, and Milind Tambe. “A robust optimization approach to designing near-optimal strategies for constant-sum monitoring games”. In: International Conference on Decision and Game Theory for Security. Springer. 2018, pp. 603–622. [155] Ab Rashid Ahmad, Zainal Arsad Md Amin, Che Hassandi Abdullah, and Siti Zarina Ngajam. “Public Awareness and Education Programme for Landslide Management and Evaluation Using a Social Research Approach to Determining “Acceptable Risk” and “Tolerable Risk” in Landslide Risk Areas in Malaysia”. In: Advancing Culture of Living with Landslides. Ed. by Kyoji Sassa, Matjaž Mikoš, and Yueping Yin. Springer International Publishing, 2017, pp. 437–447.isbn: 978-3-319-59469-9. [156] John Rawls. A theory of justice. Harvard university press, 2009. [157] Eric Rice. Assessment Tools for Prioritizing Housing Resources for Homeless Youth. 2017. [158] Eric Rice, Monique Holguin, Hsun-Ta Hsu, Matthew Morton, Phebe Vayanos, Milind Tambe, and Hau Chan. “Linking Homelessness Vulnerability Assessments to Housing Placements and Outcomes for Youth”. In: CITYSCAPE 20.3 (2018), pp. 69–86.issn: 1936-007X. [159] Lisa Rice and Deidre Swesnik. Discriminatory effects of credit scoring on communities of color . 2012. [160] Paul R. Rosenbaum and Donald B. Rubin. “The central role of the propensity score in observational studies for causal effects”. In: Biometrika 70.1 (Apr. 1983), pp. 41–55.issn: 00063444. doi: 10.1093/biomet/70.1.41. [161] Donald B Rubin. “Causal Inference Using Potential Outcomes”. In: Journal of the American Statistical Association 100.469 (2005), pp. 322–331.issn: 0162-1459.doi: 10.1198/016214504000001880. [162] Donald B. Rubin. “Bayesian Inference for Causal Effects: The Role of Randomization”. In: The Annals of Statistics 6.1 (2007).issn: 0090-5364.doi: 10.1214/aos/1176344064. [163] Donald B. Rubin. “Multiple Imputation after 18+ Years”. In: Journal of the American Statistical Association 91.434 (1996).issn: 1537274X.doi: 10.1080/01621459.1996.10476908. [164] Virginia Sapiro. “If U.S. Senator Baker Were A Woman: An Experimental Study of Candidate Images”. In: Political Psychology 3.1/2 (1981).issn: 0162895X.doi: 10.2307/3791285. [165] Erel Segal-Halevi and Warut Suksompong. “Democratic fair allocation of indivisible goods”. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence . AAAI Press. 2018, pp. 482–488. [166] Erel Segal-Halevi and Warut Suksompong. “Democratic fair allocation of indivisible goods”. In: Artificial Intelligence 277 (2019), p. 103167. [167] Amartya Sen. On economic inequality. Oxford university press, 1997. [168] Tom Simonite. Meet the secret algorithm that’s keeping students out of college. 2020. 126 [169] Arvind Narayanan Solon Barocas Moritz Hardt. “Fairness in Machine Learning Limitations and Opportunities”. In: Book (2020). [170] Peter Spirtes and Kun Zhang. “Causal discovery and inference: concepts and recent methodological advances”. In: Applied Informatics 3.1 (2016).issn: 2196-0089.doi: 10.1186/s40535-016-0018-x. [171] Ana-Andreea Stoica, Jessy Xinyi Han, and Augustin Chaintreau. “Seeding Network Influence in Biased Networks and the Benefits of Diversity”. In: Proceedings of The Web Conference. 2020, pp. 2089–2098. [172] Xuanming Su and Stefanos Zenios. “Patient choice in kidney allocation: The role of the queueing discipline”. In: Manufacturing & Service Operations Management 6.4 (2004), pp. 280–301. [173] Warut Suksompong. “Approximate maximin shares for groups of agents”. In: Mathematical Social Sciences 92 (2018), pp. 40–47. [174] William Thomson. “Problems of fair division and the egalitarian solution”. In:Journalof Economic Theory 31.2 (1983), pp. 211–226. [175] Alan Tsang, Bryan Wilder, Eric Rice, Milind Tambe, and Yair Zick. “Group-Fairness in Influence Maximization”. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19. 2019, pp. 5997–6005.doi: 10.24963/ijcai.2019/831. [176] Vasileios Tzoumas, Konstantinos Gatsis, Ali Jadbabaie, and George J Pappas. “Resilient monotone submodular function maximization”. In: 2017 IEEE 56th Annual Conference on Decision and Control (CDC). IEEE. 2017, pp. 1362–1367. [177] United States Interagency Council on Homelessness. Opening doors: Federal strategic plan to prevent and end homelessness. Tech. rep. US Interagency Council on Homelessness, 2015. [178] Thomas Valente, Anamara Ritt-Olson, Alan Stacy, Jennifer Unger, Janet Okamoto, and Steve Sussman. “Peer acceleration: effects of a social network tailored substance abuse prevention program among high-risk adolescents”. In: Addiction 102.11 (2007), pp. 1804–1815. [179] Tyler J. VanderWeele and Whitney R. Robinson. “On the causal interpretation of race in regressions adjusting for confounding and mediating variables”. In: Epidemiology 25.4 (2014). issn: 15315487.doi: 10.1097/EDE.0000000000000105. [180] Hal R Varian. “Equity, envy, and efficiency”. In: (1973). [181] Phebe Vayanos, Angelos Georghiou, and Han Yu. “Robust optimization with decision-dependent information discovery”. In: arXiv preprint arXiv:2004.08490 (2020). [182] Phebe Vayanos, Daniel Kuhn, and Bercc Rustem. “Decision rules for information discovery in multi-stage stochastic programming”. In: 2011 50th IEEE Conference on Decision and Control and European Control Conference. IEEE. 2011, pp. 7368–7373. 127 [183] Stefan Wager and Susan Athey. “Estimation and Inference of Heterogeneous Treatment Effects using Random Forests”. In: Journal of the American Statistical Association 113.523 (2018), pp. 1228–1242.issn: 1537274X.doi: 10.1080/01621459.2017.1319839. [184] Jean Walrand. “Lecture notes on probability theory and random processes”. In: (2004). [185] Bryan Wilder, Laura Onasch-Vera, Graham Diguiseppi, Robin Petering, Chyna Hill, Amulya Yadav, Eric Rice, and Milind Tambe. Clinical trial of an AI-augmented intervention for HIV prevention in youth experiencing homelessness. 2020. arXiv: 2009.09559[cs.SI]. [186] Alice Xiang. “Reconciling legal and technical approaches to algorithmic bias”. In: Tenn. L. Rev. 88 (2020), p. 649. [187] Alice Xiang and Donald B. Rubin. “Assessing the potential impact of a nationwide class-based affirmative action system”. In: Statistical Science 30.3 (2015).issn: 08834237.doi: 10.1214/15-STS514. [188] Amulya Yadav, Hau Chan, Albert Jiang, Haifeng Xu, Eric Rice, and Milind Tambe. “Using social networks to aid homeless shelters: Dynamic influence maximization under uncertainty”. In: Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS. 2016. [189] İhsan Yanıkouglu, Bram L. Gorissen, and Dick den Hertog. “A survey of adjustable robust optimization”. In: European Journal of Operational Research 277.3 (2019), pp. 799–813. [190] Naohiro Yonemoto, Yoshitaka Kawashima, Kaori Endo, and Mitsuhiko Yamada. “Gatekeeper training for suicidal behaviors: A systematic review”. In: Journal of affective disorders 246 (2019), pp. 506–514. [191] Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rogriguez, and Krishna P Gummadi. “Fairness constraints: Mechanisms for fair classification”. In: Artificial Intelligence and Statistics . PMLR. 2017, pp. 962–970. [192] Chongjie Zhang and Julie A Shah. “Fairness in multi-agent sequential decision-making”. In: Advances in Neural Information Processing Systems 27 (2014). [193] Junzhe Zhang and Elias Bareinboim. “Equality of opportunity in classification: A causal approach”. In: Advances in Neural Information Processing Systems. Vol. 2018-December. 2018. [194] Junzhe Zhang and Elias Bareinboim. “Fairness in decision-making the causal explanation formula”. In: 32nd AAAI Conference on Artificial Intelligence, AAAI 2018 . 2018. 128 Appendices 129 AppendixA TechnicalAppendixtoChapter1 A.1 ExperimentalResultsinSection1.6 Data and Data Preprocessing. The original datasets used throughout our paper are described in detail in [17]. They present 8 racial groups, with each individual belonging to a single group. To avoid misinter- pretation of the results, we collect racial groups with a population<10% of the network sizeN under the “Other” category. The racial composition of the networks after the preprocessing is provided in Table A.1. For instance, network SPY1 consists of54% White, 11% Black, 15% Mixed and20% Others. The empty entry for Hispanic indicates that their population was less than10%; as a result, they are categorized under “Other”. Network Name White Black Hispanic Mixed Other SPY1 54 11 – 15 20 SPY2 55 – 11 21 13 SPY3 58 – 10 18 14 MFP1 16 38 22 16 8 MFP2 16 32 22 20 10 Table A.1: Racial composition (%) of the social networks considered after preprocessing Setting of Parameter W. We now describe in detail the procedure we use to select W in our ex- periments. As noted in Section 1.3, to achieve maximin fairness, W must take the maximum value for which the problem is feasible (fairness constraints satisfied). Its value thus depends on other parameters, includingI,J, andK. In our experiments, we conduct a search to identify the best value ofW for each 130 setting. Specifically, we vary W from 0 to 1, in increments of 0.04; we employ the largestW for which the problem is feasible. By construction, this choice of W guarantees that all of the fairness constraints are satisfied. In Table A.2, we provide the values of W associated with the results in Table 1.2 forI = N/3 andK =3 and for each of the values ofJ. Network Name J =1 J =2 J =3 J =4 J =5 SPY1 0.44 0.40 0.36 0.32 0.32 SPY2 0.56 0.52 0.48 0.44 0.36 SPY3 0.44 0.36 0.32 0.28 0.24 MFP1 0.52 0.48 0.44 0.40 0.32 MFP2 0.56 0.52 0.44 0.40 0.32 Table A.2: Values of W output by our search procedure and used in the experiments associated with Table 1.2. Head-to-HeadComparisonwithTable1.1. We conduct a head-to-head comparison of our approach with the results from Table 1.1 which motivated our work. The results are summarized in Table A.3. From the table we observe a consistent increase of 8-14% in worst-case coverage of the worse-off group. For example, inSPY3, the coverage of Hispanics has increased from 33% to 44%. We can also see that the PoF is moderate, ranging from 1-4.2%. The result for theMFP1 network suggests a36% increase in the coverage of the “Other” group. We note that, by construction, this group consists of racial minorities with a population less than 10% of the network size. While this increase has impacted the coverage of “majority” groups, the worst-case coverage of the worse-off group has increased by 14% with a negligible PoF of 2.6%. A.2 ProofofStatementsinSection1.3 Proof of Lemma 1. For the special case when all monitors are available (Ξ= {e}), there is a single commu- nity (C =1), and no fairness constraints are imposed (W =0), Problem (RC fair ) reduces to the maximum coverage problem, which is known to beNP-hard [68]. ■ 131 Network Name Network Size (N) Worst-case coverage of individuals by racial group (%) PoF (%) White Black Hispanic Mixed Other SPY1 95 65 (70) 45 (36) – 79 (86) 88 (94) 3.3 SPY2 117 81 (78) – 50 (42) 72 (76) 73 (67) 1.0 SPY3 118 90 (88) – 44 (33) 85 (95) 87 (69) 4.2 MFP1 165 85 (96) 69 (77) 42 (69) 73 (73) 64 (28) 2.6 MFP2 182 56 (44) 80 (85) 70 (70) 71 (77) 72 (72) 3.4 Table A.3: Reduction in racial discrimination in node coverage resulting from applying our proposed al- gorithm relative to that of [176] on the five real-world social networks from Table A.1, when 1/3 of nodes (individuals) can be selected as monitors, out of which at most 10% may fail. The numbers correspond to the worst-case percentage of covered nodes across all monitor availability scenarios. The numbers in the parentheses are solutions to the state-of-the-art algorithm [176] (same numbers as in Table 1.1. a. Original Graph b. With fairness c. Without fairness Table A.4: Companion figure to Lemma 2. The figures illustrate a network sequence {G N } ∞ N=5 param- eterized by N and consisting of two disconnected clusters: a small and a large one, with 4 and N − 4 nodes, respectively. The small cluster remains intact as N grows. The nodes in the large cluster form a clique. In the figures, each color (white, grey, black) represents a different group and we investigate the price of imposing fairness across these groups. The subfigures show the original graph (a) and an op- timal solution when I = 2 monitors can be selected in the cases (b) when fairness constraints are not imposed and (c) when fairness constraints are imposed, respectively. It holds that OPT fair (G N ,2,0) = 4 and OPT(G N ,2,0)=N− 3 so that the PoF inG N converges to one asN tends to infinity. A.3 ProofsofStatementsinSection1.4 In all of our analysis, we assume the graphs are undirected. This can be done without loss of generality and the results hold for directed graphs. A.3.1 Worst-CasePoF Proof of Lemma 2. Let{G N } ∞ N=5 denote the graph sequence shown in Figure A.4(a) (wherein all edges are bidirectional). The network consists of three groups (e.g., racial groups) for which fair treatment is important. NetworkG N consists of two disjoint clusters: one involving four nodes and a bigger clique 132 containing the remaining(N− 4) nodes. Suppose that we can chooseI = 2 nodes as monitors and that all of them are available (J = 0). Observe that Problem (RC fair ) is feasible only if0≤ W ≤ (N − 3) − 1 . ForW =(N− 3) − 1 , the optimal solution places both nodes in the smaller cluster, see Figure A.4(b). This way, at least one node from each group is covered. The total coverage for the fair solution is then equal to OPT fair (G N ,2,0) = 4. The maximum achievable coverage under no fairness constraints, however, is obtained by placing one monitor in each cluster, see Figure A.4(c). Thus, the total coverage is equal to OPT(G N ,2,0)=N− 3. As a result,PoF(G N ,2,0)=1− 4(N− 3) − 1 and forN ≥ 4/ϵ +3, it holds that PoF(G N ,2,0)≥ 1− ϵ . The proof is complete. ■ A.3.2 SupportingResultsforthePoFDerivation In this section, we provide the preliminary results needed in the derivation of the PoF for both the deter- ministic and robust graph covering problems. First, we provide two results (Lemmas 4 and 5) from the literature which characterize the maximum degree, as well as the expected number of maximum-degree nodes in sparse Erdős Rény graphs [65, 80]. We note that in SBM graphs which are used in our PoF anal- ysis, each communityc∈C, when viewed in isolation, is an instance of the Erdős Rényi graph, in which each edge exists independently with probabilityp in c . These results are useful to evaluate the coverage of each communityc∈C under the sparsity Assumption 1. Specifically, they enable us to show in Lemma 6 that, in sparse Erdős Rényi graphs, the coverage can be evaluated approximately as the sum of the degrees of the monitoring nodes. Thus, the maximum coverage within each community in an SBM network can obtained by selecting the maximum degree nodes. Lastly, we prove Lemma 8 which will be useful to show that coverage from monitoring nodes in other communities in SBM networks is negligible. 133 In what follows, we useG N,p to denote a random instance of Erdős Rény graphs on vertex setN(= {1,...,N}), where each edge occurs independently with probabilityp. Following the notational conven- tions in [75], we will say that a sequence of events{A n } N n=1 occurs with high probability iflim n→∞ P(A n )= 1 and, given a graphG, we let∆( G), the maximum degree of vertices ofG. Theorem 4 ([75, Theorem 3.4]). Let{G N,p } ∞ N=1 a sequence of graphs. If p = Θ( N − 1 ), then with high probability lim N→∞ ∆( G N,p )= logN loglogN . Lemma5. Let{G N,p } ∞ N=1 asequenceofgraphswithp=Θ( N − 1 ). Letσ (N):=logN(loglogN) − 1 . Then, it holds that E[X σ (N) (G N,p )]≥ N logloglogN− o(1) loglogN , whereX σ (N) (G N,p ) is the number of vertices of degreeσ (N) inG N,p . Proof. We borrow results from [75, Theorem 3.4], where the authors show that E[X σ (N) (G N,p )]=exp logN loglogN (logloglogN− o(1))+O logN loglogN +2logloglogN , We further simplify the expression in Lemma 5 by eliminating theO(.) term and we obtain E[X σ (N) (G N,p )]≥ N logloglogN− o(1) loglogN , ■ 134 Lemma 5 ensures that our budget for selecting monitors I = O(logN), is (asymptotically) smaller than number of nodes with degree∆( G N,p ). Lemma6. Let{G N,p } ∞ N=1 beasequenceofgraphswithp=Θ( N − 1 ). Supposethatthenumberofmonitors isI =O(logN). Then,forallν ,thereexistsagraphG N,p suchthatthedifferencebetweentheexpectedmax- imum coverage inG N,p and the expected number of neighbors of the monitoring nodes is bounded. Precisely, ifx(G N,p ) is the indicator vector of the highest degree nodes inG N,p , we have X n∈N E x n (G N,p )|δ G N,p (n)| − E F G N,p (x(G N,p ),e) ≤ ν, whereδ G N,p (n) is the set of neighbors ofn inG N,p andν is the error term and it isν =o(1). Proof. Let Y n be the event that node n is covered. Also, let Z i n the event that node n is covered by the ith highest degree node (and by potentially other nodes too). Without loss of generality, assume that the nodes with lower indexes have higher degrees, i.e.,|δ (1)|≥···≥| δ (N)|. The probability that noden is covered can be written as P(Y n )=P ∪ I i=1 Z i n . (A.1) From the Bonferroni inequalities, we have P(∪ I i=1 Z i n ) ≥ I X i=1 P(Z i n )− I X j=i P(Z i n ∩Z j n ) (A.2) and P(∪ I i=1 Z i n ) ≤ I X i=1 P(Z i n ). (A.3) Define Y := P N i=1 Y n as the (random) total coverage. With a slight abuse of notation, we viewY n and Z i n as Bernoulli random binary variables that are equal to 1 if and only if the associated event occurs. As a 135 result, we can substitute the probability terms with their expected values. Combining Equations (A.1), (A.2) and (A.3), we obtain I X i=1 E[Z i n ]− I X j=i E[Z i n Z j n ] ≤ E[Y n ] ≤ I X i=1 E[Z i n ], ∀n∈N, where we used the fact thatP(Z i n ∩Z j n )=P(Z i n )P(Z j n )=E(Z i n )E(Z j n )=E(Z i n Z j n ) by independence of the eventsZ i n andZ j n . Summing over alln yields X n∈N I X i=1 E[Z i n ]− I X j=i E[Z i n Z j n ] ≤ X n∈N E[Y n ] ≤ X n∈N I X i=1 E[Z i n ]. Changing the order of the summations, it follows that I X i=1 X n∈N E[Z i n ]− I X j=i X n∈N E[Z i n Z j n ] ≤ E[Y] ≤ I X i=1 X n∈N E[Z i n ], where we have usedE[Y] = P N i=1 E[Y n ]. By definition of δ G N,p (i), sincex i (G N,p ) = 1 fori = 1,...,I, it holds that the number of nodes covered by nodei, P n∈N E[Z i n ] = E[|δ G N,p (i)|]. Also, we remark that E[Y]=E[F G N,p (x(G N,p ), e)]. Thus, the above sequence of inequalities is equivalent to I X i=1 E[|δ G N,p (i)|]− I X j=i X n∈N E[Z i n Z j n ] ≤ E[F G N,p (x(G N,p ),e)] ≤ I X i=1 E[|δ G N,p (i)|], where, by reordering terms, we obtain 0 ≤ I X i=1 E[|δ G N,p (i)|]− E[F G N,p (x(G N,p ),e)] ≤ I X i=1 I X j=i X n∈N E[Z i n Z j n ]. 136 Note thatE[x n (G N,p )]=1,∀n≤ I since by assumption the nodes are ordered by decreasing order of their degree, so the nodes indexed from 1 toI are selected in each realization of the graph. Thus, I X i=1 E[|δ G N,p (i)|] = X n∈N E[x n (G N,p )]E |δ G N,p (n)| = X n∈N E x n (G N,p )|δ G N,p (n)| , which yields X n∈N E x n (G N,p )|δ G N,p (n)| − E[F G N,p (x(G N,p ),e)] ≤ I X i=1 I X j=i X n∈N E[Z i n Z j n ]. (A.4) The right-hand side of Equation (A.4) is the error term. We denote this term byν . This error term deter- mines the difference between the true value of the coverage and the expected sum of the degrees of the monitoring nodes. Given thatp = Θ( N − 1 ), we can precisely evaluate the error term. First, we note that since in the Erdős-Rényi model edges are drawn independently, we can writeE[Z i n Z j n ]=E[Z i n ]E[Z j n ]. Us- ing Theorem 4 and Lemma 5, and given that the monitors are the highest degree nodes in any realization of the graph, we can write E[Z i n ]=E[Z j n ]=Θ 1 N logN loglogN . We thus obtain ν =Θ I 2 N logN loglogN 2 . By the assumption on the order ofI, it follows thatlim N→∞ ν =0, which concludes the proof. ■ We now prove the following lemma which will be used in proof of the subsequent results. 137 Lemma 7. LetX i fori = 1,...,Q beQ i.i.d samples from normal distribution with meanµ and standard deviationσ . Also, letZ =max i∈{1,··· ,Q} X i . It holds that E[Z]≤ µ +σ p 2logQ. Proof. By Jensen’s inequality, exp(tE[Z])≤ E[exp(tZ)] = E[exp(t max i=1,...,Q X i )] ≤ Q X i=1 E[exp(tX i )] = Qexp(µt +t 2 σ 2 /2), where the last equality follows from the definition of the Gaussian moment generating function. Taking the logarithm of both sides of this inequality, we can obtain E[Z]≤ µ + logQ t + tσ 2 2 . For the tightest upper-bound, we sett= p 2logQ/σ . Thus, we obtain E[Z]≤ µ +σ p 2logQ. ■ Lemma 8. Consider B N,M,p to be a random instance of a bipartite graph on the vertex setN = L∪R, where N = |R∪L| and M := |R| and p = O (Mlog 2 M) − 1 is the probability that each edge exists 138 (independently). SupposethatmonitoringnodescanonlybechosenfromthesetLandthatatmostI monitors can be selected. Then, it holds that E max x∈{0,1} |L| : P n∈L xn=I F B N,M,p (x,e) =IO 1 log 2 M . Proof. We note that the degree of nodei,δ B N,M,p (i), follows a binomial distribution with meanMp. Given we are interested in N,M → ∞, we can approximate the binomial distribution with a normal distribu- tion [184] with meanMp and standard deviation p Mp(1− p). Using the result of Lemma 7, we obtain E[∆ B N,M,p ]=O Mp+ p Mp(1− p) p 2log(N− M) =O(Mp). Using the above result combined with the assumption onp, we can bound the expected maximum degree ofB. E[∆ B N,M,p ]=O 1 log 2 M . As a result, the maximum expected coverage of theI monitoring nodes is upper-bounded as E max x∈{0,1} N : P n∈L xn=I F B N,M,p (x, e) ≤ IE[∆ B N,M,p ] = IO 1 log 2 M . and the proof is complete. ■ A.3.3 PoFintheDeterministicCase Next, we prove the main result which is the derivation of the PoF for the deterministic graph covering problem. The idea of the proof is as follows: by Lemmas 5 and 6, we are able to evaluate the coverage 139 of each community. By Lemma 8, we upper bound the between-community coverage. In other words, based on Lemma 8, we conclude that in every instance of the coverage problem, the between-community coverage is zero (asymptotically) with high probability. Thus, the allocation of monitoring nodes is only dependant on the within-community coverage. Using this observation, we can determine the allocation of the monitors both in the presence and absence of fairness constraints. Subsequently, we are able to evaluate the coverage in both cases. PoF can be then computed based on these two quantities, see Equation (1.2). Proof of Proposition 1. LetS N be a random instance of the SBM network with sizeN. Considers(S N )∈ Z C to be the number of allocated monitoring nodes to each of theC communities, i.e.,s c (S N )= P n∈Nc x n (S N ). Using the result of Lemmas 6 and 8, we can measure the expected maximum coverage as lim N→∞ E[OPT(S N ,I,0)]= lim N→∞ E max x(S N )∈X F S N (x,e) =E lim N→∞ max x(S N )∈X F S N (x,e) , where the last equality is obtained by exchanging the expectation and limit. Using Lemma 4 and since the maximum degree is convergent tod(c), we can exchange the limit and maximization term. Thus, we will have E lim N→∞ max x(S N )∈X F S N (x,e) = E max x(S N )∈X lim N→∞ F S N (x,e) = E " max s(S N )∈Z C X c∈C s c (S N )d(c)+o(1) # , which given thatd(c) is only dependent on the size of the communities inS N is equivalent to lim N→∞ E[OPT(S N ,I,0)]= max s(S N ) X c∈C s c (S N )d(c)+o(1). (A.5) Equation (A.5) suggests that for large enoughN, the maximum coverage is only dependent on thenumber of the monitoring nodes allocated to each community. Also, the allocation is the same for all random 140 instances so we can drop the dependence ofs onS N . In right-hand side of Equation (A.5), the first term is the within-community (Lemma 6), and the second term is the between-community (Lemma 8) coverage. In the analysis below, all the evaluations are for large enoughN. Therefore, we drop thelim N→∞ for ease of notation. According to Equation (A.5) the between-community coverage is negligible, compared to the within-community coverage. This suggests that the maximum achievable coverage will be obtained by placing all the monitoring nodes in the largest community, with the largest value of d(c), where the assumption onI, as given in the premise of the proposition, combined with Lemma 5 guarantee that such a selection is possible. Thus, we obtain E[OPT(S N ,I,0)]=Id(C)+o(1). Next, we measureE[OPT fair (.)], where in addition to optimization problem in Equation (A.5), the allocation is further restricted to satisfy all the fairness constraints. s c |N c | d(c)+o(1)≥ W ∀c∈C, (A.6) in which, o(1) is the term that compensates for the coverage of the nodes in other communities, and is small due to the regimes ofp out cc ′ , ∀c,c ′ ∈C and the budgetI. At optimality and for the maximum value of W , we have s c |N c | − 1 d(c)− s c ′|N c ′| − 1 d(c ′ ) ≤ δ ∀c,c ′ ∈C,δ ≤ d(1)|N 1 | − 1 − d(C)|N C | − 1 . This holds because otherwise one can remove on node from the group with higher value ofs c |N c | − 1 d(c) to a group with less value and thus increase the normalized coverage of the worse-off group and this contradicts the fact that W is the maximum possible value. This suggests that in a fair solution, the 141 normalized coverage is almost equal across different groups, given that lim N→∞ δ = 0. As a result, the monitoring nodes should be such that W ≤ s c |N c | d(c)+o(1)≤ W +δ, ∀c∈C. From this, it follows that W − o(1)≤ s c |N c | d(c)≤ W +o(1). (A.7) By assumption, there must be an integrals c that satisfies the above relation. Note that if we could relax the integrality assumption,s c = W|N c |d(c) − 1 . Due to the integrality constraint, and according to Equa- tion (A.7), we sets c |N c | − 1 d(c) = W +o(1), where the o(1) term is to account for the discretizing error, which results in s c = W|N c |d(c) − 1 + O(1), where O(1) ≤ 1 (As we can not make a higher error in rounding). Also, since P c∈C s c =I, we can obtain the value ofW as W = I P c∈C |Nc| d(c) +o(1). As a result s c = I P c∈C |Nc| d(c) |N c | d(c) +O(1) ∀c∈C. We now define κ :=I P c∈C |Nc| d(c) − 1 for a compact representation. 142 So far, we obtained the allocation of the monitoring nodes to satisfy the fairness constraints. This is enough to evaluate the coverage under the fairness constraints. Now, we can evaluate the PoF as defined by Equation (1.2). E[OPT(S N ,I,0)] = Id(C) ⇒ − 1 E[OPT(S N ,I,0)] = − 1 Id(C) ⇒ − E[OPT fair (S N ,I,0)] E[OPT(S N ,I,0)] = − κ P c∈C |Nc| d(c) d(c) Id(C) − o(1) ⇒ 1− E[OPT fair (S N ,I,0)] E[OPT(S N ,I,0)] = 1− κ P c∈C |Nc| d(c) d(c) Id(C) − o(1) ⇒ PoF(I,0) = 1− κ P c∈C |N c | Id(C) − o(1) ⇒ PoF(I,0) = 1− P c∈C |N c | P c∈C |N c |d(C)/d(c) − o(1). ■ A.3.4 PoFintheRobustCase Proof of Proposition 2. The idea of the proof is similar to Proposition 1, with the exception that the fair allocation of the monitoring nodes will be affected by the uncertainty. Consider s to be the number of allocated monitoring nodes to each of theC communities, i.e.,s c = P n∈Nc x n . Using the result of lemma 6, and 8, we can measure the expected maximum coverage as E[OPT(S N ,I,J)]=(I− J)d(c)+o(1). 143 That is because, in the worst-case J nodes fail, thus only (I− J) nodes can cover the graph. Next, we measureE[OPT fair (.)], where in addition to optimization problem in Equation (A.5), the allocation is further restricted to satisfy all the fairness constraints. Given that at most J nodes may fail, we need to ensure after fairness constraints are satisfied after the removal of J nodes. We momentarily revisit the fairness constraint in the deterministic case. s c |N c | d(c)+o(1)≥ W ∀c∈C, in which, o(1) is the term that compensates for the coverage of the nodes in other communities, and is small due to the regimes of p out , and the budget I. Under the uncertainty, we need to ensure that these constraints are satisfied even after J nodes are removed. In other words (s c − J) |N c | d(c)+o(1)≥ W ∀c∈C. At optimality and for the maximum value ofW , we have (s c − J)|N c | − 1 d(c)− (s c ′− J)|N c ′| − 1 d(c ′ ) ≤ δ ∀c,c ′ ∈C,δ ≤ d(1)|N 1 | − 1 − d(C)|N C | − 1 . This holds because otherwise one can remove on node from the group with higher value ofs c |N c | − 1 d(c) to a group with less value and thus increase the normalized coverage of the worse-off group and this contradicts the fact thatW is the maximum possible value. This suggests that in a fair solution, the normalized coverage is almost equal across different groups, given thatδ → 0, asN c →∞,∀c∈C. Following the proof of Proposition 1, the discretizing error can be 144 handled by setting(s c − J)|N c | − 1 d(c)=W +o(1), where the o(1) term is to account for the discretizing error. As a result s c = |N c |W d(c) +J +O(1), whereO(1)≤ 1 (As we can not make a higher error in rounding). This suggests that a fair allocation is the one that places J nodes in each community, regardless of the community size. The remaining monitors are allocated with respect to the relative size of the communities. Summing over alls c and since P c∈C s c =I we obtain W = (I− CJ) P c∈C |Nc| d(c) +o(1). As a result s c = (I− CJ) P c∈C |Nc| d(c) |N c | d(c) +J +O(1) ∀c∈C. As defined in the premise of the proposition, η =(I− CJ) P c∈C |Nc| d(c) − 1 . So far, we obtained the allocation of the monitoring nodes, to satisfy the fairness constraints. Now, we evaluate the coverage, i.e., objective value of Problem (RC fair ), under the obtained fair alloca- tion. Since the fairness constraints are satisfied under all the scenarios, the worst-case scenario is the one that results in the maximum loss in the total coverage. This corresponds to the case thatJ nodes from the largest community(N C ) fail. As a result the expected coverage can be obtained by E[OPT fair (S N ,I,J)]= X c∈C η |N c | d(c) d(c)+Jd(c)+O(1)d(c) − Jd(C). 145 Now, we can evaluate the PoF as defined by Equation (1.2). E[OPT(S N ,I,J)] = (I− J)d(C) ⇒ − 1 E[OPT(S N ,I,J)] = − 1 (I− J)d(C) ⇒ − E[OPT fair (S N ,I,J)] E[OPT(S N ,I,J)] = − P c∈C (η |N c |+Jd(c))− Jd(C) (I− J)d(C) − o(1) ⇒ 1− E[OPT fair (S N ,I,J)] E[OPT(S N ,I,J)] = 1− P c∈C η |N c |+ P c∈C\{C} Jd(c) (I− J)d(C) − o(1) ⇒ PoF(I,J) = 1− P c∈C η |N c | (I− J)d(C) − J P c∈C\{C} d(c) (I− J)d(C) − o(1). ■ A.4 ProofsofStatementsinSection1.5 A.4.1 EquivalentReformulationasaMax-Min-MaxOptimization Proof of Proposition 3. Let¯ x be feasible in Problem (RC fair ). It follows that it is also feasible in Problem 1.3. For a fixed ¯ξ , we show that X c∈C F G,c (¯x, ¯ξ ) = max y X c∈C X n∈Nc y n s.t. y n ≤ X ν ∈δ (n) ¯ξ ν ¯x ν X n∈C y n ≥ W|N c |, ∀c∈C Since ¯x is feasible in Problem (RC fair ), it holds that 146 F G,c (¯x, ¯ξ ) = X n∈Nc y n (¯x, ¯ξ ) = X n∈Nc I X ν ∈δ (n) ¯ξ ν ¯x ν ≥ 1 ≥ W|N c | We define y ⋆ n = I P ν ∈δ (n) ¯ξ ν ¯x ν ≥ 1 which is feasible in Problem (1.3). Since the choice of ¯ξ was arbitrary, we showed that given a solution to Problem (RC fair ), we can always construct a feasible solution to Problem (1.3), thus the objective value of the latter is at least as high. We now prove the contrary, i.e., given a solution to Problem (1.3), we will construct a solution to Problem (RC fair ). Consider ¯x to be an optimal solution to Problem (RC fair ). Suppose there exists ¯ξ ∈ Ξ such that F G,c (¯x, ¯ξ ) < |N c |W ⇒ X n∈Nc I X ν ∈δ (n) ¯ξ ν ¯x ν ≥ 1 < |N c |W. However, since ¯x is feasible in Problem (RC fair ), we have that ∀ ˜ ξ ∈Ξ , ∃y n : y n ≤ X ν ∈δ (n) ˜ ξ ν ¯x ν X n∈Nc y n ≥|N c |W. 147 By construction,y n ≤ I P ν ∈δ (n) ˜ ξ ν ¯x ν ≥ 1 , ∀n∈N. Thus X c∈C X n∈Nc I X ν ∈δ (n) ˜ ξ ν ¯x ν ≥ 1 ≥ X c∈C X n∈Nc y n ≥ |N c |W. According to the above result, we showed that the optimal objective value of Problem (RC fair ) is at least as high as that of Problem (1.3). This completes the proof. ■ A.4.2 ExactMILPFormulationoftheK-AdaptabilityProblem In order to derive the equivalent MILP in Theorem 1, we start by a variant of the K-adaptability Prob- lem (1.4), in which we move the constraints of the inner maximization problem to the definition of the uncertainty set in the spirit of [84]. Next, we prove, via Proposition 12, that by relaxing the integrality constraint on the uncertain parametersξ , the problem remains unchanged, and this is the key result that enables us to provide an equivalent MILP reformulation for Problem (1.4). We replaceΞ with a collection of uncertainty sets parameterized by vectorsℓ∈L as in [84]. Specifi- cally, it follows from Proposition 2 in [84] that Problem (1.4) is equivalent to max min ℓ∈L min ξ ∈Ξ( x,y,ℓ) max k∈K: ℓ k =0 X n∈N y k n s.t. x∈X, y 1 ,...,y K ∈Y, (A.8) 148 whereΞ( x,y,ℓ) is defined through Ξ( x,y,ℓ) := ξ ∈Ξ: y k ℓ k > X ν ∈δ (ℓ k ) ξ ν x ν , ∀k∈K :ℓ k >0 y k n ≤ X ν ∈δ (n) ξ ν x ν ∀n∈N, ∀k∈K :ℓ k =0 , and, with a slight abuse of notation, we usey :={y 1 ,...,y K }. The vectorℓ∈L encodes which of the K candidate covering schemes are feasible. By introducingℓ, the constraints of the inner maximization problem are absorbed in the parameterized uncertainty setsΞ( x,y,ℓ), and in the inner-most maximization problem, any covering scheme can be chosen for whichℓ k =0. Note that, for any fixed x ∈ X , y ∈ Y K , and ℓ ∈ L, the strict inequalities in Ξ( x,y,ℓ) can be converted to (loose) inequalities as in Ξ( x,y,ℓ)= ξ ∈Ξ: y k ℓ k ≥ X ν ∈δ (ℓ k ) ξ ν x ν +1, ∀k∈K :ℓ k >0 y k n ≤ X ν ∈δ (n) ξ ν x ν ∀n∈N, ∀k∈K :ℓ k =0 . 149 This idea was previously leveraged in [153]. It follows naturally since all decision variables and uncertain parameters are binary. Next, we show that we can obtain an equivalent problem by relaxing the integrality constraint on the setΞ in the definition of Ξ( x,y,l). Consider the following problem max min ℓ∈L min ξ ∈Ξ( x,y,ℓ) max k∈K: ℓ k =0 X n∈N y k n s.t. x∈X, y∈Y K , (A.9) where the uncertainty set is obtained by relaxing the integrality constraints onξ , i.e., Ξ( x,y,ℓ)= ξ ∈T : y k ℓ k ≥ X ν ∈δ (ℓ k ) ξ ν x ν +1, ∀k∈K :ℓ k >0 y k n ≤ X ν ∈δ (n) ξ ν x ν ∀n∈N, ∀k∈K :ℓ k =0 . Proposition12. Under Assumption 3, Problems (A.8) and (A.9) are equivalent. Proof. Letx∈X ,y∈Y K , andℓ∈L. It suffices to show that min ξ ∈Ξ( x,y,ℓ) max k∈K: ℓ k =0 X n∈N y k n and min ξ ∈Ξ( x,y,ℓ) max k∈K: ℓ k =0 X n∈N y k n are equivalent. Observe that the these problems have the same objective function. Thus, the two problems have the same optimal objective value if and only if they are either both feasible or both infeasible. As a result, it suffices to show that Ξ( x,y,ℓ) is empty if and only ifΞ( x,y,ℓ) is empty. Naturally, ifΞ( x,y,ℓ)= ∅ thenΞ( x,y,ℓ)=∅ sinceT is the linear programming relaxation ofΞ . Thus, it suffices to show that the converse also holds, i.e., that ifΞ( x,y,ℓ)̸=∅, then alsoΞ( x,y,ℓ)̸=∅. 150 To this end, suppose thatΞ( x,y,ℓ)̸=∅ and let ˜ ξ ∈Ξ( x,y,ℓ). Then, ˜ ξ is such that ˜ ξ ∈T, y k ℓ k ≥ X ν ∈δ (ℓ k ) ˜ ξ ν x ν +1 ∀k∈K :ℓ k >0, y k n ≤ X ν ∈δ (n) ˜ ξ ν x ν ∀n∈N, ∀k∈K :ℓ k =0. (A.10) Next, define ˆ ξ n :=⌈ ˜ ξ n ⌉∀n∈N . We show that ˆ ξ ∈Ξ( x,y,ℓ). First, note that ˆ ξ ≥ ˜ ξ and by Assumption 3, it follows that ˆ ξ ∈T . Moreover, by construction, ˆ ξ ∈{0,1} N . Thus, it follows that ˆ ξ ∈Ξ . Next, we show that the constructed solution ˆ ξ also satisfies the remaining constraints in Ξ( x,y,ℓ). Fixk∈K such that ℓ k >0. Then, from (A.10) it holds that y k ℓ k ≥ X ν ∈δ (ℓ k ) ˜ ξ ν x ν +1 ⇒ y k ℓ k =1 and ˜ ξ ν x ν =0 ∀ν ∈δ (ℓ k ) ⇒ y k ℓ k =1 and ˜ ξ ν =0 ∀ν ∈δ (ℓ k ):x ν =1 ⇒ y k ℓ k =1 and ˆ ξ ν =0 ∀ν ∈δ (ℓ k ):x ν =1 ⇒ y k ℓ k ≥ X ν ∈δ (ℓ k ) ˆ ξ ν x ν +1, where the first and second implication follow since y andx are binary, respectively, and the third impli- cation holds by definition of ˆ ξ , 151 Next, fix k∈K such thatℓ k =0. Then, (A.10) yields y k n ≤ X ν ∈δ (n) ˜ ξ ν x ν ∀n∈N ⇒ y k n ≤ X ν ∈δ (n) ˆ ξ ν x ν ∀n∈N, which follows by definition of ˆ ξ . We have thus constructed ˆ ξ ∈ Ξ( x,y,ℓ) and therefore conclude that Ξ( x,y,ℓ)̸=∅. Since the choice ofx∈X ,y∈Y K , andℓ∈L was arbitrary, the claim follows. ■ Proposition 12 is key to leverage existing literature to reformulate Problem (1.4) as an MILP. The re- formulation is based on [84, 153]. Proof of Theorem 1. Note that the objective function of the Problem (A.8) is identical to min ℓ∈L min ξ ∈Ξ( x,y,ℓ) " max λ ∈∆ K (ℓ) X k∈K λ k X n∈N y k n # , where∆ K (ℓ) :={λ ∈ R K + : e ⊤ λ = 1, λ k = 0 ∀k ∈K :ℓ k ̸= 0}. We define ∂L :={ℓ∈L :ℓ≯ 0}, andL + := {ℓ ∈ L : ℓ > 0}. We remark that∆ K (ℓ) = ∅ if and only ifℓ > 0. IfΞ( x,y,ℓ) = ∅ for all ℓ∈L + , then the problem is equivalent to min ℓ∈∂L min ξ ∈Ξ( x,y,ℓ) " max λ ∈∆ K (ℓ) X k∈K λ k X n∈N y k n # . By applying the classical min-max theorem, we obtain min ℓ∈∂L max λ ∈∆ K (ℓ) min ξ ∈Ξ( x,y,ℓ) X k∈K λ k X n∈N y k n . 152 This problem is also equivalent to max λ (ℓ)∈∆ K (ℓ) min ℓ∈∂L min ξ ∈Ξ( x,y,ℓ) X k∈K λ k (ℓ) X n∈N y k n . If on the other handΞ( x,y,ℓ)̸=∅ for someℓ∈L + , the objective of Problem (A.8) evaluates to−∞ . Using the above results, we can write Problem (A.8) in epigraph form as max τ s.t. x∈X, y∈Y K , τ ∈R, λ (ℓ)∈∆ K (ℓ), ℓ∈∂L τ ≤ X k∈K λ k (ℓ) X n∈N y k n ∀ℓ∈∂L: Ξ( x,y,ℓ)̸=∅ Ξ( x,y,ℓ)=∅ ∀ℓ∈L + . (A.11) 153 We begin by reformulating the semi-infinite constraint associated with ℓ∈∂L in Problem (A.11). To this end, fix ℓ∈∂L and consider the linear program min 0 s.t. 0 ≤ ξ n ≤ 1 ∀n∈N A ⊤ ξ ≥ b y k ℓ k ≥ X ν ∈δ (ℓ k ) ξ ν x ν +1 ∀k∈K : ℓ k >0 y k n ≤ X ν ∈δ (n) ξ ν x ν ∀n∈N, ∀k∈K : ℓ k =0, whose dual reads max − e ⊤ θ (ℓ)+b ⊤ α (ℓ)− X k∈K ℓ k ̸=0 y k ℓ k − 1 ν k (ℓ)+ X k∈K ℓ k =0 X n∈N y k n β k n (ℓ) s.t. θ (ℓ)∈R N + , α (ℓ)∈R R + , β k (ℓ)∈R N + , ∀k∈K, ν (ℓ)∈R K + θ n (ℓ) ≤ A ⊤ α (ℓ)+ X k∈K ℓ k ̸=0 X ν ∈δ (ℓ k ) x ν ν k (ℓ)− X k∈K ℓ k =0 X ν ∈δ (n) x ν β k n (ℓ) ∀n∈N. In Problem (A.11) the constraint associated with eachℓ∈∂L is satisfied if and only if the objective value of the above dual problem is greater thanτ − P k∈K λ k (ℓ) P n∈N y k n . This follows since the dual is always feasible. Therefore, either the dual is unbounded in which case the primal is infeasible, i.e.,Ξ( x,y,ℓ)=∅, 154 and the constraint is trivial. Else, by strong duality, the primal and dual must have the same objective value (zero). As a result, the constraints in Problem (A.11) associated with eachℓ∈∂L can be written as τ ≤− e ⊤ θ (ℓ)+b ⊤ α (ℓ)− X k∈K ℓ k ̸=0 y k ℓ k − 1 ν k (ℓ)+ X k∈K ℓ k =0 X n∈N y k n β k n (ℓ)+ X k∈K λ k (ℓ) X n∈N y k n θ n (ℓ) ≤ A ⊤ α (ℓ)+ X k∈K ℓ k ̸=0 X ν ∈δ (ℓ k ) x ν ν k (ℓ)− X k∈K ℓ k =0 X ν ∈δ (n) x ν β k n (ℓ) ∀n∈N. Finally, the last constraint in Problem (A.11) is satisfied if the linear program min 0 s.t. 0 ≤ ξ n ≤ 1 ∀n∈N Aξ ≥ b y k ℓ k ≥ X ν ∈δ (ℓ k ) ξ ν x ν +1 ∀k∈K : ℓ k ̸=0 is infeasible. Using strong duality, this occurs if the dual problem max − e ⊤ θ (l)+α (ℓ) ⊤ b− X k∈K ℓ k ̸=0 y k ℓ k − 1 ν k (ℓ) s.t. θ (ℓ)∈R N + , α (ℓ)∈R R + , ν (ℓ)∈R K + θ n (ℓ) ≤ A ⊤ α (ℓ)+ X k∈K ℓ k ̸=0 X ν ∈δ (ℓ k ) x ν ν k (ℓ) ∀n∈N 155 is unbounded. Since the feasible region of the dual problem constitutes a cone, the dual problem is un- bounded if and only if there is a feasible solution with an objective value of 1 or more. ■ A.5 Bender’sDecomposition We do not detail all the steps of the Bender’s decomposition algorithm. We merely provide the initial relaxed master problem and the subproblems used to generate the cuts. We refer the reader to e.g., [37] for more details. RelaxedMasterProblem. Initially, the relaxed master problem only involves the binary variables of the Problem (1.5) and is expressible as max τ : τ ∈R, x∈X, y 1 ,...,y K ∈Y . 156 Subproblems. As discussed in Section 1.5, Problem (1.5) decomposes byℓ. Depending on the indexℓ of the subproblem, there are two types of subproblems to consider. Ifℓ∈L 0 , the subproblem is given by min 0 s.t. θ (ℓ), β k (ℓ)∈R N + , α (ℓ)∈R R + , ν (ℓ)∈R K + , λ (ℓ)∈∆ K (ℓ) τ ≤ − e ⊤ θ (ℓ)+b ⊤ α (ℓ)− X k∈K: ℓ k ̸=0 y k ℓ k − 1 ν k (ℓ)+... ...+ X k∈K: ℓ k =0 X n∈N y k n β k n (ℓ)+ X k∈K λ k (ℓ) X n∈N y k n θ n (ℓ) ≤ A ⊤ α (ℓ)+ X k∈K ℓ k ̸=0 X ν ∈δ (l k ) x ν ν k (ℓ)− X k∈K ℓ k =0 X ν ∈δ (n) x ν β k n (ℓ) ∀n∈N. (Z 0 (ℓ)) In a similar fashion, we define the subproblem associated with ℓ∈L + , given by min 0 s.t. θ (ℓ)∈R N + , α (l)∈R R + , ν (l)∈R K + 1≤− e ⊤ θ (l)+b ⊤ α (ℓ)− X k∈K ℓ k ̸=0 y k ℓ k − 1 ν k (ℓ) θ n (ℓ)≤ A ⊤ α (ℓ)+ X k∈K ℓ k ̸=0 X ν ∈δ (ℓ k ) x ν ν k (ℓ) ∀n∈N. (Z + (ℓ)) 157 AppendixB TechnicalAppendixtoChapter2 B.1 OmittedProofsfromSection2.5.2 Proof of Proposition 4. Let F : R N → R be an additive function in the form F(u) = N X i=1 f(u i ) where f :R→R is a monotonically increasing and strictly concave function. We are focusing on group fairness where the utility of each individual is given by the average utility of their community. Hence, we can rewriteF(u) = X c∈C N c f(u c ). Letu =u(A) andu ′ =u(A ′ ) denote the utility vectors corresponding to neighboring solutionsA andA ′ , respectively. Supposeu andu ′ are sorted in ascending order and for all c∈C, indexc in both vectors corresponds to the same community, i.e., after the transfer the ordering of the utilities has not changed. Furthermore, assume Σ κ ∈C:κ ≤ c N κ (u κ − u ′ κ ) ≥ 0,∀c ∈ C and u c > u ′ c for some c ∈ C. Clearlyu andu ′ satisfy the assumptions of the influence transfer principle. We need to show that Σ c∈C N c f(u c ) > Σ c∈C N c f(u ′ c ) orΣ c∈C N c f(u c )− f(u ′ c ) >0. The proof is by induction. We iteratively sweep the vectorsu andu ′ from the smallest index to the largest and show that for any κ ∈ C, Σ c≤ κ N c f(u c )− f(u ′ c ) ≥ 0 with inequality becoming strict for at least oneκ . To do so we repeatedly use a property of strictly concave functions known as decreasing marginal returns. According to this property f(x + δ x )− f(x) > f(y + δ y )− f(y) for x < y and δ x ≥ δ y >0. 158 Figure B.1: An illustration for the graph used in the proof of Proposition 5 without the correct scaling. There are three communities (circle, square and diamond) and they all have size 100. The circle community consists of an “all-circle" star structure with 80 vertices, 14 isolated vertices and a mixed star structure (shared with the diamond community) with 6 circle vertices. The square community consists of two “all- square" star structures with sizes 60 and 10 plus a set of 30 isolated vertices. The diamond community consists of an “all-diamond" star structure with 30 vertices, 66 isolated vertices and a mixed star structure (shared with the circle community) with 4 diamond vertices. More specifically, in our inductive step, we keep track of a “decrement budget” which we denote by ∆ . Intuitively if we can show thatΣ c≤ κ N c f(u c )− f(u ′ c ) > 0 with budget∆ for someκ , we can then use the decreasing marginal return property along with the assumption thatu ′ is sorted to show that as long asN κ +1 u ′ κ +1 − u κ +1 ≤ ∆ it is the case that Σ c≤ κ +1 N c f(u c )− f(u ′ c ) > 0. After each round we update the∆ and move on to the next element in the utility vectors. Formally, let∆=0 to start at the begining of this inductive process. After visiting thecth community, we simply update ∆ by ∆ ← ∆+ N c u c − u ′ c . By the assumption of the transfer principle ∆ is non- negative at all points of this iterative process and is strictly positive at some point during the process. Observe thatf(u 1 )≥ f(u ′ 1 ) sinceu 1 ≥ u ′ 1 by the assumption of the transfer principle and monotonicity off. We can use this as the base case. Sinceu andu ′ are sorted, given that∆ is non-negative, the fact that f is strictly concave (so that the decreasing marginal return property can be used) immediately implies that Σ c≤ κ N c f(u c )− f(u ′ c ) ≥ 0 at any iteration κ of the process. The inequality becomes strict for someκ given the assumption of the transfer principle. This proves the claim. ■ Proof of Proposition 5. Figure B.1 is an illustration of the graph that is used in the proof to witness the statement. We setp = 1 (deterministic spread) and number of initial seedsK = 4. Consider two choices of influencer vertices A andA ′ . LetA denote the choice that consists of the center of all-star structures that consist of a single community. LetA ′ denote the solution that is identical toA with the sole difference 159 that only the center of one of the all-square structures is chosen and the last seed is selected to be the center of the star structure that is the mix of circle and diamond communities. Clearly these two solutions are neighboring. The average utilities for these solutions are (diamond =0.3, square =0.7, circle =0.8) inu and(diamond =0.34, square =0.6, circle =0.86) inu ′ , respectively. Both solutions correspond to a total utility of 180 but the utility gap is∆( u) = 0.5 foru as opposed to the utility gap of∆( u ′ ) = 0.52 foru ′ . So a welfare function that obeys the utility gap reduction should preferu overu ′ . W α (u)−W α (u′) −4 −3 −2 −1 0 0.00 0.25 0.50 0.75 1.00 α value 1 10 −24 ( W α (u)−W α (u′ ) ) −2 −1 0 −50 −40 −30 −20 −10 0 α value Figure B.2: The difference of W α (u)− W α (u ′ ) on the vertical axis versus α on the horizontal axis for different welfare functions (this difference is scaled by a factor of 10 − 24 on the bottom panel). Top panel: W α (u)=Σ c∈C N c u α c /α forα ∈(0,1); bottom panel: W α (u)=Σ c∈C N c u α c /α forα< 0. We now show that no welfare function that satisfies the first 5 principles will prefer u overu ′ . Recall that such welfare functions are in the form W α (u) = Σ c∈C N c u α c /α for α < 1 and α ̸= 0, W α (u) = Σ c∈C N c log(u c ) for α = 0. We verify this claim numerically. In particular Figure B.2 plots the dif- ference of W α (u) − W α (u ′ ) for W α (u) = Σ c∈C N c u α c /α when α ∈ (0,1) (top panel) and α < 0 (bottom panel). This difference is always negative so u ′ is preferred by these welfare functions. For W α (u)=Σ c∈C N c log(u c ),W α (u)− W α (u ′ )≈− 4.3. We point out that the instance used in the proof (graph structure, probability of spread and the number of seeds) is designed with the sole purpose of simplifying the calculations of the utilities. It is possible to modify this instance to more complicated and realistic instances. ■ Proof of Proposition 6. LetA andA ′ denote two neighboring solutions with corresponding utility vectors u=u(A) andu ′ =u(A ′ ). Letu denote any of the two utility vectors such thatΣ c∈C N c u c ≥ Σ c∈C N c u ′ c . 160 Without loss of generality, we assumeu ′ is sorted in ascending order of the utilities andu is permuted so that index c ∈ C in bothu andu ′ corresponds to the same community. This is because we assume that W satisfies the symmetry principle due to which by permuting a utility vector the value of the welfare function does not change. Letν andκ ∈C denote the communities whose utilities are changed between u andu ′ , i.e., we assume ν and κ are the two communities where taking influencer vertices from ν and giving them toκ will transferu ′ intou. To satisfy the condition of the utility gap reduction principle, it should be the case thatu ′ ν ≥ u ′ κ (i.e., we transfer influencer vertices from the group with higher utility to a group with lower utility), otherwise after the transfer fromu ′ tou the utility gap could not get smaller (i.e.,∆( u)≥ ∆( u ′ ) in which case the utility gap reduction is not applicable). Assuming u ′ ν ≥ u ′ κ , if ∆( u) ≥ ∆( u ′ ), again the assumption of the utility gap reduction principle is not satisfied, hence the principle is not applicable and there is no need to study this case. Therefore, we further assume∆( u) < ∆( u ′ ). We would like to show in this case a welfare functionW that satisfies all the 5 other principles witnessesW(u)>W(u ′ ). By assumptionΣ c∈C N c u c ≥ Σ c∈C N c u ′ c . From this, it follows that: X c∈C N c u c − u ′ c ≥ 0 (B.1) ⇔N ν u ν − u ′ ν +N κ u κ − u ′ κ ≥ 0 (B.2) ⇔ X y∈C:y≤ x N y (u y − u ′ y )≥ 0, ∀x∈C, (B.3) where both inequalities (B.2) and (B.3) follow directly from the fact that the utilities of all the other communities are the same in bothu andu ′ . Finally, sinceu κ >u ′ κ (we are transferring influencer vertices to the community κ ), we can apply the influence transfer principle to show that W(u) > W(u ′ ) as claimed. ■ 161 Proof of Lemma 3. As we have shown earlier welfare functionsW α (u) = Σ c∈C N c log(u c ) forα = 0 and W α (u) = Σ c∈C N c u α c /α forα < 1,α ̸= 0 satisfy all the first 5 principles. In [118], the authors show that the composition of a non-decreasing concave function (in our caselog(x),α =0 orx α /α forα< 1,α ̸=0) and a non-decreasing submodular function (in our caseu c (A)) is submodular. Since the sum of submodular functions is submodular, our proposed class of welfare functions is submodular. Our welfare functions also satisfy monotonicity. This is becauseu c (A) is monotonically non-decreasing so its composition with another monotonically non-decreasing function (log(x) for α = 0 or x α /α for α < 1,α ̸= 0) will be monotonically non-decreasing. Since our welfare functions are the sum of monotonically non-decreasing function they are also monotone. ■ B.2 LeximinFairnessandSocialWelfare In this section, we show that leximin fairness can be captured by our welfare maximizing framework. See [86] for more details. Proposition 13. Welfare optimization is equivalent to the leximin fairness, i.e., there exists a constant α 0 , such forα<α 0 , an optimal solution to the welfare maximization satisfies leximin fairness and vice versa. Proof. Let u = (u 1 ,...,u N ) ≽ u(A) ∀A ∈ A ⋆ , where “≽” is the lexicographic ordering sign and it indicates thatu is a leximin fair solution (w.l.o.g. and with a slight abuse of notation, we assume that both u andu(A) are sorted in increasing order). We aim to show that∃α 0 < 0 such that for any α ≤ α 0 , Σ N i=1 u α i /α ≥ Σ N i=1 u α i (A)/α, ∀A∈A ⋆ . For simplicity we multiply both sides of the inequality by− 1/α and sinceα< 0 the direction of the inequality sign does not change. 162 We now prove this inequality by contradiction. Suppose∀α 1 < 0,∃α < α 1 such thatΣ N i=1 − u α i < Σ N i=1 − u α i (A),∃A∈A ⋆ . Sinceu is a leximin solution then by definition u 1 ≥ u 1 (A). We consider two cases. First supposeu 1 >u 1 (A). N X i=1 − u α i < N X i=1 − u α i (A)⇔ P N i=1 − u α i min(u 1 ,u 1 (A)) α < P N i=1 − u α i (A) min(u 1 ,u 1 (A)) α = P N i=1 − u α i u 1 (A) α < P N i=1 − u α i (A) u 1 (A) α ⇒ lim α →−∞ P N i=1 − u α i u 1 (A) α ≤ lim α →−∞ P N i=1 − u α i (A) u 1 (A) α ⇒ 0≤− N 1 . This is a contradiction sinceN 1 > 0. Now, supposeu 1 = u 1 (A). In this case, we can eliminate the first terms that involveu α 1 andu α 1 from the two sides of inequality and redo the above steps iteratively starting from the second biggest element inu α . Next, we prove the other direction. Let us assumeu is a utility vector such that∃α 0 < 0,∀α ≤ α 0 , N X i=1 − u α i ≥ N X i=1 − u α i (A), ∀A∈A ⋆ . W.l.o.g, we can assume thatu 1 ̸=u 1 otherwise we can remove those terms that are equal and the proof still holds. However, we assume this for ease of exposition. It follows that P N i=1 − u α i min(u 1 ,u 1 ) α ≥ P N i=1 − u α i (A) min(u 1 ,u 1 ) α , ∀A∈A ⋆ . If min(u 1 ,u 1 ) = u 1 meaning that u 1 > u 1 we have− C− ϵ (α ) ≥ − δ (α, A), ∀A ∈ A ⋆ where C > 0 is a constant (equal to the number of entities inu that are equal to u 1 ) and both ϵ ≥ 0 and δ ≥ 0 are functions of α and can be made arbitrarily small by decreasing α . This is a contradiction which means that min(u 1 ,u 1 ) = u 1 , i.e., u 1 ≥ u 1 . By continuing this procedure, we can establish thatu ≽ u. This completes the proof. ■ 163 B.3 OmittedProofsfromTable2.1 In this section we provide detailed description of the entries of Table 2.1 and their derivations. B.3.1 Monotonicity Proposition14. Exact DP does not satisfy monotonicity. Figure B.3: Companion figure to Proposition 14. The network consists of two communities circle and square, each of sizeN. Proof. LetK =2 andp∈(0,1). Consider a graphG as shown in Figure B.3 consisting of two communities, square and circle, each of sizeN (for large enoughN). The circle community consists of a star network of size N. The square community contains a star network of size 2 + p(N − 2) and (N − 2)(1− p) singletons. Consider two solutionsA andA ′ . A will select a seed from the periphery of the star for the circle community and allocate the other seed to the center of the star for the square community.A ′ on the other hand allocates each of the seeds to the center of the stars. Letu=u(A) andu ′ =u(A ′ ) denote the corresponding allocations ofA andA ′ . The utility vectors for these allocations areu=((1+p+p 2 (N− 2))/N,(1+p+p 2 (N − 2))/N) andu ′ = ((1+p(N − 1))/N,(1+p+p 2 (N − 2))/N), respectively. Clearly,u < u ′ . So by monotonicityu ′ is preferred tou. However, onlyu satisfies the exact DP. Hence, DP does not satisfy monotonicity. ■ Proposition15. Approximate DP does not satisfy monotonicity. Proof. Consider a graphG as shown in Figure B.4 consisting of two communities, square and circle, each of size N. We choose an arbitrary δ ∈ (0,1), to reflect the arbitrary strictness of a decision maker. Let δ < p < √ δ , K = 2 and N > max 3p/(p− δ ),1/(δ − p 2 ) . The optimal solutionA of the influence 164 maximization problem chooses the center of the star and any disconnected square vertex. InA, the utility of circle and square communities are(1+(N− 1)p)/N and(1+2p)/N, respectively and the utility gap exceeds δ (so this solution does not satisfy the DP constraints). By imposing DP, any fair solution is to choose one vertex from the periphery of the circle community and one from the isolated square vertices. For a fair solutionA ′ , the utilities of circle and square are 1+p+p 2 (N− 2) /N and (1 + 2p 2 )/N, respectively. Given the range of N, the utility gap is less than δ so approximate DP is satisfied. Since the utility of both communities have degraded, any monotone welfare function will preferA ′ (and its corresponding utility vector) overA. However, onlyA ′ is DP fair and hence it is preferred overA by DP. We point out that the graph used in the proof is directed. This is for ease of exposition. It is possible to create a more complex example with an undirected graph. ■ Figure B.4: Companion figure to Proposition 15. The network consists of two communities circle and square, each of sizeN. All edges except the two shown by arrows are undirected meaning that influence can spread both ways. Proposition16. ConsiderageneralfairnessnotionasasetofconstraintsintheformofF = u∈[0,1] C :u c ≥ l c , ∀c∈C wherel c ∀c∈C are arbitrary lower-bound values. The considered fairness notion satisfies the monotonicity principle. Proof. Let A and A ′ ∈ A ⋆ denote two solutions whose corresponding utility vectors u = u(A) and u ′ = u(A) are feasible (u,u ′ ∈ F) such that u < u ′ . Given the objective function of the influence maximization is equivalent toΣ c∈C N c u c (A) and that allN c values are positive the objective values ofu ′ is strictly better thanu. Hence,W(u)<W(u ′ ) and the monotonicity is satisfied. ■ 165 As we have shown in Section 2.4, both maximin and DC can be written as constraints that are com- patible with the fairness definition in Proposition 16. The utilitarian solution corresponds to setting all the lower bounds to 0. Hence all of them satisfy monotonicity. Corollary1. DC, MMF and utilitarian satisfy monotonicity. B.3.2 Symmetry It is straightforward to show that DP (exact and approximate), maximin and utilitarian fairness satisfy the symmetry principle. DC, however, does not satisfy the symmetry principle. Based on its definition, DC can place different lower-bounds on the utility of different communities. Hence, by permuting a utility vector we may no longer be able to satisfy the DC constraints (see Definition 2.2). B.3.3 IndependenceofUnconcernedIndividuals Proposition17. Exact and approximate DP do not satisfy the independence of unconcerned individuals. Proof. Consider two utility vectors: u = ((1+3δ )/8,(1− δ )/8) andu ′ = ((1+δ )/4,(1− δ )/8) for δ ∈[0,1). Both exact and approximate DP strictly preferu overu ′ . Let us substitute the second component of both vectors by (1+δ )/4. Therefore, we obtainv = u| 2 (1+δ )/4 = ((1+3δ )/8,(1+δ )/4) and v ′ =u ′ | 2 (1+δ )/4=((1+δ )/4,(1+δ )/4). In contrast to the previous case, both approximate and exact DP preferv ′ overv. Note that while the construction does does not involve an instance of the influence maximization problem, it is possible to provide such an instance to witness the claim as follows. Figure B.5: Companion figure to Proposition 17. The network consists of two communities circle and square each of sizeN. 166 Figure B.5 demonstrates the instance witnessing the claim. We consider an influence maximization problem with two communities: circle (first component of the utility vector) and square (second component of the utility vector), each of size N (for N large enough). We assume p = 1 and K = 2. The circle community consists of three components: two star components of sizeN(1+δ )/4 (small) andN(1+3δ )/8 (large) and5N(1− δ )/8 isolated vertices. The square community consists of three components as well: two star components of sizeN(1− δ )/8 (small) andN(1+δ )/4 (large) andN(5− δ )/8 isolated vertices. Solution u (u ′ ) corresponds to selecting a seed vertex from the large (small) star component of the circle community and a vertex from the small star component of the square community. Allocationv (v ′ ) corresponds to selecting a seed vertex from the large (small) star component of the circle community a vertex from the large star component of the square community. Note that the choice of p = 1 is merely for the ease of exposition and the example network can be modified to accommodate p<1. ■ Henceforth, we only discuss utility vectors when appropriate. In all such cases, there exist instances of the influence maximization problem which witness these utility vectors. We have demonstrated one such instance in the proof of Proposition 17, but we omit the details from the remaining proofs for simplicity. Proposition18. DC does not satisfy the independence of unconcerned individuals. Proof. Consider an instance of the influence maximization problem with 2 communities were the lower bound set by DC for both communities is0.4. Also consider 2 solutions with corresponding utility vectors u = (0.5,0.5) and u ′ = (0.5,0.3). Therefore, only u satisfies DC and hence DC prefers u over u ′ . Let us substitute the first component with 0.35. Therefore, we obtain v = u| 1 0.35 = (0.35,0.5) and v ′ = u ′ | 1 0.35 = (0.35,0.3). In contrast to the previous case, both solutions are infeasible with respect to the DC (hence -∞ welfare, see Section 2.5.4). Therefore, whileW(u) > W(u ′ ), it does not hold that W(v)>W(v ′ ). ■ Proposition19. MMF does not satisfy the independence of unconcerned individuals. 167 Proof. Consider an instance of the influence maximization problem with 3 communities of equal size. Also consider 2 solutions with corresponding utility vectorsu = (0.3,0.6,0.4) andu ′ = (0.3,0.2,0.8). Maximin fairness strictly prefersu overu ′ . Let us substitute the first component with 0.1. Therefore, we obtainv = u| 1 0.1 = (0.1,0.6,0.4) andv ′ = u ′ | 1 0.1 = (0.1,0.2,0.8). Maximin fairness is indifferent between v and v ′ (both have the same worst-case utility and total utility) which shows that maximin fairness does not satisfy the independence of unconcerned individuals. ■ Note that the utilitarian satisfies the independence of unconcerned individuals because if W(u) = Σ c∈C N c u c >Σ c∈C N c u ′ c =W(u ′ ) thenW(u| c b ′ )<W(u ′ | c b ′ ) sinceN c ,u c ,u ′ c andb ′ are all non-negative. B.3.4 AffineInvariance Exact DP satisfies affine invariance principle because a linear transformation over a uniform vector will remain uniform. However, for approximate DP this is not the case. More particularly, for any utility vector u that is δ -DP for δ ∈ (0,1) and an affine transformation of the form u ′ = α u+β ,u ′ satisfies αδ -DP. Therefore, for α > 1,u ′ does not satisfy δ -DP. Similarly, DC does not satisfy this principle either. This is because after the transformation the constraints may not be satisfied (e.g., when α < 1/min c∈C U c ). It is known that MMF satisfies this principle [33]. The same holds for the utilitarian objective because if Σ c∈C N c u c >Σ c∈C N c u ′ c , thenΣ c∈C N c αu c +β > Σ c∈C N c αu ′ c +β sinceα> 0. B.3.5 InfluenceTransferPrinciple Proposition20. Exact and approximate DP do not satisfy the influence transfer principle. Proof. Letδ ∈[0,1) denote the parameter of DP. Consider utility vectorsu=((1+δ )/2,0) andu ′ =(δ, 0). The sizes of the all the communities are the same. Based on the influence transfer principle, W(u) > W(u ′ ), however, DP strictly prefersu ′ overu. ■ Proposition21. DC does not satisfy the influence transfer principle. 168 Proof. Letu = (0.5,0.5) andu ′ = (0.3,0.6) denote the utility vectors of two allocations where sizes of the all the communities are the same. Suppose the lower bounds set by DC are0.25 and0.55, respectively. This means that onlyu ′ satisfies DC. Based on the transfer principle, W(u) > W(u ′ ), however,u does not satisfy DC and DC strictly prefersu ′ overu. ■ Proposition22. MMF does not satisfy the influence transfer principle. Proof. Consider two utility vectorsu = (0.2,0.4,0.6) andu ′ = (0.2,0.2,0.8). The sizes of the all the communities are the same. A fairness notion satisfying the influence transfer principle strictly prefers u overu ′ . However, Maximin is indifferent between u andu ′ as they both obtain the same worst-case utility. ■ Proposition23. Utilitarian does not satisfy the influence transfer principle. Proof. Consider two utility vectorsu = (0.5,0.5) andu ′ = (0.3,0.7). The sizes of the all the communi- ties are the same. Based on the transfer principle, W(u) > W(u ′ ), however, the utilitarian approach is indifferent between u andu ′ as both solutions lead to the same total utility. ■ B.3.6 UtilityGapReduction Proposition24. DP satisfies the utility gap reduction if and only if δ =0. Proof. It is easy to show that ifδ = 0, DP satisfies the utility gap reduction principle. We can prove this by contradiction. Suppose that δ = 0 and DP does not satisfy the utility gap reduction principle. From this, it follows that given two utility vectorsu,u ′ such that X c∈C N c u c ≥ X c∈C N c u ′ c if∆ u < ∆ u ′ , DP can strictly preferu ′ . Ifu ′ is preferred, thenu ′ is feasible and it must be that ∆ u ′ = 0 and ∆ u < 0 which is not possible. Next, we show that if δ ̸= 0, DP does not satisfy this principle over all instances of the influence maximization problem. 169 For the proof, we use the example used in the proof of Proposition 5. In that setting there are two solutions with utility gap 0.52 and 0.5 with the same total utility. First, let us assume δ > 0.02. In this case, since both solutions satisfy DP constraints, they are both feasible and the total utility of both solutions is equal (= 180), however, DP does not strictly prefer the solution with smaller gap. In fact, both solutions are feasible with the same objective value and DP does not favor one solution to the other. Forδ ≤ 0.02, we can use the same example graph (see Figure B.1) and add enough isolated vertices to each community until the gap between the solutions becomes small enough to pass theδ threshold. ■ Figure B.6: Companion figure to Proposition 25 of a graph with two communities: N black vertices and N/3 white vertices forN =9. We chooseK =4 and arbitraryp<1. All edges are undirected, meaning that influence can spread both ways. Proposition25. DC does not satisfy utility gap reduction principle. Proof. Consider the networkG as in Figure B.6 consisting of two communities white and black with size N/3 andN, respectively. SupposeK = 4 andp < 1. Without DC, an optimal solution places one seed vertex at the center of the black group and allocates the remaining 3 vertices to the white group. We letu denote this solution. Thus, the utility of the black and white groups will be equal to(1+(N− 1)p)/N and9/N. Since there is no edge between the black and white communities, DC (See Definition 2.2) reduces to how to optimally choose one seed vertex from white and the remaining 3 from the black group. After imposing DC, the utility of the black and white groups will be equal to(3+(N− 3)p)/N and3/N. We let w denote one such solution that satisfies DC. While u has a higher total utility and a smaller utility gap, DC strictly prefersw with higher utility gap and lower total utility. Therefore, DC does not satisfy utility gap reduction principle. ■ Proposition26. MMF does not satisfy the utility gap reduction 170 Proof. We prove the statement via the example in Figure B.7 which depicts a network with three groups: blue, black and white. We fix K =1 andp>3/4. The graph corresponds to the case wherep=1 but the example will hold for arbitraryp by setting the number of isolated green vertices to be⌈21/p⌉. Figure B.7: Companion figure to Proposition 26 for the case of p=1. The network consists of three groups: white, blue and black. The edges are undirected so the influence can spread both ways. For arbitrary p, the number of isolated black vertices should scale to⌈21/p⌉. Consider one solution that targets the center of the bigger star component. Thus, the utilities of blue, black and white will be (1+4p)/11, 4p/11 and 0, respectively. This results in utility gap equal to (1+ 4p)/11. By imposing leximin, the optimal fair solution selects the center of the smaller star component and the optimal fair utilities of blue, black and white will be6p/11,p/11 and1/(1+⌈21/p⌉), respectively and we observe a utility gap 6p/11− 1/(1+⌈21/p⌉) ≥ 6p/11− 1/22 > (1+4p)/11, where we used p > 3/4 in the last inequality. In conclusion, while the first solution has a higher total utility (= 9) and lower utility gap compared to the second solution (total utility = 8), leximin still strictly prefers the second solution. This concludes the proof. ■ Proposition27. Utilitarian does not satisfy the utility gap reduction Proof. Consider an instance of the influence maximization problem where all the communities are of size 10. Letu = (0.5,0.5,0.5) andu ′ = (0.2,0.5,0.8). Bothu andu ′ achieve the same total utility (=15). Thus, the utilitarian approach is indifferent between u andu ′ . According to the utility gap reductionu is strictly preferred. We note that in this special instance, both influence transfer principle and utility gap reduction apply and according to both principlesu is strictly preferred tou ′ . ■ 171 B.4 OmittedDetailsfromSection2.6 B.4.1 EstimatingtheSBMParametersforLandslideRiskManagement In order to qualitatively and formatively describe the network structure, the research team conducted several in-person semi-structured interviews in Sitka, Alaska from 2018-2020. These interviews were con- ducted with individuals who were identified as “community leaders” or “popular community members” through word-of-mouth, and then subsequently through respondent-driven sampling, a broader range of community members were interviewed (n=14). In these semi-structured interviews, respondents were asked to 1) sort and describe community groups and 2) identify “cliques” and “isolates” as they relate to an early landslide warning system. The former resulted in developing, to the extent possible, discrete a priori community groups. The latter helped to inform the relationships between and within these groups. The in- terviewer took notes which listed the responses and through a tallying and pile sorting exercise, attempted to seek consensus in definitions of community groups. The formative research resulted in cliques based on occupation, political affiliation, age, and local recreational activities. Many cliques were overlapping with shared attributes (e.g. people from two different occupations share a political affiliation and frequent the same local pub), however for the purposes of this formative exercise, these community groups were quali- tatively coerced into discrete classifications. These resulted in 16 community groups that include political affiliation, time spent in Sitka (e.g. new arrival or tourist vs. long-term resident), occupation, and whether or not a parent of a child in the public-school system. The community size estimates were developed based on a 2018 Sitka Economic Development Survey, particularly for the occupation-based community groups, as well as publicly available voter records for political affiliations. Several attributes, namely age, specific occupation, time spent in Sitka, and parental status were unavailable in existing datasets, and therefore required the use of proxies and assumptions for estimating community group sizes. Once the community 172 group sizes were estimated, based on the formative research notes on social cohesion, cliques, and iso- lates, we further developed assumptions on within and between-community connectedness. For example, if a respondent suggested that there may be very close relationships between two cliques, we assumed a higher relative p(b) than between two cliques which had less similar attributes. For simplicity, we limited the absolute probabilities for withing-community and between-community probabilities between 0.00 and 0.10. We then sense-checked these absolute probabilities with several of the initial formative research re- spondents. These absolute probabilities were then organized into a16× 16 adjacency matrix to facilitate simulations for influence maximization. B.4.2 RelativeCommunitySizes 0 10 20 30 −9 −7 −5 −2 0 0.5 DC MMF Method Utility Gap (%) Ratio 1 3 5 7 9 0 1 2 3 4 −9 −7 −5 −2 0 0.5 DC MMF Method PoF (%) Ratio 1 3 5 7 9 Figure B.8: Utility gap and PoF for various relative community sizes where the ratio changes from 1 to 9. In this section we study the effect of relative community size on both utility gap and efficiency. We consider synthetic samples of SBM network consisting of two communities each of size 100 and we grad- ually increase the size of one community from 100 to 900 in order to study its effect on the utility values of each community. We setq c =0.005 andq cc ′ =0.001. Results are summarized in Figure B.8. This result indicates that the utility gap increases with the relative community size, suggesting that minorities can be adversely affected without appropriate fairness measures in place. We also note that the strength of our approach is in its flexibility to trade-off fairness with efficiency. We may encounter scenarios where 173 the fairness-efficiency trade-offs are mild (as in the particular setting of Figure B.8), but this does not un- dermine our approach as there are many practical situations (as discussed in real-world applications in the paper) where clearly this is not the case and our approach can handle all those cases effectively. DC exhibits relatively high utility gap. This is because by definition DC allocates more resources to communi- ties that “will do better with the resources” and it does not always show an aversion to inequality, a result which we show theoretically in Section 2.5.4. B.4.3 SuicidePreventionApplication Network Name # of Vertices # of Edges White Black Hispanic Mixed Race Other W1MFP 219 217 16.4 41.5 20.5 16.4 5.0 W2MFP 243 214 16.8 36.6 21.8 22.2 2.4 W3MFP 296 326 22.6 34.4 15.2 22.9 4.7 W2SPY 133 225 55.6 10.5 – 22.5 11.3 W3SPY 144 227 63.0 – – 16.0 20.0 W4SPY 124 111 54.0 16.1 – 14.5 15.3 Table B.1: Racial composition (%) after pre-processing as well as the number of vertices and edges of the social networks [17] Measure (%) Fairness Approach K α =− 5 α =− 2 α =0 α =0.5 α =0.9 DC Maximin IM utility gap 5 4.8 7.5 8.5 9.7 11.4 8.2 3.5 12.5 10 4.6 6.6 7.3 9.5 12.9 6.9 2.0 11.7 15 3.6 5.2 5.9 8.9 13.5 7.6 2.4 15.3 20 3.6 4.4 5.9 7.3 14.0 5.8 2.3 17.2 25 2.6 3.5 4.6 5.8 13.2 7.0 2.0 16.6 30 2.4 3.2 4.3 6.4 8.6 8.2 2.0 15.7 PoF 5 6.9 5.1 3.4 1.5 0.7 14.4 16.6 0.0 10 6.6 3.8 2.8 0.7 0.1 14.3 13.1 0.0 15 3.8 2.5 1.6 1.1 0.1 14.6 10.5 0.0 20 4.6 3.8 2.9 2.0 1.0 13.9 10.5 0.0 25 4.0 3.2 2.5 1.9 1.0 13.4 9.9 0.0 30 3.9 3.4 2.9 2.3 1.8 12.7 10.8 0.0 Table B.2: Summary of the utility gap and PoF results averaged over 6 different real world social networks for various budget, fairness approaches and baselines. Numbers in bold highlight the best values in each setting (row) across different approaches. Influence maximization has been previously implemented for health promoting interventions among homeless youth [188, 185]. In this section, we consider the problem of training a set of individuals who can 174 share information to help prevent suicide (e.g., how to identify warning signs of suicide among others). We present simulation results over six different social networks of homeless youth from a major city in US as described in [17]. We provide aggregate summaries of these networks (e.g., size, edge density and community statistics) in Table B.1. The data set consists of six different social networks of homeless youth from a major city in US as described in detail in [17]. Each social network consists of 7 racial groups, namely, Black or African American, Latino or Hispanic, White, American Indian or Alaska Native, Asian, Native Hawaiian or Other Pacific Islander and Mixed race . Each individual belongs to a single racial group. We use these partitioning by race to define our communities. However, to avoid misinterpretation of the results, we combine racial groups with a population<10% of the network sizeN under the “Other” cate- gory. After this pre-processing step, each dataset will contain 3 to 5 communities. Results are summarized in Table B.1. We remark that the absent of a racial category in a given network is due to their small sizes and hence being merged into the “Other" category after pre-processing (e.g., Hispanic in network W2SPY.) We compare our welfare-based framework for different values of α against DC, MMF and influence maximization without fairness considerations (IM). Table B.2 provides a summary of the results averaged over all network instances where the numbers in bold highlight the best values (minimum utility gap and PoF) for each budget and across different fairness approaches. As seen, IM typically has a large utility gap (up to 17.2% for K = 20 which is significant because the total influence is only 28.40%). By imposing fairness we can reduce this gap. In fact, we observe that across different values of α ranging from -5 to 0.5, there is a decreasing trend in utility gap, where forK =20 and withα =− 5, we are able to decrease the utility gap by3.6%. Consistent with previous results on SBM networks, both MMF andα = − 5 exhibit very low utility gaps, however, MMF results in higher PoF. Furthermore, across the range ofα we observe a mild trade-off between fairness and utility. This shows that in these networks enforcing fairness comes at a low cost, though as we see in the landslide setting, this is not always the case. 175 0 5 10 15 W1MFP W2MFP W3MFP W2SPY W3SPY W4SPY Utility Gap (%) Maximin α = −5 α = −2 α = 0 α = 0.5 α = 0.9 DC 0 5 10 15 20 W1MFP W2MFP W3MFP W2SPY W3SPY W4SPY PoF (%) Maximin α = −5 α = −2 α = 0 α = 0.5 α = 0.9 DC Figure B.9: Top and bottom panels: utility gap and PoF for each real world network instances (K =30). Figure B.9 shows the results for each network separately (X axis) for a fixed budget K =30. Figure B.9 shows that the trade-offs can also be very network-dependent (compare e.g. W2SPY and W3MFP). This highlights the crucial need for a flexible framework that can be easily adjusted to meaningfully compare these trade-offs. 176 AppendixC TechnicalAppendixtoChapter3 C.1 ProofofProposition8 Proof. We let theX q =1 be the event whereP(X)=q (X q =0 otherwise). It holds that: V(π M )=E " X r∈R π (r|X)Y(r) # = X q∈Q P(X q =1)E " X r∈R π (r|X)Y(r) X q =1 # = X q∈Q P(X q =1)E " X r∈R π (r|X)(Y(r)− Y(0)) X q =1 # + P(X q =1)E " X r∈R π (r|X)Y(0) X q =1 # = X q∈Q P(X q =1)E " X r∈R π (r|X)(Y(r)− Y(0)) X q =1 # +P(X q =1)E[Y(0)|X q =1] = X q∈Q P(X q =1)E " X r∈R π (r|X)(Y(r)− Y(0)) X q =1 # +C = X q∈Q P(X q =1) X r∈R π (r|X)E[Y(r)− Y(0)|X q =1]+C = X q∈Q λ q λ Q X r∈R f qr λ q τ qr +C = X q∈Q X r∈R f qr τ qr λ Q +C, 177 whereC = X q∈Q P(X q =1)E[Y(0)|X q =1]=E[Y(0)]. ■ C.2 ProofofProposition9 Proof. We first prove part one and show the conditional independence for each component Y r of the poten- tial outcome vector. The proof is in the same vein as the balancing scores in the causal inference literature which is essentially a low-dimensional summary of the feature space that facilitates causal inference for observational data in settings with many features. For binary potential outcomes, we have P(Y r =1|S,R)=E[Y r |S,R] =E[E[Y r |S,R,X]|S,R] =E[E[Y r |S,X]|S,R] =E[E[Y r |X]|S,R] =E[S r |S,R] =S r , where the third line follows the assumption of the proposition and the fourth line holds sinceS is essen- tially a function ofX and can be dropped. We also show P(Y r =1|S)=E[Y r |S] =E[E[Y r |S,X]|S] =E[E[Y r |X]|S] =E[S r |S] =S r . 178 We provedP(Y r =1|S,R)=P(Y r =1|S). We now prove the second part of the proposition. P(P(R=r|X=x)>0)=1⇒P(P(R=r,X=x)>0)=1 P(P(R=r,X=x)>0)=P(P(R=r,X=x,S=s)>0) ≤ P(P(R=r,S=s)>0). It follows thatP(P(R=r,S=s)>0)=1 for all values ofs. ■ C.3 ComputationalResults HMISDataPreparation. We used HMIS dataset collected between 2015 and 2017 across 16 communities in the United States. The dataset contains 10,922 homeless youth and 3464 PSH and RRH resources com- bined. We removed all those with veteran status (54 data points), pending and unknown outcomes (4713 data points). We grouped Hawaiian/Pacific Islander, Native American, Hispanics, Asian under ‘Other’ cat- egory as no significant statistical inference can be made on small set of observations within each individual category. Further, we removed 6 data points with no gender information. We use a median date 08/13/2015 as the cut-off date to separate train and test sets. Outcome Estimation. Figure C.1 depicts the average outcome across different score values E[Y(r) | S = s] ∀r ∈ R, using the DR estimate. Under SO, after S = 8, there is a significant drop in average outcome. Average outcomes under PSH and RRH also exhibit a decline with score. However, they remain highly effective even for high-scoring youth. 179 PropensityScore. In order to evaluate different policies using IPW and DR methods, we estimated the propensity scores, i.e., π 0 (R = r | X = x). Table C.1 summarizes the accuracy across different models. We consider two models, one that uses only the NST score and one that uses the entire set of features in the data. We observe that, even though the policy recommendations only use NST score, including other features help improve the accuracy. In addition, the decision tree and random forest are the top- performing models. Although random forest exhibits over-fitting (in-sample accuracy = 99.6%) its out-of- sample accuracy (79.3%) outperforms other models. In addition to accuracy, the propensity models should be well-calibrated. That is, the observed probability should match the predicted probability. We plot the reliability diagrams in Figure C.2, wherey− axis is the observed probability in the data and thex− axis is the predicted value. The dots correspond to values of different bins. A well-calibrated model should lie on they =x diagonal line. As seen in Figure C.2, random forest and neural network models have relatively better calibration property. Finally, in our model selection, we take fairness considerations into account. In particular, we study the calibration of the models across different demographic groups for which fair treatment is im- portant. Since ultimately we use the probability estimates, not the binary prediction, it is important to ensure that across different demographic groups, the models are well-calibrated. We adopted test-fairness notion [51]. We fit a model to predict the resource one receives, based on the predicted propensities and 0.00 0.25 0.50 0.75 1.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 NST Vulnerability Score Average Outcome SO RRH PSH Figure C.1: Probability of exiting homelessness across the NST score range estimated using the DR method. 180 Model In-Sample Accuracy (%) Out-of-Sample Accuracy (%) NST Score Multinomial Regression 72.5 73.7 Neural Network 76.4 76.5 Decision Tree 76.3 76.2 Random Forest 76.4 76.3 All Features Multinomial Regression 75.4 73.5 Neural Network 80.4 77.2 Decision Tree 79.2 78.5 Random Forest 99.7 79.3 Table C.1: Prediction accuracy for propensity estimation using HMIS data. 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Multinomial Regression Observed Probability 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Neural Network 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Decision Tree 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Random Forest Estimated Probability (RRH) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Multinomial Regression Observed Probability 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Neural Network 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Decision Tree 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Random Forest Estimated Probability (PSH) Figure C.2: Reliability diagram of propensity estimation, RRH (top) and PSH (bottom). demographic features. In a well-calibrated model across demographic groups, the coefficients of the de- mographic attributes should not be statistically significant in the prediction. For the predicted values of the random forest model none of the demographic attributes coefficients were found to be statistically sig- nificant. In addition, the model were calibrated within groups with coefficient near 1. Regression results are summarized in Table C.2. Hence, we chose random forest as the model of historical policyπ 0 . 181 Coeffs. Estimates p-value Intercept -0.012 0.066 PSH pred. 0.985 <2e-16 Race = 2 0.011 0.204 Race = 3 0.002 0.803 Gender = 2 -0.006 0.469 Age = 2 0.006 0.485 Coeffs. Estimates p-value Intercept -0.054 5.7e-05 RRH pred. 1.125 < 2e-16 Race = 2 -0.007 0.586 Race = 3 -0.014 0.394 Gender = 2 0.000 0.987 Age = 2 -0.003 0.813 Table C.2: Propensity calibration within group for PSH (left) and RRH (right) of random forest model. None of the coefficients of the demographic attributes are found to be significant. In addition, the coefficient associated with the predicted probability is close to 1 in both models, suggesting that the model is well- calibrated even when we control for the demographic attributes. OutcomeEstimation. In the direct method, one estimates the (counterfactual) outcomes under different resources by fitting the regression models P(Y |X=x,R=r) ∀r∈R. For model selection, we followed the same procedure as propensity score estimation. Table C.3 summarizes the accuracy of different models for each type of resource. Model PSH RRH SO NST Logistic Regression 83.1 78.8 90.0 Neural Network 83.9 78.9 90.0 Decision Tree 83.9 78.9 90.0 Random Forest 83.1 78.6 90.0 NST + Demographic Logistic Regression 83.1 78.8 90.0 Neural Network 81.6 78.3 90.3 Decision Tree 83.9 78.8 90.0 Random Forest 83.9 78.1 90.0 All Features Logistic Regression 81.9 82.2 90.3 Neural Network 83.9 78.8 86.8 Decision Tree 74.3 81.1 90.0 Random Forest 83.9 81.4 90.0 Table C.3: Out-of-Sample Accuracy (%) of different outcome estimation models (outcome definition in Figure 3.4). Considering the reliability diagrams in Figure C.3, we observe that logistic regression models are well- calibrated across different resources. We also investigated test-fairness of logistic regression where we fit 182 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Multinomial Regression Observed Probability 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Neural Network 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Decision Tree 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Random Forest Estimated Probability (SO) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Multinomial Regression Observed Probability 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Neural Network 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Decision Tree 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Random Forest Estimated Probability (RRH) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Multinomial Regression Observed Probability 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Neural Network 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Decision Tree 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Random Forest Estimated Probability (PSH) Figure C.3: Reliability diagram of outcome, SO (top), RRH (middle) and PSH (bottom). the observed outcome against the predicted outcome and demographic features. Results are summarized in Table C.4. As seen, the coefficients of demographic features are not significant, suggesting that test- fairness is satisfied. OptimalMatchingTopologyforFairnessoverAge. Figure C.4 depicts the policies when fairness over age is imposed. According to this figure, across all score values youth below 17 years are eligible for PSH. On the other hand, mid- and high-scoring youth over 17 years old, are eligible for PSH. We further imposed constraints to ensure within each score group, the connections are the same for different age groups. Figure C.5 illustrates the resulting matching topology, according to which individuals who score above 7 are eligible for RRH and PSH, regardless of their age. Those who score 6 are eligible 183 Coeffs. Estimates p-value PSH Intercept 0.147 0.489 PSH pred. 0.853 0.000 Race = 2 -0.021 0.666 Race = 3 -0.061 0.324 Gender = 2 0.003 0.954 Age = 2 0.079 0.202 RRH Intercept -0.122 0.645 RRH pred. 1.172 0.000 Race = 2 0.028 0.386 Race = 3 0.025 0.504 Gender = 2 -0.021 0.433 Age = 2 0.003 0.931 SO Intercept 0.035 0.148 SO pred. 0.974 <2e-16 Race = 2 -0.000 0.973 Race = 3 0.023 0.226 Gender = 2 -0.008 0.618 Age = 2 -0.011 0.542 Table C.4: Outcome calibration of logistic regression model within group under PSH, RRH and SO. None of the coefficients of the demographic attributes are found to be significant. In addition, the coefficient associated with the predicted probability is close to 1 in both models, suggesting that the model is well- calibrated even when we control for the demographic attributes. 1 > 7 6 < 6 age 17 ≤ age 17 > age 17 > age 17 ≤ Age - dependent on age col (SO) SO 1 > 7 6 < 6 age 17 ≤ age 17 > age 17 > age 17 ≤ Age - dependent on age col (RRH) RRH 1 Age - dependent on age col (PSH) > 7 6 < 6 age 17 ≤ age 17 > age 17 > age 17 ≤ PSH Figure C.4: The matching topology split by resource type: left (SO), middle (RRH) and right (PSH). The solid line indicates that the resource is connected to the entire queue. The dotted line indicates connection to a sub-group within the queue, e.g., SO is only connected to the individuals with NST = 6 and age>17. 1 > 7 < 6 6 Age - not dependent on age col SO RRH PSH Figure C.5: Fair topology (age) for all three resource types. Finally, All youth with score below 6 are only eligible for SO. We observe that all individuals who belong to a certain queue, regardless of their age, are eligible for the same types of 184 resources. As a result of combining the queues that depended on age, the worst-case policy value across the age groups decreased from 0.74 to 0.69 which still outperforms the SQ (data) with worst-case performance of 0.64. 185
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Integer optimization for analytics in high stakes domain
PDF
Artificial intelligence for low resource communities: Influence maximization in an uncertain world
PDF
Artificial Decision Intelligence: integrating deep learning and combinatorial optimization
PDF
Hierarchical planning in security games: a game theoretic approach to strategic, tactical and operational decision making
PDF
Integrating annotator biases into modeling subjective language classification tasks
PDF
Emergence and mitigation of bias in heterogeneous data
PDF
Not a Lone Ranger: unleashing defender teamwork in security games
PDF
Controlling information in neural networks for fairness and privacy
PDF
Real-world evaluation and deployment of wildlife crime prediction models
PDF
When AI helps wildlife conservation: learning adversary behavior in green security games
PDF
Toward sustainable and resilient communities with HCI: physical structures and socio-cultural factors
PDF
Protecting networks against diffusive attacks: game-theoretic resource allocation for contagion mitigation
PDF
Modeling information operations and diffusion on social media networks
PDF
Generating and utilizing machine explanations for trustworthy NLP
PDF
Data-driven methods for increasing real-time observability in smart distribution grids
PDF
The interpersonal effect of emotion in decision-making and social dilemmas
PDF
Realistic and controllable trajectory generation
PDF
Learning fair models with biased heterogeneous data
PDF
Fair Machine Learning for Human Behavior Understanding
PDF
Practice-inspired trust models and mechanisms for differential privacy
Asset Metadata
Creator
Rahmattalabi, Aida
(author)
Core Title
Towards trustworthy and data-driven social interventions
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Degree Conferral Date
2022-08
Publication Date
07/29/2022
Defense Date
04/26/2022
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
algorithmic fairness,causality,data-driven decision-making,machine learning,OAI-PMH Harvest,social network
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Vayanos, Phebe (
committee chair
), Dilkina, Bistra (
committee member
), Rice, Eric (
committee member
), Shahabi, Cyrus (
committee member
), Tambe, Milind (
committee member
)
Creator Email
ida.rahmattalabi@gmail.com,rahmatta@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC111375609
Unique identifier
UC111375609
Legacy Identifier
etd-Rahmattala-11040
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Rahmattalabi, Aida
Type
texts
Source
20220729-usctheses-batch-963
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
algorithmic fairness
causality
data-driven decision-making
machine learning
social network