Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Fairness in machine learning applied to child welfare
(USC Thesis Other)
Fairness in machine learning applied to child welfare
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
i
FAIRNESS IN MACHINE LEARNING APPLIED TO CHILD WELFARE
By
Eunhye Ahn, MSW
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(SOCIAL WORK)
May 2022
ii
Acknowledgements
I would like to first acknowledge and thank the First 5 Orange County and the USC
Graduate School for generously funding my dissertation. The First 5 Orange County’s dedication
to improving health outcomes of children has inspired this research, and thus they were an
essential part of my dissertation.
I am deeply indebted to all five members of my committee. This work would not have
been possible without the invaluable advice, unwavering support, and incredible patience of my
mentors, Jacquelyn McCroskey and Emily Putnam-Hornstein. Their immense knowledge and
plentiful experience have encouraged me in all the time of my academic research and daily life. I
would also like to extend my deepest gratitude to Jungeun Olivia Lee for her treasured guidance
in teaching and research. I am also grateful to Fred Morstatter and Rebecca Rebbe for their
mentorship throughout my dissertation.
This dissertation would not have been possible were it not for the tremendous
contributions of child welfare social workers who shared their experiences and invaluable
insights. I would also like to express my appreciation to Debra Waters-Roman for her insightful
suggestions for my research.
The Children’s Data Network was a critical part of my dissertation. Without the CDN’s
tremendous collaborative arrangements and infrastructure, my work would have not been
possible. I am deeply appreciative of what Emily and Jacquelyn have built. They are pioneers in
linking and analyzing administrative data and their reputation as a respected and trusted
researcher throughout the state, and nationally, is beyond compare.
I would also like to extend my appreciation to each member of the Children’s Data
Network for their encouragement and support. A special thank you to John Prindle for
iii
constructive advice and mentorship since we met first time at a small café in Melbourne and to
Regan Foust for her unparalleled encouragement and support. I would also like to thank Lillie
Guo for her efforts to code qualitative data.
I wish to thank several individuals who have informed my research and my perspective
during my doctoral training. These individuals include: Julie Cederbaum, Yolanda Gil, Michael
Hurlburt, and Shinyi Wu. Yolanda Gil, in particular, has shaped my understanding of human-
centered data science.
I am deeply appreciative of my student colleagues and friends, Amy, Jessie, Monique,
Sara, and Yoewon, for sharing this journey with me.
I am humbled by the support my friends and family have given me over these last years.
A special thank you to Sungeun and Tanya for believing in me and to Siddharth and Idli for
holding my hand to the finish line. I must also acknowledge my mother, who has been amazingly
loving, caring, and supportive throughout my journey. I also wish to thank my father, brother,
and Ruby for their unwavering love.
iv
Table of Contents
Acknowledgements ......................................................................................................................... ii
List of Tables .................................................................................................................................. v
List of Figures ................................................................................................................................ vi
Abstract ......................................................................................................................................... vii
Chapter 1: Introduction and Rationale ............................................................................................ 1
Background ............................................................................................................. 4
Chapter 2: (Study 1) Qualitative Exploration of Child Welfare Workers’ Decision-Making
Experiences and Perspectives on Fairness .................................................................................... 15
Introduction ........................................................................................................... 15
Methods................................................................................................................. 20
Results ................................................................................................................... 24
Discussion ............................................................................................................. 35
Chapter 3: (Study 2) Application of Machine Learning to Children and Family Services
Considering Fairness ..................................................................................................................... 40
Introduction ........................................................................................................... 40
Methods................................................................................................................. 53
Results ................................................................................................................... 59
Discussion ............................................................................................................. 65
Chapter 4: (Study 3) Intersectional Fairness in Machine Learning Applied to Children and
Family Services ............................................................................................................................. 70
Introduction ........................................................................................................... 70
Methods................................................................................................................. 75
Results ................................................................................................................... 78
Discussion ............................................................................................................. 85
Chapter 5: Implications and Future Directions ............................................................................. 89
Major Findings ...................................................................................................... 89
Implications for Policy and Practice ..................................................................... 91
Implications for Research ..................................................................................... 93
Conclusion ............................................................................................................ 94
References ..................................................................................................................................... 95
v
List of Tables
Table 1.1: Confusion Matrix: A Summary of Prediction Results ................................................... 9
Table 2.1: Focus Group Questions................................................................................................ 23
Table 2.2: Themes Emerged in Focus Groups .............................................................................. 24
Table 3.1: Machine Learning Predicting Features and Bridges Risk Score Factors .................... 57
Table 3.2: Maternal and Birth Characteristics and CPS Outcomes of Children ........................... 60
Table 3.3: Number of Births, Bedside Screening, And Substantiation by Hospital ..................... 61
Table 3.4: Recall Scores at k% ..................................................................................................... 63
Table 3.5: Recall Scores at Screen-in% by Hospital .................................................................... 64
Table 4.1: Descriptive Summary of Mothers who Gave Birth between 2011-2016 .................... 80
Table 4.2: False-negative Rates between the Baseline Model and Machine Learning Model ..... 81
Table 4.3Multivariate Analysis of False-Negatives among Children with Substantiation ........... 82
Table 4.4: Descriptive Summary of Children Born to Hispanic Mothers .................................... 83
vi
List of Figures
Figure 3.1: Bridges Maternal Child Health Network Service Flow ............................................. 52
Figure 3.2: Distributions of Predicted Risk Scores ....................................................................... 62
Figure 3.3: Recall and Precision Scores at k% ............................................................................. 63
Figure 3.4: Recall Scores of Baseline and LightGBM models at k% by Hospital ....................... 65
Figure 4.1: Predicted Risk Scores among Children Born to Hispanic Mothers ........................... 84
Figure 4.2: False-Negative Rates among Children Born to Hispanic and White Mothers ........... 85
vii
Abstract
Assessing and reducing the risk of child maltreatment has been a primary concern for
child welfare workers and agencies. As supporting children and families in need involves a wide
range of decisions to be made, the use of machine learning to inform decision-making has
received increasing attention. Yet, much remains to be explored in the area of applying machine
learning to child welfare with a focus on fairness. The objective of this dissertation is to generate
knowledge that will guide the fair and ethical use of machine learning to inform decision-making
in child welfare.
Study1: Qualitative Exploration of Child Welfare Workers’ Decision-Making Experiences
To ensure machine learning serves the need of human users, machine learning models
should be built and deployed based on users’ experiences. While the characteristics of decision-
making in the child welfare systems have been well established in the literature, less is known
about the experiences of child welfare workers. To understand and incorporate the perspectives
of child welfare workers into machine learning applications, this study used focus groups to learn
about their decision-making experiences and perception of fairness. From the thematic analysis,
several findings emerged offering practical implications for applying machine learning to child
welfare considering fairness. Machine learning models should be built and deployed considering
the complex, nonlinear decision-making process in child welfare that can be influenced by
narrative-based, nuanced information. It is also essential to operationalize the core values of
child welfare and incorporate them into decision-making informed by machine learning while
carefully considering related liability and accountability issues.
viii
Study 2: Building a Machine Learning Model Examining Potential Biases
A primary concern about algorithmic intervention has been the potential of inadvertently
perpetuating racial, socioeconomic, and other biases in our society by making predictions based
on data ladened with human prejudice. These biases may be further complicated by the unique
nature of child welfare, such as clients’ vulnerability, racial disproportionality, and the core
values that may conflict. To advance the understanding of using machine learning fairly and
equitably, this study adopted a real-world use case to provide an illustrative example of building
a machine learning model examining fairness. First, this study identified potential biases that
may emerge in each step of machine learning modeling and addressed how they might be further
complicated in the context of child welfare. Then, in response to the question of the First 5
Orange County’s Bridges Maternal Child Health Network (MCHN) program, the study
developed a machine learning model that assesses the need for home visiting services among
families with a newborn by using a CPS outcome as a proxy for familial circumstances. The
findings suggested that the machine learning models built in this study could significantly
improve the assessment process by identifying more children and families in need. This points to
opportunities to use machine learning to enhance equitable access to current home visiting
services.
Study 3: Evaluating a Machine Learning Model Considering Intersectional Fairness
As machine learning has received increasing attention for its potential to inform decision-
making in child welfare, a growing body of literature has discussed various aspects of its ethics.
Yet, limited empirical work has been done on evaluating fairness in machine learning applied to
child welfare. Moreover, machine learning fairness has been examined considering a single
attribute, race/ethnicity, even if various attributes may have intertwined impacts on children and
ix
families. To illustrate the process of examining the fairness of a machine learning model using a
real-world use case, this study conducted a fairness analysis on the machine learning model
developed in Study 2. Firstly, this study identified a relevant fairness measure considering the
service contexts and goals of the Bridges MCHN program. Using the measure, this study
examined whether the machine learning model treats children differently depending on their
maternal and birth characteristics. Then, the study drew upon the idea of intersectionality and
tested whether the intersectionality of maternal race/ethnicity and nativity is associated with
model performance. Compared to the Bridges pre-screening tool, the machine learning model
could successfully identify more children who would experience substantiated maltreatment in
general, but particularly for those born to mothers who were Black, under age 20, and without
paternity established at birth. However, the model was found to be less effective in assessing the
risk for children born to foreign-born Hispanic mothers.
1
CHAPTER 1: Introduction and Rationale
Every year, over seven million children in the United States are reported to the child
protection system for alleged abuse and neglect (U.S. Department of Health & Human Services,
2021). Assessing risk to those children’s safety and reducing the risk of child maltreatment has
been a primary concern in child welfare and other child and family-serving systems (Berrick,
2018). To understand and respond to the needs of those children and families who have or may
come to the attention of the system, child welfare workers and agencies make several difficult
decisions daily, from assessing familial circumstances to planning for children’s futures
(Benbenishty & Fluke, 2020). Complex needs of children and families, varying contexts and
constraints of each child welfare agency, and ambiguous and often inadequate information
available in the field make decision-making challenging for child welfare professionals who are
already overburdened (Benbenishty & Fluke, 2020; Berrick, 2018).
Recent advances in data technologies have shown the potential for using machine
learning approaches to support child welfare workers with their decision-making (Coulton et al.,
2015). In child welfare, machine learning has been primarily adopted as predictive analytics,
shedding light on the early identification of children and families in need (Allegheny County
Department of Human Services, 2019; Schwartz et al., 2017a). It has been used to assist child
welfare professionals with assessing the risk of child welfare outcomes, such as foster care
placement (Chouldechova, 2017), maltreatment re-reports (Schwartz et al., 2017a; Shroff, 2017),
and adverse birth outcomes (Pan et al., 2017).
Despite the insights machine learning has offered to both researchers and professionals in
the child welfare domain, its ethics are still in question. A primary concern has been the potential
of inadvertently perpetuating racial, socioeconomic, and other biases present in our society by
2
making predictions based on data ladened with human prejudice (Barocas & Selbst, 2016a). In
addition, the unique nature of child welfare—clients’ vulnerability, racial disparity, stigma, and
conflicting interests (i.e., protecting child’s rights by restricting parent’s rights)—further
complicates understanding the ethical application of machine learning (Coulton et al., 2015;
Lanier et al., 2019).
As part of a broader effort to understand and evaluate algorithmic bias and fairness, a
burgeoning body of machine learning literature has proposed various definitions of fairness using
statistical approaches (Barocas & Selbst, 2016a; Binns, 2018; Chouldechova & Roth, 2018); at a
glance, those fairness notions and metrics seem to be easily plugged into machine learning
applied in child welfare. However, the ideas of fairness proposed by computer scientists have
been criticized because they often assume a simple, closed system, overlooking the direct and
indirect impacts of algorithmic interventions on individuals (Ahmad et al., 2020; McCradden et
al., 2020). Furthermore, little empirical work has been done by applying fairness metrics to real-
world case scenarios (Saleiro et al., 2019). There is much to be explored about the practical
limitations and implications of various fairness notions proposed by the machine learning
community considering the domain-specific factors of child welfare, including core values,
workflow, policy, and culture.
This dissertation seeks to advance our understanding of using machine learning to inform
decision-making in child welfare in a fair and equitable manner. The primary objective is to
examine machine learning approaches and related fairness issues, considering the unique
characteristics of child welfare. Firstly, this dissertation qualitatively analyzed decision-making
experiences among child welfare workers and their perspectives on fairness. Then, it used a real-
world child welfare problem to provide an illustrative example of developing a machine learning
3
model considering potential algorithmic biases that may emerge in each element of machine
learning. Lastly, using the machine learning model developed in the second study, this research
demonstrated each step of examining machine learning fairness considering intersectionality and
discussed practical implications.
4
Background
Child Welfare Service
Child welfare service responds to a variety of children and family’s needs, ensuring that
all children live in safe, permanent, and stable environments that support their well-being
(Adoption and Safe Families Act of 1997, P.L. 105-89). Through an array of public and private
services, encompassing prevention, intervention, and treatment, child welfare services interact
with an entire family or a child directly (National Association of Social Workers, 2013). In the
United States, the focus of the child welfare system has been on child protection; there exists a
relatively high threshold for intervention in a family and a more limited mix of services offered
to the family (Berrick et al., 2015). The primary concern has been assessing risk to a child’s
safety from the family or caregivers and providing risk-averse practice reactively following a
report of child maltreatment allegations (Lonne et al., 2008a; Pecora et al., 2018). Therefore, the
notions of promoting safety, permanency, and well-being of children may have been more
aspirational than real, because the long-standing mission of the United States’ child welfare
services has been serving the needs of children, specifically, those who come to the attention of
the child protection systems (Berrick, 2018).
Since the enactment of the Family First Prevention Services Act (FFPSA) as part of
Public Law (P.L. 115-123) in 2018, the child welfare systems have shifted their efforts from
child protection and foster care services to more prevention-focused child and family well-being
services. The Title IV-E Prevention Program, established by the FFPSA, gives states the option
to use child welfare programming funds for evidence-based prevention services, such as mental
health care, substance abuse prevention and treatment, and in-home parent skills training. Given
this change, this study uses the term “child welfare” to refer to child and family welfare services.
5
Using Data to Inform Decision-Making in Child Welfare
While using data to inform decision-making in child welfare has been a matter of
continuing concern, approaches and methods have changed over the decades (Benbenishty &
Fluke, 2020). Actuarial assessment, a statistical approach that uses specific and measurable
variables from validated data, has been a primary approach to inform decision-making in child
welfare. Studies have shown that actuarial assessment can be helpful when used properly in
conjunction with clinical judgment (Cuccaro-Alamin et al., 2017; Dawes et al., 1989; Shlonsky
& Wagner, 2005). One of the most widely adopted actuarial tools in child welfare is Structured
Decision-Making (SDM), a case management system designed to assist child welfare workers in
making decisions about the maltreatment risk (Child Welfare Information Gateway, n.d.-a;
Johnson, 2004).
Along with advances in digital technology and the size and richness of data in the public
sector, the potential for using algorithmic tools, such as machine learning, has increased in the
child welfare arena (Cuccaro-Alamin et al., 2017; Drake & Jonson-Reid, 2019). Similar to
actuarial tools, machine learning finds relationships between predictors and an outcome using
algorithmic approaches to improve the accuracy in a range of decision-making, including
prioritizing, classifying, associating, and filtering (Hastie et al., 2009). Over the past decade,
machine learning has been actively explored and used in various fields—from optimizing the
medical scoring systems (Ustun & Rudin, 2016) to predicting the criminal justice recidivism
(Berk & Bleich, 2013).
In child welfare, machine learning has been primarily adopted as predictive analytics,
offering new insights into service areas where early identification of children and families in
need is of particular interest. Based on existing data, machine learning has been utilized to assess
6
future risk of adverse events, including foster care placement (Chouldechova et al., 2018),
maltreatment re-reports (Schwartz et al., 2017b; Shroff, 2017), substantiation of maltreatment
(Vaithianathan et al., 2013), hospital injury encounters (Vaithianathan et al., 2020), exiting foster
care without permanency (Elgin, 2018; Ahn et al., 2021), experiencing chronic homelessness
among transitional age youth (Chan et al., 2018), and adverse birth outcomes (Pan et al., 2017).
Machine learning has also been used to detect patterns in natural language (i.e., text data); Perron
et al. (2019) identified substance-related problems in maltreatment investigation narratives, and
Amrit et al. (2017) located child abuse in children’s health records
1
.
Ethics and Fairness in Machine Learning Applied to Child Welfare
Despite the insights machine learning may offer to the child welfare domain, the ethics of
its use are still in question
2
. Studies have indicated that ethical concerns can arise in any element
of machine learning, from data collection to model deployment (Barocas & Selbst, 2016a; Dare,
2015; Mehrabi et al., 2019; Suresh & Guttag, 2020), and that they may become more nuanced
and complicated because of the unique nature of child welfare (e.g., client vulnerability, racial
and ethnic disparity, stigmatization, and conflicting interests) (Coulton et al., 2015; Dare, 2015).
Below are a few primary areas where ethical issues may emerge in machine learning applied to
child welfare.
1
More studies that apply a computational approach to explore issues in the child welfare systems are introduced in
the literature review conducted by Saxena and colleagues (2020).
2
A broad range of ethical issues that may emerge in the application of predictive modeling to child welfare has been
well established in peer-reviewed papers (Brown et al., 2019; Church & Fairchild, 2017; de Haan & Connolly, 2014;
Drake et al., 2020; Gillingham, 2016; Glaberson, 2019; Keddell, 2015, 2019; Krakouer et al., 2021; Lanier et al.,
2019; Russell, 2015), program evaluation reports (Dare, 2015; Dare & Gambrill, 2017), gray papers (Capatosto,
2017; Chadwick Center and Chapin Hall, 2018; Drake & Jonson-Reid, 2019; Roberts et al., 2018), and a book
chapter (Dare, 2015). These ethical issues include the effectiveness of predictive modeling, algorithmic
accountability, transparency and interpretability, data privacy, and algorithmic interventions' direct and indirect
impacts on children and families.
7
Data Bias
A primary concern has been the potential that machine learning models might
discriminate against minority groups based on individual attributes, such as race and ethnicity,
by encoding human bias from the real-world data (Capatosto, 2017; Eubanks, 2019; Hoffmann,
2019; O’Neil, 2016). Given long-standing, pronounced racial and socioeconomic
disproportionality in the population of children and families engaged with the child welfare
system (Dettlaff & Boyd, 2020; Fluke et al., 2010; Kim & Drake, 2018), it seems inevitable that
algorithmic models fitted to existing child welfare data would also suffer those biases. Another
primary source of data bias is statistical bias, which is a systematic mismatch between the
collected data and the world as it is (Mitchell et al., 2021). Various reasons contribute to
statistical bias, including sampling bias and measurement error (Mehrabi et al., 2019).
Modeling Bias
The tendency of algorithmic models to prioritize bigger size groups over smaller size
groups and its inability to incorporate unknown counterfactual outcomes into modeling have
surfaced as primary threats to machine learning fairness (Suresh & Guttag, 2020). These
machine learning attributes are likely to heavily affect members of small-size racial or ethnic
groups, and thus deepen the disparity among children and families (Capatosto, 2017).
Prediction Error
The prediction error is highly likely to happen to some degree in all cases because any
decision-making system predicting a future outcome would produce false determinations.
Relative ethical tolerance for false negatives and false positives depends on the context and the
consequences of either false determination (Drake et al., 2020). In the child welfare system,
consequences for the children and family of false determinations can be much more significant
8
later in the process when decisions are about court involvement, foster care placement, or
possible termination of parental rights (Dare, 2015).
Stigmatization
Unlike most health care contexts, where being assessed as at-risk carries a relatively little
stigma, being identified to be at-risk in child welfare may have far-reaching consequences for
children and families, especially when they are falsely identified (Vaithianathan et al., 2012).
Stigmatization may place additional pressure on already struggling families by increasing the
level of surveillance and affecting the way service providers engage with them (Dare, 2015).
Machine Learning Bias and Fairness
To apply machine learning to child welfare in a fair and equitable manner, understanding
the definitions of machine learning fairness can be helpful. The literature on machine learning
fairness has evolved around measuring and mitigating biases in machine learning models,
particularly those applied to prediction-based decision-making (Barocas & Selbst, 2016a; Binns,
2018; Hardt et al., 2016). While social conceptualizations of fairness are much broader, relating
to social justice and the right to due process, machine learning fairness focuses on whether a
model performs differently depending on individual attributes, such as race, gender, or class
(Mitchell et al., 2021).
Setup and Notation
Fairness definitions in this study focus on the context of making binary decisions. D is
defined as a decision made by a decision-making model, and Y is defined as an observed
outcome. P( ) indicates a function that computes the probability of the notation within the
parentheses. The vertical bar ( | ) notates conditioning on a given subset. To display the state of
an outcome of interest, 0 and 1 are used (e.g., 1 = high risk and 0 = no risk if an outcome of
9
interest is a maltreatment referral). For example, P(Y=1|D=1) indicates the probability of
observing a positive outcome (Y=1), given that the decision made by a model is positive (D=1).
For an example of sensitive attributes, race and ethnicity are used and notated by R.
Machine Learning Fairness Definitions
Despite several definitions of machine learning fairness (Mehrabi et al., 2019), there is no
agreed single definition of fairness among scholars, because fairness is highly subjective and
context-dependent (Binns, 2018; Verma & Rubin, 2018). In this study, selected definitions of
fairness that have gained significant prominence are introduced, guided by the taxonomy used in
the work of Mitchell and colleagues (2021).
Table 1.1: Confusion Matrix: A Summary of Prediction Results
Note. The confusion matrix is adopted from the work of Mitchell et al. (2021) and revised. A positive outcome is
indicated by the number 1, and a negative outcome is indicated by the number 0. The outcome indicates an observed
outcome that happened in the real world, whereas the decision indicates a prediction based on a predictive model. It
illustrates match and mismatch between outcome and decision, with margins expressing conditioning on decision
and outcome. The blue area indicates fairness conditioning on outcomes, and the green area indicates fairness
conditioning on decisions.
Equal prediction measures. Some of the most widely discussed fairness notions focus on
the parity of fairness metrics between different membership groups. These notations are used for
Outcome = 1 Outcome = 0
True positive False positive
Positive predictive value:
P(Y=1|D=1)
Probability of the outcome =1
given decision=1
False discovery rate:
P(Y=0|D=1)
Probability of the Outcome=0
given decision=1
False negative True negative
False omission rate:
P(Y=1|D=0)
Probability of the outcome=1
given decision=0
Negative predictive value:
P(Y=0|D=0)
Probability of the outcome=0
given decision=0
Decision = 1
True positive rate:
P(D=1|Y=1)
Probability of the decision=1
given outcome=1
False positive rate:
P(D=1|Y=0)
Probability of the decision=1
given outcome=0
Decision = 0
False negative rate:
P(D=0|Y=1)
Probability of the decision=0
given outcome=1
True negative rate:
P(D=0|Y=0)
Probability of the decision=0
given outcome=0
Accuracy:
P(D=Y)
Probability of outcomes and
decisions match
Conditioning on Decision (given decision=1/0)
Conditioning on
Outcome
(given outcome=1/0)
Outcome = 1 Outcome = 0
Decision = 1
Decision = 0
10
decision-making tools that use a single risk score threshold to arrive at the ultimate decision—
that is, when using a threshold of 0.5, all individuals who have a predicted risk score above 0.5
are classified to have an outcome of interest (D=1). When false positives and false negatives
have equal costs, we can require comparable accuracy across membership groups. For example,
P(D=Y|R=White) = P(D=Y|R=Black) indicates that the prediction accuracy (i.e., P(D=Y), the
probability of decisions matched to observed outcomes) is equal between White and Black.
Conditional on outcome. Fairness definitions conditioning upon outcomes reflect a
notion that people with the same outcomes should be treated the same, regardless of their group
membership (Mitchell et al., 2021). These are also called error rate balance (Chouldechova,
2017) or equalized odds (Hardt et al., 2016).
Conditioning on the outcome Y = 0, we focus on a subset of the population observed to
have an outcome defined by 0 (e.g., no risk). Equality of false-positive rate P(D=1|Y=0) requires
the equal proportion of people to be falsely predicted to have an outcome defined by 1 (e.g., high
risk), and equality of true negative rate P(D=0|Y=0) requires that membership groups have the
equal proportion of people correctly predicted to have an outcome defined by 0.
When conditioned on the outcome Y=1, we focus on the subset of the population
observed to have an outcome defined by 1 (e.g., high risk). Equality of true positive rate
P(D=1|Y=1) requires equal rates of correctly assessed individuals across membership groups.
Equality of false-negative rate P(D=0|Y=1) needs membership groups to have similar error rates
by misclassifying individuals to have an outcome defined as 0, even though they were observed
to have an outcome of 1. This notation is equivalent to equality of opportunity (Hardt et al.,
2016).
11
Conditional on decision. Fairness notions conditioning on decision reflect the decision-
maker's viewpoint, rather than individual’s actual outcomes (Mitchell et al., 2021). Given that an
individual’s actual outcomes are not known at the decision, Field Dieterich and colleagues
(2016) argued that fairness notions conditioning decisions are more appropriate than that
conditioning on outcomes. These are also called predictive parity or outcome test
(Chouldechova, 2017; Verma & Rubin, 2018).
Both equality of false omission rate P(Y=1|D=0) and equality of negative predictive
value P(Y=0|D=0) focus on those predicted to have a decision defined as 0. The former requires
that different membership groups have the same proportions of people who were observed to
have an outcome (Y=1). On the contrary, equality of negative predictive value requires groups to
have the same matching rates of outcomes and decisions.
The other set of fairness notations conditioning the decision focus on individuals
predicted to have a positive outcome (D=1). Parity of positive predictive value P(Y=1|D=1)
requires groups to have equal portions of individuals observed to have a positive outcome.
Quality of false discovery rate P(Y=0|D=1) needs membership groups to have similar
misclassification error rates.
Equal decision measures. Unlike equal prediction measures, equal decision measures
focus only on decisions without considering observed outcomes. Equal decision measures may
be helpful when tradeoffs can be examined by looking at the decision alone (Mitchell et al.,
2021). Equal decision measures, such as demographic parity, require membership groups to
show equal decision rates (either 1 or 0) regardless of their observed outcomes (Kusner et al.,
2018). For example, when demographic parity is satisfied for home visiting service receipt,
Black and White mothers receive the service at the same rate regardless of their observed risk
12
outcomes. This approach can be motivated by a few reasons: 1) when one decision is always
preferable to another regardless of an observed outcome; 2) when the outcomes being predicted
may not be observed or poorly measured, leaving error rates unknown; 3) and when the
relationship between protected attributes and observed outcomes are considered to be unfair,
even if the observed relationships are accurately capturing a real-world phenomenon (Mitchell et
al., 2021).
Fairness through Unawareness
Fairness through unawareness measures, or anti-classification measures (Corbett-Davies
& Goel, 2018), requires that decisions should be made independent of the group membership
(Grgić-Hlača et al., 2016; Kusner et al., 2018). This can be achieved by not allowing a model to
directly access information about protected attributes. Given broad consensus in our society that
it is unfair to make a distinction based on sensitive attributes (e.g., race and ethnicity), excluding
any sensitive attributes as predicting features from modeling may seem reasonable to ensure
fairness. However, as non-sensitive attributes are often strongly linked to sensitive attributes
(e.g., race and ethnicity are associated with family income), seemingly neutral attributes can be
misused to produce discriminatory decisions (Barocas & Selbst, 2016a). Researchers have raised
concerns around simply removing sensitive attributes from algorithmic modeling, indicating that
sensitive attributes should be carefully used to observe and control any kind of discrimination
(Dwork et al., 2012; Ruf & Detyniecki, 2020).
Impossibility Theorem
It has been shown that it is not possible to simultaneously satisfy three fairness
conditions, demographic parity, equalized odds, and predictive parity (Chouldechova, 2017;
Kleinberg et al., 2016). This indicates that some of the fairness notions are incompatible with
13
each other, and thus, trade-offs between them should be carefully considered when using those
fairness metrics.
Fairness Definitions beyond Machine Learning
The fact that much of fairness literature has been established in computer science and
statistics may lead to the mistaken impression that bias and fairness issues only arise when using
automated decision-making (Mitchell et al., 2021). However, they are also applied to human
decision-making, and the trade-offs between different notions of fairness would apply regardless
of whether a human or a machine makes the prediction (Kleinberg et al., 2016). Recent advances
in data size and quality allow us to evaluate the fairness of human decision-making using the
various notions of machine learning fairness.
Examining Machine Learning Fairness in Child Welfare
Researchers in the field of fair machine learning have conceptualized fairness as a matter
of resource allocation (Donahue & Kleinberg, 2020; Lundgard, 2020). Rooted in theories of
equal opportunity, some of machine learning fairness help us achieve an equal allocation of
resources, arguably operationalizing theories of the distributive justice (Hardt et al., 2016;
Hashimoto et al., 2018). For example, parity of true positive rates attempts to ensure equal
opportunities across membership groups by requiring that the groups have the same proportions
of people accurately predicted among individuals who have an outcome of interest. Considering
fairness as equal allocation of computational artifacts, such as true positive rates, is based on the
assumption that a resource is a single-valued property that can be compared and unambiguously
ranked (Lundgard, 2020).
The prevailing view on fairness in the machine learning community should be carefully
examined when those definitions and metrics are applied to child welfare. As in machine
14
learning, social justice in social work has also been established upon the notion of just
distribution of social goods. However, social work takes one step further and pursues “a society
in which people, individually and in community, can live decent lives and realize their full
human potential” (Reisch, 2002). In other words, the distribution of social goods in social work,
or child welfare, should promote social justice through the realization of full human potential
among individuals whose social and economic conditions are heterogeneous. Since individuals’
social and political conditions can heavily influence their actual obtainment of desired outcomes,
simply ensuring equal allocation of resources may not be enough (Lundgard, 2020; Mehrabi et
al., 2020).
15
CHAPTER 2: (Study 1) Qualitative Exploration of Child Welfare Workers’ Decision-
Making Experiences and Perspectives on Fairness
Introduction
Previous literature has shown the potential of using machine learning to inform decision-
making in child welfare and examined some aspects of machine learning fairness (Ahn et al.,
2021; Chouldechova et al., 2018; Pan et al., 2017). Yet, relatively scant attention has been paid
to how the unique characteristics of child welfare can be incorporated into building and
evaluating machine learning. Careful consideration of child welfare specific characteristics, such
as values, workflows, cultures, and policies, is essential for aligning machine learning initiatives
with the need of child welfare workers (Drake et al., 2020; Roberts et al., 2018). While the
unique characteristics of decision-making in child welfare have been well established in the
literature, less is known about the experiences of child welfare workers. As part of a broader
effort to leverage machine learning to inform decision-making in child welfare in a fair and
equitable manner, this study qualitatively explored the decision-making experiences among child
welfare workers and their perspectives on fairness. This can help understanding practical
implications and related fairness issues when applying machine learning to child welfare.
Machine Learning Applied to Child Welfare
Using data to inform decision-making in child welfare has been a continuing concern
(English et al., 2000; Johnson, 2004; Schoech et al., 2000). With the recent advances in digital
technologies and recognition of administration data in the public sector, increasing attention has
been given to leveraging data science approaches, such as machine learning, to inform decision-
making in child welfare (Cuccaro-Alamin et al., 2017). Machine learning has been primarily
16
adopted as predictive analytics, offering new insights into service areas where early
identification of children and families in need is of particular interest (Ahn et al., 2021;
Chouldechova et al., 2018; Schwartz et al., 2017a; Shroff, 2017; Vaithianathan et al., 2020).
While these studies have shown how machine learning can improve prediction accuracy for a
given problem, little has been discussed how machine learning can be applied and evaluated
considering the unique characteristics of decision-making in child welfare. Further exploration is
needed to use machine learning to better assist child welfare professionals with their decision-
making within the existing workflow considering domain-specific conditions, including policy
and organizational structure.
Decision-Making in Child Welfare
Although the recently enacted Family First Prevention Services Act in 2018 (P.L. 115-
123) has redirected the United States’ child welfare to prevention-oriented services, the primary
focus is still on child protection (Berrick et al., 2015). Annually, child welfare agencies across
the country receive approximately 4 million referrals involving around 7 million children (U.S.
Department of Health & Human Services, 2021). As the backbone of the child welfare system,
child welfare workers respond to the needs of vulnerable children and troubled families who
come to the attention of the child protection system (CPS). On a daily basis, they make several
difficult and consequential decisions, from assessing familial circumstances to planning for
children’s futures (Benbenishty & Fluke, 2020; Berrick, 2018). As child welfare workers are key
decision-makers in child welfare, understanding their experiences, responsibilities, and
qualifications is essential when discussing the unique characteristics of decision-making in child
welfare.
17
Child Welfare Worker’s Responsibilities
The CPS prioritizes risk-averse practices based on the assessment of safety and risk to
children by family or caregivers (Berrick et al., 2015; Lonne et al., 2008b). Thus, the core
responsibilities of child welfare workers in responding to child maltreatment reports include:
assessing the risk of children who are reported to the hotline; removing children from harmful
home settings; working on family reunification in collaboration with a variety of stakeholders;
providing and connecting families to services, support, and interventions; and coordinating short-
and long-term care for children who cannot reunify with their family (National Association of
Social Workers, 2013).
Depending on their responsibilities, child welfare workers can be divided into emergency
response (ER) social workers and continuing services (CS) social workers. ER social workers are
first responders who conduct initial investigations or assessments on suspected child
maltreatment (Cage & Salus, 2010). If maltreatment is substantiated, a case is opened. Then, ER
social workers contact the CPS and Child Dependency Court to have the child removed while
documenting the evidence of maltreatment for the courts and CS social workers (Louie, n.d.).
Once a child is placed in the care of the Child Dependency Court, CS social workers conduct an
in-depth investigation of the children and their families to develop a case plan, which includes
behavioral changes for the parents and ways to support the recommended changes. Throughout
the process, CS social workers work closely with the family and provide them with emotional
support and advice (Louie, n.d.).
Training And Education for Child Welfare Workers
Given the significance of decision-making in child welfare, ensuring child welfare
workers have adequate education and training has been a primary concern for child protective
18
services agencies. Child welfare workers build their careers upon solid academic preparation,
internships, and strong social and collegial support systems. Although they may vary by state,
requirements to become a child welfare worker can include completion of a bachelor’s or
master’s degree in social work from a Council on Social Work Education (CSWE)-accredited
institution, experience through a child welfare internship or related field placement, pass of the
appropriate licensing exams, and completion of continuing education required to retain licensure
(National Association of Social Workers, 2013). While child welfare workers can work in entry-
level positions with a bachelor’s degree, a master’s in social work (MSW) is often desired to
assure the clinical skills needed to assess and serve families and progress to additional
responsibilities (National Association of Social Workers, 2013).
The importance of cultivating skilled child welfare workers has also been addressed by
federally funded initiatives, such as the Title IV-E programs. Under Title IV-E of the Social
Security Act (P.L. 103-432), states are entitled to claim federal reimbursement for funds targeted
to delivering child welfare curriculum and training students committed to service in child welfare
(Child Welfare Information Gateway, n.d.-b). While the requirements for fulfillment may vary
across jurisdictions, students are required to finish a year-for-year work obligation period in
public child welfare after completing their master’s degree in social work (Morazes et al., 2010).
The programs aim to improve practice outcomes in child welfare by investing in selected
students dedicated to serving children.
Decision-Making Dilemmas
Although education and training may help child welfare workers feel more confident in
making decisions, Berrick [2018] argues that it may be impossible to make decisions always
“correct,” even for well-trained child welfare workers. First, it is often infeasible to know
19
whether solutions for each unique family are “right” at the point of decision-making. Child
welfare workers make decisions about long-term outcomes for children in troubled families
whose problems are complex and pertinacious, such as poverty and substance abuse. Therefore,
working with caregivers to improve family life requires preparation, support, and insight beyond
the baseline needs for case management, service coordination, and case completion. Second, the
core values of child welfare, safety, permanency, and well-being, often collide. For example,
child welfare workers have to decide whether they remove children from home for their safety,
knowing that it would significantly compromise their permanency. Balancing those core values
can be mentally and emotionally taxing because child welfare workers are required to pose and
respond to complex ethical questions that may strain their moral compass (Berrick, 2018).
Further complicating these decisions are several other domain-specific factors, including
engagement of other supportive family members, caregivers’ behavioral issues, and the families’
previous CPS involvement history. Moreover, varying constraints of each child welfare agency,
ambiguous information available in the field, limited time and resources, and conflicting interests
among stakeholders make decision-making more challenging for child welfare workers who are
already overburdened with a large number of cases (Benbenishty & Fluke, 2020; Berrick, 2018).
Compared to other social workers, a study found that child welfare workers experience higher
workloads, more significant role conflict, and depersonalization (Kim, 2011). Therefore, child
welfare workers should be supported thoughtfully in every aspect of decision-making to better
develop and utilize their clinical expertise and make decisions with more confidence.
Self-Defensive Practice
In addition to the burden of making decisions that have long-term, consequential impacts
on children and their families, child welfare workers also often endure media scrutiny and
20
sometimes brutal public criticism when their decisions are associated with injury to or the death
of a child (Etehad & Winton, 2017). The public and media’s censorious attitudes towards the
public child welfare systems influence the defensive cultures in the systems (Chenot, 2011).
Defensive practice includes a range of behavior pointed to protecting oneself against later being
held responsible or blamed, including overly emphasizing documentation, intervening more than
needed, and refraining from intervening when service is required (Harris, 1987; Whittaker &
Havard, 2016). To provide adequate support to child welfare workers, careful consideration of
their deeply rooted concerns, response cycles, and fear is required.
The Current Study
This study examines the decision-making experiences among child welfare workers and
their perceptions of fairness. Advancing the previous literature on decision-making in child
welfare, this study adopted a qualitative approach to directly incorporate the voice of child
welfare workers discussing their decision-making experiences and insights on fairness. The
objectives of this study were: 1) to identify the unique characteristics of decision-making in child
welfare, 2) to understand how child welfare workers perceive fairness in decision-making, and 3)
to discuss how the unique characteristics of decision-making in child welfare can be considered
when applying machine learning and operationalizing fairness in child welfare. This study was
marked as exempt by the Institutional Review Board.
Methods
Participants
Public child welfare workers were recruited to participate in focus groups to share their
experiences in making decisions and perspectives on fairness in the CPS. Study participants were
required to have at least six months of working experience at the same child protection service
21
organization in California. The eligibility criteria for study participation were to ensure that
participants have sufficient practical insights and experience to share and that they are bound by
similar organizational culture and policy, which influence their decision-making (Fluke et al.,
2020a). Study participants were recruited by sending emails to social work alumni with a link to
an online application form. A follow-up email was sent to schedule focus groups. Guided by the
wage-payment model (Dickert & Grady, 1999), participants were compensated US$60 for their
time.
Seventeen child welfare workers who met the inclusion criteria expressed their interest in
participating in the study by submitting an online application. Of the seventeen potential
applicants, two of them did not respond to the follow-up email. Of the rest fifteen respondents
who were placed in a scheduled focus group, five did not show up for the meetings. Although the
respondents were not asked to provide a reason for cancellation, one participant voluntarily
reported that they could not join the focus group due to unexpected fieldwork. Three focus
groups were held in June and July 2021 via Zoom, an online meeting platform. Three focus
groups were conducted with ten participants: one focus group included four participants, and two
focus groups included three participants.
Demographic and professional information that was relevant to the study aims (i.e., race,
years of working in the child protection systems, and the area of expertise) was collected by self-
report questions during the recruiting process. Of the participants, 40% were Black, 40% were
Hispanic, and 20% were White. A majority (70%) of the participants had four or more years of
working experience in the child protection systems, and the rest had between one to three years
of experience. A third (30%) of participants were more familiar with the emergency response
22
service, another third (40%) was more familiar with the continuing service, and the rest (30%)
responded that they have experience in both areas.
Focus Group
The focus group approach is appropriate to investigate participants’ perceptions and
attitudes on a given topic as it elicits opinions as a result of group discussion rather than
individual retrospection (Kitzinger, 1995). Using a semi-structured, funnel interview design
(Morgan, 1997), a list of broad questions was developed for the moderator to facilitate
discussions. The participants were asked the following questions: 1) what kind of decisions do
you make in child welfare practice? 2) what does fairness mean to you when you make
decisions? 3) how do you feel about your organization's policies and procedures when you think
about the fairness of your decisions? 4) What would ideally fair decision-making look like in
public child welfare? These broad questions were followed by prompts and questions based on
the participants’ responses to elicit additional perspectives from the group. Each group lasted
approximately 100 minutes and was led by a moderator who is a clinician with 30 years of child
welfare work experience at the organization from which the participants were recruited. The
principal investigator (PI) attended the focus groups to observe and collect data. All participants
were informed that the discussions might not be perfectly confidential given the nature of the
focus group because their names, experience, appearances, and voices may be inevitably shared
within the focus group participants. It was also noted that the focus groups would be video-
recorded and that they could opt not to turn on their cameras. Each participant verbally agreed to
the conditions.
23
Analysis
All recordings were auto-transcribed by Zoom. Once the accuracy of each transcription
was reviewed by the PI, the video recordings were destroyed. The PI and a research assistant
independently analyzed each transcript and identified themes using a thematic analysis approach
(Krueger & Casey, 2009). The focus group questions (Table 2.1) were used as a framework to
develop the initial set of themes, upon which additional themes and codes were developed.
Codes were compared and emerging themes were discussed to reach a consensus. Codes were
then reviewed, reduced, and organized to establish patterns throughout focus groups. The
analysis was performed through the software NVivo 12.
Table 2.1: Focus Group Questions
Topic Question Follow-up question
What kind of decisions do
you make?
• Can you share some examples?
• What makes the decision easy or difficult?
• Who are directly and indirectly involved with the decisions
you make?
• Are decisions you make in the public child welfare systems
different from decisions in other public agencies?
• Differences between decisions in frontend vs. backend?
What are the unique
characteristics of decisions
made in public child
welfare?
• What are your thoughts about the decision-making process at
the DCFS?
• How do the DCFS policies and protocols affect your decision-
making?
What does fairness mean to
you when you make
decisions?
• Have you thought about fairness when you were making
decisions? What was your experience? Did you have a
dilemma?
• Are there any value conflicts when you think about fairness in
your decision making?
How do you feel about the
DCFS policies and
procedures when you think
about fairness of your
decisions?
• How do the DCFS policies and procedures affect your
decision making in terms of fairness?
• Do the DCFS policies and procedures align with your
definition of fairness?
What would ideally fair
decision-making look like
in public child welfare?
• What should be considered when we discuss fairness in public
child welfare decision-making?
• What does fairness mean in decisions made in public child
welfare?
Characteristics of
public child welfare
decision-making
Defining fairness in
public child welfare
decision-making
24
Results
Several important themes relevant to the decision-making experiences of frontline child
welfare workers and their perspectives on fairness were identified. Themes fell into six groups:
(1) responsibilities of child welfare workers; (2) characteristics of decision-making in child
welfare, (3) factors that complicate decision-making, (4) perspectives on fairness, (5) threats to
fairness, and (6) suggestions to improve fairness in decision-making (See Table 2.2).
Table 2.2: Themes Emerged in Focus Groups
Responsibility of Child Welfare Workers
Three themes emerged under the category of responsibilities of child welfare workers: (a)
responsibilities of emergency response (ER) social workers, (b) responsibilities of continuing
services (CS) social workers, and (c) interactions with stakeholders. The findings showed that
Categories Themes
• Responsibilities of emergency response social workers
• Responsibilities of continuing services social workers
• Interactions with stakeholders
• Child safety as a key priority
• Having long-term impacts
• The involuntary nature of services
• Using the Structural Decision Making tool
• Self-defensive decision-making
• Lack of time and support for decision-making
• Limitations of policy
Perspectives on fairness • Equitible Outcome
• Human bias
• Systemic bias
• Provide peer support and guidance through case consultation meetings
• Work with communities and having them give suggestions
• Acknowledgement of systemic racism and taking trauma-informed
approaches
• Lower workload
• Reform policy to better meet family needs
Characteristics of decision-
making in child protection
Factors that complicate
decision-making
Threats to fairness
Suggestions to improve
fairness in decision-making
Responsibilities of child
welfare workers
25
the types of decisions may vary between ER and CS. First, decision-making in ER starts once a
referral is generated by the hotline and focuses on evaluating immediate child safety concerns.
They proactively make their own risk assessment so that they can appropriately respond to the
children at risk. When they visit homes and conduct assessments, their primary concern is
whether it is safe to leave the child at home.
“In ER we make initial decisions, right? So, I'm going out, gathering
information, talk[ing] to the parties that are available. And then at that point, I
decide [whether] this family is safe or not […] and if I think they aren’t, then
contact […] my supervisor, and then we make a decision. I introduce our
supervisor management and middle management into the decision.”
“In emergency response, the hotline determines when we have to go on the
referral, whether it be expedited immediate or five-day response. [However,
sometimes] even though it's labeled as a five day, […] I [decide] like, no, this
sounds like I need to go today. So, I pretty much make the decision to go and
respond. Once I [make visit] to a home, I make the decision whether I can
leave the home and I can leave the children there safely. Or I need to make a
safety plan with the family, or I need to detain the child on exigent
circumstances.”
Based on their risk assessment, ER social workers also coordinate a service plan for the family
and make custody decisions.
“We also make decisions regarding aftercare services that the family may
need. We decide to [refer] them [to] specific agencies that can support the
family […] [I]f we do come to a point where we decide that the child is unsafe,
or there needs to be […] [a] chang[e] [in] the custody or dependency of the
children, we may ultimately make a decision to either remove custody from
parents or restrict custody from parents.”
Second, once a case is open following the decision of an ER social worker, CS workers
make a wide range of decisions, including placement, services, timeframe, reunification, safety
plan, court hearings, and termination of parental rights.
“Once a case is open, then there's a bunch of decisions that have to be made.
Some of those decisions are: what services are going to mitigate safety; what
placement is appropriate for the child; how long does that child need to be
26
away from their parents or how long do we need the case to be open; what
services are going to be helpful for the child to stabilize; and there's a lot of
follow up.”
“[Y]ou have to make decisions about family, you have to make decisions about
location, you have to make decisions regarding safety, you have to make a
decision regarding siblings, and what that looks like. So that in and of itself,
whether you can operate off of it, […] you have to make the best decision that
you can for them. Because our quote unquote placements are, you know, not
always going to be the best options.”
“I can’t think of a decision [..] that happens at the department that I don’t
have to as a backend worker sign off on. Everything touches our desk, at one
point or another. […] It always has to go through the worker, and that is a
lot.”
Beyond simply making decisions about families and children, some CS social workers made it
clear that their role is to advise and support families through the process.
“I [tell] my parents [that] my role is to walk [them] through this very
complicated system with as much transparency as possible. I'm here to give
[them] the tools that I know to help [them] get in and out of this system.”
Third, focus groups revealed complicated dynamics between child welfare workers and a
range of stakeholders who participate in decision-making, including supervisors, Assistant
Regional Administrator (ARA), children, parents, caregivers, therapists, courts, law enforcement,
and teachers. Decision-making processes are not linear and child welfare workers play a critical
role in navigating and integrating the interests of stakeholders whose roles, perspectives, and
expectations vary.
“[O]ften times, stakeholders get to come in and out. And we ultimately have to
pick up all everybody else’s decisions. A parent makes a decision that impacts
the kid, a therapist leaves, it impacts the kid. So, we are having to manage and
facilitate the relationship that this kid has with, with the system based off
decisions that we are not making. But ultimately, it comes back on us and that
affects our relationship with the kids.”
27
While some participants reported their decisions are collaborative, most participants reported that
they and their supervisors play the biggest role in decision-making. Some participants stated that
they make decisions on their own.
“My supervisors and upper management [administrators] are involved. Other
team members, so does service providers of outside therapist, regional center,
caregivers, parents, […] but I will say a lot of the time […] at least from the
DCFS side of things that I feel like I am making decisions on my own with not
a lot of consult or feedback, to be honest.”
“Making decisions on [their] own” does not necessarily mean that child welfare workers have the
authority to make key decisions on their own. Often, they are balancing information from
multiple sources, making recommendations that require approval from supervisors,
administrators, or courts. Their decisions and recommendations are bounded by organizational
and systematic restrictions, including policy, court orders, administrative decisions, and budgets.
“ [E]ven though we might have made a recommendation, sometimes the court
will order something different and then we will have to change kind of the
trajectory of where we think the case should go or how it should go.”
Characteristics of Decision-Making in Child Protection
In terms of the characteristics of decision-making in the child protection system, four
themes emerged across focus groups: (a) child safety as the key priority; (b) having long-term
impacts; (c) the involuntary nature of services; and (d) using the structural decision making
(SDM) tool. First, in all three focus groups, participants strongly underscored that child safety
was the top priority in any decision-making.
“Child safety is always our main priority. And it will always remain our
priority. And so sometimes even […] older children, or teenagers, they're like,
‘I can take care of myself’ [or] sometimes, [both] the parents [and children]
are like, ‘we don't need you guys,’ if we feel like it's an unsafe situation, we are
required to step in, and we must enforce whatever needs to be enforced in
order to keep the child safe. And so yes, absolutely. Child Safety is our number
one priority.”
28
“Every time we enter the home, we'll make a decision regarding the immediate
safety of the child [and if] there are any safety threats”
Second, across all focus groups, it was clearly shown that the participants well
understand the long-term, significant impacts of their decisions on children and families.
“[E]very single decision that we make, whether it's small or large, affects
somebody's life. […] It's all important, [because decisions can] change the
trajectory of where [children] are going to go. And I don't think any of us here,
take that lightly.”
“[W]e are making decisions on how much contact the children have with their
families, or whether or not if they're going to be with their families at all. It's
decisions that are really life changing and life altering.”
Third, according to the participants, one of the primary characteristics that differentiate
child welfare services from other social welfare services is that participation is not voluntary.
Often, families do not know they are being reported, and unsurprisingly, they disagree with the
findings of an investigation. They may not only be afraid of being involved with the CPS but
also not be willing to receive services. These pushbacks are often “because most of [the] parents
don’t believe that they need [an] intervention.” While child welfare workers cannot force parents
to receive services, the consequences may be very serious indeed. Children may be removed
from or may not be able to reunify with their families.
“[T]he biggest thing that comes to mind is the idea that we are dealing with
involuntary clients. And so often times, we have a lot of resistance if they don’t
agree with being involved or the department, [including] how we make our
decisions, where we place the children. There can be a ton of pushback from
the parents themselves or relatives. […] [M]ost of our parents don’t believe
that they need our intervention. […] [T]hey can be very difficult or try to
monopolize meetings or jeopardize placements if they don’t want their child
there.”
Lastly, the attitudes toward the SDM, a comprehensive set of tools designed to support
case management for CPS (Johnson, 2004), were mixed among the participants. Although most
participants agreed that the SDM tools can be helpful to standardize or “equalize” decision-
29
making across child welfare workers, they sometimes find the tool is limited as it does not reflect
the full picture or nuance of real-world problems. Participants reported that when there is a “gray
area”, or the resulting risk level does not accurately reflect the true situation, they may opt to
override the decision suggested by the SDM tool based on additional information and
consultation with their supervisors.
“But sometimes the answer is not yes or no. Or sometimes we have to override
the tool, because […] the tool will give us one decision, but it is not accurate.
For example, a parent [can be] technically compliant, [without] behavior
change, […] like a mom could sit in a class for 10 weeks but playing cell
phone on entire time, not engaged and not learning, and not able to come to
apply any of the knowledge that she gained from the class, and that is not
behavior change [although the tool may indicate compliance]. ”
“The SDM tool […] we use that to determine or to kind of […] bounc[e] ideas
off of […] But sometimes I don’t agree with it. And that’s when I have a
conversation with my supervisor. […] And then we kind of talk about it and see
if we need to override that SDM decision.”
Factors that Complicate Decision-Making
Of several themes that emerged around the factors that complicate decision-making, the
three primary themes were: (a) self-defensive decision-making, (b) lack of support and time for
decision-making, and (c) limitation of policy. First, in all focus groups, participants reported that
they have or have seen others adopt a self-defensive, or fear-based, decision-making approach,
which was called the ‘CYA’ (Cover-Your-Ass) approach. While some participants appeared to
think most of their time was focused on self-defensive documentation, they hope to spend most
of their time directly working with children and families (“I would rather it be 90% instilled with
the kids and 10% of paperwork”). To avoid self-defensive decisions, participants said they try to
“become more confident” or “follow [their] gut.” Also, it was discussed that to not be self-
defensive, they need to be supported by “creative, comfortable, and confident supervisors and
ARAs”.
30
“ I feel like so much of decision-making in DCFS is really rooted in fear. […]
But I guess for me, […] I feel like I have to trust and believe that […] my work
within itself should cover me. As long as I am doing good work, I should not
have to worry about getting in trouble. Because I just have seen just so many
fear-based decisions. […] [I]f my name is on this report, this recommendation
is going to be what I truly feel solid and comfortable that I believe in. […] I
just have seen so many cases, so many reunifications delayed based upon fear.
‘Oh, we need parents to do drug test just a few more times’ […] [W]e don’t
always need this additional thing to provide us our comfort. ”
Second, lack of time and support also make decision-making more difficult for child
welfare workers. Unsurprisingly, several participants stated they would like to have more time to
review their cases to make better decisions. In addition, lack of guidance and support also
appeared to hinder effective decision-making. While some participants had experiences of
working with supervisors who “knew every resource in the book [and] was great at utilizing it”,
others reported that they don’t necessarily receive appropriate and adequate feedback and
supervision. It was also mentioned that support staff and upper management can be disconnected
from the experiences of frontline service workers and that communication and connection across
the agency are needed to be strengthened.
“I do feel confident in my recommendations, but I will say, if I didn’t, I just
don’t always feel like I am necessarily getting the actual supervision and
feedback.”
“I think there needs to be actual investment on support staff. […] There is this
disconnect from support staff to what their work actually means, because they
are not in the field.”
“ I just want social workers to feel more confident when they go out in the
field. I want us to feel more confident about our assessment and what our
recommendations are.”
“[T]he upper management is completely disconnected from the reality of what
we are doing out in the field. And it is really disheartening, because this is
what we need and what we want.”
Third, as child welfare workers have to sort through different perspectives and interests
of a range of stakeholders in varying circumstances, child welfare workers “consider policy like
31
[their] training wheels and use it as [their] foundation.” However, it was also pointed out that the
official policy often falls short of grappling with critical aspects of child welfare service cases.
More importantly, participants underscored that policy often fails to address the heterogeneity of
children and families who come to the attention of the system. It appeared that many of the
current policies, which child welfare workers are expected to take into account as they make
decisions, can seem to be “archaic” and “bias[ed].” Participants also noted that not all
supervisors and child welfare workers are up to date with the policy.
“[W]e have to make decisions because our policy or procedure says we have
to do this. But it may not always sync with some of the research or some of the
evidence-based information that we have. But we are still bound by this law or
policy. And so we just have to move forward with what we have to do.”
“I don’t want to minimize my job, but I am at the bottom of the totem pole […]
We can have these conversations, but if it doesn’t reach the right people, then
[…] it stops at the bottom. […] So I just want […] people who are creating
policy and procedures […] at the top would have access to this information
because I think it is so needed, and it would be great to help drive.”
Perspectives on Fairness
A primary theme that emerged regarding fairness in decision-making was equity. In all
focus groups, it was repeatedly emphasized that each family is different, even when they may
appear to be the same based on variables in the SDM tools or summary description, and that it is
important to promote “the best interest of the child” based on “the accurate picture of the
family.” It is noteworthy that some participants stated that their understanding of fairness is also
based on “personal feeling.”
“There are no families that are the same or no situations that are completely
the same. They may have the same allegations, the same kind of incident that
happened, but there is always going to be something different in each family,
or even their statements what they say, […] I can’t say that it is really ever
going to be fair, because each situation [seems to be] the same, [but] different
at the same time.”
32
“When we are talking about being fair, it depends on who we are being fair to.
Are we treating each family the same? Or are we being fair to that family that
you are working with? […] We want to get that kid to the closest amount of
normal that kid recognizes. Their normal is not [other] kids normal and not my
normal. That is the goal while keeping them in a safe environment.”
Threats to Fairness
More than a few participants felt strongly that the current child protection system is not
fair due to (a) human bias and (b) systemic bias. First, various people, including child welfare
workers themselves, bring bias into the decisions made in the system. Participants acknowledged
that their own “point of view based on their experiences as a social worker or in their personal
life” could be the source of bias when they make decisions. The same applies to their supervisors
and administrators.
“It is dependent on the worker you get, […] what supervisor that worker has,
[…] what judge you end up getting, [and] what department in the courtroom
you are getting. […] just too much human error involved. I don’t know
fairness actually exists in our decision-making process.”
“I have had a white mother reunify when she has done in minimal work, just
kind of checking off the boxes, whereas I have had a family who has made the
progress, and they were a minority, and they did not reunify. It is frustrating.
[…] The person in the courtroom who has never worked with this family […]
makes the decision.”
Participants pointed out that “some families are called into the hotline more than other families”
and that the potential racial and societal biases among mandated reporters may further traumatize
already marginalized children and families.
“[I]t starts with the hotline. […] I mean, we've all seen the bogus referrals that
we have to go out on and that just further traumatize a family, [even if] we
know from the narrative, that it's, it's bogus, it's just not true. And I think we
need to do a better job of actually figuring out who, which families we touch,
because as soon as we touch this family, there's trauma, there's trauma as soon
as you knock on the door.”
The other primary theme for threats to fairness in decision-making was systemic bias.
One aspect of systemic bias is related to the current DCFS policy, upon which the perspectives
33
among child welfare workers are mixed. While some participants mentioned “protocols within
our system help keep things as fair as possible” and “policies are designed to create fairness,”
most participants expressed that “it is very rare that […] policy help [them] make a fair
decision.” It appeared that policy focuses more on liability than fairness, although it may not be
completely against fairness.
“Policy is more designed to cover liability than it is to create fairness.”
“Policies are reactive, to be specific, probably primarily child fatality. […]
And they are trying to fit every situation when you can’t possibly fit every
situation. […] I can’t tell you how many supervisors that I have [worked with]
had no policy. […] [E]verybody has their own idea of what is best practice.”
“Policies […] kind of wrong to me. […] In our particular area, […] a lot of
children’s families have criminal records, and I have seen how criminal
records in there and even non-violent ones. […] I have seen a child could not
be able to be placed with her grandmother because the grandfather […]
committed welfare fraud in the 70s. […] They were also not able to be
approved to be adoptive parents. So I think that there is still a lot to grow in
that particular area.”
Another aspect of systemic bias emerges when families navigate the system, which falls
short of providing equitable service to families who need more support.
“Our low-income families get the court appointed attorneys that meet with
them three minutes before court. And then you have other families that are
getting services out of more affluent areas, West LA office or whatever, and
they have their private attorneys who are actually able to check in with them
review their case and actually appropriately represent them. […] It is not
fair.”
Suggestions for Improving Fairness in Decision-Making
Several themes emerged in regard to the efforts to be made to improve fairness in
decision-making. First, participants need peer support and guidance through case consultation
meetings.
“[What] would be helpful to make more fair decisions is [having case studies].
When [I was getting my] license, every week, [we] got together [as] a group
34
and [we did] case studies. And I wish that was more of a common practice in
our unit, and that's up to the supervisor.”
“[D]uring COVID, we haven't had a single unit meeting, so I don't feel the
support. So, I can't make fair decisions. If I'm burned out. I can't make fair
decisions.”
Second, the importance of involving communities and incorporating their perspectives
into decision-making was addressed.
“Churches, schools, and neighbors to be involved and to balance out the
system perspective to bring in the community perspective. And I guess that is
the idea of what a Child and Family Team (CFT) is.”
“I think […] it's a good idea to introduce to have a panel or to introduce more
people, more social workers into the decision-making process, because if it's
just one supervisor, one social worker, it's going to be a very biased decision
all the time.“
Third, acknowledging systemic racism and taking a trauma-informed approach is
important.
“[H]aving the trauma-informed approach from every angle. […] We have all
seen bogus referrals. […] [A]s soon as we touch this family, there is trauma,
as soon as you knock on the door. So, I think that we have a responsibility to
our communities, to actually only intervene when it's necessary, not just
because you have a teacher who's scared, or you have a therapist who doesn't
know how to build rapport, or you have a therapist who's scared or whatever it
is. So, I would say, for me, I would love for the, quote unquote, fairness to start
at the beginning, before it even gets to any of us, and only to respond to what
is actually a child safety issue. But we're so scared of liability.”
Fourth, a decreased workload would help child welfare workers spend more time learning
about the cases and building trust with families.
“Lower caseloads and lower referrals, because I think the number and the
quantity of everything that I am doing definitely takes away the quality of what
I can produce. […] I think we are actually able to provide fairness, equality,
and equity. All of these would increase [if we have] more time [for] each of
our client and family.”
Lastly, policy reform, particularly regarding allowed time frames for complex cases, is
needed to better serve the needs of families involved with the system.
35
“I think [timeframe] is the biggest one that comes to mind, especially again, I
think more so I'm thinking in regards to like substance use cases, and how that
takes time. Because, as we know, addiction will not go away within six months.
So, for me, I'm just thinking [a] longer […] timeline [is needed]. [Because]
some of the [timeframes] can definitely be challenging.”
Discussion
This study conducted focus groups involving child welfare workers to qualitatively
examine their decision-making experiences and perceptions on fairness. The participants were
recruited from a child protection service agency in California and required to have at least six
months of experience. During the focus groups, participants were asked about their roles and
responsibilities in decision-making, the unique characteristics of decisions in child welfare, and
their thoughts on fairness in the CPS. Several significant findings emerged that can help to
understand the unique characteristics of decision-making in child welfare that may guide using
machine learning considering fairness.
Implications for Applying Machine Learning Applications to Child Welfare
First, the findings underscore the complexity and interdependency of decision-making in
child welfare. Consistent with the literature, the results show that child welfare workers make a
range of decisions through coordinating and balancing varying expectations and interests of
stakeholders (Berrick, 2018; Fluke et al., 2020b). In addition, the findings document that child
welfare workers make multiple decisions simultaneously considering different timelines,
documentation, and consequences. Many decisions are non-linear and inevitably interdependent;
they are dependent upon what came before or what may happen later. The intricacy of the
decision-making system in child welfare makes it difficult to understand the practical
implications of machine learning decision aids because machine learning models are often built
on an assumption of a simple, closed system (Ahmad et al., 2020; McCradden et al., 2020).
36
Therefore, machine learning should be applied and evaluated considering the contextual meaning
of the decision-making within the current workflow.
Second, this study highlights the importance of understanding the range of information
used to make decisions. The Decision-making process in child welfare is dynamic; child welfare
workers consistently collect new information from various sources as they further assess and
respond to alleged child maltreatment reports. Yet, the amount and types of information used in
varying points of decision-making still remain to be explored. While some information is
recorded and managed by the SDM (Johnson, 2004), other narrative-based, nuanced information
may only remain with the child welfare workers who worked with a family. As surfaced during
the focus groups, some families look the same on the data, even though their circumstances are
not the same and need different supports. This indicates the limited scope of child welfare data
fields, which can potentially compromise the utility of machine learning that heavily relies on
accessible data. Therefore, it is necessary to better operationalize, measure, record, and manage
various types of information, particularly qualitative records. Future research should explore the
potential of operationalizing and measuring an extended range of information.
Third, incorporating core values of child welfare that often collide with one another is a
significant challenge not just for human decision-makers, child welfare workers, but also for
machine learning decision aids. Highlighted in the results, the key principle of child welfare
practice is child safety, but safety must also be balanced with permanency and well-being
(Adoption and Safe Families Act in 1997, P.L. 105-89). These key concepts should be carefully
operationalized and considered when incorporated into a machine learning model.
Lastly, discussions around self-defensive practice raise concerns around liability and
accountability, particularly when machine learning is used to inform decision-making. Of course,
37
if appropriately used, machine learning can help human decision-makers make decisions more
confidently by providing necessary information. Yet, it is not clear how machine learning
informed decision-making tools may influence the liability and accountability issues for child
welfare workers. The issue can be further complicated by the way human decision-makers and
machine learning interact. Machine learning accountability in the context of child welfare has
been discussed focusing on the impacted communities (Brown et al., 2019). There is much to be
explored regarding liability and accountability when machine learning is used in accordance with
child welfare workers’ clinical judgment.
Implications and Opportunities for Improving Fairness
The findings indicate that child welfare workers perceive fairness as promoting equitable
outcomes across children and families whose circumstances vary. Their view reaffirms that
machine learning fairness definitions and measures that have an emphasis on equal allocation of
resources may not be sufficient for addressing and responding to fairness issues in child welfare.
In child welfare, the focus has been on discussing various aspects of ethical implications (Dare,
2015; Drake & Jonson-Reid, 2019; Keddell, 2015). Furthermore, the recent work of Drake and
colleagues (2020) has proposed a practical framework for the ethical application of machine
learning to decision-making in child welfare. Yet, further research is needed to operationalize the
notion of equity in the context of machine learning applied to child welfare.
The child welfare workers’ suggestions to improve fairness in decision-making indicate
some areas where algorithms can be used to improve fairness in the system. First, as suggested in
focus groups, conscious efforts should be made to incorporate community members to promote
fair and equitable CPS outcomes. Algorithmic approaches could be used to effectively collect,
summarize, and share concerns and suggestions from a range of community members. Second,
38
across all focus groups, lowering caseload per worker emerged as an urgent matter, as it would
allow child welfare workers to better understand and meet the needs of each family. Previous
studies have demonstrated how algorithms can be used to help human decision-makers prioritize
and focus on high-complexity cases (Ahn et al., 2021; Chouldechova et al., 2018; Pan et al.,
2017). Additionally, algorithms can assist child welfare workers by automating documentation
and producing periodic reports. If algorithms can be appropriately applied to decrease worker
caseloads, they could potentially support improvements in the quality and fairness of decision-
making in child welfare. Third, our findings suggested that inconsistency in individual decision-
making in child protection systems may contribute to compromising fairness. Computational
approaches could allow for minimizing inconsistency across various levels of decision points
when used properly. Lastly, algorithms could be used to inform policy reform by producing
analytic reports summarizing information that is accessible in the data science systems. For
example, as surfaced in the focus groups, algorithms can help develop a more nuanced
understanding of families with substance abuse.
Limitations
Several limitations should be noted when interpreting this work. First, due to the low
turnout rate, the number of participants in each focus group was relatively small (three to four) to
effectively elicit group interactions. In general, the recommended size for focus groups is six to
eight (Plummer-D’Amato, 2008). Second, the generalizability of this study is limited as the
participants were recruited at a local child protective services organization in a county in
California. Third, although intentional, our questions were comprehensive yet fell short of
capturing child welfare workers' views on using algorithms to inform decision-making. By not
explicitly mentioning machine learning, our questions allowed the participants to discuss without
39
being influenced by their prejudice on algorithmic approaches. At the same time, they may not
capture a consensus about algorithmic decision aids and fairness among child welfare workers.
40
CHAPTER 3: (Study 2) Application of Machine Learning to Children and Family Services
Considering Fairness
Introduction
To use machine learning appropriately and ethically, it is essential to understand the
process of building and deploying machine learning as well as potential biases that may arise in
each step of modeling. Fairness in machine learning applied to child welfare should be examined
considering the unique nature of child welfare, including clients’ vulnerability, racial
disproportionality, and stigma. This study identifies potential biases that may surface in each
element of machine learning and addresses how they can be further complicated in the context of
child welfare, drawing on two bodies of literature: machine learning and child welfare. Then, this
study provides an illustrative example of building a machine learning model examining those
biases using a real-world example, the risk assessment tool used by the Bridges Maternal Child
Health Network Program in Orange County.
Design
Problem Identification
Machine learning fairness should be carefully considered as early as identifying a
problem. It is critical to align machine learning initiatives with agency priorities to ensure that a
machine learning model improves existing practices and procedures and service equity (Drake et
al., 2020; Roberts et al., 2018). Machine learning literature has also underscored the importance
of being guided by domain expertise when formulating a machine learning problem, rather than
simply being driven by data availability (Holstein et al., 2019; Veale et al., 2018).
41
Goal Setting
The overarching goal of using machine learning should be carefully addressed because
stakeholders may hold different or even conflicting views (Eubanks, 2019). In child welfare,
where social justice is at its core, the focus should be on allowing more opportunities for children
and families in need rather than simply speeding up the current process (J. McCroskey, personal
communication, March 2021; Rodriguez, 2021). It should be noted that machine learning does
not necessarily bring groundbreaking new insights into the practice (Veale et al., 2018).
Moreover, machine learning does not equate to policy or service; it is a mere assistive tool that
can inform decision-making by computing a risk score, whereas policy planning and service
coordination encompass a wide range of concerns and agendas, including providing access to
necessary supportive services in the community (Drake & Jonson-Reid, 2019; Goel, 2020).
Deployment Planning
With a clearly defined goal of using machine learning, the interaction between a machine
learning model and human decision-makers, such as child welfare workers, should be carefully
discussed and planned within the service horizon (Drake & Jonson-Reid, 2019; Veale et al.,
2018). If the boundary of individual workers’ discretion and the role of a machine learning
model are not clearly explained within the system, model transparency and accountability may
be difficult to be tracked (Veale et al., 2018). Additionally, to make sure the use of machine
learning leads to a meaningful change in practice, well-developed protocols are needed to
provide an intervention when risk is determined (Cuccaro-Alamin et al., 2017). It is critical to
ensure the efficacy and capacity of algorithmic interventions in advance so that the needs
identified by machine learning are appropriately and timely responded (Dare, 2013; Drake et al.,
2020).
42
Data
Bias in Child Welfare Data
Predicting future events using real-world data inevitably induces undesirable properties,
called data bias (Barocas & Selbst, 2016a; Chouldechova, 2017; Mitchell et al., 2021). Data bias
is particularly relevant to predictive analytics applied to child welfare, a field in which racial and
socioeconomic disparities and disproportionalities have been a long-standing issue (Dettlaff &
Boyd, 2020; Drake et al., 2020; Glaberson, 2019). Machine learning models fitted to child
welfare data must not be considered neutral decision aid tools as data collected through the child
welfare systems have been influenced by previous human judgment, including policy agendas,
organizational practices, and legal frameworks (Friedler et al., 2021; Redden et al., 2020).
Mitchell and colleagues (2021) decomposed data bias into more precise notions: societal
and statistical biases. Societal bias concerns real-world social injustice, such as racial or
socioeconomic disparities, represented in the data. The overrepresentation of Black children in
child welfare data is an example of societal bias. On the other hand, statistical bias is considered
as a systematic mismatch between the data sample and the world as it is. Various reasons
contribute to statistical bias, including sampling bias (i.e., a data set is not representative of the
total population) and measurement error (i.e., a measured quantity differs from the actual value).
Potential data bias may also emerge in data linkage and data preprocessing because cultural
differences and sample size disparity may result in lower matching rates for minority groups
(e.g., lower matching rates for non-anglophone names) (Barocas & Selbst, 2016a; Hardt et al.,
2016). Aggregation bias occurs when false conclusions are drawn for a subgroup based on
observing other subgroups (e.g., erroneous conclusions may be drawn for Pacific Islander
children when they are aggregated to Asian children) (Suresh & Guttag, 2020).
43
Counterfactual Analysis
Limited attention has been given to the problems related to fitting a model to data that
contain the impact of historical policy decisions. Careful consideration of the potential impact of
historical service involvement is needed when assessing the need or risk of children and families
using retrospective data (Coston et al., 2020). For example, the risk of mothers who have
received intervention or prevention might be underestimated due to the service impact despite
their high-risk profile. This issue is particularly relevant to child welfare, in which the focus of
using predictive analytics is often on assessing the needs of the marginalized population who
have been likely to be targeted by various public services. Although it might not be easy, the
effects of historical policy decisions should be carefully considered.
Data Imbalance
A dataset is imbalanced if the classification categories are not approximately equally
represented (He & Garcia, 2009). While there is no strict definition of imbalance, Google
Developers (n.d.) suggested 3 degrees of imbalance: 20-40% is mild, 1-20% is moderate, and
<1% is extreme. Imbalanced classifications pose a challenge for predictive modeling because
most machine learning algorithms were designed assuming an equal number of observations in
each class (He & Garcia, 2009). Training a model on imbalanced data can produce poor
predictive performance, specifically for small-size minority classes. In child welfare, data
imbalance is of particular relevance where a target outcome is a rare event, such as various levels
of child protection system (CPS) involvement: the national estimates have shown that by age 18,
37.4% of children experience CPS investigation (Kim et al., 2017), 12.5% experience a
confirmed case of maltreatment (Wildeman et al., 2014), and 5.9% have experienced a foster
care placement (Wildeman & Emanuel, 2014).
44
When a dataset is imbalanced, train and test sets should be carefully split, ensuring equal
or similar rates of outcome categories are assigned to each dataset. In addition, model
performance metrics should be carefully chosen because simple predictive accuracy may not be
appropriate when data is imbalanced (Veale et al., 2018). For example, if the outcome of interest
is only 1% of a dataset, it is possible to obtain 99% accuracy by simply predicting negative for
all cases. The Area Under the Curve (AUC), a widely accepted traditional performance metric
for a Receiver Operating Characteristics (ROC) curve (Swets, 1988), works well for models
trained on imbalanced datasets. Data imbalance problem can be resolved by using the Synthetic
Minority Over-sampling TEchnique (SMOTE) (Chawla et al., 2002) or by adding a parameter
that balances the weight of the dominant label (Anand et al., 2010).
Modeling
Target Feature
A target feature, or an outcome of interest, is not always obvious. Thus it should be
carefully defined by converting the project objectives, requirements, and constraints into a
prediction problem, rather than being driven by the data availability (Swets, 1988). Once the
outcome of interest is defined, a reliable and valid outcome variable should be selected, ensuring
that they are recorded consistently within carefully designed information systems (Gillingham,
2016).
A few concerns may arise when defining a target feature. First, the meanings of a target
feature may vary in different groups, and using such variable may perpetuate, or even aggravate,
the current systematic disadvantages upon already marginalized populations (Barocas et al.,
2019; Barocas & Selbst, 2016a; Paulus & Kent, 2020). For example, while low service
utilization may indicate service barriers in minority communities, it may reflect the low level of
45
needs or having access to alternative fee-for-service supports in affluent communities. If
resources were blindly distributed using low service utilization as a proxy for service barriers,
service inequity could become more severe; it may prioritize individuals who do not need the
service or can afford alternative private supports. Second, potential feedback loops may occur
when a target feature is used as an element for the prediction (Drake et al., 2020; Keddell, 2019).
For example, if a model predicting a future CPS involvement uses previous CPS history as a
predictor, children may be screened in part because they have been screened in before. Third,
even if a target feature is defined addressing all the issues above, critical consideration should be
given to whether it is fair to make decisions based on the specified target feature (Barocas et al.,
2019). Lastly, given the limitations in data sources and the unique nature of child welfare, the
ground truth of an outcome is often not available in data. For example, the true risk of
maltreatment among children is often unknown or undocumented, and thus studies have used
CPS involvement or outcomes as a proxy of familial circumstances (Allegheny County
Department of Human Services, 2019; Chouldechova et al., 2018).
Predicting Features
Predicting features should be carefully selected and engineered because variables
included in a dataset may have been affected by societal or statistical biases, as mentioned above.
Missing values should be coded with caution as they may contain meaningful information,
including the language barrier among immigrants who could not fill up a form that was not in
their preferred language. Using variables that are a proxy for sensitive attributes may also infuse
additional biases (Barocas & Selbst, 2016a; Mehrabi et al., 2020). For example, including
geographic coordinates to predict CPS involvement may introduce racial bias because the
residential mobility of the Black community has been historically restricted (e.g., redlining and
46
other forms of segregation). For these reasons, machine learning literature has underscored the
importance of being guided by domain experts, such as child welfare workers, in creating
predicting features (Barocas et al., 2019).
Interpretability
One of the widely spread, long-standing notions of machine learning is that it is a
complex “black box” whose internal logic and inner working mechanisms cannot be understood
by human users. With increasing interest in questioning, understanding, and trusting machine
learning systems, efforts have been made to interpret and explain machine learning models over
the few years (Carvalho et al., 2019; Gilpin et al., 2018). Two primary paths towards
interpretability have been “creating accurate interpretable-by-nature models” and “creating
explanation methods for existing black box, opaque models” (Carvalho et al., 2019). In other
words, model interpretability can be achieved to some extent by using a model that is
interpretable by its nature or using different tools that help us understand machine learning
models, such as Sibyl, a visual analytics tool to increase the interpretability of a machine
learning model (Zytek et al., 2021). Of course, some machine learning models cannot be
interpreted or do not need to be understood. Not all algorithms allow us to draw clear
associations between predicting features and a target feature as regression algorithms; often,
feature importance indicates how much each feature contributes to model performance rather
than the risk of interest. As model interpretability often results in costs (e.g., time and resources,
compromised prediction accuracy), trade-offs should be carefully considered.
Model Performance Evaluation
Models can be assessed and compared using various prediction performance metrics.
Performance metrics should be chosen considering the purpose of the model, agency goals,
47
resource constraints, and deployment (Veale et al., 2018). If the purpose of service is to reach as
many individuals at risk as possible and the risk of false positives is nonexistent or minimal,
recall (i.e., the rate of true positives among ground truth positives) would be an appropriate
measure. If an agency wants to maximize the accuracy among the individuals they reach out to,
and the risk of false positives is significant, precision (i.e., the rate of true positives among
predicted positives) can be considered. Overall accuracy, such as AUC, may not always be
relevant in practice if a critical constraint exists, such as a limitation to the number of families an
agency can serve in a certain period. In this case, performance metrics at the top k% can be
considered.
Model Impact
Impact on Individuals
Before embarking upon a new machine learning model, it is critical to evaluate the
impact of an algorithmic intervention and identify the population directly and indirectly affected
by the tool (Drake & Jonson-Reid, 2019; Friedler et al., 2021). Since the model may have
different implications for individuals depending on the context and application, its potential
harms and benefits on each group should be critically examined (Dare, 2015; Holstein et al.,
2019). It should also be noted that the impact of a predictive model, regardless of whether it is
false determination or not, can be more dire for marginalized children and families who are
already struggling (Dare, 2015).
Family Participation in Decision-Making
Adopting a new algorithmic tool to inform decision-making may affect how families
interact with practitioners and child welfare workers in the processes of the decision-making
(Drake & Jonson-Reid, 2019). Families’ involvement ranges from being informed about
48
procedures to having support and opportunities to participate in decision-making (Crea & Berzin,
2009). In reviewing the impact of a machine learning model, the level and way families are
involved in decision-making should be carefully considered.
Dismantling Structural Inequalities
Scholars argue that predictive modeling applied to child welfare should aim to explicitly
dismantle structural inequalities (Eubanks, 2019; Keddell, 2019). Child welfare, principally
concerned with marginalized, vulnerable groups, should build a predictive model focusing on
pushing social justice barriers rather than simply maximizing utility for the public (Rodriguez,
2021). The ethics of predictive modeling in child welfare should be considered in the context of
current child welfare policy, practice, and outcomes, noting that the focus of child welfare
service is on child and family well-being rather than on assessing guilt or meting out punishment
(Drake & Jonson-Reid, 2019; Friedler et al., 2021).
Ethical Implications
Some child welfare scholars have established ethical principles and frameworks that
could inform decision-making using machine learning (Dare & Gambrill, 2017; Gillingham,
2019; Roberts et al., 2018). Brown and colleagues (2019) explored accountable algorithmic
design using a qualitative approach, and Lanier and colleagues (2019) examined whether ethical
principles used in business and computer science communities can also be applied to child
welfare. Yet, as these frameworks focus on a post-hoc evaluation, there are needs for practical
guidance targeted child welfare professionals who would like to lead or participate in building
and implementing algorithmic models.
49
Deployment and Maintenance
In a deployment phase, it is essential to plan how the model results will be interpreted
and used by human decision-makers within the existing workflow. Careful reviews and
discussions are needed to decide how individuals can use their discretion. Child welfare literature
underscores that continuing the use of clinical expertise is necessary to safeguard the
implementation of a machine learning tool (Dare, 2013; Dare & Gambrill, 2017). On the other
hand, computer science literature raised concerns about individuals using discretion because
users may impose their biases on the machine learning loop (i.e., labeling bias) (Holstein et al.,
2019).
Current Study
This study provides an illustrative example of developing and implementing a machine
learning model to inform decision-making in children and family services. In parallel, this study
examines a range of potential biases that may emerge in each phase of model planning, building,
and application. The Bridges Maternal Child Health Network (MCHN) in Orange County,
California, offers home visitation services to families with a newborn child and is interested in
better triaging and serving families in need. Home visiting service is one of preventative and
supportive services that may be used to prevent families with young children from involving
with the CPS system (Eastman et al., 2014). To identify families with a high demand for home
visitation service, this study uses a linked dataset of the MCHN program records, vital birth
records, and Child Welfare System-Case Management System (CWS/CMS) records. A range of
potential biases that are particularly relevant to child welfare practice was addressed and
examined.
50
Case Example: Orange County’s Bridges Maternal Child Health Network (MCHN) Program
The Bridges Maternal Child Health Network (MCHN) collaborates with ten hospitals and
community agencies in Orange County, CA, to ensure parents the support and resources for a
healthy start at birth. The program assesses the risk of mothers who gave birth at the Bridges
hospitals and refers them to a local home visitation program if needed. The Bridges screening
system comprises two levels: prescreening and bedside screening. The prescreening conducts an
initial triage, and the bedside screening further assesses risk for mothers identified as at-risk by
the prescreening. Both Bridges risk scoring tools were informed by the Life Skills Progression, a
tool that home visitors use to measure a parental ability to achieve and maintain a healthy and
satisfying life for families (Design Options for Home Visiting Evaluation, 2011). Practitioners
refer mothers to home visitation services based on their bedside screening score.
The existing prescreening tool
3
assesses all mothers who give birth at Bridges hospitals
and automatically assign each mother a risk score computed based on a few maternal and birth
characteristics: maternal age at birth, primary maternal language, maternal employment status at
birth, marital status, and birth payment method (The details are provided in Table 3.1).
Following the prescreening assessment, hospital coordinators approach mothers with high
prescreening scores and offer a further risk assessment, a bedside screening, to better understand
their needs. Unlike the prescreening assessment, a bedside screening assessment is in person and
comprises more in-depth questions. Coordinators refer mothers with a high bedside screening
score (equal to or higher than 40) to a local agency that provides home visitation service. During
the processes, coordinators use their discretion based on their professional judgment or
3
As of Nov 2021, the Bridges hospitals use the New Prescreening Algorithm to conduct prescreening assessment,
and this was implemented in July 2018. The prescreening tool used as a baseline model in this study is a tool that
was used before this New Prescreening Algorithm.
51
qualitative assessment to decide whether a mother needs to be further assessed through a bedside
screening. Of note, mothers may decline to be further assessed or referred to service.
Home Visiting Services
Early childhood home visiting connects pregnant mothers and new mothers with nurses,
parent educators, and other trained professionals to provide the tools, guidance, and support
necessary to raise a healthy family (Eastman et al., 2014). Home visiting service is viewed as a
critical intervention system that can possibly improve education, health, and socioeconomic
outcomes (Eckenrode, Campa, Luckey, Henderson, C. R., et al., 2010). For example, home
visiting has been known as effective prevention delaying subsequent pregnancy and birth among
participating mothers (Rubin, O’Reilly, Luan, X., Dai, et al., 2011; Yun, Chesnokova, Matone,
Luan, Localio, A. R., et al., 2014); increasing positive and nonaggressive parenting practices
(Dodge et al., 2013; Guterman et al., 2013); lowering government spending per year on food
stamps, Medicaid, Aid to Families with Dependent Children, and Temporary Assistance for
Needy Families (Olds et al., 2010); and beneficial for children’s physical and cognitive
development (Horwath, 2007). It is also found that home visiting services can decrease and delay
chances of child maltreatment. Chaiyachati and colleagues (2018) found that the receipt of home
visiting decreased likelihood of being substantiated for maltreatment by 22% and delayed the
first maltreatment substantiation. Since pre- and post-natal period is critical, it is important to
provide adequate support to children and their families during this period. Providing adequate
support to families with a newborn baby through home visiting can protect and benefit babies
and their families.
52
Goal Setting
First 5 Orange County is dedicated to optimizing the health and development of young
children so that they can reach their full potential (First 5 Orange County, 2021). To promote
proactive and timely identification as well as effective services for families in need, the Bridges
MCHN team wanted to improve the accuracy of their current prescreening tool to better assess
the needs of families while promoting service equity.
Planning
The prescreening tool targets all mothers who give birth at Bridges hospitals and
identifies those who would need a further assessment so that they can be referred to a home
visiting service agency. Since over 20,000 children are born in Bridges hospitals every year,
manually reviewing each family’s birth records and assessing their needs would be taxing and
time-consuming. Within the service horizon of the Bridges MCHN, shown in Figure 3.1, this
study focuses on improving the prescreening risk assessment process by using machine learning.
Figure 3.1: Bridges Maternal Child Health Network Service Flow
Birth in
Bridges hospitals
Prescreening
Risk Assessment
Bedside Screening
Risk Assessment
Mothers whose prescreening risk
score is high and who agreed to
receive a bedside screening.
Referred to Home
Visitation Service
Mothers whose bedside screening risk
score is high and who agreed to be
referred to a home visitation service.
53
Methods
Data
This study used a linked dataset of three kinds of administrative records. The MCHN
program records from 2011-2016 were linked to California vital birth records from 2011 to 2016
and CPS records from 2011 to 2019. The MCHN program records included only those mothers
who received a bedside screening. In other words, the mothers included in the MCHN program
records not only received a high prescreening risk score but also agreed to receive a further risk
assessment, a bedside screening.
The MCHN program data contain service engagement information as well as children
and caregivers’ sociodemographic characteristics documented at birth. The vital birth records,
accessed through the California Department of Public Health following the review and approval
by the state’s Vital Statistics Advisory Committee (California Department of Public Health,
2021), include child and maternal health information (e.g., child’s birth weight), birth
information (e.g., prenatal care, payment method), and caregivers’ sociodemographic
information (e.g., race/ethnicity, education). CPS records were extracted from California’s Child
Welfare Services / Case Management System, which was designed to assist caseworkers
managing individual child welfare cases. Thus, they contain client information along with
documentation of service involvement. Data used for this study were securely hosted at the
Children’s Data Network at the University of Southern California. The use of records was
governed by an active data-sharing agreement with the California Department of Social Services
and strict data security protocols falling under university (USC IRB UP-13-00455) and state (CA
CPHS 13-10-1366) human subject approvals.
54
Study Population
This study focuses on those children born in 8 of 10 Bridges hospitals between 2011 and
2016 (N= 132,216). Two hospitals were excluded from this study for data linkage and quality
issues. The children included in this study make up 53.9% of all children born in Orange County
during the period.
Target Feature
To assess the need for home visitation services, this study defined the target outcome as
whether a child had a substantiated maltreatment allegation in Orange County during the first
three years of their life (1 if substantiated, 0 if not). While maltreatment substantiation may not
be the perfect indicator for families’ needs for home visiting, the fact that a state system
officially identified a child as the victim of abuse and neglect can be a very important proxy for
the families’ circumstances. This study specifically focused on maltreatment referrals received
and substantiated in Orange County, focusing on high-profile mothers who stayed in the county.
Counterfactual Analysis
Counterfactual analyses were conducted to see whether receiving home visiting service
affects the risk of substantiation. This study used a propensity score to match mothers who
received home visitation services to their counterpart mothers who did not receive home
visitation services. Then the average treatment effect on the treated group was estimated using
nearest neighbor matching. The analysis was completed using Stata 17.0.
Predicting Features
This study used two sets of predicting features. The first set of predicting features is
restricted to the information upon which the existing Bridges prescreening tool is based. This
was to examine whether a machine learning model based on the same or similar information with
55
the current prescreening tool can generate a more accurate prediction. In accordance with the
Bridges hospital prescreening model, maternal age and birth payment method were coded. Since
maternal primary language, maternal employment status, and marital status were not documented
in the birth records, maternal birthplace was used as a proxy for maternal primary language,
maternal education was used as a proxy for maternal employment status, and documentation of
paternal information was used as a proxy for marital status (see Table 3.1).
The second set of predicting features extends the first one by adding more information on
sociodemographic characteristics of caregivers and birth. It includes parental education, prenatal
care initiation, maternal residence county, parental age at birth, maternal smoking during
pregnancy, and birth hospital.
Analysis
Assessing the child-level risk of being substantiated for maltreatment allegation within
the first three years of a child’s life was formulated as a supervised binary classification problem.
Two models were trained and tested: Model 1 was trained using only the Bridges predicting
features, and Model 2 was trained including additional features (see Table 3.1). Modeling was
completed in Python 3.7.4. using pandas, numpy, and scikit-learn packages. Visualizations were
conducted using matplotlib and seaborn.
Classification Algorithm
Nonlinear binary classification learners were considered over linear classification
learners for their flexibility and aptitude for nonlinear associations between predicting features
and the target outcome (Hastie et al., 2009). Based upon this consideration, this study chose a
light gradient boosting machine (LightGBM), a gradient boosting framework that uses tree-based
ensemble learning algorithm (Ke et al., 2017). Gradient boosting decision tree (GBDT) has been
56
widely known and used for its efficiency, accuracy, and interpretability (Friedman, 2001). Given
that GBDT’s computational complexities have been raised as a challenge as it becomes very time
consuming when handling big data, Ke and colleagues (2017) proposed a new GBDT
implementation, LightGBM, which can speed up the training process of conventional GBDT by
up to over 20 times while achieving similar accuracy.
Imbalanced Data
Given that 2.7% of children had a substantiated maltreatment allegation within the first
three years of their life, our binary classification problem has an imbalance of approximately 1 to
40. With data imbalance, the model performance tends to be partial towards the majority class in
the dataset (Kaur et al., 2019). To resolve this issue, the synthetic minority over-sampling
technique (SMOTE) was used. The SMOTE oversamples the minority class (i.e., substantiation)
by synthesizing new examples from the existing observations (Chawla et al., 2002). As the
SMOTE did not significantly improve the performance, the data imbalance parameter
(‘is_unbalance = True’) was added to re-balance the weights of the majority class (LightGBM,
n.d.). The parameter increased the contribution of the minority class to the modeling.
Model Training
The data were split into training (70%) and testing (30%) datasets to avoid overfitting to
the training dataset and to measure the performance on previously unseen data. Given the data
imbalance issue, dataset split was done while equally distributing the outcome label. Using a grid
search method, models were trained while being evaluated using 5-fold cross-validation methods
and tuned for multiple sets of parameters (searching over a grid of prespecified values). Given
the data imbalance, models were fitted using ROC-AUC score rather than accuracy.
57
Target
feature
Algorithm
Maternal Age at
Birth
If age is < 17 or >39, add 18
If age is between 17 and 19, add 12
If age is between 20 and 24, add 6
If age is between 25 and 39, add 0
Maternal Age at
Birth
Integer, missing = -1
Maternal/Paternal
Age at Birth
Integer, missing = -1
Maternal Primary
Language
If language is not English, add 2
Otherwise, add 0
Maternal Birth
Place outside of US
Binary (1/0): Mexico, China, Vietnam,
South Korea, Phillippines, India, El
Salvador, Guatemala, Iran, other
Maternal Birth
Place outside of US
Binary (1/0): Mexico, China, Vietnam,
South Korea, Phillippines, India, El
Salvador, Guatemala, Iran, other
Maternal
Employment Status
If disabled/unemployed/unknown, add 2
Otherwise, add 0
Maternal Education
Binary (1/0): no high school, high
school graduate, some college,
gradschool, unknown)
Maternal/Paternal
Education
Binary (1/0): no high school, high
school graduate, some college,
gradschool, unknown)
Marital Status
If single or widowed, add 6
If divorced, add 4
Otherwise, add 0
Paternity
Establishment
Binary (1/0): if father's age is
documented.
Paternity
Establishment
Binary (1/0): if father's age is
documented.
Birth Payment
Method
If private/self-pay or cash, add 3
If Medi-Cal, add 2
Otherwise, add 0
Birth Payment
Method
Binary (1/0): private, Medi-Cal, self-pay,
other government, military, Indian, other,
missing
Birth Payment
Method
Binary (1/0): private, Medi-Cal, self-pay,
other government, military, Indian, other,
missing
Prenatal Care
Initiation
Binary (1/0): first trimester, second
trimester, late
Maternal Residence
County
Binary (1/0): Orange County
Maternal Smoking
Binary (1/0): Self-stated smoking
cigarette during pregnancy
Birth Hospital Binary (1/0): Bridges hospitals
Baseline: Bridges Prescreening Model
Individual parent and infant/toddler progress*
Risk = Age at birth + Primary language + Employment status
+ Marital status + Birth payment method
Predicting
features
Model 1: LGBM with Bridges Features Model 2: LGBM with All Features
Maltreatment substantiation in Orange County within the first
3 years of a child's life
Maltreatment substantiation in Orange County within the first
3 years of a child's life
Light Gradient Boosting Machine algorithm was used to
generate risk scores.
Light Gradient Boosting Machine algorithm was used to
generate risk scores.
Table 3.1: Machine Learning Predicting Features and Bridges Risk Score Factors
58
Performance Metrics
A recall score was used to evaluate model performance to ensure that the prescreening
tool can identify as many high-risk children and families in need as possible. In the context of
this study, recall score indicates the rate of children who were correctly predicted to be at risk
among all children who were observed to have a substantiated maltreatment allegation. Recall
scores were computed for top 10%, 30%, 50%, and 100%. Given that Bridges hospitals can give
approximately one-third of mothers a further assessment (i.e., bedside screening) on average, the
focus was on the recall score at the top 30%.
Baseline Model
To examine model performances, the machine learning models were compared to the
baseline model. The baseline model in this study was defined as the existing Bridges Network
pre-screening tool. It should be noted that the baseline model was designed to assess a
qualitatively defined outcome, a parental ability to achieve and maintain a healthy and satisfying
life for families (Design Options for Home Visiting Evaluation, 2011). Thus, the baseline model
may not show its best performance for the target feature used in this study as it was not
optimized for CPS substantiation. Instead, the current Bridges pre-screening risk assessment tool
aims to measure children’s well-being that is qualitatively defined. In addition, model
performances were compared only for the mothers screened-in for a bedside screening due to
data availability.
59
Results
Descriptive Analysis
Between 2011 and 2016, 245,209 children were born in Orange County, and 53.9%
(N=132,216 ) of them were born in Bridges hospitals. Of the children who were born in Bridges
hospitals, 2.1% had a substantiated maltreatment allegation within the first 3 years of their life in
Orange County. Children who had a substantiated maltreatment allegation were less likely to be
Asian or White, but more likely to be Hispanic, Black, or other races and ethnicities. Children
with a substantiated allegation were more likely to be born to mothers under age 20 and without
high school education. Mothers of these children were more likely to be born in the US or
Mexico, less likely to initiate their prenatal care during the first trimester, more likely to make
their birth payment with public insurance, and less likely to have paternity established at birth.
60
Table 3.2: Maternal and Birth Characteristics and CPS Outcomes of Children
Bridges Hospitals
The Bridges hospitals showed varying numbers of births and bedside screenings between
2011 and 2016. As shown in Table 3.3, the size of births and the number of bedside screening
assessments do not always positively correlate. Hospital A showed the highest number of births
(N=37,040) with the lowest number of bedside screening assessments (9.8%). The number of
All Birth in OC
Birth in Bridges
Hospitals*
Substantiated for
Maltreatment
(N=245,209) (N=132,216) (N=3,608)
col% col% col%
CPS referred 6.7 7.4 100.0
CPS substantiated 2.5 2.7 100.0
Maternal Race and Ethnicity
Asian/PI 21.2 15.0 4.0
Black 1.3 1.0 1.7
Hispanic 45.9 50.7 70.8
Native American 0.1 1.7 0.2
White 29.7 31.5 21.6
Other/missing 1.9 0.1 1.8
Maternal Age at Birth
<20 4.4 5.0 16.0
20-25 17.6 18.4 33.0
26+ 78.0 76.7 51.0
Maternal Education
No High School 15.1 17.9 36.6
High School 84.9 82.1 63.4
Maternal Birth Place
US 55.5 57.3 66.0
Mexico 19.5 22.8 26.3
China 5.4 3.3 0.5
Vietnam 4.7 1.3 0.4
Other/missing 15.0 15.3 6.8
Prenatal Care Initiation
1st trimester 87.2 90.1 75.8
Late/no 12.8 9.9 24.2
Birth Payment
Public 35.7 39.4 70.8
Private/Cash 64.3 60.6 29.2
Paternity
Yes 95.1 94.8 81.5
No 4.9 5.2 18.5
Characteristic
61
children who had a substantiated maltreatment allegation within the first 3 years of their life also
varied across hospitals. Hospital A and Hospital D showed the lowest percentages of
substantiation (1.2%), while Hospital G showed the highest percentage (5.2%).
Table 3.3: Number of Births, Bedside Screening, And Substantiation by Hospital
Risk Scores
The Figure 3.2 shows the distributions of predicted risk scores generated by the two
machine learning models. The first set (Figure 3.2-(a)) compares the distributions of predicted
scores of all children and the Figure 3.2-(b) compares among children whose maltreatment
allegation was substantiated. Compared to Model 1, Model 2 identified more children with low
(<0.03) predicted risk scores among total population.
n col% n row % n row%
Total 132,216 100.0% 42,897 32.4% 3,608 2.7%
Hoag Memorial Hospital Hospital A 37,040 28.0% 3,625 9.8% 435 1.2%
St. Joseph Hospital Hospital B 29,450 22.3% 8,773 29.8% 941 3.2%
Mission Hospital Hospital C 16,709 12.6% 3,711 22.2% 474 2.8%
St. Jude Hospital Hospital D 13,144 9.9% 4,710 35.8% 161 1.2%
Orange County Global Medical Cener Hospital E 12,160 9.2% 8,098 66.6% 545 4.5%
Anaheim Regional Medical Center Hospital F 9,524 7.2% 3,973 41.7% 410 4.3%
Anaheim Global Medical Center Hospital G 7,404 5.6% 5,193 70.1% 388 5.2%
South Coast Global Medical Center Hospital H 6,785 5.1% 4,814 71.0% 254 3.7%
Birth Bedside Screened Substantiated
62
Figure 3.2: Distributions of Predicted Risk Scores
(a) Predicted risk score distributions of all children
(b) Predicted risk score distributions of children who had a substantiated maltreatment allegation within the
first 3 years of their life
Model Performance
As shown in Table 3.4, both model 1 and model 2 outperformed the baseline Bridges
model. When Bridges hospitals keep the current number of mothers screened in for a bedside
risk assessment (around 30%), the baseline model can correctly screen in 46.2% of all mothers
whose children would experience maltreatment substantiation, whereas both machine learning
models could identify 75.3% and 84.1% respectively. In other words, if the Bridges program
used the machine learning models, they could screen in significantly (62.0%-82.0%) more
63
mothers whose children would have a substantiated maltreatment allegation during the next 3
years.
Table 3.4: Recall Scores at k%
Figure 3.3 demonstrates the recall and precision rates across all k% for three models. It
shows that the gap of recall rates between machine learning models and the baseline model
increases for higher k%. On the other hand, the precision rates remain low across all k%; this is
because precision rates (i.e., the percentage of true positives among those who are predicted to be
positive) can’t be so much higher than CPS substantiation rates (2.7%).
Figure 3.3: Recall and Precision Scores at k%
Top k%
Baseline:
Bridges Prescreening
Model 1:
LGBM with Bridges Features
Model 2:
LGBM with All Features
10 0.229 0.396 0.489
30 0.462 0.753 0.841
50 - 0.918 0.951
64
Model Performances by Bridges Network Hospitals
As shown in Table 3.5, the Bridges Network hospitals showed varying levels of
screened-in rates, from 9.8% to 71.0%. Not surprisingly the recall rate of the baseline model was
positively correlated with the percentage of mothers screened in for a bedside screening. One
exception was Hospital F, whose recall rate was only 37.8% even though they conducted a
bedside screening for 41.7% of mothers. Hospitals G and H also showed low recall rates
considering their high number of mothers screened in for a bedside screening.
Both machine learning models significantly improved recall rates across all Bridges
Network hospitals at varying bedside screening-in rates. If Hospital A adopted the LightGBM
models and continues to target mothers of the top 9.8% of risk scores, they could identify almost
twice more mothers whose children would have a substantiated maltreatment allegation.
Table 3.5: Recall Scores at Screen-in% by Hospital
Baseline
Model 1
(Bridges features)
Model 2
(All features)
Total 32.4% 0.458 0.780 0.866
Hospital A 9.8% 0.301 0.513 0.637
Hospital B 29.8% 0.412 0.700 0.784
Hospital C 22.2% 0.432 0.622 0.728
Hospital D 35.8% 0.478 0.845 0.969
Hospital E 66.6% 0.604 0.873 0.930
Hospital F 41.7% 0.378 0.846 0.878
Hospital G 70.1% 0.642 0.881 0.961
Hospital H 71.0% 0.650 0.906 0.941
Recall at k %
Screened-in %
65
Figure 3.4: Recall Scores of Baseline and LightGBM models at k% by Hospital
(a) Baseline model (b) Model 1(LightGBM with Bridges features)
(c) Model 2 (LightGBM with all features)
Note. The baseline model recall rate lines are flat beyond the bedside screening percentages (k%) due to a lack of
data.
Discussion
Using a real-world use case, this study provides an illustrative example of building a
machine learning model examining potential biases that may arise in each step of modeling.
Working with the MCHN program team, a machine learning model was developed to inform and
improve their risk assessment tool, the Bridges pre-screening assessment tool. This research
suggests that, in some settings, machine learning can be used to promote service equity by
66
allowing child and family service agencies to prioritize and allocate more resources to families in
more complex need.
Several findings merge in this study have implications for applying machine learning to
child welfare considering fairness. First, this study underscores the importance of being guided
by child welfare domain knowledge when developing a machine learning model. A widespread
notion of machine learning is that it is a ‘black box’ because its internal logic cannot be
understood or controlled by human users (Gillingham, 2016). Although this is true to some
extent, machine learning comprises multiple steps that require human discretion. Human users
have to identify a problem, ensure that machine learning initiatives with agency priorities are
aligned, define a target feature, select and operationalize predicting features, and deploy a model
in the current workflow. Each step of modeling includes a wide range of decisions that may
significantly influence the fairness of a machine learning model (Barocas & Selbst, 2016a; Dare,
2015; Suresh & Guttag, 2020). Additionally, the potential biases that may arise in the modeling
process may be further complicated by the unique nature of child welfare, such as client
vulnerability, racial disproportionality, and stigma (Coulton et al., 2015; Dare, 2015). To
adequately address and examine algorithmic biases, it is essential to have a nuanced
understanding of the service context.
Second, the results demonstrate that machine learning algorithms can help us identify
more children and families in need by better organizing the information. Compared to the current
Bridges program’s pre-screening tool (i.e., the baseline model), the machine learning models
could identify significantly more children who were observed to have a substantiated
maltreatment allegation during the first three years of their lives. Among the top 30% risk scores,
the base model could identify almost half (46.2%) of all children whose maltreatment allegation
67
was substantiated. In contrast, the LightGBM model, built upon a similar set of information,
successfully identified three-quarters (75.3%) of children whose maltreatment allegations were
substantiated. This suggests that if the Bridges MCHN program used the LightGBM model and
targeted their in-person bedside screening assessment to mothers who were assessed to be in the
top 30% of the risk, they could reach out to three-quarters of mothers whose children would have
a substantiated maltreatment allegation during the next three years.
Improved prediction can lead to a more equitable outcome as it allows for prioritizing and
serving children and families whose needs are more complex. To ensure that prediction
improvement can result in enhanced service equity, it is critical to consider the current service
capacity and workflow (Drake et al., 2020). If the MCHN home visiting agencies cannot serve
the needs of families who were identified to be at high risk due to lacking service capacity, that
would lead to another ethical question; whether it is ethical to have families wait for the service
after being identified to be at high risk of the CPS reporting and substantiation (Vaithianathan et
al., 2017).
Additionally, it should be noted that not all decision-making can be benefitted from a
machine learning approach. Machine learning worked well in this use case partly because the
task was a relatively simple classification. This study was to model the binary outcome that was
clearly defined (i.e., whether or not a child had a substantiated maltreatment allegation during the
first 3 years of the child’s life in Orange County) using a structured dataset extracted from
administrative records sources. Given the size of the data, it would not be possible for humans to
achieve a similar level of accuracy at the same speed as the machine learning models. On the
contrary, for more nuanced decisions, such as diagnosing mild depression based on a client’s
body language and narratives, human decision-makers, particularly trained practitioners, may
68
perform better than machine learning models. Understanding the service goals and the strengths
and weaknesses of machine learning algorithms is essential to using the tool appropriately.
Finally, this study underscores the necessity of advancing the richness and size of data in
the child welfare arena. As shown in Table 3.4, Model 2 with additional features outperformed
Model 1; Model 2 improved the recall scores by 23.4% for the top 10% and 11.7% for the top
30%. Summarized differently, just by adding a few more features (i.e., prenatal care initiation,
maternal residence county, maternal smoking, birth hospital), the same machine learning
algorithm could identify 10-20% more mothers and children who were observed to have a
substantiated maltreatment allegation.
Having more data can also be beneficial for evaluating and mitigating algorithmic biases.
In this research, the target outcome was defined as substantiation of a CPS report during the first
three years of a child’s life. While a CPS substantiation could be a proxy for families’ needs to
some extent, this clearly falls short of capturing various aspects of families’ circumstances. To
improve the assessment tool, more information is needed to operationalize families’ needs and
willingness for service. One way to augment the current child welfare database would be linking
records across public service systems (Putnam-Hornstein et al., 2020). Through record linkage, it
is possible to transform agency-centered data that have been kept in silos into client-centered
data.
Limitations
Findings from this study should be considered in light of study limitations. First, this
study focused on selective algorithmic biases that may arise in machine learning development;
therefore, the list of biases addressed and examined in this study may not be thorough. Second,
the anonymous linked dataset included only those mothers who received a bedside screening, not
69
all of those who gave birth at the Bridges hospitals. This resulted in comparing the machine
learning models to the baseline model only for those who fall within the top 30% of risk scores.
Third, our machine learning models were trained using CPS substantiation as a proxy of
families’ needs. As mentioned in the discussion, this may not capture all families in need.
Additional information would be helpful for accurately assessing families.
70
CHAPTER 4: (Study 3) Intersectional Fairness in Machine Learning Applied to Children
and Family Services
Introduction
As machine learning has been increasingly adopted to inform decision-making in child
welfare, its ethics and fairness have received growing attention. A primary focus of child welfare
literature has been on discussing related ethical issues, particularly the impacts of an algorithmic
intervention on vulnerable communities (Brown et al., 2019; Dare, 2013; Dare & Gambrill,
2017; Drake & Jonson-Reid, 2019). Yet, limited empirical work has been done on evaluating
fairness in machine learning applied to real-world child welfare problems and discussing related
practical implications for child welfare service (Ahn et al., 2021; Chouldechova et al., 2018).
Application of Fairness Measures
The rapidly growing body of machine learning literature has proposed a wide range of
fairness definitions and metrics (Mehrabi et al., 2019). One of the notions of machine learning
fairness that has gained the most prominence over the decade is statistical parity, which requires
a model to show similar performance across membership groups defined by individual attributes,
such as race, gender, and class (Barocas et al., 2019; Barocas & Selbst, 2016b; Corbett-Davies &
Goel, 2018). While statistical notions of fairness may be useful for testing whether a machine
learning model treats individuals differently based on their attributes, they are limited to a
narrow definition of fairness. Even if a model is found to be ‘fair’ based on a statistical definition
of machine learning fairness, it does not necessarily mean that the model would lead us to more
equitable outcomes (Hu & Chen, 2020; Lundgard, 2020; McCradden et al., 2020). Rather, it
71
would be more correct to think that the model satisfies at least one aspect of statistical parity in
terms of model performance.
Given the attributes of statistical definitions of machine learning fairness, it is essential to
use fairness measures appropriately considering how the service is conceptualized, evaluated,
and executed. Depending on service and its direct and indirect impacts on children and families,
fairness should be carefully defined, and measures need to be chosen accordingly. To assist
practitioners with their use of fairness measures, some researchers have provided guidance on
selecting and applying fairness definitions and metrics. Saleiro and colleagues (2019) designed a
“fairness tree,” which navigates the most relevant bias metric given the characteristics of
decisions and the goals of decision-makers. A few software packages and toolkits have also been
provided to assist practitioners with identifying and mitigating machine learning bias using
algorithmic approaches (Bellamy et al., 2018; Galhotra et al., 2017).
Practical Implications of Fairness Analysis
Despite those efforts to enhance the application of fairness measures to real-world
problems, limited empirical work has been done using those measures. In child welfare, only a
few studies have attempted to analyze fairness using statistical fairness measures. Chouldechova
and colleagues (2018) examined the fairness in a machine learning model that assessed the risk
of foster care placement among children reported to the child protection system. They examined
the fairness of the machine learning model using a few statistical parity measures. Ahn and
colleagues (2021) also conducted a fairness analysis on the machine learning model developed to
assess the risk of exiting foster care without having permanency. While both studies have shown
the potential for examining machine learning fairness by using statistically defined fairness
notions, they fell short of discussing the practical implications of their fairness analyses. There is
72
much to be explored regarding defining fairness, selecting appropriate measures, and linking the
analytic results to practice.
The recent work of Drake and colleagues (2020) proposed a practical framework to guide
the ethical application of algorithmic predictive risk modeling in child welfare. They suggest
three core questions to be considered: 1) whether the tool improves accuracy in general and for
subgroups; 2) whether the tool is ethically equivalent or superior to current practice; and 3)
whether necessary evaluative and implementation procedures established before, during, and
following an introduction of the tool. Their framework guides thinking ethical applications of
machine learning approaches by delineating some key areas that warrant rigorous examination.
The first two questions are particularly related to examining the fairness of machine learning
applied to child welfare.
Intersectional Fairness
Statistical definitions of fairness have often been used with a focus on a single attribute,
such as race or gender. This single-axis framework can be misleading, particularly in child
welfare, because many families experience oppression in different ways depending on the
overlapping of their attributes. Therefore, heterogeneity in familial circumstances should be
carefully considered and incorporated into evaluating fairness in machine learning applied to
child welfare. One way to approach this is using the idea of intersectionality. Crenshaw (1989)
introduced intersectionality as a lens for examining societal unfairness. Intersectionality
emphasizes that systems of oppression built into society lead to systematic disadvantages along
intersecting dimensions, which include not only gender but also race, nationality, and
socioeconomic class.
73
Previous research has described efforts to incorporate Crenshaw's (1989) idea of
intersectionality into understanding fairness in machine learning. According to Foulds and
colleagues (2019), intersectional fairness is defined as “regardless of the combination of
protected attributes, the probabilities of the outcomes will be similar.” A few studies have
demonstrated the importance of considering intersectional fairness in machine learning
applications (Foulds et al., 2019b; Morina et al., 2020). For example, Buolamwini and Gebru
(2018) reported that several commercial facial recognition programs that aim to classify gender
suffer from substantial intersectional disparities in their accuracy rates. In their study, it was
demonstrated that those tools showed a lower accuracy in telling a gender for darker-skinned
women than darker-skinned men. Their findings highlighted the need to investigate intersectional
fairness because gender and skin color alone may fall short of grappling with the complete
picture of the distribution of misclassifications. Similarly, in child welfare, intersections of
sensitive attributes should be carefully considered, given that a single-axis framework may easily
overlook the heterogeneity of familial circumstances.
Study 2: Application of Machine Learning to Children and Families Services
Study 2 was informed by the First 5 Orange County, dedicated to improving the health
outcomes of children under age 5. One of their primary prevention programs is the Bridges
Maternal Health Network (MCHN), which offers home visiting services to families with a
newborn. The Bridges MCHN collaborates with ten local hospitals and community agencies to
ensure that parents have support and resources for a healthy start at birth. Among the ten
hospitals, eight hospitals were included in this study.
Using the MCHN program as a real-world use case, Study 2 demonstrated how a
machine learning model could inform decision-making in child welfare. The model triaged
74
families with a newborn for a further in-person assessment (‘bedside screening’) that can lead to
a home visiting service referral. In Study 2, the target feature was defined as whether a child was
reported for maltreatment and the allegation was substantiated during the first 3 years of the
child’s life by the CPS in Orange County. Within the top 30% of the risk, the Bridges pre-
screening tool could identify 46.2% of children whose allegation was substantiated. Machine
learning models could identify significantly more children who were observed to have
maltreatment substantiation. A machine learning model (Model 1) that was trained using a
similar set of information that is currently used by the Bridges pre-screening tool could identify
75.3% of children who experienced substantiation. When additional information about caregivers
was included (Model 2), a machine learning model could identify 84.1% of those children who
had maltreatment substantiation. This suggests that the Bridges Maternal Child Health Network
(MCHN) program could better allocate resources among families based on accurately assessed
risk and improve service equity by using a machine learning approach.
Current Study
Building upon Study 2, this study provides an illustrative example of evaluating fairness
in machine learning to child and family services. To understand the impacts of an algorithmic
decision aid on service equity, this study examines whether the machine learning model
developed in Study 2 treats families differently depending on their attributes rather than their
risk. In Study 2, two machine learning models were developed: Model 1 was trained using a few
maternal and birth characteristics from the Bridges pre-screening tool, and Model 2 used parental
characteristics in addition to the features used by Model 1. Of those two models, this study
focuses on Model 1 because it is more realistic and accessible for the agency as this model does
75
not require additional effort to acquire information from other data sources for operationalization
and implementation.
Firstly, the model performances were computed and compared across varying groups
defined by sociodemographic attributes. Then, the associations between those attributes and the
outcome of the false determination were examined focusing on children who were observed to
experience substantiated maltreatment. Informed by the results, attributes related to the
heightened risk of false classification were identified and examined to test whether the model
fairly treats a group of children who fall at the intersection of those attributes. Additionally, the
practical implications and limitations of the machine learning model were discussed, focusing on
fairness and service equity given the service context.
Methods
Data
This study used an anonymized linked dataset consisting of three data sources: 1) the
MCHN program records from 2011-2016; 2) California vital birth records from 2011-2016; and
3) CPS records from 2011-2019. The MCHN program data included service engagement
information of mothers who received a bedside screening assessment; this is 32.4% of all
mothers who gave birth at the Bridges hospitals. The vital birth records included the
characteristics of the child, parents, and birth. They were accessed through the California
Department of Public Health, following review and approval by the state’s Vital Statistics
Advisory Committee (California Department of Public Health, 2021). Lastly, CPS records,
extracted from California’s Child Welfare Services / Case Management System, included client
information with engagement and service history (e.g., referral dates, maltreatment allegations,
disposition types). The data were securely hosted at the Children’s Data Network at the
76
University of Southern California. The use of records was governed by an active data-sharing
agreement with the California Department of Social Services and strict data security protocols
falling under university (USC IRB UP-13-00455) and state (CA CPHS 13-10-1366) human
subject approvals.
Study Population
This is a population-based study that includes all children (N=132,216) born at eight
Bridges hospitals in Orange County between 2011 and 2016.
Dependent Variable
The dependent variable of this study was defined considering the MCHN program’s
service goals and context. The MCHN program strives to reach out to as many families in need
as possible so that those at high risk can be assessed and referred to community-based voluntary
home visiting service programs. The machine learning model is an initial triage tool to screen for
families who should receive a further assessment (i.e., in-person bedside screening). The model
assessed families’ needs for home visiting using a CPS outcome (i.e., maltreatment
substantiation during the first three years of life) as a proxy for familial circumstances.
The potential harms of this model may surface primarily in two cases: false positives and
false negatives. Firstly, being falsely determined to be at high risk doesn’t seem to expose
families to significant harm because home visiting service is a preventative and supportive
service that can improve the outcomes of children and families (Eckenrode, Campa, Luckey,
Henderson, et al., 2010; Rubin, O’Reilly, Luan, et al., 2011; Yun, Chesnokova, Matone, Luan,
Localio, et al., 2014). When it comes to the potential surveillance bias, researchers have
documented that the receipt of home visiting service does not significantly increase the risk of
CPS involvement (Drake et al., 2017; Chaffin & Bard, 2006). On the other hand, being falsely
77
determined to be at low risk (i.e., false-negative) could be harmful since families would be less
likely to be offered and engage in home visiting services despite their unmet needs. If service
providers failed to provide support for mothers who would end up with a substantiated allegation
of child maltreatment, they would lose an opportunity for prevention at a critical point in family
development.
Therefore, this study focuses on the false-negative cases that the Bridges program would
most like to avoid and where fine-grained data from multiple hospitals serving different
communities in Orange County could be most useful. To examine machine learning fairness
focusing on service equity, the outcome of interest was defined as whether a child was falsely
predicted to be at low risk (bottom 70%) although he or she was later reported and substantiated
for alleged maltreatment within the first three years of life in Orange County. This was binarily
coded (false classification: yes=1, no=0).
Independent Variables
Maternal and birth characteristics used to build the Bridges pre-screening tool and the
machine learning model were included in this study as independent variables. The maternal race
was categorically coded as Asian/Pacific Islanders, Black, White, and other/missing. Mothers of
any race who defined themselves as of Hispanic Origin were coded as Hispanic. Maternal age at
birth (<20, 20-25, >25), maternal education attainment (high school or more, less than high
school), and maternal birthplace (US, Mexico, and other/missing) were also categorically coded.
Birth characteristics were categorically coded: initiation of prenatal care (1
st
trimester, late or no
prenatal care), birth payment method (private insurance or cash, public insurance), and paternity
established at birth (yes, no).
78
Analysis
The distribution of maternal and birth characteristics of children was descriptively
examined. We then stratified the study population to compare the characteristics of children who
were accurately and falsely determined by the baseline model and the machine learning model.
Additionally, the performances of two models were compared for each subgroup defined by
maternal and birth characteristics by using a metric dividing the false-negative rates of the
machine learning by that of the baseline model. To examine whether the likelihood of being
falsely identified to be at low risk varied by maternal and birth characteristics, a logistic
regression model was used. Then, the intersectionality of attributes that may indicate a higher
risk of false-negative determinants was identified. Analyses were completed in Stata version 17.0
and Python 3.7.4. using pandas, NumPy, and scikit-learn packages.
Results
Among children (N=132,216) born at the eight Bridges hospitals between 2011-2016,
2.7% were reported for alleged maltreatment that was substantiated during the first three years of
the child’s life by the CPS in Orange County. Using the Bridges pre-screening tool, the program
successfully identified 47.1% of children whose maltreatment allegation was substantiated
within the top 32.4% of children at risk. In contrast, the machine learning model identified
77.9% of children and their families for whom allegations were substantiated.
Table 4.1 demonstrates the distributions of maternal and birth characteristics of children
who were born at the Bridges hospitals between 2011-2016. The table also presents a subset of
those children whose maltreatment allegation was substantiated, and these children were
stratified by the prediction results of the two models (whether they were accurately predicted to
be at high risk (Ture Positives) or falsely determined to be at low risk (False-negatives)). The
79
results of the Chi-Square test show that children whose maltreatment allegation was
substantiated have distinctive characteristics compared to their counterpart children whose
maltreatment allegation was not substantiated.
Focusing on children whose allegations were substantiated, Table 4.2 displays the row
percentages of children falsely predicted to be at low risk by the baseline model and machine
learning models. Overall, the machine learning model successfully decreased the false-negative
rates by 58%, from 52.9% to 22.1%. To compare false-negative rates between the baseline model
and the machine learning model, a metric was proposed: (B)/(A) indicates the ratio of the false-
negative rate of the machine learning model to that of the baseline model. Compared to the
overall improvement ((B)/(A)=0.42), the model performed better for Black ((B)/(A)=0.19)
mothers than for White ((B)/(A)=0.45) and Asian ((B)/(A)=0.86) mothers. The machine learning
model was less likely to decrease the false-negative rates for mothers who were 26 years old or
older, those born outside the US (Mexico or other/missing), those who received prenatal care
during the first trimester, those who paid for delivery with private insurance or cash, and those
who had paternity established at birth.
80
Table 4.1: Descriptive Summary of Mothers who Gave Birth between 2011-2016
81
Table 4.2: False-negative Rates between the Baseline Model and Machine Learning Model
Informed by the results shown in Table 4.2, a logistic regression analysis was conducted
to examine the associations between false determination and maternal and birth characteristics
among children who experienced substantiated maltreatment. The results showed that among the
children with a substantiation, those born to mothers who were 26 years or older were more than
four times more likely to be falsely predicted (OR=4.66, 99CI% [3.4, 6.5]). Also, children of
mothers who were born in Mexico or other countries were almost ten times more likely to be
falsely predicted to be at low risk (OR=11.24, 99%CI [7.3, 17.3], OR=9.04, 99%CI [5.3, 15.5]
prospectively) compared to their counterpart children of US-born mothers.
82
Table 4.3Multivariate Analysis of False-Negatives among Children with Substantiation
Table 4.4 shows the distributions of maternal and birth characteristics among children
born to Hispanic mothers (n=67,097), both US-born mothers (47.6%) and foreign-born mothers
(52.4%). Children with substantiation showed significantly different distributions of maternal
and birth characteristics depending on the maternal nativity. Compared to US-born mothers,
foreign-born mothers were more likely to have given birth at older ages, initiated prenatal care
during the first trimester, paid birth with public insurance, and had paternity established at birth.
Additionally, foreign-born mothers were less likely to have graduated high school.
83
Table 4.4: Descriptive Summary of Children Born to Hispanic Mothers
Note. The distributions of maternal and birth characteristics among children were significantly different by maternal
nativity (p<.0001).
Figure 4.1 displays the distributions of predicted risk scores generated by the machine
learning model among children born to Hispanic mothers. According to Figures 4.1-(a) and 4.1-
(b), the predicted risk scores among children whose allegation was substantiated tend to be
higher than those of all Hispanic children. Figures 4.1-(c) and 4.1-(d) show that the distributions
of predicted risk scores among children with substantiated allegations differ by the maternal
nativity; children born to US-born mothers showed higher predicted risk scores than their
counterpart children whose mothers were born outside the US.
84
Figure 4.1: Predicted Risk Scores among Children Born to Hispanic Mothers
(a) All children (b) Children with substantiation
Figure 4.2 compares false-negative rates at varying predicted risk score cut-offs. Without
considering maternal nativity (Figure 4.2-(a)), children born to Hispanic and White mothers
showed similar false-negative rates when the risk cutoff is 0.6 or higher. On the contrary,
considering the intersectionality of maternal ethnicity and nativity, the false-negative rates
among children born to foreign-born Hispanic mothers were significantly higher than their
counterpart children born to both US-born Hispanic mothers and White mothers at risk cutoff is
0.6 or higher.
(c) Children born to US-born mothers and whose
allegation was substantiated
(d) Children born to foreign-born mothers and
whose allegation was substantiated
85
Figure 4.2: False-Negative Rates among Children Born to Hispanic and White Mothers
Discussion
Drawing upon the machine learning model developed in Study 2, this study provides an
empirical example of evaluating fairness in machine learning applied to children and family
services. First, this study identified relevant fairness measures considering the service context
and goals of the Bridges MCHN program (Drake et al., 2020; Saleiro et al., 2019). Since the
program wanted to reach out to as many children and families in need as possible, this study
examined the false-negative rates across subgroups defined by maternal and birth characteristics.
Then, the study developed a range of estimates of false-negative risk by maternal and birth
characteristics. Several key findings emerged and are discussed below.
The findings document that the machine learning model generated significantly lower
false-negative rates across all subgroups defined by maternal and birth characteristics compared
to the Bridges pre-screening tool. In other words, by using the machine learning model we can
identify more children who would have substantiated maltreatment allegations and connect them
(b) Considering maternal nativity among Hispanic
mothers
(a) Without considering maternal nativity among
Hispanic mothers
86
to the in-person bedside screening assessment for potential home visiting service referral.
Particularly, the machine learning model was found to decrease the false-negative rates
effectively for those children born to mothers who were Black, were under age 20 years, had not
graduated high school, paid birth with public insurance, and had no paternity established. Given
those characteristics have been associated with the heightened risk of CPS involvement (Baldwin
et al., 2020; Putnam-Hornstein et al., 2021), identifying more children and families in need
among those who are exposed to those risk factors can contribute to service equity.
Additionally, the findings underscore the importance of considering the heterogeneity of
familial circumstances when using a machine learning approach. Unlike previous studies whose
primary focus has been on a single attribute, race/ethnicity (Ahn et al., 2021; Chouldechova et
al., 2018), this study analyzed machine learning fairness considering an array of maternal and
birth characteristics. Although there remains more to be explored in terms of its applications and
implications for children and family services, the notion of intersectional fairness is an important
one. This allows for a more nuanced understanding of algorithmic fairness and service equity
when adopting a machine learning approach to inform decision-making in specific service
sectors, including children and family services.
Finally, the findings show that children born to foreign-born Hispanic mothers and who
had a substantiated allegation were more likely to experience false determination compared to
their counterpart children who were born to US-born mothers. When maternal race/ethnicity
alone was considered, children born to Hispanic mothers, in general, were significantly less
likely to be falsely predicted to be at low risk compared to children born to White mothers.
However, when intersectionality of maternal nativity and race/ethnicity was considered, children
born to foreign-born Hispanic mothers showed significantly higher false-negative rates compared
87
to their counterpart children born to US-born Hispanic mothers. In part, this can be attributed to
the fact that the characteristics of children born to foreign-born Hispanic mothers are
significantly different from those of children born to US-born Hispanic mothers. Consistent with
prior child maltreatment literature that has documented a lower risk of CPS involvement among
children born to foreign-born Hispanic mothers (i.e., “Hispanic paradox”) (Johnson-Motoyama
et al., 2015; Putnam-Hornstein et al., 2013), this study also finds that foreign-born Hispanic
mothers have distinctive characteristics compared to their counterpart mothers born in the US.
Given that machine learning algorithms model risk by prioritizing larger groups over smaller
groups, the unique characteristics of this small group of children who were born to foreign-born
Hispanic mothers could have led to the heightened false-negative rate for the group. Summarized
differently, the machine learning model did not perform well for children born to foreign-born
Hispanic mothers possibly because the way these children’s maternal and birth characteristics
were associated with the risk of substantiation was significantly different from the way other
children’s characteristics were related to the risk.
Interpreting and managing the intersectionality disparities should be carefully approached
considering its direct and indirect impacts on marginalized communities. Previous studies have
proposed varying approaches to resolve disparities in error rates. Bellamy and colleagues (2018)
proposed a list of algorithmic approaches that can address and potentially mitigate biases; these
algorithms intervene in various stages of modeling—pre-processing, in-processing, and post-
processing—to mitigate biases by editing feature values and reweighing model training
examples. Another more straightforward way to mitigate different false-negative rates is setting
different thresholds for subgroups (Kleinberg et al., 2018). For example, we can decrease the
proportion of children falsely predicted to be at low-risk (or false-negative rates) among children
88
born to foreign-born Hispanic mothers by setting their risk threshold lower than that of other
subgroups.
However, framing fairness as a technical problem that can be quickly resolved by using
more advanced or complicated computations while sidestepping more profound questions about
power and equity can be ethically problematic (Hoffmann, 2019; West, 2020). Instead of
algorithmic approaches, Goel (2020) suggested an alternative, consequentialist perspective to
address and manage the limitations of predictive machine learning models, emphasizing that
prediction is not equal to policy. Unlike machine learning which often assumes a closed, simple
system, delivering service requires a broader perspective involving an array of decision-making.
One way to approach intersectional disparities can be engaging with families differently
considering their needs and circumstances.
Limitations
This study has documented how intersectional machine learning fairness can be
addressed and considered in the context of children and family services. The findings document
substantial intersectional disparities in false-negative rates when maternal race/ethnicity and
nativity are considered together. However, the findings of this study should be interpreted, noting
several limitations. Because this study used the machine learning model developed in Study 2,
this study also suffers from the similar limitations that were addressed in Study 2. In addition,
since this study focused on a use case of the MCHN program by the First 5 Orange County, the
results may not be generalizable to other cases. Moreover, this study is limited to examining the
intersection of maternal race/ethnicity and nativity, leaving other intersectional disparities largely
unknown.
89
CHAPTER 5: Implications and Future Directions
Major Findings
This dissertation advances the understanding of applying machine learning to child
welfare in a fair and equitable manner by using qualitative and quantitative analyses. Organized
as a three-paper project, this dissertation began by exploring decision-making experiences among
child welfare workers and their perspectives on fairness. Chapters 3 and 4 assess a real-world use
case, providing an illustrative example of building and evaluating a machine learning model
focusing on fairness. Key findings from each chapter are summarized below.
Chapter 2 (Study 1) used focus groups to explore child welfare workers’ decision-making
experiences in the child protection system and their perspectives on fairness. This study involved
child welfare workers who had worked at a public child welfare agency in California for at least
six months. Several important themes emerged from the thematic analysis: (1) roles and
responsibilities of child welfare workers in both emergency response and continuing service; (2)
unique characteristics of decision-making in child welfare, (3) factors that complicate decision-
making, (4) child welfare workers’ perspectives on fairness, (5) threats to fairness, and (6)
suggestions to improve fairness in decision-making. Participating child welfare workers shared
how their decision-making involves an extensive range of information beyond the scope of data
currently collected and managed by the child welfare systems. In terms of fairness, they said that
“promoting equitable outcomes among children and families whose circumstances may widely
vary” was the best definition of fairness from their point of view. The findings underscore the
complexity of decision-making in child welfare and the importance of carefully considering child
welfare specific values, context, restrictions, and workflow when using machine learning.
90
Chapter 3 (Study 2) developed machine learning models to identify families in high need
of home visiting service, in response to the question of the First 5 Orange County’s Bridges
Maternal Child Health Network (MCHN) program. The chapter provided a thorough literature
review on potential biases that may emerge when applying machine learning to children and
family services. Examining relevant algorithmic biases, two machine learning models were
developed using maltreatment substantiation during the first three years of the child’s life as a
proxy measure for the family’s need for home visiting service. Model 1 was trained on the
information currently used by the MCHN pre-screening program, and Model 2 included
additional indicators of familial circumstances derived from birth records. Among the top 30% of
children the models identified as high risk, the Bridges pre-screening tool, or the baseline model,
identified 46.2% of children who were observed to experience substantiated allegations of child
maltreatment. The performances of both Models 1 and 2 were substantially improved compared
to the Bridges pre-screening tool, with Model 1 identifying 75.3% and Model 2 identifying
84.1% of children whose allegations were substantiated. Although the eight Bridges hospitals
differed in respect to the number of births, families who received bedside screening, and children
whose allegations were substantiated, the results showed that all of the hospitals could
significantly improve their recall rates by adopting a machine learning approach.
Chapter 4 (Study 3) evaluated one of the machine learning models (Model 1) considering
intersectional fairness. Drawing upon the MCHN program’s service goals and priorities, the
study focused on children who had been falsely predicted to be at low risk despite having a
substantiated maltreatment allegation during the first three years of the child’s life. The findings
documented that the machine learning models performed differently for varying subgroups
defined by maternal and birth characteristics. The model was found to be more effective for
91
identifying children at high risk among those born to mothers who were Black, were under age
20 at birth, and had no paternity established. On the contrary, the model was not able to capture
at-risk children if they were born to mothers born outside the US. Given that most foreign-born
mothers were Hispanic, a descriptive analysis was conducted focusing on children born to
Hispanic mothers.
The findings suggested that children born to foreign-born Hispanic mothers have
significantly different distributions of maternal and birth characteristics than their peers born to
US-born Hispanic mothers regardless of their CPS substantiation outcomes. Informed by the
finding, intersectional fairness was examined considering both maternal race/ethnicity and
nativity, and substantive intersectional disparities were observed in false-negative rates. When
the risk score cutoff is 0.6 or higher, children born to foreign-born Hispanic mothers showed
significantly higher false-negative rates than children born to White mothers and those born to
US-born Hispanic mothers.
Implications for Policy and Practice
This work has several important implications for policy planning and service
coordination targeting children and families in need. Firstly, machine learning approaches can
help improve practice in child and family service agencies by allowing for more timely service
engagement and equitable access to services. This research demonstrated that local service
agencies can leverage machine learning approaches to inform their decision-making even
without additional effort to obtain data from other sources because machine learning algorithms
are capable of sophisticated data curation. Moreover, machine learning offers insights into
equitable resource allocation by proactively identifying and prioritizing children and families
with more complex needs who may benefit from voluntary services such as home visiting
92
programs. While machine learning may be useful to inform decision-making in various kinds of
child and family service settings, its use should be carefully considered depending on the
problem that needs to be resolved. For example, a classification algorithm like the one used in
this study can be useful for predictive risk assessment problems with a clearly and quantitatively
defined outcome, but the same algorithm may not work for more nuanced decision-making that
involves qualitative information.
While machine learning often assumes a simple, closed system, in reality, delivering
services involves complex, non-linear processes that may have direct and indirect disparate
impacts on individuals (Ahmad et al., 2020; McCradden et al., 2020). Expert knowledge specific
to the domain and service sector is therefore critical when applying machine learning to child
welfare. Of course, domain expertise itself can not automatically guarantee bias-free algorithmic
intervention, but it is necessary for the ethical use of machine learning. Domain knowledge can
allow us to understand the most pressing matters, desired outcomes, and characteristics of
marginalized communities (Drake & Jonson-Reid, 2018; Rajkomar et al., 2018). It can also
provide information about related ethical concerns and guide the integration of an algorithmic
tool into clinical workflows considering both direct and indirect impacts on individuals
(Rajkomar et al., 2018). Thus, machine learning models should be built, evaluated, and
implemented considering the service context, workflow, values, and organizational culture.
Involving child welfare professionals in machine learning discourses can help us better align
machine learning initiatives with service priorities.
To encourage the participation of social welfare professionals in the conversation of
machine learning applied to child welfare, it is essential to listen to the needs and concerns of
child welfare professionals. Their insights and expertise should be the foundation for machine
93
learning applications considering fairness and equity. To help the child welfare arena move
forward with establishing open-source, non-proprietary algorithmic intervention, it is necessary
to empower child welfare service agencies by recognizing their data ownership and enhancing
their data literacy.
Implications for Research
This study documents the unique characteristics of decision-making in child welfare. It
also provides an illustrative example of developing and evaluating a machine learning model
applied to child welfare considering relevant algorithmic biases. However, these findings also
point to several areas that remain largely unknown. First, this study has discussed the unique
characteristics and complexity of decision-making in child welfare and highlighted the
importance of domain expertise in machine learning. Machine learning applied to child welfare
should be designed and implemented based on the understanding of preceding and consequent
events that may have direct and indirect impacts on individuals involved in decision-making.
Yet, there is much to be explored in terms of implementing a machine learning model in the
current child welfare system with a consideration of a wide range of domain-specific factors,
such as organizational culture, workflows, values, and contexts.
Second, we addressed and examined algorithmic biases that may surface in each element
of machine learning, underscoring the need for establishing a practical framework to evaluate
fairness in machine learning applied to child welfare. Recently, Drake and colleagues (2020)
have proposed a practical framework for the use of predictive risk modeling in child welfare, but
their work focuses on broadly defined ethics without offering practical approaches in the child
welfare setting. There is a need to operationalize the core values of child welfare and incorporate
them into the building and evaluating machine learning fairness. Researchers can help by
94
developing the notions of fairness that can be used by child welfare researchers and professionals
who would like to understand and evaluate the fairness of machine learning.
Third, this study lays a foundation for future research focused on considering and
incorporating human factors into machine learning building and evaluation. Understanding the
human factors that are at play when humans interact with algorithmic tools can help make
machine learning tools more effective and more useful for a broader range of people, both the
professionals who use the tools and the families and children who are most likely to be affected
by algorithmic intervention (Saxena et al., 2020). Despite some efforts to understand the
perception of field practitioners and families on machine learning intervention (Brown et al.,
2019), research on human interactions with machine learning tools is still in the initial stages and
needs further exploration.
Conclusion
With recent advances in data technologies and recognition of the size and richness of data
in the public sector, increasing attention has been paid to using machine learning to inform
decision-making in child welfare. Machine learning has shown unprecedented ability to process
high-dimensional data and offered insights into assessing and serving the need of children and
families. That said, machine learning suffers from a wide range of social and data biases just like
any other data science tool, and thus should be used with caution. Since machine learning is
limited to computing the likelihood of an event based on a restricted sample dataset, its
prediction results should be carefully used to inform child and family service. As delivering
services to children and families in need involve complex, non-linear processes machine learning
results should be interpreted and applied guided by child welfare domain knowledge.
95
References
Ahmad, M. A., Patel, A., Eckert, C., Kumar, V., & Teredesai, A. (2020). Fairness in Machine
Learning for Healthcare. Proceedings of the 26th ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining, 3529–3530.
https://doi.org/10.1145/3394486.3406461
Ahn, E., Gil, Y., & Putnam-Hornstein, E. (2021). Predicting youth at high risk of aging out of
foster care using machine learning methods. Child Abuse & Neglect, 117, 105059.
https://doi.org/10.1016/j.chiabu.2021.105059
Allegheny County Department of Human Services. (2019, May 1). Developing Predictive Risk
Models to Support Child Maltreatment Hotline Screening Decisions. Allegheny County
Analytics. https://www.alleghenycountyanalytics.us/index.php/2019/05/01/developing-
predictive-risk-models-support-child-maltreatment-hotline-screening-decisions/
Amrit, C., Paauw, T., Aly, R., & Lavric, M. (2017). Identifying child abuse through text mining
and machine learning. Expert Systems with Applications, 88, 402–418.
https://doi.org/10.1016/j.eswa.2017.06.035
Anand, A., Pugalenthi, G., Fogel, G. B., & Suganthan, P. N. (2010). An approach for
classification of highly imbalanced data using weighting and undersampling. Amino
Acids, 39(5), 1385–1391. https://doi.org/10.1007/s00726-010-0595-2
Baldwin, H., Biehal, N., Allgar, V., Cusworth, L., & Pickett, K. (2020). Antenatal risk factors for
child maltreatment: Linkage of data from a birth cohort study to child welfare records.
Child Abuse & Neglect, 107, 104605. https://doi.org/10.1016/j.chiabu.2020.104605
Barocas, S., Hardt, M., & Narayanan, A. (2019). Fairness and Machine Learning.
fairmlbook.org. www.fairmlbook.org
96
Barocas, S., & Selbst, A. D. (2016a). Big Data’s Disparate Impact (SSRN Scholarly Paper ID
2477899). Social Science Research Network. https://doi.org/10.2139/ssrn.2477899
Barocas, S., & Selbst, A. D. (2016b). Big Data’s Disparate Impact. SSRN Electronic Journal.
https://doi.org/10.2139/ssrn.2477899
Bellamy, R. K. E., Dey, K., Hind, M., Hoffman, S. C., Houde, S., Kannan, K., Lohia, P.,
Martino, J., Mehta, S., Mojsilovic, A., Nagar, S., Ramamurthy, K. N., Richards, J., Saha,
D., Sattigeri, P., Singh, M., Varshney, K. R., & Zhang, Y. (2018). AI Fairness 360: An
Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic
Bias. ArXiv:1810.01943 [Cs]. http://arxiv.org/abs/1810.01943
Benbenishty, R., & Fluke, J. D. (2020). Frameworks and Models in Decision-Making and
Judgment in Child Welfare and Protection. In R. Benbenishty & J. D. Fluke, Decision-
Making and Judgment in Child Welfare and Protection (pp. 3–26). Oxford University
Press. https://doi.org/10.1093/oso/9780190059538.003.0001
Berk, R. A., & Bleich, J. (2013). Statistical Procedures for Forecasting Criminal Behavior: A
Comparative Assessment. Criminology & Public Policy, 12(3), 513–544.
https://doi.org/10.1111/1745-9133.12047
Berrick, J. D. (2018). The impossible imperative: Navigating the competing principles of child
protection. Oxford University Press.
Berrick, J. D., Peckover, S., Pösö, T., & Skivenes, M. (2015). The formalized framework for
decision-making in child protection care orders: A cross-country analysis. Journal of
European Social Policy, 25(4), 366–378. https://doi.org/10.1177/0958928715594540
Binns, R. (2018). Fairness in Machine Learning: Lessons from Political Philosophy. Proceedings
of Machine Learning Research, 81, 149–159.
97
Brown, A., Chouldechova, A., Putnam-Hornstein, E., Tobin, A., & Vaithianathan, R. (2019).
Toward Algorithmic Accountability in Public Services: A Qualitative Study of Affected
Community Perspectives on Algorithmic Decision-making in Child Welfare Services.
Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems -
CHI ’19, 1–12. https://doi.org/10.1145/3290605.3300271
Buolamwini, J., & Gebru, T. (2018). Gender Shades: Intersectional Accuracy Disparities in
Commercial Gender Classification. In S. A. Friedler & C. Wilson (Eds.), Proceedings of
the 1st Conference on Fairness, Accountability and Transparency (Vol. 81, pp. 77–91).
PMLR. https://proceedings.mlr.press/v81/buolamwini18a.html
Cage, R., & Salus, M. (2010). The role of first responders in child maltreatment cases: Disaster
and nondisaster situations. https://www.childwelfare.gov/pubPDFs/firstresponders.pdf
California Department of Public Health. (2021). VSB Vital Statistics Advisory Committee
Meeting Information. https://www.cdph.ca.gov/Programs/CHSI/Pages/Vital-Statistics-
Advisory-Committee-Meeting-Information.aspx
Capatosto, K. (2017). Foretelling the future: A critical perspective on the use of predictive
analytics in child welfare. The Ohio State University, Kirwan Institute for the Study of
Race and Ethnicity. http://kirwaninstitute.osu.edu/wp-content/uploads/2017/05/ki-
predictive-analytics.pdf
Carvalho, D. V., Pereira, E. M., & Cardoso, J. S. (2019). Machine Learning Interpretability: A
Survey on Methods and Metrics. Electronics, 8(8), 832.
https://doi.org/10.3390/electronics8080832
Chadwick Center and Chapin Hall. (2018). Making the Most of Predictive Analytics: Responsive
and Innovative Uses in Child Welfare Policy and Practice. Intersection of Research and
98
Policy. https://www.chapinhall.org/wp-content/uploads/Making-the-Most-of-Predictive-
Analytics.pdf
Chaffin, M., & Bard, D. (2006). Impact of Intervention Surveillance Bias on Analyses of Child
Welfare Report Outcomes. Child Maltreatment, 11(4), 301–312.
https://doi.org/10.1177/1077559506291261
Chaiyachati, B. H., Gaither, J. R., Hughes, M., Foley-Schain, K., & Leventhal, J. M. (2018).
Preventing child maltreatment: Examination of an established statewide home-visiting
program. Child Abuse & Neglect, 79, 476–484.
https://doi.org/10.1016/j.chiabu.2018.02.019
Chan, H., Tran-Thanh, L., Wilder, B., Rice, E., Vayanos, P., & Tambe, M. (2018). Utilizing
Housing Resources for Homeless Youth Through the Lens of Multiple Multi-
Dimensional Knapsacks. Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics,
and Society, 41–47. https://doi.org/10.1145/3278721.3278757
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic
Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321–
357. https://doi.org/10.1613/jair.953
Chenot, D. (2011). The Vicious Cycle: Recurrent Interactions Among the Media, Politicians, the
Public, and Child Welfare Services Organizations. Journal of Public Child Welfare, 5(2–
3), 167–184. https://doi.org/10.1080/15548732.2011.566752
Child Welfare Information Gateway. (n.d.-a). Structured Decision-Making. Retrieved August 30,
2021, from
https://www.childwelfare.gov/topics/systemwide/assessment/approaches/structured-
decision-making/
99
Child Welfare Information Gateway. (n.d.-b). Title IV-E Training Programs. Retrieved October
2, 2021, from
https://www.childwelfare.gov/topics/management/workforce/preparation/education/titleI
VE/
Chouldechova, A. (2017). Fair Prediction with Disparate Impact: A Study of Bias in Recidivism
Prediction Instruments. Big Data, 5(2), 153–163. https://doi.org/10.1089/big.2016.0047
Chouldechova, A., Benavides-Prado, D., Fialko, O., & Vaithianathan, R. (2018). A case study of
algorithm-assisted decision making in child maltreatment hotline screening decisions. In
S. A. Friedler & C. Wilson (Eds.), Proceedings of the 1st Conference on Fairness,
Accountability and Transparency (Vol. 81, pp. 134–148). PMLR.
http://proceedings.mlr.press/v81/chouldechova18a.html
Chouldechova, A., & Roth, A. (2018). The Frontiers of Fairness in Machine Learning.
ArXiv:1810.08810 [Cs, Stat]. http://arxiv.org/abs/1810.08810
Church, C. E., & Fairchild, A. J. (2017). In Search of a Silver Bullet: Child Welfare’s Embrace
of Predictive Analytics. Juvenile and Family Court Journal, 68(1), 67–81.
https://doi.org/10.1111/jfcj.12086
Corbett-Davies, S., & Goel, S. (2018). The Measure and Mismeasure of Fairness: A Critical
Review of Fair Machine Learning. ArXiv:1808.00023 [Cs].
http://arxiv.org/abs/1808.00023
Coston, A., Mishler, A., Kennedy, E. H., & Chouldechova, A. (2020). Counterfactual Risk
Assessments, Evaluation, and Fairness. ArXiv:1909.00066 [Cs, Stat].
http://arxiv.org/abs/1909.00066
100
Coulton, C., Goerge, R., Putnam-Hornstein, E., & de Hann, B. (2015). Harnessing Big Data for
Social Good: A Grand Challenge for Social Work.
Crea, T. M., & Berzin, S. C. (2009). Family Involvement in Child Welfare Decision-Making:
Strategies and Research On Inclusive Practices. Journal of Public Child Welfare, 3(3),
305–327. https://doi.org/10.1080/15548730903129970
Crenshaw, K. (1989). Demarginalizing the Intersection of Race and Sex: Black Feminist Critique
of Antidiscrimination Doctrine, Feminist Theory and Antiracist Politics. University of
Chicago Legal Forum, 139–168.
Cuccaro-Alamin, S., Foust, R., Vaithianathan, R., & Putnam-Hornstein, E. (2017). Risk
assessment and decision making in child protective services: Predictive risk modeling in
context. Children and Youth Services Review, 79, 291–298.
https://doi.org/10.1016/j.childyouth.2017.06.027
Dare, T. (2013). Predictive risk modeling and child maltreatment: An ethical review.
https://csda.aut.ac.nz/__data/assets/pdf_file/0016/11923/00-predicitve-risk-modelling-
and-child-maltreatment-an-ethical-review.pdf
Dare, T. (2015). The Ethics of Predictive Risk Modeling. In Challenging Child Protection.
Jessica Kingsley Publishers.
Dare, T., & Gambrill, E. (2017). Ethical Analysis: Predictive Risk Models at Call Screening for
Allegheny County. http://www.alleghenycountyanalytics.us/wp-
content/uploads/2017/04/Developing-Predictive-Risk-Models-package-with-cover-1-to-
post-1.pdf
Dawes, R., Faust, D., & Meehl, P. (1989). Clinical versus actuarial judgment. Science,
243(4899), 1668–1674. https://doi.org/10.1126/science.2648573
101
de Haan, I., & Connolly, M. (2014). Another Pandora’s box? Some pros and cons of predictive
risk modeling. Children and Youth Services Review, 47, 86–91.
https://doi.org/10.1016/j.childyouth.2014.07.016
Design Options for Home Visiting Evaluation. (2011). Life Skills Progression Brief: Information
and Guidelines for Use in Meeting MIECHV Benchmarks.
https://www.mdrc.org/sites/default/files/img/LSP_Brief.pdf
Dettlaff, A. J., & Boyd, R. (2020). Racial Disproportionality and Disparities in the Child Welfare
System: Why Do They Exist, and What Can Be Done to Address Them? The ANNALS of
the American Academy of Political and Social Science, 692(1), 253–274.
https://doi.org/10.1177/0002716220980329
Dickert, N., & Grady, C. (1999). What’s the Price of a Research Subject? Approaches to
Payment for Research Participation. New England Journal of Medicine, 341(3), 198–203.
https://doi.org/10.1056/NEJM199907153410312
Dodge, K. A., Goodman, W. B., Murphy, R. A., O’Donnell, K., & Sato, J. (2013). Randomized
controlled trial of universal postnatal nurse home visiting: Impact on emergency care.
Pediatrics, 132, S140–S146. https://doi.org/10.1542/peds.2013-1021M
Donahue, K., & Kleinberg, J. (2020). Fairness and utilization in allocating resources with
uncertain demand. Proceedings of the 2020 Conference on Fairness, Accountability, and
Transparency, 658–668. https://doi.org/10.1145/3351095.3372847
Drake, Jonson-Reid, & Kim. (2017). Surveillance Bias in Child Maltreatment: A Tempest in a
Teapot. International Journal of Environmental Research and Public Health, 14(9), 971.
https://doi.org/10.3390/ijerph14090971
102
Drake, B., & Jonson-Reid, M. (2018). If We Had a Crystal Ball, Would We Use It? Pediatrics,
141(2), e20173469. https://doi.org/10.1542/peds.2017-3469
Drake, B., & Jonson-Reid, M. (2019). Administrative Data and Predictive Risk Modeling in
Public Child Welfare: Ethical Issues Relating to California.
https://www.semanticscholar.org/paper/Administrative-Data-and-Predictive-Risk-
Modeling-in-Drake-Jonson-Reid/ff6f5b52cc5894379143e531549e33ad9a2c614a#paper-
header
Drake, B., Jonson-Reid, M., Ocampo, M. G., Morrison, M., & Dvalishvili, D. (Daji). (2020). A
Practical Framework for Considering the Use of Predictive Risk Modeling in Child
Welfare. The ANNALS of the American Academy of Political and Social Science, 692(1),
162–181. https://doi.org/10.1177/0002716220978200
Dwork, C., Hardt, M., Pitassi, T., Reingold, O., & Zemel, R. (2012). Fairness through awareness.
Proceedings of the 3rd Innovations in Theoretical Computer Science Conference on -
ITCS ’12, 214–226. https://doi.org/10.1145/2090236.2090255
Eastman, A. L., Hoonhout, J., & Putnam-Hornstein, E. (2014). Newborn Home Visiting
Programs: A Scan of Services and Data.
Eckenrode, J., Campa, M., Luckey, D. W., Henderson, C. R., Jr., Cole, R., Kitzman, H., ..., &
Olds, D. (2010). Long-term effects of prenatal and infancy nurse home visitation on the
life course of youths: 19-year. Archives of Pediatrics & Adolescent Medicine, 164, 9–15.
https://doi.org/10.1001/archpediatrics.2009.240
Eckenrode, J., Campa, M., Luckey, D. W., Henderson, C. R., Jr., Cole, R., Kitzman, H., .., &
Olds, D. (2010). Long-term effects of prenatal and infancy nurse home visitation on the
103
life course of youths: 19-year. Archives of Pediatrics & Adolescent Medicine, 164, 9–15.
https://doi.org/10.1001/archpediatrics.2009.240
Elgin, D. J. (2018). Utilizing predictive modeling to enhance policy and practice through
improved identification of at-risk clients: Predicting permanency for foster children.
Children and Youth Services Review, 91, 156–167.
https://doi.org/10.1016/j.childyouth.2018.05.030
English, D. J., Brandford, C. C., & Coghlan, L. (2000). Data-based organizational change: The
use of administrative data to improve child welfare programs and policy. Child Welfare,
79(5), 499–515.
Etehad, M., & Winton, R. (2017, March). 4 L.A. County social workers to face trial in horrific
death of 8-year-old boy. Los Angeles Times. https://www.latimes.com/local/lanow/la-me-
ln-social-worker-charges-20170320-story.html
Eubanks, V. (2019). Automating inequality: How hight-tech tools profile, police, and punish the
poor.
First 5 Orange County. (2021). 2021-2025 Strategic Plan. First 5 Orange County.
http://occhildrenandfamilies.com/wp-content/uploads/2021/04/Strategic-Plan-2021.pdf
Fluke, J., Jones Harden, B., Jenkins, M., & Ruehrdanz, A. (2010). Research synthesis on child
welfare: Disproportionality and disparities. American Humane Association.
Fluke, J., López, M. L., Benbenishty, R., Knorth, E. J., & Baumann, D. (2020a). Decision
making and judgement in child welfare and protection: Theory, research, and practice.
Oxford University Press.
104
Fluke, J., López, M. L., Benbenishty, R., Knorth, E. J., & Baumann, D. (2020b). Decision
making and judgement in child welfare and protection: Theory, research, and practice.
Oxford University Press.
Foulds, J., Islam, R., Keya, K. N., & Pan, S. (2019a). An Intersectional Definition of Fairness.
ArXiv:1807.08362 [Cs, Stat]. http://arxiv.org/abs/1807.08362
Foulds, J., Islam, R., Keya, K. N., & Pan, S. (2019b). An Intersectional Definition of Fairness.
ArXiv:1807.08362 [Cs, Stat]. http://arxiv.org/abs/1807.08362
Friedler, S. A., Scheidegger, C., & Venkatasubramanian, S. (2021). The (Im)possibility of
fairness: Different value systems require different mechanisms for fair decision making.
Communications of the ACM, 64(4), 136–143. https://doi.org/10.1145/3433949
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The
Annals of Statistics, 29(5). https://doi.org/10.1214/aos/1013203451
Galhotra, S., Brun, Y., & Meliou, A. (2017). Fairness testing: Testing software for
discrimination. Proceedings of the 2017 11th Joint Meeting on Foundations of Software
Engineering, 498–510. https://doi.org/10.1145/3106237.3106277
Gillingham, P. (2016). Predictive Risk Modelling to Prevent Child Maltreatment and Other
Adverse Outcomes for Service Users: Inside the ‘Black Box’ of Machine Learning.
British Journal of Social Work, 46(4), 1044–1058. https://doi.org/10.1093/bjsw/bcv031
Gillingham, P. (2019). Can Predictive Algorithms Assist Decision‐Making in Social Work with
Children and Families? Child Abuse Review, 28(2), 114–126.
https://doi.org/10.1002/car.2547
Gilpin, L. H., Bau, D., Yuan, B. Z., Bajwa, A., Specter, M., & Kagal, L. (2018). Explaining
Explanations: An Overview of Interpretability of Machine Learning. 2018 IEEE 5th
105
International Conference on Data Science and Advanced Analytics (DSAA), 80–89.
https://doi.org/10.1109/DSAA.2018.00018
Glaberson, S. K. (2019). Coding over the cracks: Predictive analytics and child protection.
Fordham Urban Law Journal, 46(2), 307–363.
Goel, S. (2020, August). Designing Equitable Risk Models for Lending and Beyond.
https://www.youtube.com/watch?v=nk47RS3p4Xc
Google Developers. (n.d.). Imbalanced Data. Retrieved September 7, 2021, from
https://developers.google.com/machine-learning/data-prep/construct/sampling-
splitting/imbalanced-data
Grgić-Hlača, N., Zafar, M. B., Gummadi, K. P., & Weller, A. (2016). The Case for Process
Fairness in Learning: Feature Selection for Fair Decision Making. In Symposium on
Machine Learning and the Law at the 29th Conference on Neural Information Processing
Systems., 11.
Guterman, N. B., Tabone, J. K., Bryan, G. M., Taylor, C. A., Napoleon-Hanger, C., & Banman,
A. (2013). Examining the effectiveness of home-based parent aide services to reduce risk
for physical child abuse and neglect: Six-month findings from a randomized clinical trial.
Child Abuse and Neglect, 37, 566–577. https://doi.org/10.1016/j.chiabu.2013.03.006
Hardt, M., Price, E., & Srebro, N. (2016). Equality of Opportunity in Supervised Learning.
ArXiv:1610.02413 [Cs]. http://arxiv.org/abs/1610.02413
Harris, N. (1987). Defensive Social Work. The British Journal of Social Work.
https://doi.org/10.1093/oxfordjournals.bjsw.a055315
106
Hashimoto, T. B., Srivastava, M., Namkoong, H., & Liang, P. (2018). Fairness Without
Demographics in Repeated Loss Minimization. ArXiv:1806.08010 [Cs, Stat].
http://arxiv.org/abs/1806.08010
Hastie, T., Tibshirani, R., & Friedman, J. H. (2009). The elements of statistical learning: Data
mining, inference, and prediction (2nd ed). Springer.
He, H., & Garcia, E. A. (2009). Learning from Imbalanced Data. IEEE Transactions on
Knowledge and Data Engineering, 21(9), 1263–1284.
https://doi.org/10.1109/TKDE.2008.239
Hoffmann, A. L. (2019). Where fairness fails: Data, algorithms, and the limits of
antidiscrimination discourse. Information, Communication & Society, 22(7), 900–915.
https://doi.org/10.1080/1369118X.2019.1573912
Holstein, K., Wortman Vaughan, J., Daumé, H., Dudik, M., & Wallach, H. (2019). Improving
Fairness in Machine Learning Systems: What Do Industry Practitioners Need?
Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 1–
16. https://doi.org/10.1145/3290605.3300830
Horwath, J. (2007). Child neglect: Identification and assessment. Palgrave Macmillan.
Hu, L., & Chen, Y. (2020). Fair classification and social welfare. Proceedings of the 2020
Conference on Fairness, Accountability, and Transparency, 535–545.
https://doi.org/10.1145/3351095.3372857
Johnson, W. (2004). Effectiveness of California’s child welfare structured decision making
(SDM) model: A prospective study of the validity of the California Family Risk
Assessment. Children’s Research Center.
107
Johnson-Motoyama, M., Putnam-Hornstein, E., Dettlaff, A. J., Zhao, K., Finno-Velasquez, M.,
& Needell, B. (2015). Disparities in Reported and Substantiated Infant Maltreatment by
Maternal Hispanic Origin and Nativity: A Birth Cohort Study. Maternal and Child
Health Journal, 19(5), 958–968.
Kaur, H., Pannu, H. S., & Malhi, A. K. (2019). A Systematic Review on Imbalanced Data
Challenges in Machine Learning: Applications and Solutions. ACM Computing Surveys,
52(4), 1–36. https://doi.org/10.1145/3343440
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (2017).
LightGBM: A Highly Efficient Gradient Boosting Decision Tree. 9.
http://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-
tree.pdf
Keddell, E. (2015). The ethics of predictive risk modelling in the Aotearoa/New Zealand child
welfare context: Child abuse prevention or neo-liberal tool? Critical Social Policy, 35(1),
69–88. https://doi.org/10.1177/0261018314543224
Keddell, E. (2019). Algorithmic Justice in Child Protection: Statistical Fairness, Social Justice
and the Implications for Practice. Social Sciences, 8(10), 281.
https://doi.org/10.3390/socsci8100281
Kim, H. (2011). Job conditions, unmet expectations, and burnout in public child welfare
workers: How different from other social workers? Children and Youth Services Review,
33(2), 358–367. https://doi.org/10.1016/j.childyouth.2010.10.001
Kim, H., & Drake, B. (2018). Child maltreatment risk as a function of poverty and race/ethnicity
in the USA. International Journal of Epidemiology, 47(3), 780–787.
https://doi.org/10.1093/ije/dyx280
108
Kim, H., Wildeman, C., Jonson-reid, M., & Drake, B. (2017). Lifetime Prevalence of
Investigating Child Maltreatment Among US Children. 107(2), 274–281.
https://doi.org/10.2105/AJPH.2016.303545
Kitzinger, J. (1995). Qualitative Research: Introducing focus groups. BMJ, 311(7000), 299–302.
https://doi.org/10.1136/bmj.311.7000.299
Kleinberg, J., Ludwig, J., Mullainathan, S., & Rambachan, A. (2018). Algorithmic Fairness. AEA
Papers and Proceedings, 108, 22–27. https://doi.org/10.1257/pandp.20181018
Kleinberg, J., Mullainathan, S., & Raghavan, M. (2016). Inherent Trade-Offs in the Fair
Determination of Risk Scores. ArXiv:1609.05807 [Cs, Stat].
http://arxiv.org/abs/1609.05807
Krakouer, J., Wu Tan, W., & Parolini, A. (2021). Who is analysing what? The opportunities,
risks and implications of using predictive risk modelling with Indigenous Australians in
child protection: A scoping review. Australian Journal of Social Issues, 56(2), 173–197.
https://doi.org/10.1002/ajs4.155
Krueger, R. A., & Casey, M. A. (2009). Focus groups: A practical guide for applied research
(4th ed). SAGE.
Kusner, M. J., Loftus, J. R., Russell, C., & Silva, R. (2018). Counterfactual Fairness.
ArXiv:1703.06856 [Cs, Stat]. http://arxiv.org/abs/1703.06856
Lanier, P., Rodriguez, M., Verbiest, S., Bryant, K., Guan, T., & Zolotor, A. (2019). Preventing
Infant Maltreatment with Predictive Analytics: Applying Ethical Principles to Evidence-
Based Child Welfare Policy. Journal of Family Violence. https://doi.org/10.1007/s10896-
019-00074-y
109
LightGBM. (n.d.). Parameters—LightGBM 3.2.1.99 documentation. Retrieved July 28, 2021,
from https://lightgbm.readthedocs.io/en/latest/Parameters.html
Lonne, B., Parton, N., Thomson, J., & Harries, M. (2008a). Reforming Child Protection (0 ed.).
Routledge. https://doi.org/10.4324/9780203894675
Lonne, B., Parton, N., Thomson, J., & Harries, M. (2008b). Reforming Child Protection (0 ed.).
Routledge. https://doi.org/10.4324/9780203894675
Louie, K. (n.d.). Child Welfare Social Work Guide.
https://www.onlinemswprograms.com/careers/child-welfare-social-work/
Lundgard, A. (2020). Measuring justice in machine learning. Proceedings of the 2020
Conference on Fairness, Accountability, and Transparency, 680–680.
https://doi.org/10.1145/3351095.3372838
McCradden, M. D., Joshi, S., Mazwi, M., & Anderson, J. A. (2020). Ethical limitations of
algorithmic fairness solutions in health care machine learning. The Lancet Digital Health,
2(5), e221–e223. https://doi.org/10.1016/S2589-7500(20)30065-0
McCroskey, J. (2021, March). Personal conversation [Personal conversation].
Mehrabi, N., Huang, Y., & Morstatter, F. (2020). Statistical Equity: A Fairness Classification
Objective. ArXiv:2005.07293 [Cs, Stat]. http://arxiv.org/abs/2005.07293
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2019). A Survey on Bias
and Fairness in Machine Learning. ArXiv:1908.09635 [Cs].
http://arxiv.org/abs/1908.09635
Mitchell, S., Potash, E., Barocas, S., D’Amour, A., & Lum, K. (2021). Algorithmic Fairness:
Choices, Assumptions, and Definitions. Annual Review of Statistics and Its Application,
8(1), 141–163. https://doi.org/10.1146/annurev-statistics-042720-125902
110
Morazes, J. L., Benton, A. D., Clark, S. J., & Jacquet, S. E. (2010). Views of Specially-trained
Child Welfare Social Workers: A Qualitative Study of their Motivations, Perceptions,
and Retention. Qualitative Social Work, 9(2), 227–247.
https://doi.org/10.1177/1473325009350671
Morgan, D. L. (1997). Focus groups as qualitative research / David L. Morgan (2nd ed). Sage
Publications.
Morina, G., Oliinyk, V., Waton, J., Marusic, I., & Georgatzis, K. (2020). Auditing and
Achieving Intersectional Fairness in Classification Problems. ArXiv:1911.01468 [Cs,
Stat]. http://arxiv.org/abs/1911.01468
National Association of Social Workers. (2013). NASW Standards for Social Work Practice in
Child Welfare. National Association of Social Workers.
https://www.socialworkers.org/LinkClick.aspx?fileticket=zV1G_96nWoI%3D&portalid=
0
Olds, D. L., Kitzman, H. J., Cole, R. E., Hanks, C. A., Arcoleo, K. J., Anson, E. A., ..., &
Stevenson, A. J. (2010). Enduring effects of prenatal and infancy home visiting by nurses
on maternal life course and government spending: Follow-up of a randomized trial
among children at age 12 years. Archives of Pediatrics & Adolescent Medicine, 164, 419–
424. https://doi.org/10.1001/archpediatrics.2010.49
O’Neil, C. (2016). Weapons of math destruction: How big data increases inequality and
threatens democracy (First edition). Crown.
Pan, I., Nolan, L. B., Brown, R. R., Khan, R., van der Boor, P., Harris, D. G., & Ghani, R.
(2017). Machine Learning for Social Services: A Study of Prenatal Case Management in
111
Illinois. American Journal of Public Health, 107(6), 938–944.
https://doi.org/10.2105/AJPH.2017.303711
Paulus, J. K., & Kent, D. M. (2020). Predictably unequal: Understanding and addressing
concerns that algorithmic clinical prediction may increase health disparities. Npj Digital
Medicine, 3(1), 99. https://doi.org/10.1038/s41746-020-0304-9
Pecora, P. J., Whittaker, J. K., Barth, R. P., Borja, S., & Vesneski, W. (2018). The Child Welfare
Challenge: Policy, Practice, and Research (4th ed.). Routledge.
https://doi.org/10.4324/9781351141161
Perron, B. E., Victor, B. G., Bushman, G., Moore, A., Ryan, J. P., Lu, A. J., & Piellusch, E. K.
(2019). Detecting substance-related problems in narrative investigation summaries of
child abuse and neglect using text mining and machine learning. Child Abuse & Neglect,
98, 104180. https://doi.org/10.1016/j.chiabu.2019.104180
Plummer-D’Amato, P. (2008). Focus group methodology Part 1: Considerations for design.
International Journal of Therapy and Rehabilitation, 15(2), 69–73.
https://doi.org/10.12968/ijtr.2008.15.2.28189
Putnam-Hornstein, E., Ahn, E., Prindle, J., Magruder, J., Webster, D., & Wildeman, C. (2021).
Cumulative Rates of Child Protection Involvement and Terminations of Parental Rights
in a California Birth Cohort, 1999–2017. American Journal of Public Health, 111(6),
1157–1163. https://doi.org/10.2105/AJPH.2021.306214
Putnam-Hornstein, E., Ghaly, M., & Wilkening, M. (2020). Integrating Data To Advance
Research, Operations, And Client-Centered Services In California: Integrating millions of
administrative records across California’s health and human services programs to
112
improve operations, coordinate services, develop targeted interventions, and more.
Health Affairs, 39(4), 655–661. https://doi.org/10.1377/hlthaff.2019.01752
Putnam-Hornstein, E., Needell, B., King, B., & Johnson-Motoyama, M. (2013). Racial and
ethnic disparities: A population-based examination of risk factors for involvement with
child protective services. Child Abuse & Neglect, 37(1), 33–46.
https://doi.org/10.1016/j.chiabu.2012.08.005
Rajkomar, A., Hardt, M., Howell, M. D., Corrado, G., & Chin, M. H. (2018). Ensuring Fairness
in Machine Learning to Advance Health Equity. Annals of Internal Medicine, 169(12),
866. https://doi.org/10.7326/M18-1990
Redden, J., Dencik, L., & Warne, H. (2020). Datafied child welfare services: Unpacking politics,
economics and power. Policy Studies, 41(5), 507–526.
https://doi.org/10.1080/01442872.2020.1724928
Reisch, M. (2002). Defining Social Justice in a Socially Unjust World. Families in Society: The
Journal of Contemporary Social Services, 83(4), 343–354. https://doi.org/10.1606/1044-
3894.17
Roberts, Y. H., O’Brien, K., & Pecora, P. J. (2018). Considerations for Implementing Predictive
Analytics in Child Welfare. Casey Family Programs.
Rodriguez, M. Y. (2021, March). The root of algorithmic bias and how to deal with it.
https://www.youtube.com/watch?v=epgPm8X-iyg&t=3041s
Rubin, D. M., O’Reilly, A. L. R., Luan, X., Dai, D., Localio, A. R., & Christian, C. W. (2011).
Variation in pregnancy outcomes following statewide implementation of a prenatal home
visitation program. Archives of Pediatrics & Adolescent Medicine, 165, 198–204.
https://doi.org/10.1001/archpediatrics.2010.221
113
Rubin, D. M., O’Reilly, A. L. R., Luan, X., Dai, D., Localio, A. R., & Christian, C. W. (2011).
Variation in pregnancy outcomes following statewide implementation of a prenatal home
visitation program. Archives of Pediatrics & Adolescent Medicine, 165, 198–204.
https://doi.org/10.1001/archpediatrics.2010.221
Ruf, B., & Detyniecki, M. (2020). Active Fairness Instead of Unawareness. ArXiv:2009.06251
[Cs]. http://arxiv.org/abs/2009.06251
Russell, J. (2015). Predictive analytics and child protection: Constraints and opportunities. Child
Abuse & Neglect, 46, 182–189. https://doi.org/10.1016/j.chiabu.2015.05.022
Saleiro, P., Kuester, B., Hinkson, L., London, J., Stevens, A., Anisfeld, A., Rodolfa, K. T., &
Ghani, R. (2019). Aequitas: A Bias and Fairness Audit Toolkit. ArXiv:1811.05577 [Cs].
http://arxiv.org/abs/1811.05577
Saxena, D., Badillo-Urquiola, K., Wisniewski, P. J., & Guha, S. (2020). A Human-Centered
Review of Algorithms used within the U.S. Child Welfare System. Proceedings of the
2020 CHI Conference on Human Factors in Computing Systems, 1–15.
https://doi.org/10.1145/3313831.3376229
Schoech, D., Quinn, A., & Rycraft, J. R. (2000). Data mining in child welfare. Child Welfare,
79(5), 633–650.
Schwartz, I. M., York, P., Nowakowski-Sims, E., & Ramos-Hernandez, A. (2017a). Predictive
and prescriptive analytics, machine learning and child welfare risk assessment: The
Broward County experience. Children and Youth Services Review, 81, 309–320.
https://doi.org/10.1016/j.childyouth.2017.08.020
Schwartz, I. M., York, P., Nowakowski-Sims, E., & Ramos-Hernandez, A. (2017b). Predictive
and prescriptive analytics, machine learning and child welfare risk assessment: The
114
Broward County experience. Children and Youth Services Review, 81, 309–320.
https://doi.org/10.1016/j.childyouth.2017.08.020
Shlonsky, A., & Wagner, D. (2005). The next step: Integrating actuarial risk assessment and
clinical judgment into an evidence-based practice framework in CPS case management.
Children and Youth Services Review, 27(4), 409–427.
https://doi.org/10.1016/j.childyouth.2004.11.007
Shroff, R. (2017). Predictive Analytics for City Agencies: Lessons from Children’s Services. Big
Data, 5(3), 189–196. https://doi.org/10.1089/big.2016.0052
Suresh, H., & Guttag, J. V. (2020). A Framework for Understanding Unintended Consequences
of Machine Learning. ArXiv:1901.10002 [Cs, Stat]. http://arxiv.org/abs/1901.10002
Swets, J. (1988). Measuring the accuracy of diagnostic systems. Science, 240(4857), 1285–1293.
https://doi.org/10.1126/science.3287615
U.S. Department of Health & Human Services. (2021). Child Maltreatment 2019. Administration
for Children and Families, Administration on Children, Youth and Families, Children’s
Bureau. https://www.acf.hhs.gov/cb/report/child-maltreatment-2019
Ustun, B., & Rudin, C. (2016). Supersparse linear integer models for optimized medical scoring
systems. Machine Learning, 102(3), 349–391. https://doi.org/10.1007/s10994-015-5528-
6
Vaithianathan, R., Maloney, T., Putnam-Hornstein, E., & Jiang, N. (2013). Children in the Public
Benefit System at Risk of Maltreatment. American Journal of Preventive Medicine,
45(3), 354–359. https://doi.org/10.1016/j.amepre.2013.04.022
Vaithianathan, R., Maloney, T., Putnam-Hornstein, E., Jiang, N., De Haan, I., & Dare, T. (2012).
Vulnerable Children: Can Administrative Data Be Used To Identify Children At Risk Of
115
Adverse Outcomes? https://www.msd.govt.nz/documents/about-msd-and-our-
work/publications-resources/research/vulnerable-children/auckland-university-can-
administrative-data-be-used-to-identify-children-at-risk-of-adverse-outcome.pdf
Vaithianathan, R., Putnam-Hornstein, E., Chouldechova, A., Benavides-Prado, D., & Berger, R.
(2020). Hospital Injury Encounters of Children Identified by a Predictive Risk Model for
Screening Child Maltreatment Referrals: Evidence From the Allegheny Family Screening
Tool. JAMA Pediatrics, e202770. https://doi.org/10.1001/jamapediatrics.2020.2770
Vaithianathan, R., Putnam-Hornstein, E., Jiang, N., Nand, P., & Maloney, T. (2017). Developing
Predictive Risk Models to Support Child Maltreatment Hotline Screening Decisions.
Veale, M., Van Kleek, M., & Binns, R. (2018). Fairness and Accountability Design Needs for
Algorithmic Support in High-Stakes Public Sector Decision-Making. Proceedings of the
2018 CHI Conference on Human Factors in Computing Systems - CHI ’18, 1–14.
https://doi.org/10.1145/3173574.3174014
Verma, S., & Rubin, J. (2018, May 29). Fairness Definitions Explained. 2018 IEEE/ACM
International Workshop on Software Fairness (FairWare), Gothenburg, Sweden.
https://doi.org/10.23919/FAIRWARE.2018.8452913
West, S. M. (2020). Redistribution and Rekognition: A Feminist Critique of Algorithmic
Fairness. Catalyst: Feminism, Theory, Technoscience, 6(2).
https://doi.org/10.28968/cftt.v6i2.33043
Whittaker, A., & Havard, T. (2016). Defensive Practice as ‘Fear-Based’ Practice: Social Work’s
Open Secret? British Journal of Social Work, 46(5), 1158–1174.
https://doi.org/10.1093/bjsw/bcv048
116
Wildeman, C., & Emanuel, N. (2014). Cumulative Risks of Foster Care Placement by Age 18 for
U.S. Children, 2000-2011. 9(3), 1–7. https://doi.org/10.1371/journal.pone.0092785
Wildeman, C., Emanuel, N., Leventhal, J. M., Putnam-Hornstein, E., Waldfogel, J., & Lee, H.
(2014). The Prevalence of Confirmed Maltreatment Among US Children, 2004 to 2011.
JAMA Pediatrics, 168(8), 706. https://doi.org/10.1001/jamapediatrics.2014.410
Yun, K., Chesnokova, A., Matone, M., Luan, X., Localio, A. R., &, & Rubin, D. M. (2014).
Effect of maternal– child home visitation on pregnancy spacing for first-time Latina
mothers. American Journal of Public Health, 104, S152–S158.
https://doi.org/10.2105/AJPH.2013.301505
Yun, K., Chesnokova, A., Matone, M., Luan, X., Localio, A. R., &., & Rubin, D. M. (2014).
Effect of maternal– child home visitation on pregnancy spacing for first-time Latina
mothers. American Journal of Public Health, 104, S152–S158.
https://doi.org/10.2105/AJPH.2013.301505
Zytek, A., Liu, D., Vaithianathan, R., & Veeramachaneni, K. (2021). Sibyl: Understanding and
Addressing the Usability Challenges of Machine Learning In High-Stakes Decision
Making. IEEE Transactions on Visualization and Computer Graphics, 1–1.
https://doi.org/10.1109/TVCG.2021.3114864
Abstract (if available)
Abstract
Assessing and reducing the risk of child maltreatment has been a primary concern for child welfare workers and agencies. As supporting children and families in need involves a wide range of decisions to be made, the use of machine learning to inform decision-making has received increasing attention. Yet, much remains to be explored in the area of applying machine learning to child welfare with a focus on fairness. The objective of this dissertation is to generate knowledge that will guide the fair and ethical use of machine learning to inform decision-making in child welfare.
Study1: Qualitative Exploration of Child Welfare Workers’ Decision-Making Experiences
To ensure machine learning serves the need of human users, machine learning models should be built and deployed based on users’ experiences. While the characteristics of decision-making in the child welfare systems have been well established in the literature, less is known about the experiences of child welfare workers. To understand and incorporate the perspectives of child welfare workers into machine learning applications, this study used focus groups to learn about their decision-making experiences and perception of fairness. From the thematic analysis, several findings emerged offering practical implications for applying machine learning to child welfare considering fairness. Machine learning models should be built and deployed considering the complex, nonlinear decision-making process in child welfare that can be influenced by narrative-based, nuanced information. It is also essential to operationalize the core values of child welfare and incorporate them into decision-making informed by machine learning while carefully considering related liability and accountability issues.
Study 2: Building a Machine Learning Model Examining Potential Biases
A primary concern about algorithmic intervention has been the potential of inadvertently perpetuating racial, socioeconomic, and other biases in our society by making predictions based on data ladened with human prejudice. These biases may be further complicated by the unique nature of child welfare, such as clients’ vulnerability, racial disproportionality, and the core values that may conflict. To advance the understanding of using machine learning fairly and equitably, this study adopted a real-world use case to provide an illustrative example of building a machine learning model examining fairness. First, this study identified potential biases that may emerge in each step of machine learning modeling and addressed how they might be further complicated in the context of child welfare. Then, in response to the question of the First 5 Orange County’s Bridges Maternal Child Health Network (MCHN) program, the study developed a machine learning model that assesses the need for home visiting services among families with a newborn by using a CPS outcome as a proxy for familial circumstances. The findings suggested that the machine learning models built in this study could significantly improve the assessment process by identifying more children and families in need. This points to opportunities to use machine learning to enhance equitable access to current home visiting services.
Study 3: Evaluating a Machine Learning Model Considering Intersectional Fairness
As machine learning has received increasing attention for its potential to inform decision-making in child welfare, a growing body of literature has discussed various aspects of its ethics. Yet, limited empirical work has been done on evaluating fairness in machine learning applied to child welfare. Moreover, machine learning fairness has been examined considering a single attribute, race/ethnicity, even if various attributes may have intertwined impacts on children and families. To illustrate the process of examining the fairness of a machine learning model using a real-world use case, this study conducted a fairness analysis on the machine learning model developed in Study 2. Firstly, this study identified a relevant fairness measure considering the service contexts and goals of the Bridges MCHN program. Using the measure, this study examined whether the machine learning model treats children differently depending on their maternal and birth characteristics. Then, the study drew upon the idea of intersectionality and tested whether the intersectionality of maternal race/ethnicity and nativity is associated with model performance. Compared to the Bridges pre-screening tool, the machine learning model could successfully identify more children who would experience substantiated maltreatment in general, but particularly for those born to mothers who were Black, under age 20, and without paternity established at birth. However, the model was found to be less effective in assessing the risk for children born to foreign-born Hispanic mothers.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Understanding allegations of childhood neglect using structured and unstructured administrative data
PDF
Exploring the implementation of the Indian Child of Welfare Act of 1978
PDF
An examination of child protective service involvement among offspring born to young mothers in foster care
PDF
Strength-Based Reporting: a trauma-informed practice for mandated reporters, to address behavioral health concerns in children at risk of child welfare involvement
PDF
Fair Machine Learning for Human Behavior Understanding
PDF
Assessing implementation of a child welfare system practice change
PDF
Operation hope
PDF
County of San Diego Child Welfare Services Hotline Redesign
PDF
Building healthy relationships to end family violence
PDF
Controlling information in neural networks for fairness and privacy
PDF
Scalable optimization for trustworthy AI: robust and fair machine learning
PDF
Learning fair models with biased heterogeneous data
PDF
U.S. Latinx youth development and substance use risk: adversity and strengths
PDF
Incarceration trajectories of mothers in state and federal prisons and their relation to the mother’s mental health problems and child’s risk of incarceration
PDF
Towards trustworthy and data-driven social interventions
PDF
Decision-aware learning in the small-data, large-scale regime
PDF
Artificial Decision Intelligence: integrating deep learning and combinatorial optimization
PDF
Neighborhood context and adolescent mental health: development and mechanisms
PDF
Immigrants at a loss: the need for services that promote child well-being among Latino families with child welfare contact
PDF
Substance use disorder treatment clinician and director challenges working with collaborative justice court and child welfare systems: A multisite qualitative study
Asset Metadata
Creator
Ahn, Eunhye
(author)
Core Title
Fairness in machine learning applied to child welfare
School
Suzanne Dworak-Peck School of Social Work
Degree
Doctor of Philosophy
Degree Program
Social Work
Degree Conferral Date
2022-05
Publication Date
04/01/2022
Defense Date
03/04/2022
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Child welfare,decision-making in child welfare,machine learning,machine learning fairness,OAI-PMH Harvest
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
McCroskey, Jacquelyn (
committee chair
), Lee, Jungeun Olivia (
committee member
), Morstatter, Fred (
committee member
), Putnam-Hornstein, Emily (
committee member
), Rebbe, Rebecca (
committee member
)
Creator Email
eunhyeahn@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC110878925
Unique identifier
UC110878925
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Ahn, Eunhye
Type
texts
Source
20220406-usctheses-batch-919
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
decision-making in child welfare
machine learning
machine learning fairness