Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Associations of ambient air pollution exposures with perceived stress in the MADRES cohort
(USC Thesis Other)
Associations of ambient air pollution exposures with perceived stress in the MADRES cohort
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ASSOCIATIONS OF AMBIENT AIR POLLUTION EXPOSURES WITH
PERCEIVED STRESS IN THE MADRES COHORT
by
Jeremy K. Yu
A Thesis Presented to the
FACULTY OF THE USC KECK SCHOOL OF MEDICINE
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
MASTER OF SCIENCE
BIOSTATISTICS
May 2021
Copyright 2021 Jeremy K. Yu
This thesis is dedicated to my family
for all of their love and support,
and for fostering a belief that the cause that is unknown should be searched out.
ii
Acknowledgments
I would like to say a big thank you to my thesis advisor, Dr. Rima Habre, for her guidance
throughout the research process, helping to develop my interest in environmental health topics,
and being enthusiastic about including and connecting me with the rest of the MADRES research
team. Many thanks to that team, who attended my presentations on this research and made
suggestions. I would also like to express my gratitude to Dr. Kimberly Siegmund and Dr.
Meredith Franklin, two of my professors who were also sources of encouragement even before
joining my thesis committee and offering valuable feedback. I also wish to acknowledge the
participants of the MADRES cohort, who provided their information for the purpose of research.
iii
TABLE OF CONTENTS
Dedication........................................................................................................................................ii
Acknowledgments..........................................................................................................................iii
List of Tables....................................................................................................................................v
List of Figures.................................................................................................................................vi
ABSTRACT..................................................................................................................................vii
1. Introduction..................................................................................................................................1
2. Methods......................................................................................................................................13
2.1. Subjects..............................................................................................................................13
2.2. Air Pollution Measures.......................................................................................................15
2.3. Perceived Stress Measure...................................................................................................16
2.4. Imputation..........................................................................................................................17
2.5. Overview of Statistical Analysis........................................................................................17
2.6. Confounding.......................................................................................................................19
2.6.1. Financial and socioeconomic status, and principal component analysis....................22
2.6.2. Meteorology and principal component analysis.........................................................32
2.6.3. Variable-selection procedure......................................................................................38
2.7. Logistic Regression Assumptions......................................................................................42
2.8. Selecting Candidates for Final Models..............................................................................44
2.9. Sensitivity Analysis............................................................................................................46
2.10. Effect Modification Assessment.......................................................................................49
2.11. Spatial Analysis................................................................................................................49
3. Results........................................................................................................................................51
3.1. Variable Selection..........................................................................................................51
3.2. Multicollinearity............................................................................................................52
3.3. Linearity........................................................................................................................54
3.4. Influential Observations................................................................................................56
3.5. Final Preliminary Fixed-effects Binary Logit Models.......................................................58
3.5.1. NO
2
models.................................................................................................................58
3.5.2. O
3
models...................................................................................................................60
3.5.3. PM
2.5
models...............................................................................................................62
3.5.4. PM
10
models...............................................................................................................64
3.5.5. CALINE4 Total NO
x
models......................................................................................65
3.5.6. CALINE4 Freeway NO
x
models................................................................................67
3.5.7. CALINE4 Major-road NO
x
models............................................................................68
3.5.8. CALINE4 Minor-road NO
x
models............................................................................69
3.6. Revised Fixed-effects Binary Logit Models......................................................................70
3.7. Sensitivity Analysis............................................................................................................71
3.8. Effect Modification............................................................................................................75
3.9. Spatial Analysis..................................................................................................................77
4. Discussion..................................................................................................................................81
REFERENCES..............................................................................................................................84
iv
List of Tables
Table 1: Descriptive statistics of selected variables, by cohort entry point..................................15
Table 2: Polychoric correlation matrix of individual-level binary or ordinal indicators of
socioeconomic status.....................................................................................................................24
Table 3: Loadings of the first two PRINCALS components of individual-level socioeconomic
status indicators, and percentages of variance explained...............................................................25
Table 4: Pearson (and Spearman) correlation matrix of geographic (CalEnviroScreen 3.0)
indicators of socioeconomic status................................................................................................29
Table 5: Loadings on the first two principal components of the geographic (CalEnviroScreen
3.0) indicators of socioeconomic status, and percentages of variance explained..........................30
Table 6: Smallest Pearson (and Spearman) correlation coefficients for n-day mean daily
meteorological estimates at residential locations in the third trimester of pregnancy...................33
Table 7: List of covariates before automated confounder selection..............................................39
Table 8: Confounders selected for each air pollutant using augmented backward elimination -
based procedure..............................................................................................................................51
Table 9: Generalized variance-inflation factor: for each pollutant, maximum GVIF1/(2ν) over all
exposure windows and variables, and mean of window highest GVIF1/(2ν)...............................54
Table 10: Generalized variance-inflation factor: for NO2, highest and second-highest
GVIF1/(2ν) values and variables by n-day exposure window.......................................................54
Table 11: Generalized variance-inflation factor: for squared O3 level, highest and second-highest
GVIF1/(2ν) values and variables by n-day exposure window.......................................................54
Table 12: DFBETAS summary statistics by pollutant..................................................................57
Table 13: Revised main-effects models........................................................................................71
Table 14: Variations on revised log
2
7-day PM
2.5
model...............................................................73
Table 15: Effect modification: stratified estimates of association of log2 of 7-day mean PM2.5
level with binary perceived stress, by depression, meteorological or seasonal variable...............77
v
List of Figures
Figure 1: Proportions of variance accounted for by principal components of selected n-day mean
meteorological variables, by selected n-day window....................................................................34
Figure 2: Loadings on the first principal component (eigenvector elements) of the n-day mean
selected meteorological variables, by n.........................................................................................35
Figure 3: Mean score on PC1 of n-day meteorological variables, versus PSS month, by n.........36
Figure 4: NO
2
binary logit model of T3 perceived stress.............................................................60
Figure 5: O
3
logit model of T3 perceived stress...........................................................................61
Figure 6: Short-term O
3
logit model of T3 perceived stress.........................................................62
Figure 7: PM
2.5
logit model of T3 perceived stress.......................................................................63
Figure 8: Short-term PM
2.5
logit model of T3 perceived stress....................................................63
Figure 9: PM
10
binary logit model of T3 perceived stress............................................................64
Figure 10: Short-term PM
10
logit model of T3 perceived stress...................................................65
Figure 11: CALINE4 Total NO
x
logit model of T3 perceived stress............................................66
Figure 12: Short-term CALINE4 Total NO
x
logit model of T3 perceived stress..........................66
Figure 13: CALINE4 Freeway NO
x
logit model of T3 perceived stress......................................67
Figure 14: Short-term CALINE4 Freeway NO
x
logit model of T3 perceived stress....................68
Figure 15: CALINE4 Major-road NO
x
logit model of T3 perceived stress..................................69
Figure 16: CALINE4 Minor-road NO
x
logit model of T3 perceived stress..................................70
Figure 17: Estimated log-odds of high perceived stress vs. log
2
7-day PM
2.5
..............................74
Figure 18: Sensitivity analysis … for associations of an unmeasured binary confounder with
exposure level and outcome...........................................................................................................75
Figure 19: Geographically weighted logistic regression … one-half of optimal adaptive
bandwidth.......................................................................................................................................79
Figure 20: Geographically weighted logistic regression … one-quarter of optimal adaptive
bandwidth.......................................................................................................................................80
vi
Abstract
Background
Observational research has shown an association of air pollutants—ambient particulate matter
smaller than 2.5 µm in aerodynamic diameter (PM
2.5
), personal nitrogen dioxide (NO
2
), and near-
roadway nitrogen oxides (NO
x
)—with perceived stress as measured by different versions of the
Perceived Stress Scale (PSS) in older adults in the northeastern United States and Western
Europe, and in children in Southern California, respectively. Observational evidence has also
connected perceived stress to obesity in mothers of young children, with lower socioeconomic
status. Murine experiments have shown air pollution could change brain, cognitive or mental
health, for exposure durations longer than five days and predetermined exposure intensities.
However, the precise exposure time window, during which an ambient pollution concentration
can vary, with the largest association with perceived stress in humans has not been determined
for particular pollutants. Also, the relationship between air pollution and perceived stress among
persons who are pregnant in Los Angeles County and have lower income might be particularly
relevant to addressing the causes of obesity-related health outcomes in known environmental
health disparities populations such as the Hispanic and African or Black American communities.
As such, we investigated the relationship between multiple pollutants and perceived stress in the
Los Angeles Maternal and Developmental Risks from Environmental and Social Stressors
(MADRES) pregnancy cohort, for several short and long periods of exposure estimated for
participants’ locations of residence.
Methods
A total of 426 MADRES cohort participants who took the 10-item Perceived Stress Scale in their
third trimester of pregnancy in 2016-2019 were selected to develop and fit binary logistic
vii
regression models of perceived stress (median-split PSS score). The mean air-pollutant level of
the n days before the PSS administration date was the regressor of interest; n = 1, 2, …, 9, 10, 15,
20, 25, 30, 45, 60, 75, 90, 105, 120. The pollutants examined were residential ambient NO
2
,
ozone (O
3
), PM
2.5
, and particulate matter less than 10 µm in aerodynamic diameter (PM
10
), and
road class -specific NO
x
that had been estimated using the CALINE4 line-source dispersion
model. Principal component analysis was used to create approximate latent measures of
meteorology and socioeconomic status. Confounders were selected by using a priori
information, followed by augmented backward elimination. Maternal age, time having lived in
the United States, ethnicity, language, race, individual-level and geographic indicators of
socioeconomic status, marital status, meteorological variables, month of PSS and physical
activity were identified as possible confounders prior to automated selection. Clinical depression
risk, meteorology and seasonality were considered as effect modifiers. After adjusting for
multiple testing, the logistic models in which the pollutant parameter estimate was significant
were evaluated by simulating a binary unmeasured confounder. The same models were evaluated
for evidence of spatial non-stationarity and geographic confounding by using geographically
weighted logistic regression.
Results
There was a significant (adjusted P=.04) association between the logarithm (to base 2) of 7-day
mean PM
2.5
concentration and dichotomous perceived stress. In initial exploratory models, the
association of PM
2
.
5
with perceived stress was not significant for the very shortest exposure
windows (one, two and three days) or windows longer than seven days (unadjusted P>.05). The
estimated odds ratio for a doubling of the 7-day PM
2.5
level was 1.99 (95% CI 1.02, 4.07), where
the Bonferroni-adjusted 95% confidence interval was calculated for 21 tests (a 99.76% CI).
There was no significant spatial variation of the parameter estimate with either an optimal kernel
viii
bandwidth or selected non-optimal bandwidths. Also, no evidence was obtained of extreme
geographic confounding. The estimated OR remained greater than 1 with simulations of an
unmeasured confounder with various prevalences and associations with exposure/outcome;
although, there were exceptions involving an OR (association of the unmeasured confounder
with the outcome) greater than 2.5 and an estimated difference of means (association of the
unmeasured confounder with exposure level) of between approximately 20% and 30%, or
between about 0.25 and 0.40 in log
2
of the PM
2.5
level. Stratification suggested that a positive
association between 7-day PM
2.5
might be larger in warmer months or at higher temperatures, but
all of the interaction terms were nonsignificant.
Discussion
Overall, we found evidence of a positive association of 7-day mean ambient PM
2.5
with perceived
stress. Results were nonsignificant for shorter or longer windows and for other pollutants, or
inconclusive due to the method of ceasing exploration of an exposure window when an
unadjusted P-value less than .05 was obtained with a preliminary confounder set common to all
windows.
ix
1. Introduction
The purpose of this thesis is to contribute to understanding obesity and obesity-related health
outcomes in Hispanic or Latinx (including but not limited to Chicana/Chicano) and African or
Black American communities, by studying the association between air pollutants and
psychological stress in participants of the Maternal And Developmental Risks from
Environmental and Social Stressors (MADRES) cohort in Los Angeles, California. The
MADRES cohort is a federally funded research center and pregnancy cohort that considers how
various environmental and social exposures during developmental periods might be related to
maternal and child obesity, one of multiple aspects of the health of underserved communities and
whose effects are often more serious in these communities. While neither body mass index
(BMI) nor any other measure of obesity was included in the analysis in this thesis, this thesis
focuses on a set of questions—one for each of several specific air pollutants—falling under the
relationship between environment, and behavioral or mental states and their possible
physiological consequences. For each of ambient nitrogen dioxide (NO
2
), ozone (O
3
), fine
particulate matter (with an aerodynamic equivalent diameter less than 2.5 micrometers, PM
2.5
),
particulate matter with an aerodynamic diameter under 10 micrometers (PM
10
), and traffic-related
near-roadway contributions to nitrogen oxides (NO
x
) from three road classes (freeways and
highways, major road, and minor roads), and the sum of these three, is there a statistically
significant association between the pollutant and perceived stress in the third trimester of
pregnancy in the MADRES cohort? If so, this might have a bearing on associations between air
pollution, perceived stress and obesity in other studies. This is the background against which the
associations of air pollutants with perceived stress were studied.
1
The MADRES cohort is an ongoing pregnancy cohort whose study design and initial
sample characteristics have been described elsewhere,
1
but we highlight that the cohort study
continues to recruit from partner clinics in the City of Los Angeles, California, serving many
individuals in a low-income classification. Presumably virtually all participants were assigned
female at birth. The vast majority of cohort participants have indicated Hispanic ethnicity, and/or
Black or African American as a racial categorization less common than either Hispanic or non-
Hispanic ethnicity. The sample size for certain pairings of ethnicity and race, such as persons
who are both non-Hispanic and Asian, could be too small for some statistical analyses. At the
same time, the MADRES cohort presents an opportunity to adjust for other covariates available
in the cohort dataset and examine the adjusted associations between air pollutants and perceived
stress within a sample that might be might be considered more culturally, ethnically or
socioeconomically homogeneous than other groups. That is despite internal differences, such as
the wide variety of individuals who might be classified as African or Black American, Hispanic,
or low-income. Most MADRES participants belong to broad categories that might be statistically
underrepresented in some other studies.
Among many other assessments or collections at time points during pregnancy or after
birth, MADRES participants agree to take the Ten-item Perceived Stress Scale (PSS-10) during
the third trimester of pregnancy (as well as during each of the previous trimesters). There are
various versions of the Perceived Stress Scale (PSS), each of which has several five-point Likert-
scale items. The items on a Perceived Stress Scale questionnaire are together intended to measure
psychological stress. In particular, the PSS asks respondents about their beliefs regarding their
emotions and feelings with respect to stress in their lives or their ability to manage that stress. An
example of the questions on the PSS is, “In the last month, how often have you been upset
because of something that happened unexpectedly?” Another example on the PSS-10 is, “In the
2
last month, how often have you felt difficulties were piling up so high that you could not
overcome them?” The unexpected thing causing upset, or the difficulties, are left unspecified,
and the queried frequencies depend on the individual’s own perception of their own emotions or
feelings as well as on their definition at the time of words such as “unexpectedly” and
“overcome.” Answer choices range from “never” to “very often.” The PSS questions are different
in scope and type from items on other, seemingly related assessments MADRES participants
take, such as “I find weight gain during pregnancy troubling” in the Prenatal Distress
Questionnaire (PDQ), and the even more objective “A close family member was very sick and
had to go into the hospital” in the Pregnancy Risk Assessment Monitoring System (PRAMS)
stressful life events questionnaire. As others have noted, the PSS does not ask about specific
individual stressful events, but rather explores the respondent’s own appraisal of their experience
of stress in general.
2,3
The appraisal could change even if the types and numbers of stressful
events do not.
In a study that included only persons who already had a child (younger than five years)
and did not include anyone who was pregnant, score on the fourteen-item version of the PSS
(PSS-14) was found using structural equation modeling to be positively associated with severe
obesity independently of certain behavior (emotional, uncontrolled or cognitively restrained
eating) and diet quality.
4
The sample consisted of mothers with low income in North Carolina, a
plurality of whom were classified as non-Hispanic Black. Since perceived stress might be
independently important in explaining obesity, the risk factors for perceived stress might
themselves be important. In addition to PSS scores, the MADRES dataset includes air pollutant
concentrations and many other variables that might precede and be associated with perceived
stress.
3
The PSS-10, which the MADRES cohort uses, has been reported to have good
psychometric properties in comparison with the PSS-14 and a four-item PSS (PSS-4).
2
A
question has been raised as to whether PSS-10 item responses should be used in analyses by
summing all of the responses and interpreting a single score, or whether subsets of the PSS-10
items measure different things (such that a perceived helplessness subscale score and a perceived
self-efficacy subscale score, for example, should be included separately or together in regression
analysis),
3
but this critique would apply to how the PSS-10 is used in practice in many different
studies and settings. A 2019 study validated PSS-10 for use with Hispanic Americans who
preferred to speak English, and Spanish-preferring Hispanic Americans.
5
In the validation study,
participants were recruited from Southern California. The MADRES cohort is more specifically
focused on people who live in Los Angeles County, a highly urbanized area smaller than Greater
Los Angeles (which includes additional rural and suburban areas). Though the version of the PSS
differs, the above-mentioned North Carolina study’s finding regarding PSS-14 score and severe
obesity might be relevant. If they are relevant, score on the PSS-10 could be viewed as
measuring a possible mental intermediary between an air pollutant and obesity. The
environmental–mental portion of a hypothetical pathway between specific air pollutants and
obesity has already been studied, in a much different demographic and geographic context.
A 2015 study examined the relationship between residential air pollution and perceived
stress in the context of hundreds of older males (mean age 69 years), almost entirely white, who
were participating in the Veterans Administration Normative Aging Study (NAS) cohort in 1995-
2007. Most of the study’s subjects were living in or around Boston, Massachusetts. The study
found a positive and significant (P<.05) linear association between NO
2
and PM
2.5
levels
averaged over one month or less, and perceived stress, after adjusting for covariates including
age, education, race, meteorology, physical activity, and seasonality. For example, a 4.7 μg/m
3
4
(study sample interquartile) increase in one-week PM
2.5
was associated with a 0.5 (95% CI: 0.2,
0.9) increase in PSS-14 score, and a 0.006 ppm rise in one-week NO
2
was associated with a 0.8
point (95% CI: 0.4, 1.2) increase in score. Significant pooled (not season-specific) associations
were reported for black carbon (which is traffic-related) as well, but not for O
3
. The study used
the PSS-14.
6
The existing, prior dataset from which this thesis draws includes many variables similar
to the covariates in the NAS, including daily meteorological variables estimated for MADRES
participants, as well as daily 24-hour NO
2
, O
3
, PM
2.5
and PM
10
concentrations based on
measurements taken at multiple air quality monitoring stations. This prior dataset also includes
daily traffic-related NO
x
levels that were estimated for participants’ home addresses using the
CALINE4 line-source Gaussian dispersion model: freeway NO
x
, non-freeway major-road NO
x
,
minor-road NO
x
, and total traffic-related NO
x
(the sum of the previous three components). The
extent and granularity of the dataset makes it possible to study associations for lags and moving
averages of a few or many specific days, from the day of taking the PSS to several months before
PSS administration. The MADRES dataset thus presents an opportunity to explore air pollutant–
perceived stress associations with a new study population and study associations for exposure
windows shorter or more recent than those previously examined in other settings.
For the present study, essentially a single time point was considered—a day during a
MADRES participant’s third trimester, which might or might not overlap with another
participant’s third trimester. Though observations in the first two trimesters were available, in
this thesis’ analysis only one measurement of perceived stress was included per individual, the
third-trimester PSS score. Adjusted associations were estimated for simple lagged averages of
pollutant levels over n days immediately preceding, but not including, the day of third-trimester
PSS administration—for n = 1, 2, …, 9, 10, 15, 20, 25, 30, 45, 60, 75, 90, 105, 120 (that is, daily
5
for the last ten days, every five days for the last thirty days, and every fifteen days for the last
120 days), a total of twenty overlapping exposure windows.
The purpose of this thesis is not to establish causation. The analysis is not based on a
complete causal diagram. We caution against inferring a causal relationship, also because this
constitutes an observational study with a single time point. This is a cross-sectional analysis in
which the outcome assessed by the PSS might have preceded an air pollution exposure. Also,
daily air pollutant levels and means for days before the PSS date were measured after the PSS
was taken. However, we emphasize that the goal here is not predictive. In predictive modeling,
issues of causal relationships among outcome and predictor variables do not generally need to be
considered. In this thesis, the goal is to estimate a single association for each pollutant, and this
involves concepts such as confounding that recently are difficult to separate from all notions of
causality. In considering a number of models (one for each exposure window) for the same
pollutant as mentioned above, one runs the risk of having a false positive, and so it is helpful to
consider whether a causal relationship would even be consistent with psychology and biology.
In this context, there seems to be both a precedent with other psychological outcomes,
and a plausible biological basis, for an association between air pollution and perceived stress.
Though not focusing on the association between air pollution and perceived stress directly, a
review of evidence suggests that long-term PM
2.5
exposure is associated with anxiety (symptom
scale score) and psychotic experiences.
7
PM
2.5
might then be associated with related conditions
even though anxiety and perceived stress (which is not a mental disorder) can involve different
physiological or psychological pathways and can have different features or effects. For anxiety,
two referenced studies examined the association in samples who tended to be much older than
reproductive ages for females (all over 56 years in both cases with a mean age of 70 or 71.). Both
studies looked at multiple exposure windows and found the largest association was for six
6
months
8
or twelve months
9
prior to the assessment of anxiety. Though the context was older
adults, the estimate of association always became smaller while stepping down or up on the
ladder of exposure windows with respect to the window of largest association, suggesting a peak
in association size. For schizophrenia, a referenced study examined annual NO
2
, NO
x
and PM
2.5
exposures with older teenagers and found significant (P<.05) associations with psychotic
experiences.
10
A different study looked at very short-term one-day and cumulative lags for NO
2
with a variety of ages and found several significant (P<.05) associations with schizophrenia
hospital admission.
11
The same review, by Hahad et al., describes experimental evidence of NO
2
and PM
2.5
exposure’s causing anxiety-like behavior and memory impairment in mice or rats.
There is also experimental evidence of nanoscale-particle exposure’s being associated with
abnormal inflammation and oxidative stress.
In a much different setting—young children in a city in Korea—but using estimated
personal PM
2.5
exposure (directly measured in classrooms and homes, and indirectly measured
for outdoor and other indoor locations), the association between daily PM
2.5
exposure and a
measure of anxiety was found to be insignificant (P=.24).
12
In fact, the point estimate of the
association was negative. A linear mixed-effects model was used. The sample was about fifty
children, and PM
2.5
exposure and anxiety were measured every day for a week. The study noted
sources of PM
2.5
, associations between PM
2.5
measurements for different locations, and
associations of cooking stove type with PM
2.5
(not disaggregated) concentration, but like many
other PM
2.5
studies did not analyze the distinct associations between PM
2.5
constituents and the
dependent variable.
For the present study, pollutant composition was not considered directly. However, we
considered a type of effect modification by spatial location through the use of geographically
weighted regression. We also considered temporal effect modification by fitting models with
7
interaction terms for season. Geographic location and seasonality might impact the association
between an air pollutant and perceived stress for reasons that have nothing to do with the
composition of the pollutant. For example, a neighborhood might have a built environment (such
as one including more trees) mitigating the impact of the pollutant on perceived stress. However,
if the composition of a pollutant depends on geographic location or the season, addressing the
interactions might address pollutant composition as an effect modifier indirectly and partially.
It might happen that the association between a pollutant and perceived stress might vary
from location to location because of differences in the source of the pollutant, and a source might
be either an effect modifier or a confounder. Construction, the electric power industry,
manufacturing and transportation all directly or indirectly contribute to ground-level nitrogen
oxides, ozone, and particulates. In the case of the chemical-substance (non-mixture) pollutants
NO
2
and O
3
, there might seem to be no reason for a model coefficient on the pollutant level to
vary between locations, but the local sources of NO
2
or O
3
might themselves contribute to
perceived stress separately from any biological effect of the pollutant itself. As an example that is
extreme and also has a seasonal or temporal aspect, it is the case that lawn and garden equipment
and wildfires both contribute to NO
2
and O
3
. In the absence of other sources, the level of one of
these pollutants could represent lawn and garden equipment use that is less-positively or even
negatively associated with perceived stress. The equipment use could be a confounder.
With chemically diverse pollutants, it is more obvious that sources might play a role as an
effect modifier. Within NO
x
, the ratio of nitric oxide (NO) to NO
2
emitted from vehicles can vary
with proportions of vehicle or emission control technology types and even with the age of a
vehicle,
13
and the effects of NO and NO
2
might differ. Particulate matter as small as PM
2.5
can
include a wide variety of material including metals, nitrates, and organics. While the composition
and sources of particulate matter can both vary considerably, the evidence regarding the way in
8
which PM
2.5
composition is associated with health (apart from laboratory or hypothetical/ideal
conditions) is still growing. An editorial
14
noted a study
15
in which interquartile-range increases
in components of previous-day PM
2.5
—elemental carbon (0.22; 95% posterior interval: 0.00,
0.44), organic carbon matter (0.39; 95% PI: 0.08, 0.70), total PM
2.5
(0.30; 95% PI: 0.11, 0.50),
and nitrate (0.07; 95% PI: -0.10, 0.24) as an example with a posterior interval including zero—
were associated with different percentage increases in non-accidental mortality across several
dozen U.S. cities at the beginning of the twenty-first century. Commenting on a study
16
of
whether the association of PM
2.5
with emergency room visits for respiratory illness was modified
by oxidative potential, the same editorial noted a finding that glutathione-related oxidative
potential did not in general significantly modify (between cities) the association of three-day
mean PM
2.5
concentration with ER room visits, but that the interaction term was significant
(P<.05) when the data were subsetted to include only observations with a PM
2.5
level ≤10 μg/m
3
.
PM
2.5
oxidative potential is thought to be associated with respiratory tract inflammation and
asthma symptoms, which might in turn be associated with perceived stress even if PM
2.5
oxidative potential might not cause perceived stress directly.
PM
2.5
composition can vary temporally as well as spatially. If PM
2.5
with a particular
composition is associated with perceived stress, and the composition of PM
2.5
depends on the
season, seasonality could appear to be an effect modifier. A study suggested the composition and
biological effects of PM
2.5
might have varied with the season, in an urban area of Greater Cairo,
Egypt.
17
PM
2.5
was found to have higher levels of polycyclic aromatic hydrocarbons in both the
summer and the winter. Autumn and winter PM
2.5
were associated in the lab with a higher level
of Interleukin 6, a cytokine that has both pro-inflammatory and anti-inflammatory roles. Summer
PM
2.5
was associated in the lab with a disruption of cell division despite an apparent preservation
9
of cell viability. Season might then appear as an effect modifier of the association between PM
2.5
and various health outcomes because of season-specific composition of PM
2.5
.
Duration might play a role, not just the date. Previous studies of psychological outcomes
have each considered a small number of a variety of air pollution exposure lags and averaging
periods. Observational and experimental studies have typically considered exposures over
periods of time between one and four weeks, inclusive. The observational Mehta et al.
6
study in
Massachusetts examined the association between ambient levels of multiple pollutants—
including NO
2
, O
3
, and PM
2.5
—and perceived stress (PSS-14 score) for one, two and four weeks
of exposure time. For NO
2
and PM
2.5
, 95% confidence intervals contained only positive values
for all exposure windows. Season (cold or warm) -specific associations of NO
2
with perceived
stress were significant as were the cold-season associations of PM
2.5
, but the warm-season
associations of PM
2.5
were not significant. An observational study in Belgium, which used a
linear mixed-effects model to examine the association between personal NO
2
and PSS-10 score
for a single exposure duration (repeated), reported the absence of a significant association.
18
In
2013-2014, each participant had worn a NO
2
diffusive sampler for the five days before the PSS
was administered. There were six time points and twenty participants. Much-longer exposure
windows have also been considered in observational studies. Annual total and freeway near-
roadway NO
x
(in 2010 and 2012) were each positively associated with PSS-4 score among more
than two thousand children in the Southern California Children’s Health Study cohort.
19
Related outcomes have been studied for widely varying pollution exposure durations.
Though it is possible to have the highest PSS score without having anxiety, the association
between PM
2.5
and anxiety symptoms (a dichotomization of an anxiety scale score) was
examined for several windows ranging from one week to four years, in older Americans.
8
For the
cutoff point used, the association was positive and significant (P<.05) for all five of the exposure
10
windows considered, but the effect was the largest for the 180-days moving average. A study of
12-year-old children in England examined the association between annualized ambient NO
2
and
PM
2.5
levels and concurrent and age-18 anxiety symptoms; it did not find any of the associations
with anxiety symptoms to be significant.
20
Extremely short exposure windows have also been
examined. In hundreds of residents of Nanjing, China, an effect of instant ambient PM
2.5
exposure on a mental stress scale score was not found to be significant (P>.05).
21
The scale was
originally intended to be a measure of phobic anxiety.
22
More relevantly to the present study, a
study with pregnant women in Shanghai examined associations between NO
2
and PM
10
,
respectively, and a binary measure of emotional stress (a dichotomization of Symptom Checklist-
90-Revised Scale score), for 1-, 3-, 6-, 8- and 15-day exposure windows that included the day of
psychological assessment.
23
1-day PM
10
and 5-day NO
2
were positively and significantly (P<.05)
associated with emotional stress in single-pollutant models. However, within neither the cold
season nor the warm season was the NO
2
association with emotional stress significant (P>.05).
Within the cold season, the 95% confidence interval for 1-day PM
10
contained only odds ratios
greater than 1. The score used, the Global Severity Index, was a measure of overall distress.
Associations with anxiety symptom score, and with Life Event Scale for Pregnant Women
(LESPW) total score, were reported to be non-significant (P>.05). The LESPW dealt with
specific events. We are not aware of any published air pollution study focusing on pregnant
women’s perceived stress apart from specific events, for multiple exposure windows shorter than
one week. Databases searched include Embase, MEDLINE, and Web of Science (all databases).
Observational studies such as those described can lead to false positives, or to findings of
significant associations that do not represent causal relationships even if the associations might
be truly present in the population. Laboratory studies with non-human animals have suggested
the biological plausibility of causal associations of air pollution with behavior or cognitive
11
conditions that might in humans be related to perceived stress, for exposure intensities decided in
advance (unlike ambient exposures with humans) and typically for durations longer than a week.
The review by Hahad et al.
7
cites several experimental studies that examined associations
between air pollution and brain, cognitive or mental health for specific exposure times, in mice
or rats. In a study
24
of co-exposure to multiple air pollutants, mice were for 28 days exposed to
0.2 mg m
-3
of NO
2
and 0.5 mg m
-3
of SO
2
(dynamic inhalation) for 6 h/d, and every other day to 1
mg kg
-1
PM
2.5
(oropharyngeal aspiration). Another group was for 28 days similarly exposed to 2
mg m
-3
NO
2
, 3.5 mg m
-3
SO
2
, and 3 mg kg
-1
PM
2.5
. In comparison to the filtered air and saline to
which a control group was treated, the higher co-exposure was negatively, significantly (P<.05)
associated with measures of spatial learning and memory. The authors visually discerned damage
to neuronal mitochondria in the higher-co-exposure group, as well as cortical changes associated
with inflammation and cell death. Expression of certain apoptosis-related proteins was
significantly (P<.05) associated with high co-exposure versus the control conditions. In a study
25
with rats, PM
2.5
was administered at about 306 ± 242 μg/m
3
for 6 h/d, on each of six days per
week, for twelve weeks. PM
2.5
was also administered at about 48 ± 24 μg/m
3
for the same times.
In comparison with filtered air, the lower and higher exposures were negatively, significantly
(P<.05) associated with total distance moved in an open field test. Higher exposure was
negatively, significantly (P<.05) associated with other indicators of depressive behavior, such as
latency to feeding—and time in the center of the field, an indication of anxiety-like behavior. In
addition, the lower and higher PM
2.5
exposures were each positively, significantly (P<.05)
associated with expression of proteins related to inflammation. Wild-type and Nrf2-knockout
mice were used to show that the transcription factor played a role in enabling a difference in
depressive behavior and inflammation-related protein expression between exposed and non-
exposed mice. In a different murine study,
26
a two-week administration of NO
2
at 0.9 ppm, in
12
combination with CO and CO
2
(a mixture representing vehicle exhaust), for 5 h/d was positively,
significantly (P<.05) associated with some anxiety-like behavior measures (such as less time in
the open arms of the elevated plus maze), and indicators of oxidative stress and mitochondrial
impairment. In addition, rats have been exposed to O
3
at 0.3 ppm for 4 h/d for fifteen days; the
exposure was negatively, significantly (P<.05) associated with time in the open arms of the
elevated plus maze and with a measure of object recognition memory.
27
We considered exposure periods shorter or longer than those dealt with in many
observational or experimental studies: exposure windows as short as one day and as long as 120
days. 120 days approximately coincides with the normal life span of red blood cells. Other cells
or units, such as mitochondria, can last for much shorter periods.
2. Methods
2.1. Subjects
In this study of associations of air pollutants with perceived stress, the subjects were a subset of
the larger MADRES study’s participants. The MADRES dataset as of July 1, 2020, was used. All
and only participants who took the PSS in the third trimester, were living in Los Angeles County
at that time, and were assigned air pollution measurements in the dataset, were included.
Among the three pregnancy trimesters, the third was chosen to maximize the number of
observations. Some participants either left the study before the third trimester, but picking the
third trimester made it possible to include late-entry participants. Late-entry participants enrolled
at less than 20 weeks gestational age but had their first study visit at 28-36 weeks gestation.
Participants outside of Los Angeles County were excluded to avoid the influence of isolated
observations in spatial analysis. This exclusion was done regardless of whether most of a
13
participant’s residential history was in Los Angeles during any or all of the exposure windows
considered.
As a result, a total of 426 participants with either of two cohort entry points—117 late-
entry; and 309 regular-entry, who were enrolled at less than 20 weeks—were included. Nine
participants were excluded by not residing in Los Angeles County at the time of PSS
administration. 84 participants (16 late-entry and 68 regular-entry) were excluded for not having
a record in the air pollution portion of the dataset. Among the included participants, the earliest
date of third-semester PSS administration was in May 2016. The latest date was in October 2019.
The interquartile range of PSS dates was 462.5 days, about one and a quarter years.
Late-entry and regular-entry participants were judged to be similar in the variables
considered based on the independent samples t-test, the Mann–Whitney U test, and the
Kolmogorov–Smirnov test (for continuous variables); or the chi-square test of association and
Fisher’s exact test (for categorical variables). Exceptions were ethnicity and race. A higher
proportion (17%, compared with 9%) of late-entry participants were Black or African American,
and a lower proportion were Hispanic (69%, compared with 79%). However, effect modification
by race was considered, rather than effect modification by entry point. Table 1 partially describes
the sample, which is representative of the larger sample described in Bastain et al.
1
Among the
intersections not shown, 35 members of the 426-person sample are both non-Hispanic and white.
14
Table 1: Descriptive statistics of selected variables, by cohort entry point
2.2. Air Pollution Measures
The existing MADRES dataset provided daily measurements of air pollution. These were 24-
hour means linked to MADRES participants’ residential locations. The present study considered
ambient O
3
, NO
2
, PM
2.5
, and PM
10
, which were regional estimates that had been obtained by
inverse distance weighted interpolation from measurements at Environmental Protection Agency
air quality monitoring sites, and CALINE4 dispersion model freeway, non-freeway major-road
and minor-road NOx, and their sum. (The variable names in the original dataset were O3_24h,
NO2_24h, PM10_24h, PM25_24h, FwyHwy_NOx, nFwyMjr_NOx, nFwyMnr_NOx, and
Total_NOx, respectively.) Simple averages of n consecutive daily measurements, for days before
the day of PSS administration, were treated as an exposure for n = 1, 2, …, 9, 10, 15, 20, 25, 30,
45, 60, 75, 90, 105, 120. These n-day variables were named so that two-day mean NO
2
concentration, for example, was MEAN2_NO2_24h.
15
2.3. Perceived Stress Measure
Though transformations of PSS-10 score in this sample passed various normality tests we
conducted, and thus PSS-10 could have been treated as a continuous variable (not required to be
normally distributed), PSS-10 score was dichotomized by taking scores greater than the sample
median (16) to be “high perceived stress,” so scores ≥17. Scores less than or equal to the sample
median were “low perceived stress.” This dichotomization happened to be close to a criterion,
≥14, sometimes mentioned in the literature for moderate-to-severe perceived stress, two
examples being studies that use the PSS-10 in the context of mental health during pregnancy
28
, or
preterm birth
29
. Using the moderate perceived stress threshold as the cut point would have placed
about two-thirds of the MADRES subsample in the higher category. Using the severe perceived
stress threshold, ≥27, would have placed only 5% (20 individuals) of the sample in the higher
category, running the risk of separation in logistic regression.
To maximize the number of participants in each of two categories and avoid separation,
the dichotomization of PSS-10 score was done at the median despite known issues with median
splits and dichotomization more generally. Here, the issue is not that a different cut point might
allow a more accurate predictive model. Rather, the goal in this thesis is to detect and estimate
the associations of pollution exposures with perceived stress, and choosing a different cut point
might affect estimates of association or their significance.
30
The arbitrariness of median splits
poses a problem of interpretation. Two individuals close to the median, but separated by the split,
are treated as different though they are both typical in the sense of being near the median. Also,
dichotomizing variables that approximate a continuous variable, or that originally have several
categories, can threaten statistical power. There has been a discussion regarding whether and
when median splits can even increase the probability of Type 1 error, which is an issue arguably
more serious in some circumstances than increasing the Type II error rate.
31–33
However, the
16
discussion has been more in the context of dichotomized independent variables and
multicollinearity, not dichotomized dependent variables.
2.4. Imputation
In addition to air pollution exposures, many variables (described under Confounding below)
were considered for inclusion in models. Two percent of values were missing among the
covariates considered that were not a meteorological or pollution variable. Missing data were
replaced with the most frequent category (the mode) in the case of categorical variables. In the
case of continuous variables, the median was used. An exception was maternal race, for which
missing responses were treated as indicating membership in an additional category, Missing
(membership in which might be associated with particular experiences of ethnicity and race or
other social experiences).
A simple type of random imputation was initially used (replacing a missing value with a
random non-missing value in the sample) with a random seed, but was discarded for reasons of
reproducibility and because it lacked the benefits of multiple imputation. This is noted for
transparency because the results of hypothesis testing were seen twice in many cases (pairings of
a pollutant and an exposure window). This might pose a multiple-testing issue though our
recollection is that results with the old imputation method were not inconsistent with results with
the modal/median imputation.
2.5. Overview of Statistical Analysis
The primary statistical method used in this study was multivariable binary logistic regression.
Geographically weighted logistic regression was done as one of several ways in which fitted
aspatial (global) logit models were evaluated. All analyses, including spatial analyses, were done
with R version 4.0.2.
17
Before applying augmented backward elimination as a method of statistical confounder
selection, possible confounders were first identified based on a priori knowledge and causal
criteria. Variables were identified as possible confounders if they, before seeing any sample data
or statistic, were plausibly associated with the exposure, plausibly caused the outcome, and were
not likely to be caused by either the exposure or the outcome. Many measures of aspects of
socioeconomic status were available in the dataset, but were correlated among themselves or
were likely to be noisy measures of socioeconomic status individually. This was also true of the
several available meteorological variables. So, principal component analysis was used to develop
measures of socioeconomic status (individual-level and neighborhood) and a single
meteorological measure as described in the Confounding subsection below. The created variables
were treated as possible confounders.
For each of the eight air pollutants, preliminary models were built so that the covariates
for that pollutant were the same for all the exposure windows considered. In testing pollutant
variable coefficients, a significance level of .05 was used with unadjusted P-values in this
exploratory stage and adjusted P-values later. The preliminary models were used to search for
peaks in association significance and size, which might signal a meaningful association. For such
peak exposure windows with a pollutant coefficient P-value <.05, covariates were then selected
as confounders specific to the pollutant and exposure window. If the association of the pollutant
with dichotomous perceived stress was significant (adjusted P<.05 at this stage) with a resulting
model, the model underwent sensitivity analysis with several alternative variables and was
assessed for effect modification. To avoid false positives, the Bonferroni multiple testing
procedure was used. Throughout, models were assesssed for multicollinearity, linearity, and
influential observations.
18
Standard ordinary least squares multiple regression (with the original PSS score variable,
rather than a dichotomy) was considered as an alternative way to obtain a single parameter
estimate for each pollutant and exposure window. However, even after considering
transformations, the linearity assumption was not satisfied in early exploration with partial
residual plots and small subsets of the pollutants and windows. For some exposures, extremely
low P-values (lower than those that will be reported in this thesis) were obtained, but it was
inappropriate to report the P-values or the parameter values when linearity was violated to the
extent that was seen. Piecewise regression was considered but was viewed as less useful in a
stage of clarifying the relationship between exposure window length n and the size of an
association of the n-day mean pollutant level with perceived stress.
2.6. Confounding
Possible confounders were available in the MADRES dataset and were identified consistently
with the Mehta et al.
6
air pollution and perceived stress study in Boston, Massachusetts, namely:
maternal age, education, physical activity (in our case both before and during pregnancy), and
race, and meteorology and seasonality. Covariates were collected at or near the time of
recruitment or during a third-trimester study visit, with the exception of meteorological variables
(described in a subsection below and linked to residential history). We additionally included
other indicators of individual- and neighborhood-level socioeconomic status (described below),
Hispanic/non-Hispanic ethnicity separately from race, English/Spanish language preference, and
a categorical variable for years having lived in the United States. An available marital status field
was also included, but re-coded to merge the categories Divorced or Separated, Widowed, and
Decline to Answer (which altogether had fewer than twenty individuals). Though the experiences
of the MADRES participants in these categories were likely diverse, the variable was re-coded to
avoid separation in logistic regression and under the assumption that divorce, separation, and the
19
death of one’s spouse, were uniquely stressful events. Similarly, a seven-category (including
Don’t Know) race variable in the MADRES data was re-coded to create an indicator variable for
whether or not the participant was in the Black or African American category. Though the people
in each of the resulting two categories were diverse, the counts for American Indian or Alaska
Native (5), Asian (9), Native Hawaiʻian or Other Pacific Islander (1), More Than One Race (10),
and Don’t Know (9), might have been too small for the categories to be considered separately in
some analyses. Notably, African or Black Americans might be uniquely situated socially even
among low-income persons in Los Angeles.
Other variables were available in the MADRES dataset but excluded as possible
confounders by flexibly applying the definition that a possible confounder is plausibly associated
with the exposure, plausibly is a cause or a proxy of a cause of the outcome, and is not likely to
be caused by the exposure or the outcome. This definition of a possible confounder reflects a
traditional definition of a confounder preceding the use of the backdoor path criterion in
epidemiology, but simple directed acyclic graphs were used to note variables that might be
colliders, mediators, or variables controlling for which might result in M-bias. For example, it is
plausible that smoking during pregnancy does not lie on the pathway between air pollution and
perceived stress, and causes perceived stress or not, and that a fourth variable (economic or
social) causes both air pollution exposure and smoking during pregnancy even if the smoking
does not cause ambient pollution exposure (by determining geographic location). Smoking could
then be a proxy confounder (in the sense of a proxy for a common cause of the exposure and the
outcome) as described abstractly by VanderWeele (2019)
34
, or (in the event smoking causes
perceived stress while the fourth variable does not) controlling for smoking might close a
backdoor path. However, it was found, with 21 females participating in a University of
Minnesota smoking cessation study, that PSS-10 score was associated with a measure of nicotine
20
dependence though the association was of questionable significance (P=.098) and there was not a
significant association with number of cigarettes smoked daily (P=0.26).
35
The authors
speculated about whether perceived stress could cause smoking-related behavior, and the
direction of causation does seem unclear enough to justify not keeping smoking during
pregnancy in a list of possible confounders.
By similar and related reasoning, other variables were excluded. Mehta et al.
6
adjusted for
anti-depressant medication use, which was not available in the MADRES dataset. However, an
available measure of depression (CES-D) was excluded as it was potentially a collider caused by
both air pollution and perceived stress, or a mediator. BMI was excluded as literature had shown
that perceived stress could cause severe obesity. In addition, a study showed positive significant
(P<.05) associations of NO
2
, O
3
, PM
2.5
, and PM
10
, with BMI.
36
Weight or body fat could have
been a collider. Available measures of prenatal distress, sleep disturbance and stressful life events
were excluded because of likely being either a mediator or caused by perceived stress.
Gestational age and day of the week of PSS administration, which would be natural to
consider in predictive modeling, were excluded. Plausibility was thought to be lacking for an
association of day of the week, for example, with the n-day mean pollutant levels considered in
general, half of which were for longer than two weeks. A weak association is possible.
Despite an issue of temporal sequence—an air pollution exposure’s possibly preceding
physical activity that might be supposed to cause that exposure, or physical activity’s occuring
after the start of high perceived stress—physical activity during pregnancy as measured by
Pregnancy Physical Activity Questionnaire total-activity score (representing activity in multiple
categories including occupational and sports activities) was considered a possible confounder.
The total-activity score could be associated with an unmeasured factor that does precede
outcome, or both exposure and outcome exposure. A related factor, pre-pregnancy activity was
21
assessed with a question about the number of days, in a typical week in half a year before
pregnancy began, on which there was physical activity for a total of at least thirty minutes. Pre-
pregnancy activity was considered a possible confounder, as was the possible proxy confounder
time outdoors (“Thinking back to a typical weekday in this past week, approximately how many
hours (out of 24 hours in total) did you spend Outdoors”).
2.6.1. Financial and socioeconomic status, and principal component analysis
For late-entry participants and the third trimester of pregnancy, available measures of financial
status and socioeconomic status were limited. So, proxy or alternative measures were sought.
Individual-level socioeconomic status. MADRES participants were administered the
Financial Stress Scale in their second trimester of pregnancy, but not in their first or third
trimester. Late-entry participants enrolled as early as 20 weeks’ gestation, but their first study
visit was not before 28 weeks’ gestation. Thus, none of the late-entry participants included in the
present study’s sample was administered the Financial Stress Scale. In addition, the demographic
income variables that were collected for both regular-entry and late-entry participants had high
missingness and a particular potential for measurement error. For example, Earliest Ascertained
Income (total household family income in the last year) was about 29% missing in the present
study’s third-trimester (late- and regular-entry) sample, and it presumably might have been that
participants with higher perceived stress were more likely to avoid overestimating income if
overestimating income might result in a denial of needed services. Because of the high
missingness and this possibility of differential measurement error, the variable was not a good
candidate for either imputation or exclusion of observations with missing values.
In the absence of a reliable measure of financial status for both late- and regular-entry
participants, financial status was indirectly measured by using variables on home conditions. In
22
every trimester, MADRES participants were asked no/yes questions about whether any of the
following specific things were present in their home during their pregnancy, since their last study
visit: mice, rats, roaches, and mold. From these four binary variables (0=No, 1=Yes), a new,
ordinal variable was derived that was their sum, HOUSING_ISSUES.
Additional MADRES study variables relevant to socioeconomic status included
OCC_EMPLOYMENT, a trimester-specific multiple-category variable (with the broad
occupational categories “Homemaker,” “Student,” “Employed,” “Temporary Medical Leave,”
“Unemployed,” and “Other”) from which was derived the binary re-coded variable
UNEMPLOYED; the time-invariant (collected once, at recruitment time) polytomous variable
MAT_INSURANCE, from which was derived HIGH_INSURANCE (0 if the participant had no
insurance, Medi-Cal or Medicaid, or Veterans Affairs coverage; 1 if the participant was well-
insured in the sense of having health maintenance organization, preferred provider organization,
point-of-service or other insurance); and the mother’s time having lived in the United States (as
of the first study visit after recruitment), MAT_YEARS (1=“1 - 10 Years,” 2=“11 -20 Years,”
3=“> 20 Years,” 4=”Lifetime US Resident,” -77=“Not Sure / Unknown,” -88=“Inconsistent
Response,” -99=“Less than 1 Year”), which was used as an ordinal variable. Less than 1 Year was
reassigned to 0, and Not Sure / Unknown and Inconsistent Response were treated as though the
value had been missing. Another variable relevant to socioeconomic status was the mother’s last
grade in school completed, MAT_EDU, an ordinal five-category variable that was collected at
screening or the first study visit and ranged from “Less than 12th grade (did not finish high
school)” to “Some graduate training after college.”
For the described individual-level indicators of socioeconomic status, Table 2 shows the
polychoric correlation matrix (generated using polycor::hetcor in R), which is based on assumed
underlying normally distributed variables and was used here in an exploratory fashion. (The
23
assumption of underlying normality might be inappropriate for dichotomous unemployed status
despite the possibility of underemployment and viewing employment on a continuum.) The
correlations are for the data following imputation.
Table 2: Polychoric correlation matrix of individual-level binary or ordinal indicators of socioeconomic
status
Variable 1 2 3 4 5
1 HIGH_INSURANCE —
2 HOUSING_ISSUES -0.24 —
3 MAT_EDU 0.76 -0.20 —
4 MAT_YEARS 0.03 -0.17 0.24 —
5 UNEMPLOYED -0.39 0.07 0.10 0.17 —
The correlation coefficient, 0.76, between educational attainment and quality of prenatal
health insurance was strong (magnitude >0.7). All of the other correlations were weak
(magnitude <0.3) or moderate. All variables had a non-negligible (magnitude >0.1) correlation
with half or more of the other variables. As measures of association between indicators of
socioeconomic status, all correlation coefficients had the expected sign with the exception of the
weak MAT_YEARS–UNEMPLOYED coefficient (0.17). Though the sign of this correlation was
unexpected, a feature-extraction method was considered potentially helpful for avoiding the issue
of picking inadequate indicators of socioeconomic status and potentially under-adjusting for
confounding by socioeconomic status. Rather than multicollinearity being a concern at this point,
the signs and magnitudes of the correlation coefficients—generally non-negligible and of the
expected sign, but less than strong—indicated that the variables were measures of socioeconomic
status, but imprecise ones whose use individually as covariates in regression analysis might have
resulted in residual confounding. In addition, dimensionality reduction might have revealed that
social and economic position was not unidimensional; there might have been an aspect of
socioeconomic status positively associated with unemployment and negatively associated with
years in the United States, while another aspect of socioeconomic status was negatively
associated with both unemployment and years in the United States.
24
The categorical principal component analysis algorithm PRINCALS
37–39
(Gifi::princals in
R) was used with the individual-level indicators of socioeconomic status to generate a new
variable, ISES1, that was the score on the first component. PRINCALS is an iterative algorithm
that changes quantifications of categories while minimizing a loss function. The interpretation of
the results of PRINCALS is similar to the interpretation of classical principal component analysis
(which also produces a linear transformation to orthogonal components that, when ordered, each
account for the largest amount of the variance remaining to be explained). Table 3 shows the
approximate loadings of only the first two components and the percentages of variance accounted
for (V AF) by these components. (Loadings in this context are coefficients scaled by the standard
deviation of the component.)
Table 3: Loadings of the first two PRINCALS components of individual-level socioeconomic status
indicators, and percentages of variance explained
Variable
Comp. 1
(VAF 32%) Loading
Comp. 2
(VAF 22%) Loading
HIGH_INSURANCE -0.82 0.40
HOUSING_ISSUES -0.19 -0.65
MAT_EDU -0.73 0.42
MAT_YEARS -0.20 -0.43
UNEMPLOYED 0.34 0.37
The signs of the Component 1 loadings suggested an interpretation that Component 1 was
a measure of status arising from a confluence of occupational status, educational attainment, and
social support reflected in health insurance or accruing from time spent in the United States. An
increase in Component 1 score, ISES1, was correlated with lower-quality insurance and lower
educational attainment—the variables have loadings much larger than 0.3 (a lower conventional
cutoff)—and with fewer years in the United States and a higher chance of being unemployed.
The signs of these correlations are what one would expect for a measure in which an increase
means a lower socioeconomic status. However, ISES1 increase also predicted fewer housing
(mold/pest) issues, which could paradoxically but conceivably have been explained by having a
smaller house or apartment; in that case, HOUSING_ISSUES might have been positively
25
associated with income. Alternatively, HOUSING_ISSUES was negatively associated with
income, but was not important in Component 1 as its loading was the smallest of the Component
1 loadings and smaller than 0.3. The Pearson and Spearman correlation coefficients of ISES1 and
maternal age were -0.29 and -0.20 respectively, which suggested a weak association between
youthfulness and ISES1.
By contrast, with Component 2, HOUSING_ISSUES (negatively correlated with
Component 2) joined HIGH_INSURANCE and MAT_EDU (both positively correlated with
Component 2) in a way one might expect if the variables and Component 2 all reflected
socioeconomic status. An increase in Component 2 score predicted higher-quality insurance,
higher educational attainment, and fewer housing issues presumably associated with income.
However, an increase in Component 2 also predicted both fewer years in the United States
(unexpectedly if highler MAT_YEARS and higher Component 2 represented higher
socioeconomic status) and a higher chance of being unemployed (also unexpectedly, if not being
unemployed and higher Component 2 represented higher SES). We tentatively interpreted
Component 2, which was necessarily linearly uncorrelated with Component 1, to explore the
unclear meaning of HOUSING_ISSUES.
All loadings on Component 2 were larger than 0.3. A participant could have been non-
unemployed (not identical to Employed in the original employment-status
OCC_EMPLOYMENT variable) by consequence of being a homemaker or student, rather than
being “employed,” and so we initially considered that a participant with low Component 2 score
might have been more likely to be a young homemaker or student born in the United States.
(Alternatively, one could be non-unemployed as a person who has low socioeconomic status but
is unable to work in available work environments for health reasons.) The Pearson and Spearman
correlations between Component 2 score and maternal age, though, were only -0.01 and 0.04:
26
very weak. Component 2 score thus seemed to be a variable that was associated with time in the
United States, but not with age.
Time in the United States, MAT_YEARS, warranted examination not only because of its
seeming inconsistent relationship with possible facets of socioeconomic status, but also because
of the coding of its categories, described earlier. The treatment with PRINCALS of
MAT_YEARS as an ordinal variable (instead of creating several indicator variables) presumed
that, for example, a person who was eighteen years old but a Lifetime US Resident
(MAT_YEARS = 4) was higher in some sense than both an eighteen-year-old who had been in
the United States slightly less than eighteen years (MAT_YEAR = 2) and a thirty-year-old who
had been in the United States for slightly less than thirty years (MAT_YEARS = 3). Used as an
ordinal variable in the present study, it was an imperfect attempt to capture migration and travel
history in a single measure, but its loading on Component 2 was twice as large as its loading in
Component 1, and the loading of HOUSING_ISSUES in Component 2 was also relatively large
and negative. The loading of UNEMPLOYED in Component 2 was smaller than in Component
1, but its sign was the same. So, before using a formal stopping rule to determine components as
meaningful, we tentatively interpreted Component 2 as a measure of belonging to situations of
having been born in the United States, having low income, and being employed: conditions that
might be associated with a particular social class, stratum, or personal or family migration
history.
A scree plot showed a drop from the Component 1 eigenvalue to an inclined plateau
(virtually a straight line) beginning with Component 2 and ending with Component 4 before a
drop to the Component 5 eigenvalue. Because of the appearance of two breaks at the ends of the
scree plot with so few components, the broken-stick method
40
was used to choose the number of
components to retain. The percentage of variance accounted for by Component 1 was 32%,
27
which was less than the broken-stick model expected proportion for the longest of five pieces
formed at uniformly random points in the interval (0, 1), (1/1 + 1/2 + 1/3 + 1/4 + 1/5)/5 ≈ 0.46.
However, the eigenvalue corresponding to Component 1 was 1.6, which satisfied the Kaiser rule
(λ > 1). So, only ISES1, score on the first component, was further considered as a possible
covariate in regression models. As a precaution due to the ambiguous importance of ISES1, the
original variables, such as HIGH_INSURANCE, were also considered as possible model
covariates individually. In the case of unemployment, the original, six-category variable
OCC_EMPLOYMENT was used instead of the binary re-coded variable.
Geographic socioeconomic status. ISES1 and, for example, HIGH_INSURANCE were
individual-level indicators of socioeconomic status, measured on individual MADRES
participants themselves from questionnaire data. Several neighborhood-level or geographic
indicators of socioeconomic status were available. As described in the Bastain et al. study
protocol
1
for the MADRES cohort study, home locations of participants (collected using a
residential history questionnaire in the third trimester, or at recruitment or at the first study visit)
had been matched with five indicators of socioeconomic status obtained from the California
Communities Environmental Health Screening Tool (CalEnviroScreen) version 3.0. The
CalEnviroScreen 3.0 indicators were census-tract-level indicators based on American
Community Survey (U.S. Census Bureau) sample data: 1) low educational attainment as
percentage of persons over 25 years old who have not graduated high school, 2) percentage of
households that are housing-burdened (spending more than 50% of their income on housing) and
low-income (receiving less than 80% of the Housing and Urban Development county median
family income), 3) percentage of households with linguistic isolation (all adults and children
older than 13 in the household speak limited English), 4) percentage of persons belonging to
families who are below twice the U.S. federal poverty level, and 5) unemployment as percentage
28
of the population older than 16 who are not employed, but seeking work and considered to be
noninstitutional and part of the labor force. All of the indicators were five-year estimates for
2011-2015, with the exception of Housing-burdened Low-income Households
(CES_HOUSING_BURDEN), which was an estimate for 2009-2013.
For the present study, the values of these socioeconomic status indicators for the
residential location of the participant, at or near the time of taking the Perceived Stress Scale,
was used. The socioeconomic indicator values for the latest available date less than or equal to
the PSS date were used if present in a 2020 March 17 dataset linking CalEnviroScreen 3.0 data to
residential history, or else the values for the earliest available date greater than the PSS date.
Table 4 shows a combined Pearson and Spearman correlation matrix for the socioeconomic
indicators, for the late-entry or regular-entry third-trimester observations.
Table 4: Pearson (and Spearman) correlation matrix of geographic (CalEnviroScreen 3.0) indicators of
socioeconomic status
Variable 1 2 3 4 5
1 CES_EDUCATION —
2 CES_HOUSING_BURDEN 0.64 (0.53) —
3 CES_LINGUISTIC_ISOLATION 0.71 (0.54) 0.41 (0.27) —
4 CES_POVERTY 0.87 (0.76) 0.75 (0.61) 0.67 (0.52) —
5 CES_UNEMPL 0.16 (-0.03) 0.16 (0.11) -0.04 (-0.05) 0.17 (0.08) —
Most pairs of variables had correlation coefficients indicating a moderate or strong linear
or monotonic association. All of the exceptions involved the CalEnviroScreen unemployment
variable, for which most Spearman coefficients were negligible (ρ<0.1). However, all but one of
CES_UNEMPL’s Pearson correlations were non-negligible, and it was considered that
unemployment might still have contributed to census tract socioeconomic status, albeit with
noise. At the same time, including unemployment while applying a feature-extraction method
might be useful for developing a measure of geographic socioeconomic status that is less
sensitive to other CalEnviroScreen variables and more accurate and less noisy as a composite
measure. Principal component analysis (singular value decomposition; stats::prcomp in R) was
29
used with these indicators of geographic socioeconomic status to generate new variables, GSES1
and GSES2, that were the scores on the first two principal components of the CES variables. The
CES measures were centered and scaled in the analysis. In this case, several of the CES variables
were correlated with multiple of the other CES variables, and so principal component analysis
offered the additional benefit of helping to avoid multicollinearity.
41
Table 5 shows the loadings
(eigenvector elements times the square root of the eigenvalue) on the first two principal
components of the variables and the percentages of variance accounted for by the components.
Some oblique and orthogonal rotations were calculated, but did not meaningfully change the
loadings.
Table 5: Loadings on the first two principal components of the geographic (CalEnviroScreen 3.0)
indicators of socioeconomic status, and percentages of variance explained
Variable
PC1
(VAF 61%)
Loading
PC2
(VAF 21%)
Loading
1 CES_EDUCATION 0.93 -0.04
2 CES_HOUSING_BURDEN 0.80 0.12
3 CES_LINGUISTIC_ISOLATION 0.78 -0.33
4 CES_POVERTY 0.95 0.03
5 CES_UNEMPL 0.20 0.95
The scree plot showed a big drop from the V AF of Principal Component 1 (PC1), 61%, to
an inclined plateau ending with the last component without any other obvious break. The V AF of
PC1 was higher than the broken-stick model expected percentage for the longest of five pieces,
≈0.46%. The V AF of PC2, 21%, was lower than the broken-stick model threshold for the second
of five components, (1/2 + 1/3 + 1/4 + 1/5)/5 ≈ 0.26. So, only PC1 was interpreted initially. The
signs of the PC1 loadings were all positive, and all loadings but the CES_UNEMPL loading
(0.20) were larger than 0.3. Recalling that CES_EDUCATION is the percentage of persons over
25 years old who have not graduated high school, the consistently positive loadings of these
socioeconomic indicators suggested a measure that was positively associated with each of the
census-tract socioeconomic indicators though the CES_UNEMPL loading was arguably
30
negligible. The variables other than CES_UNEMPL had a loading well above 0.7 on PC1. PC1
score, called “GSES1,” was thus interpreted as a measure of geographic socioeconomic status
with higher values’ meaning lower status. The small size of the CES_UNEMPL loading in PC1
reflected the lesser importance of unemployment in PC1 and that much of the variability of
neighborhood unemployment among MADRES participants in the third trimester was not
explained by GSES1.
Despite the PC2 V AF’s not meeting the broken-stick method threshold, the PC2
eigenvalue was slightly above 1.0. The very large loading (0.95) of CES_UNEMPL in PC2,
together with the relatively small (mostly negligible or borderline) loadings of all the other CES
variables in PC2 and the small loading of CES_UNEMPL in PC1, suggested that
CES_UNEMPL could be considered as a possible confounder separately from the other CES
socioeconomic indicators. No other variable had a loading in PC2 higher than its loading in PC1.
Rather than considering CES unemployment by itself, though, it was noticed that the large
loading of CES_UNEMPL did coincide with the non-negligible loading -0.33 of
CES_LINGUISTIC_ISOLATION in the same component, and that there was some evidence in
the literature
42
that recent migrants in the United States in the early 1990s and undocumented
persons in particular had been less likely to report being unemployed. This suggested a
correlation of employment and linguistic isolation with each other and not just with a principal
component. In addition, there was evidence that, for example, some Spanish-speaking
undocumented migrants in migration-trust networks, who might have belonged to linguistically
homogeneous neighborhoods, were less likely to report being proficient in English than
undocumented migrants not in such networks.
43
Though not all linguistically isolated households in Los Angeles had members who were
undocumented or first-generation immigrants, an association of recent-immigrant origin with
31
both linguistic isolation and not being unemployed might thus have explained the two largest
non-negligible loadings, with opposite signs, of PC2 without contradicting the interpretation of
PC1 (in which census-tract unemployment and linguistic isolation had the same direction of
association with the component), or the interpretation of the PRINCALS Component 1 (in which
individual unemployment and time in the United States had opposite directions of association)
for the categorical indicators of individual socioeconomic status. Since also the eigenvalue for
PC2 was marginally above 1, PC2 score (GSES2), was therefore included as a possible
confounder. Together, PC1 and PC2 explained 82% of total variance.
2.6.2. Meteorology and principal component analysis
For meteorology, n-day mean precipitation, relative humidity, specific humidity, shortwave
radiation, temperature and wind speed were estimated for n in N = {1, 2, …, 9, 10, 15, 20, 25, 30,
45, 60, 75, 90, 105, 120}, based on daily measurements linked by MADRES study team
members to MADRES participants’ residential history. For example, 10-day mean precipitation
(MEAN10_PR) was the mean of daily precipitation estimates for the ten days before
administration of the Perceived Stress Scale, not including the day on which the PSS was taken.
As a way of addressing all exposure windows simultaneously, Table 6 below shows only,
for each pair of meteorological observables, the smallest of the Pearson (and Spearman)
correlation coefficients between the n-day means of the variables: the coefficients with the least
absolute values. For example, using set-builder notation and R code loosely, the Pearson
correlation coefficient -0.03 in the left-top corner of Table 6 is the element r in set R =
{cor(MEANn_RHA VG, MEANn_PR, method='pearson') | n {1, 2, …, 9, 10, 15, 20, 25, 30, ∈
45, 60, 75, 90, 105, 120}} such that the absolute value of r is the minimum of abs(R), and
similarly the Spearman coefficient 0.00 is the smallest of the n-day correlation coefficients for
daily average relative humidity and precipitation, {cor(MEANn_RHA VG, MEANn_PR,
32
method='spearman') | n ∈ N}. MEANn_RHA VG is the n-day mean estimated daily average
relative humidity for the MADRES participant at their residential location at the time they took
the PSS. (The correlation coefficients shown in this summary table can be for different n though
each coefficient is for two n-day mean meteorological variables for the same n.)
Table 6: Smallest Pearson (and Spearman) correlation coefficients for n-day mean daily meteorological
estimates at residential locations in the third trimester of pregnancy
Meteorological Variable 1 2 3 4 5
1 PR (precipitation) —
2 RHA VG (avg. relative humidity) -0.03 (0.00) —
3 SPH (specific humidity) -0.14 (-0.22) 0.47 (0.39) —
4 SRAD (shortwave radiation) -0.40 (-0.33) 0.16 (0.16) 0.61 (0.64) —
5 TA VG_C (avg. temperature) -0.33 (-0.42) 0.01 (0.05) 0.73 (0.76) 0.56 (0.57) —
6 VS (wind speed) 0.54 (0.29) 0.05 (0.08) -0.12 (0.01) 0.00 (0.00) -0.27 (-0.10)
Most meteorological variables had both a non-negligible (magnitude >0.1) smallest
Pearson correlation and a non-negligible smallest Spearman correlation with most other
meteorological variables. The exceptions were wind speed, which nonetheless had a smallest
Pearson correlation of 0.54 with precipitation, and relative humidity, which had a smallest
Pearson correlation of 0.47 with specific humidity and a smallest Pearson correlation of 0.16
with shortwave radiation. Because wind speed had tiny smallest Spearman correlations with
specific humidity (0.01) and shortwave radiation (0.00) and had both a negligible smallest
Pearson correlation and a negligible smallest Spearman correlation with relative humidity, we
considered treating wind speed separately from the other meteorological variables. Crude and
seasonally stratified negative associations between wind speed and PM
2.5
concentration in Hong
Kong had been reported in the literature,
44
and wind speed might have contributed to lowering air
pollution in Los Angeles while being weakly or negligibly correlated with almost all of the other
meteorological variables considered here. Wind speed’s correlation with precipitation did not
entail association with any of the other meteorological variables.
Though principal component analysis including wind speed might have revealed a second
component in which wind speed was the most important variable, we omitted wind speed in a
33
principal component analysis of the meteorological variables, centered and scaled, and
considered n-day mean wind speed as a separate possible confounder. We divided (using
Hmisc::cut2 in R) n-day mean wind speed (MEANn_VS) values into eight quantile groups and
used these sample octile groups (MEANn_VS_OCTILE) of n-day mean wind speed, because of
the possibility of a non-monotonic or nonlinear relationship between wind speed and perceived
stress. (We later used quartiles, when probing specific interesting models in sensitivity analysis
and partly to decrease the number of observations per parameter.) Figure 1 shows, for each n in
N, the proportions of variance accounted for by the principal components of the selected n-day
mean meteorological variables. The dashed, red curve shows the broken-stick model proportions.
Figure 1: Proportions of variance accounted for by principal components of selected n-day mean
meteorological variables, by selected n-day window
For every n in N, the V AF of the first component was higher than the broken-stick model
expected proportion. For most—eleven of the twenty—values of n considered, the V AF of the
second component was higher than the broken-stick proportion. However, the values of n
(ranging from 1 to 120) were not equally spaced, and the first ten values of n were ≤10. Though
it is conceivable that n-day PC2 is meaningful for n ≤ 15 and not meaningful for n ≥ 20, the
profile of the coefficients in the second principal component of the n-day meteorological
34
variables was roughly the same between the different values considered for n, and for
consistency only PC1 was retained for all n in N. Figure 2 shows, for each n in N, the loadings on
PC1 of the n-day meteorological variables. Loadings in this context are eigenvector elements not
scaled by component standard deviation.
Figure 2: Loadings on the first principal component (eigenvector elements) of the n-day mean selected
meteorological variables, by n
No coefficient was smaller than 0.1 (dashed, red lines) for any n in N. The coefficients
scaled by the eigenvalue would be larger than 0.1. Specific humidity, shortwave radiation and
temperature all had negative coefficients in PC1. For those variables, the smallest (in this case
greatest) coefficient over the windows considered was -0.47. Precipitation coefficients ranged
from 0.30 to 0.43. Relative-humidity coefficient ranged from -0.11 to -0.33, with several of the
coefficients near -0.1. For every window, the signs of the PC1 loadings were consistent with
what would have been expected if PC1 score had been a measure of a weather characteristic of a
cold/cool (or hot/warm) season. The Mediterranean climate of Los Angeles has a hot, sunny
summer and a mild, less-humid winter that is wet particularly in February.
45
Facilitating
interpretation of PC1, Figure 3 is a line chart of the mean, for the month of third-trimester
35
Perceived Stress Scale administration, of score (MEANn_MET1) on the first principal
component of the n-day meteorological variables, by number of days n.
Figure 3: Mean score on PC1 of n-day meteorological variables, versus PSS month, by n
The connected line segments between adjacent months form roughly sinusoidal curves.
There was no other smoothing, and so the chart shows a clear relationship between score on the
first component of the n-day meteorological variables, and the month of the PSS administration
day following the n days. The curve for each n-day window has only one peak, which is in
January, February, or March. In Los Angeles, January and February are the coldest and wettest
months. Each curve has also only one trough, which is in one of the three hottest months (July,
August, and September). In the context of PC1’s being the only meaningful principal component,
these elements suggest that MEANn_MET1 is a measure of weather distinguishing two broad
seasons. That is despite PC1’s being a principal component of data for a particular sample. This
is consistent with the treatment in some studies
6
(albeit in a different region of the United States)
of seasonality as a binary variable, but the use of a binary seasonality variable—or the month of
PSS administration, when meteorological conditions in a given month can vary from year to year
—might result in residual confounding that can be avoided with the present study’s dataset. Also,
36
a typical categorical seasonality or time-of-year variable does not account for the difference in
weather—both contributing to air-pollutant levels and experienced by the participant—that might
exist between the different exposure windows preceding PSS administration.
Rather than using MEANn_MET1 itself, MEANn_MET1 octile group
(MEANn_MET1_OCTILE) was considered as a possible confounder and used in regression
models. Though infrequent in Los Angeles, extreme cold-season weather and extreme warm-
season weather could both contribute to perceived stress, and so there might be a non-monotonic
relationship between MEANn_MET1 and perceived stress. The eight quantile groups were used
to assess confounding despite the potential for having an excessive number of model degrees of
freedom. The quartiles and median were later used, in sensitivity analysis and assessment of
effect modification.
Though the peak values of MEANn_MET1 vary between n more than the trough values,
the curves in Figure 3 appear in some respects to be roughly the same curve shifted left or right.
This might be explained by differences in 5-day average weather, for example, and 120-day
average weather both being characterized primarily by location in a transition between the cool
season and the warm season. Also, for higher n, MEANn_MET1 represents both recent weather
and weather further in the past, and a peak in that component score indicates a peak earlier in the
year of a type of weather represented by the component loadings.
Secondary principal components of the n-day meteorological variables were inspected
and might represent variation in short- or -long-term average weather due to phenomena such as
Santa Ana winds, storms, and changes involving temperature inversion or marine layer. However,
none of these other components was retained, due to low proportions of variance.
Despite not retaining the other components, we also included the sample octile group
(MEANn_RHA VG_OCTILE) of n-day mean average relative humidity (MEANn_RHA VG),
37
besides MEANn_MET1_OCTILE, as a possible model covariate. We used the quantile groups
because of the possible non-monotonic or nonlinear relationship between relative humidity and
perceived stress. (Alternatively, polynomial or spline terms could have been used after examining
plots.) Relative humidity had been included in the principal component analysis of
meteorological variables, but did not load as strongly on meaningful components (the
Component 1s). The fourteen smallest relative-humidity coefficients, belonging to the n-day
PC1s for the fourteen lowest values of n considered, were smaller than 0.3. Half of the
coefficients were smaller than 0.2, and the largest loading coefficient was only 0.33. There was a
possibly ambiguous loading of relative humidity on PC1, the sole meaningful principal
component of the n-day dataset for most equally spaced values n ≤ 120. However, a known
association between relative humidity and each of NO
x
, O
3
, PM
2.5
, and PM
10
had been reported in
the literature, as had an association between relative humidity and stress.
2.6.3. Variable-selection procedure
Table 7 below shows the final list of variables that were considered for inclusion in models as
confounders.
38
Table 7: List of covariates before automated confounder selection
Type of Variable Variables
behavioral hours outdoors in a typical weekday in the past week (HOME_OUTDOOR),
pre-pregnancy physical activity (>30 min in a day) in days per week
(PRE_ACTIVITY),
third-trimester Pregnancy Physical Activity Questionnaire total-activity score
(PPAQ_TOTAL)
demographic, economic or social employment (multiple-category) status (OCC_EMPLOYMENT)
*
,
first and second principal components of indicators of census-tract
socioeconomic status (GSES1 and GSES2),
first PRINCALS component of indicators of individual socioeconomic
(ISES1),
high-quality insurance or not (HIGH_INSURANCE)
*
,
Hispanic/Latinx ethnicity or not (MAT_HISPANIC)
*
,
housing (mold/pest) issues count (HOUSING_ISSUES)
*
,
maternal age in days (MAT_AGE),
maternal education level (MAT_EDU)
*
,
Black or African American, or not (AFRICAN_AMERICAN)
*
,
marital (multiple-category) status (MAT_MARITAL_RECODED)
*
,
prefers to speak Spanish (MAT_LANGUAGE)
*
,
years (categorical) living in USA (MAT_YEARS)
*
meteorological octile group of first principal component of n-day meteorological variables
excluding wind speed (MEANn_MET1_OCTILE)
*
,
octile group of n-day mean average relative humidity
(MEANn_RHA VG_OCTILE)
*
,
octile group of n-day mean wind speed, (MEANn_VS_OCTILE)
*
temporal PSS month of administration (MONTH)
*
*
used as an unordered categorical variable (indicator variables)
The variables listed in Table 7 were chosen by applying a definition of a possible confounder.
This is less restrictive than the common-cause criterion, for example, for confounders. Adjusting
for all of the variables might unnecessarily contribute to multicollinearity, sparseness, degrees of
freedom, and biased or less precise parameter estimates. Statistical confounder selection has
often been done to address these and other issues. In recent literature
46
, it has continued to be
stressed that the selection of variables as confounders should not rely on data-driven methods
that pick variables based on the same study’s sample alone and changes in parameter estimates
for variables of interest, or on magnitudes or significance of association of a covariate with the
exposure or outcome. Ideally, a directed acyclic graph is completed and the relevant causal
structure is known. In circumstances where the complete causal structure is not known, limited a
priori knowledge could be applied with chosen reasonable definitions and rules to create a
working set of variables that are plausibly confounders, which can then be used to initialize some
reproducible confounder-selection method.
34
39
One such method is the augmented backward elimination (ABE) algorithm,
47
which uses
a standardized change-in-estimate rule and has been reported to be an improvement over a
Hosmer–Lemeshow purposeful selection algorithm. ABE was applied with the above
prescreened variables. The result of ABE is a subset of the initial working set and specifies a
model that is still preliminary. After creating an initial working list of variables, benefits of using
a selection algorithm include avoiding overadjustment (such as for a collider or a mediator) and
unnecessary adjustment.
48
ABE differs from other methods in that it, by default in R (abe::abe),
uses a lenient criterion for exclusion of so-called active variables based on likelihood ratio test P-
value, and a permissive criterion for re-inclusion of a temporarily-excluded, active variable based
on a certain type of change in estimates for so-called passive variables. In ABE for a logit model,
a change-in-estimate is standardized to be exp(abs(δ
p
) SD(X
p
)) – 1, where δ
p
is the difference
between the coefficient on the passive variable X
p
in the fitted model including the active variable
X
a
, and the coefficient while excluding X
a
; abs(δ
p
) is the absolute value of the difference, and
SD(X
p
) is the standard deviation of X
p
in the sample. The least significant active variable X
a
whose P-value is less or equal to the chosen cutoff α (default .2 in abe::abe) is considered for
retention by the change-in-estimate criterion that exp(abs(δ
p
) SD(X
p
)) – 1 ≥ τ (default τ = 0.05)
for every passive variable. Failure to pass the change-in-estimate criterion results in re-fitting the
model without the active variable: starting over with a revised working list, from which X
a
is
permanently excluded. If the change-in-estimate criterion is passed, the next-insignificant
variable on the temporary exclusion list is similarly evaluated for change in effect—and so on
until the temporary exclusion list is exhausted. If a temporary exclusion list is exhausted, the
entirety of that list—which is the latest temporary exclusion list—is re-included by the
algorithm’s terminating with an unchanged working list. Unless bootstrapping is used (not done
here), the method does not involve random number generation and is easily repeatable. In this
40
thesis, all members of the initial working set that were not the main variable of interest, the n-day
pollutant variable, were used as an active variable and never as a passive variable, and the
pollutant variable was used as the sole passive variable.
In theory, it might be that the preliminary final list of confounders should differ between
one exposure window (n) and another for the same air pollutant, and also that the initial working
set should not be the same for all air pollutants. For example, the best subset for 2-day mean
PM
2.5
might differ from the best subset for 120-day mean PM
2.5
, and it might be that NO
2
should
be included as a confounder in a model of PM
2.5
–perceived stress association. Table 7 does not
include any air pollutant though an air pollutant itself could be a confounder or effect modifier of
the association of a different air pollutant with perceived stress. In this study, though, there are 8
air-pollution exposures and 20 exposure windows, for 160 models to be fitted. In the exploratory
stage of this study, the same initial working set of variables (Table 7) was used with each
pollutant and window with the qualification that n-day means of covariates (for which means of
daily measurements had been calculated) accompanied the n-day mean pollutant level in the set.
For example, the octile groups of 2-day mean average relative humidity, 2-day mean wind speed,
and score on the first principal component of 2-day mean meteorological variables, were used
with 2-day mean NO
2
, 2-day O
3
, 2-day mean PM
2.5
, or another 2-day mean pollutant exposure.
For each pollutant, ABE was iterated over the following eight equally spaced values for n: 15,
30, 45, 60, 75, 90, 105, and 120.
For each pollutant, the resulting confounder sets were visualized using the UpSet method
implemented as UpSetR::upset
49
in R. The UpSet technique facilitated examining the
intersections of the confounder sets, counting the number of sets in which a variable was
included, and discovering patterns such as a variable’s being included in a cluster of sets (such as
sets for the first, middle or last values of n considered),
41
For the purpose of exploring how the association of an air pollutant with perceived stress
varies over exposure window lengths, we made the simplifying assumption that the confounders
for an air pollutant are the same regardless of exposure window length. For a given air pollutant,
if a variable was in an ABE confounder set for at least two exposure windows (25% of the
equally spaced exposure windows considered), it was assumed to be a confounder for all
exposure windows for that pollutant. Eventually, there were only eight confounder sets in this
exploratory stage, one for each pollutant.
2.7. Logistic Regression Assumptions
For each air pollutant, we fitted (stats::glm in R) for each exposure window a binary logit model
of perceived stress with the selected covariates. To assess multicollinearity, for each model we
obtained (car::vif in R) the generalized variance-inflation factor (GVIF) for each regressor (main
or confounder). Some of the variables with categorical variables had more than two levels and so
had more than one degree of freedom, ν. For each variable, GVIF
1/(2ν)
was compared with the
cutoff θ
1/2
for θ = 10. If any variable that was not the main variable of interest (the air-pollutant
level) had a GVIF greater than θ
1/2
in more than one-fourth of the n-day models (n in the
complete list N), the variable with the greatest mean GVIF was removed from the confounder set
for the air pollutant, and all of the n-day models were re-fit. The procedure was repeated until no
confounder exceeded the threshold. We then considered excluding confounders whose GVIF was
greater than θ
1/(2ν)
. If only one confounder exceeded this variable-specific threshold (which was
stricter than >θ
1/2
for categorical variables with more than one degree of freedom) in more than
one-fourth of the n-day models, the confounder was kept in all models for the pollutant. If two or
more confounders exceeded the threshold in more than one quarter of the n-day models for the
pollutant, the confounders were ranked by GVIF
1/(2ν)
for each such model. If the same two
variables were, for all those models, the variables with the highest GVIF
1/(2ν)
(the ranks of the two
42
variables were otherwise disregarded) and one of the variables seemed related but less
interpretable or plausible, the less interpretable/plausible variable (which could have been an n-
day mean) was removed from the confounder set. Then, all n-day models were re-fit for the
pollutant. The whole procedure to reduce multicollinearity, which involved a manual step, was
iterated until the confounder set stopped changing.
To address the linearity assumption of logistic regression, for each pollutant and exposure
window the Box-Tidwell method was used with a P-value threshold of .1 for the x log(x) term.
Observations with the sample minimum or maximum of the pollutant variable were omitted to
later prevent the most extreme outliers from influencing the appearance of plots. If the P-value of
the x log(x) term was less than .1, the assumption of linearity was rejected. If the assumption of
linearity was rejected for any of the n-day models, scatter plots with a LOESS (locally estimated
scatterplot smoothing) curve (ggplot2::geom_smooth, span=0.75) were generated for selected n-
day models, of the estimated log-odds of high perceived stress versus the n-day mean level of the
pollutant. The plots were generated for n = 1, 3, 5, 7, 10, 15, 30, 60, 90, 120, and additional plots
were generated for Tukey-ladder transformations of the pollutant variable. The powers of the
transformation were -3, -2, -1, -1/2, 0 (logarithm), 1/2, 2, and 3. The plots were used to pick a
transformation that, for most n-day models for the pollutant, resulted in a roughly straight curve
over a wider (relative to the sample range of the pollutant variable) interval without too few
observations at the tails of the curve on the interval. It was also required that the transformation
decrease the number of n-day models for which the linearity assumption was rejected by the
Box-Tidwell procedure. If a transformation f was chosen, the Box-Tidwell method was used as
before to evaluate the linearity of the relationship between the newly transformed variable with
the log-odds of high perceived stress. If the P-value of a f(x) log(f(x)) term was significant,
43
scatter plots and LOESS curves were used as before to see if there was a discernible
improvement with a second transformation.
If a transformation was chosen for a pollutant, the automated confounder selection
method described previously was done again. If the confounder set changed, both
multicollinearity and linearity were checked again. If the confounder set did not change,
multicollinearity was checked again because of the transformation.
After the other assessments and following any pollution-variable transformation or model
revision, all models were evaluated for influential observations. Influence diagnostics more
appropriate for explanatory or descriptive binary logistic models were considered. Because the
goal was the estimation, for each pollutant and exposure window, of a single parameter—the
coefficient on the n-day mean pollution variable—we focused on the scaled DFBETA
(DFBETAS) deletion diagnostic for the pollutant variable. Observations with a DFBETAS
greater than 1 (meaning that they changed the pollutant variable’s coefficient by more than one
standard error) were flagged for examination.
2.8. Selecting Candidates for Final Models
For each pollution exposure, a scatter plot of n versus the n-day mean pollutant effect estimate
was generated with point size representing Wald test P-value. Ninety-five percent confidence
intervals (profile-likelihood confidence intervals generated using stats::confint and
MASS::confint.glm in R) were displayed in parentheses for exploratory purposes, but neither the
confidence intervals nor the P-values shown were corrected for multiple testing.
If the estimated association/P-value scatter plot for a pollutant showed that the pollutant-
variable unadjusted P-value was <.05 for any n-day model, the n-day exposure window for
which the P-value was the lowest was further explored for that pollutant. If the next longest or
44
next shortest exposure window considered had a lower P-value <.05 but a larger association,
suggesting a peak in association size, that window was explored instead.
For every such marked pollutant and exposure window, augmented backward elimination
of above-listed possible confounders was done, and all and only the retained confounders
specific to that window—except the month of PSS administration, because of multicollinearity—
were included in a new model for that pollutant and window. (Previously, the same confounder
set was used in all n-day models for the same pollutant.) We considered this revised model to be
a candidate for a final model for that pollutant–window. Significance was assessed using a
Bonferroni correction for multiple testing, to control the family-wise error rate (FWER).
The family of tests was considered the tests that could support rejecting the broad null
hypothesis of no association of the specific air pollutant (any n-day exposure window) with
perceived stress. Unadjusted Wald test and likelihood ratio test P-values were multiplied by 20 +
1 = 21, for the number of exposure windows and the single revised model. The adjusted P-values
were compared with α = .05. P-values of the pollutant–perceived stress association for close
window lengths were expected to be positively correlated while the likely relationship between
P-values for distant window lengths was less clear. FWER and false discovery rate -controlling
procedures compatible with dependence of P-values could have been used. However, n-day
models for the same pollutant were considered interchangeable in the sense that they involved
testing a null hypothesis of no association of that pollutant with perceived stress. Also, the same
Bonferroni adjustment (21 tests) was convenient for adjusting P-values for sensitivity-analysis
models, which were explored to scrutinize the revised model (not search for another significant
association). Bonferroni-adjusted 95% confidence intervals were also obtained, as the profile-
likelihood CI at confidence level 1 – α/21 ≈ .9976.
45
The revised models with a pollutant variable adjusted P-value <.05 were subjected to
sensitivity analysis, and modified to include interaction terms or fitted to stratified data while
examining for effect modification.
2.9. Sensitivity Analysis
In sensitivity analysis, we considered two datasets without influential observations, and several
alternative models. For every final-model candidate, the same model was fitted without
observations with a pollutant-variable DFBETAS larger than 2/sqrt(sample size). This was done
for the purpose of sensitivity analysis alone, to see what the estimated odds ratio for the pollutant
variable was—whether any participant had an erroneous measurement or not, and whether they
should be excluded or not—when observations that particularly impacted the pollutant-variable
coefficient were removed from the data. The same model was also fitted without observations
that had a DFBETAS larger than 3/sqrt(SS), a less strict size-adjusted threshold for considering
an observation “influential.”
Several alternatives to the octile group of MEANn_MET1 were considered: the binary
variable MEANn_MET1_MEDIAN, formed from a median split of MEANn_MET1; the quartile
group of MEANn_MET1 (MEANn_MET1_QUARTILE); the month of PSS administration
(MONTH); the Northern Hemisphere meteorological season FOURSEASON (winter, December-
February; spring, March-May; summer, June-August; fall, September-November), and cool
(November-April) or warm (May-October) season TWOSEASON. (These six-month seasons
were based on various temperature statistics for Los Angeles in recent years.)
As an additional confounder, AFRICAN_AMERICAN was considered. Separately, the
third-trimester Center for Epidemiologic Studies Depression (CES-D) scale score, CESD, was
also considered. Other singly forced-in variables included CESD_16PLUS, a binary variable
46
indicating CESD ≥ 16 (a threshold for the presence of depressive symptoms), and the ISES1
measure of individual socioeconomic status.
Other ambient pollution variables from the same eight pollutants were also included one
at a time if appropriate. Any pollution variable that was so included was treated as a possible
confounder. In addition, more than one other pollution variable was included in the same
sensitivity-analysis model if appropriate. For example, n-day mean 24-hour NO
2
level and n-day
mean O
3
level would be included in a PM
2.5
model separately and together.
Final-candidate models were further assessed, in this case for linearity, by comparing
Akaike information criterion (AIC) values with models with a natural cubic spline function of the
pollution variable. (The stats::glm and splines::ns functions were used in R.) The spline had k = 2
to 10 (inclusive) degrees of freedom.
Measured confounders, and model specifications with alternative functional forms of the
pollutant variable, were considered thus far. Despite the variety of confounders considered,
estimates of association might be biased due to lack of adjustment for an unmeasured
confounder. In the sensitivity analysis literature, there has been writing on the simulation of, or
algebraic or formulaic adjustment for, an unmeasured confounder in the context of a binary
exposure, a continuous exposure with probit models, or a continuous exposure with logit
models.
50–52
Because a software implementation of a method for continuous exposures and logit
models was not available for R, as a provisional and tentative method we loosely followed
Groenwold et al.
51
, considered a binary unmeasured confounder independent (by assumption and
given exposure status) from other confounders, and simulated such a confounder with both a
crude association with exposure, and an association with a binary outcome given exposure status.
In our case, the exposure was continuous—an air pollutant concentration—rather than
binary. The prevalence of the unmeasured confounder was not specified in advance. Instead, we
47
approached the simulation by using a logit model, log(p
i
/(1 – p
i
)) = β
0
+ β
1
e
i
+ β
2
o
i
, that gave
observation i’s probability p
i
of having the unmeasured confounder as a function of the
continuous exposure level (e
i
) and binary outcome status (o
i
). β
0
was chosen to be log(.01/(1
– .01)). This gave a .01 probability of having the confounder given zero exposure and negative
outcome status (low perceived stress). The probability p
i
was used to randomly assign the binary
unmeasured confounder to observations in simulated datasets (otherwise identical to the real
dataset). As a result, the group of persons simulated to have the unmeasured confounder tended
to have a mean exposure level that was different from the mean exposure level of the group
without the unmeasured confounder. This difference in means served as a crude association of
the confounder with the exposure in the sample. We set β
1
to log(s) for s = 1.5, 2.5, …, 7.5, 8.5,
and set β
2
to log(t) for t = 1.5, 2.0, 2.5, 3.0, 3.5, 4.0. For each (s, t) combination, we generated
confounder status 100 times (so 100 datasets), ran a Firth regression (same model variables but
now including the simulated confounder) for each of the 100 simulated datasets, and calculated
the mean point estimate of the coefficient on the pollutant variable, the mean lower 95%
confidence limit, the mean upper 95% confidence limit, and the mean proportion of observations
with the confounder (an estimate of the confounder’s prevalence), i.e., positive confounder
status. Firth regression was used because of the possibility of a very high or low confounder
prevalence.
The mean point estimate and mean confidence limits of the coefficient were
exponentiated to obtain odds ratios under the assumption of a binary unmeasured confounder
with the selected and resulting associations and prevalence. This sensitivity analysis was
considered coarse and incomplete, but a step toward using better methods in the future.
48
2.10. Effect Modification Assessment
For each final-model candidate, main-effect and interaction pairs of terms for depression,
meteorological and season variables were added to the model one pair at a time. Specifically,
CESD_16PLUS, CESD_MEDIAN (indicating whether the participant had a CES-D score greater
than the sample median, 7.5), TWOSEASON, FOURSEASON, and MEAN7_SRAD_MEDIAN
(indicating whether the participant had a 7-day mean residential shortwave radiation estimate
greater than the sample median, 242.4), were each interacted as a factor variable (more than one
indicator variable in the case of FOURSEASON) with the pollution variable in a separate model.
Wald test P-values for interaction terms were obtained.
Regardless of interaction significance, the MADRES participants were stratified by each
categorical variable, and an estimated odds ratio, a confidence interval and a P-value were
obtained from fitting the same model. (CIs and P-values were adjusted for 21 tests, for
consistency with earlier calculations, despite the unclear interpretation of stratified CIs and P-
values in the context of this additional testing done to reveal weaknesses of a non-stratified
model.) Generalized linear regression was used as before, but in some cases revealed separation
due to smaller counts within strata. As a post-hoc decision, Firth’s bias-reduction method
(logistf::logistf in R) was used, and profile penalized log-likelihood CIs intervals and P-values
were obtained.
2.11. Spatial Analysis
Final-model candidates were evaluated using generalized geographically weighted regression
(GWmodel::ggwr.basic in R), which produced a local coefficient (and a local adjusted odds ratio)
at each observation location or point of a grid. To avoid separation that might arise from using a
bisquare kernel, for example, a Gaussian kernel used. The kernel function was used with a
variable bandwidth: the distance from the local individual to their kth nearest neighbor (by great-
49
circle distance) in the sample, or the distance from a grid point to the kth nearest person. The
parameter k was automatically determined (GWmodel::bw.ggwr) by minimizing AIC. To explore
the models further, we also specified one-half (rounded) of the optimum and one-fourth of the
optimum as this adaptive bandwidth k. Such arbitrarily chosen bandwidths might not be optimal
for prediction, but might suggest alternatives to the global, original model.
The resulting locally fitted multivariable models, one for each observation, were
evaluated for significant variance of the pollution-variable coefficient. The variance of the
coefficient over the observations (at whose locations the model was locally fitted) was compared
with an empirical distribution of the variance generated under the null hypothesis of no
association between the coefficient and the observation location (other than a possible
meaningless one resulting from observations’ locations but not their attributes). As suggested by
Brunsdon, Fotheringham and Charlton
53
originally in the context of linear regression, the
empirical null distribution can be generated by randomly permuting observation locations. As a
function to do this was not available for generalized geographically weighted regression in the
GWmodel package, the boot::boot function was used with the permutation type of simulation to
generate R = 999 replicate datasets and, for each one, obtain the variance of the local-model
parameter estimate. The P-value of the coefficient variance for the original dataset was obtained
as r/(1 + R) where r is the rank of the variance among the variances for the Monte Carlo
replicates. The rank is the number of these variances greater than or equal to the coefficient
variance for the original dataset. P≤.05 was considered significant in this context. The
significance of spatial coefficient variation was considered for observation locations only, not
grid points.
We visualized the geographically weighted regression results by plotting the point
estimate of the local odds ratio at each of more than ten thousand grid points. The grid points
50
were spaced in both latitude and longitude by 0.005 degrees and spanned a smallest rectangle
containing all observation locations. In addition to assessing the significance of spatial
coefficient variation, we used the ordinary bootstrap (case resampling, with replacement) and 400
samples to generate an empirical-bootstrap 95% confidence interval for the OR at each grid
point. Contour lines were used to visualize the lower and upper confidence limits. For reference
(because the arrangement of unevenly spatially distributed observations, or its omission, can
influence the interpretation of grid GWR plots), the locations of the individuals in the sample
were also plotted. However, the points shown in this thesis are jittered as a geographic-masking
method to protect confidentiality.
3. Results
3.1. Variable Selection
Table 8 shows the confounder set obtained (following preselection with a priori information) for
each air pollutant using the augmented backward elimination -based method described in
Methods. The first row shows the eight variables that were present in all sets. The next rows
show variables that not all sets had in common.
Table 8: Confounders selected for each air pollutant using augmented backward elimination -based
procedure
Pollutant Selected Variables
(all pollutants) GSES1, HOUSING_ISSUES
*
, MAT_EDU
*
, MAT_MARITAL_RECODED
*
,
MEANn_MET1_OCTILE
*
, MEANn_VS_OCTILE
*
, MONTH
*†
,
PPAQ_TOTAL
NO2_24h AFRICAN_AMERICAN
*
, MAT_AGE, MAT_YEARS
*
,
MEANn_RHA VG_OCTILE
*
O3_24h MAT_AGE, MAT_YEARS
*
, MEANn_RHA VG_OCTILE
*
PM25_25h MAT_AGE, MAT_YEARS
*
PM10_24h MAT_YEARS
*
, MEANn_RHA VG_OCTILE
*
FwyHwy_NOx MAT_AGE
nFwyMjr_NOx MAT_AGE
nFwyMnr_NOx MAT_AGE, MAT_YEARS
*
Total_NOx MAT_AGE
*
used as an unordered categorical variable (indicator variables)
†
excluded later because of multicollinearity
51
3.2. Multicollinearity
For each air pollutant, there was no n-day model for which any confounder (with ν degrees of
freedom) had a GVIF
1/(2ν)
exceeding the threshold θ
1/2
= 10
1/2
. For some air pollutants—NO
2
and
O
3
—the GVIF
1/(2ν)
of the independent, pollutant variable did exceed 10
1/2
for some n-day models,
but the pollutant variable was not deleted.
For each pollutant, for every n-day model there were at least two confounders whose
GVIF
1/(2ν)
exceeded the different, θ
1/(2ν)
= 10
1/(2ν)
threshold. In all cases, the top two of these
variables by GVIF
1/(2ν)
were MEANn_MET1_OCTILE and MONTH (month of PSS
administration). (In addition, for every model, MEANn_MET1_OCTILE and MONTH were the
top two of all confounders by GVIF
1/(2ν)
regardless of exceeding the θ
1/(2ν)
threshold or not.)
Typically, MEANn_MET1_OCTILE had a higher GVIF
1/(2ν)
than MONTH did for a given
pollutant and exposure window (n). However, we viewed MONTH as not truly being a
confounder itself, but as standing in for other variables. Some of these variables were possibly
already included in confounder sets, such as meteorological variables—and others might be
related but different in kind, such as cooling/heating and vehicular traffic related to time of year.
MONTH might be useful as a stand-in variable for a variety of confounders, but because the
interpretation of MONTH was less clear and MONTH so consistently accompanied the arguably
related MEANn_MET1_OCTILE at the top of lists of confounders ordered by GVIF
1/(2ν)
, we
considered that MEANn_MET1_OCTILE might be an alternative to the less-interpretable
MONTH and chose to remove MONTH as a regressor.
After removing MONTH from the confounder set for each pollutant, all n-day models
were re-fit for each pollutant. After re-fitting without MONTH, no variable (either the pollutant
variable or a confounder) had a GVIF
1/(2ν)
higher than 10
1/2
. In addition, for all but two pollutants
—NO
2
and O
3
—no n-day model had a confounder with a GVIF
1/(2ν)
exceeding 10
1/(2ν)
. (This
52
remained true after transformations, described below, of n-day mean CALINE4 non-freeway
major road NO
x
, 24-hour O
3
and PM
2.5
variables.) For NO
2
, MEANn_MET1_OCTILE was the
only confounder with a GVIF
1/(2ν)
over 10
1/(2ν)
, and only for the n-day exposure windows 20 days
or longer. For O
3
, MEANn_MET1_OCTILE was again the only confounder with a GVIF
1/(2ν)
over 10
1/(2ν)
, and only for n ≥ 8. (This remained true even after later squaring the n-day mean O
3
variables.) Because no variable had a GVIF
1/(2ν)
exceeding than 10
1/2
and there was only ever at
most one confounder with a GVIF
1/(2ν)
higher than 10
1/(2ν)
, we considered multicollinearity to have
been addressed by removing MONTH.
Table 9 below shows only the highest GVIF
1/(2ν)
obtained over all n-day exposure window
models and variables, and the average (again, over all windows n-day windows considered) of
the highest GVIF
1/(2ν)
of a variable in an n-day model, by air pollutant. The maxima near 10
1/2
≈
3.16, which belonged to NO
2
and O
3
, were concerning despite not overstepping the threshold. As
Table 10 and Table 11 show, however, the variable with the highest GVIF
1/(2ν)
was the pollution
variable in the case of every n-day model for those two pollutants and was not considered for
removal from these models in which the pollution variable was the independent variable of
interest. Cells with a value greater than 5
1/2
≈ 2.24 are highlighted and would be interesting from
the standpoint of a stricter rule of thumb for the variance inflation factor. The apparent trend,
with NO
2
and O
3
, of highest GVIF
1/(2ν)
increasing with n did not seem to be accompanied by a
strong trend of second-highest GVIF
1/(2ν)
increasing with n, and none of the second-highest
GVIF
1/(2ν)
values were greater than 1.4 (whose square is ≈2). Without further scrutiny, it is unclear
what variable should be removed if any.
53
Table 9: Generalized variance-inflation factor: for each pollutant, maximum GVIF
1/(2ν)
over all exposure
windows and variables, and mean of window highest GVIF
1/(2ν)
(Figures for any transformation of a
pollution variable are in parentheses.)
Pollutant Maximum GVIF
1/(2ν)
Mean of window highest GVIF
1/(2ν)
NO2_24h 2.95 2.23
O3_24h (squared) 3.07 (3.02) 2.24 (2.14)
PM25_25h (log-transformed) 1.62 (1.63) 1.37 (1.39)
PM10_24h 1.83 1.50
FwyHwy_NOx 1.20 1.18
nFwyMjr_NOx (log-transformed) 1.20 (1.20) 1.18 (1.18)
nFwyMnr_NOx 1.22 1.20
Total_NOx 1.20 1.18
Table 10: Generalized variance-inflation factor: for NO 2, highest and second-highest GVIF
1/(2ν)
values and
variables by n-day exposure window
n 1 2 3 4 5 6 7 8 9 10 15 20 25 30 45 60 75 90 105 120
highest
GVIF
1/(2ν)
1.7 1.8 1.7 1.8 1.8 1.9 1.9 1.9 2.0 1.9 2.1 2.2 2.3 2.5 2.7 2.8 2.9 2.9 2.8 3.0
highest
variable
(For every n-day exposure window, the variable with the highest GVIF
1/(2ν)
was MEANn_NO2_24h, for
which there was ν = 1 degree of freedom.)
2
nd
-highest
GVIF
1/(2ν)
1.2 1.2 1.2 1.2 1.2 1.2 1.3 1.2 1.2 1.2 1.2 1.2 1.2 1.2 1.2 1.2 1.2 1.2 1.3 1.3
2
nd
-highest
variable
(For every n-day exposure window, the variable with the second-highest GVIF
1/(2ν)
was GSES1, for which
there was ν = 1 degree of freedom.)
Table 11: Generalized variance-inflation factor: for squared O 3 level, highest and second-highest GVIF
1/(2ν)
values and variables by n-day exposure window
n 1 2 3 4 5 6 7 8 9 10 15 20 25 30 45 60 75 90 105 120
highest
GVIF
1/(2ν)
1.4 1.5 1.5 1.6 1.6 1.8 1.7 1.8 1.9 1.9 2.0 2.1 2.3 2.5 2.9 2.9 3.0 3.0 2.9 2.8
highest
variable
(For every n-day exposure window, the variable with the highest GVIF
1/(2ν)
was MEANn_O3_24h, for
which there was ν = 1 degree of freedom.)
2
nd
-highest
GVIF
1/(2ν)
1.2 1.2 1.2 1.2 1.2 1.2 1.2 1.2 1.2 1.2 1.2 1.2 1.2 1.2 1.3 1.3 1.3 1.3 1.2 1.2
2
nd
-highest
variable
GSES1 (ν = 1), … MEANn_MET1_OCTILE (ν = 7), …
*
*
GSES1 (ν = 1)
3.3. Linearity
After MONTH was removed, the re-fitted models were evaluated for linearity of the relationship
between the pollution variable and the log-odds of high perceived stress. For each of n-day mean
non-freeway major road NO
x
, minor road NO
x
, and traffic total NO
x
, the Box-Tidwell procedure
described in Methods did not flag any n-day model on the basis of the pollution variable, and so
the linearity assumption was not rejected for any of those three pollutants. For NO
2
(n-day mean
NO
2
), only the 75-day model was flagged, and Tukey-ladder scatter plots with LOESS curves did
not support a non-identity transformation of n-day NO
2
level for either the n-day models in
54
general or the 75-day model in particular. (A log-transformation was considered, but it resulted in
the 90-day model’s being flagged instead by the Box-Tidwell test.) For O
3
(n-day mean 24-hour
O
3
), several models of n-day exposure windows were flagged by the Box-Tidwell procedure: n =
6, 7, 8, 9, 10, 15, 75. LOESS curves were generally roughly straight (and horizontal) to begin
with, and so Tukey-ladder plots in that sense did not support a transformation of the n-day mean
O
3
variables. However, a transformation of power 2 was considered to distribute densely packed
observations over a longer interval, and this transformation did result in clearing the Box-Tidwell
flags on the 7-, 8-, 15- and 75-day models without causing new models to be flagged. The 6-, 9-
and 10-day models remained flagged for n-day mean O
3
(now squared), but Tukey-ladder plots
did not support further transformation. For PM
2.5
, several n-day models were flagged by the Box-
Tidwell procedure: 1, 2, 3, 5, 90, 120. Visually, Tukey-ladder plots weakly supported a
logarithmic transformation. The logarithm to the base 2 was used in the transformation f(X) =
log
2
(X + 1/1000 × SD(X)). (The choice of base of the logarithm did not influence
multicollinearity and linearity assessments, but did later facilitate interpretation in terms of an
odds ratio for doubling.) After the transformation, the Box-Tidwell flags on the 1-, 2-, 3- and 5-
day models were cleared. Flags remained on the 90- and 120-day models, and the 105-day model
was newly flagged, but Tukey-ladder plots did not support further transformation. For PM
10
, the
5- and 6- day models were flagged, but Tukey-ladder plots did not support transformation for the
n-day models in general or the 5- and 6- day models in particular. For n-day mean CALINE4
non-freeway major road -source NO
x
, the 1-, 60-, 75-, 90-, 105- and 120-day models were
flagged, and Tukey-ladder plots did support a logarithmic transformation. The logarithm to base
2 was used. After the transformation, all Box-Tidwell flags were cleared, and subsequent Tukey-
ladder plots did not support further transformation. For the purpose of exploring linearity, 1/1000
times the standard deviation of a pollution variable vector X was temporarily added to X before
55
taking the logarithm, the reciprocal or a negative power (in case X contained zero), or a similar
quantity was added to allow the logarithm to be taken (to apply the Box-Tidwell procedure
again) if the transformation f(X) = log(X + 1/1000 × SD(X)) produced negative values. Following
the transformations, the augmented backward elimination -based confounder sets were produced
again and did not change. Multicollinearity assessments were redone and were essentially
unchanged as described above. Multicollinearity assessment continued to support removal of
MONTH from variable sets. After removal, multicollinearity assessments improved and did not
support removal of any other variable according to the method described.
3.4. Influential Observations
For each final preliminary n-day model for each pollutant, DFBETAS statistics were obtained for
the n-day mean pollution variable. For all pollutants, no observation had a pollution-variable
DFBETAS greater than 1 for any n-day model. So, at this exploratory stage involving twenty n-
day models per pollutant, we did not visualize the DFBETAS values and generally did not
examine individual influential observations. However, Table 12 below is provided below to
summarize the DFBETAS diagnostics by pollutant. Shown for each pollutant are the range of the
greatest pollutant-variable DFBETAS absolute value for an n-day exposure window model (over
all of the windows considered); the range (again, over the n-day models) of the proportion of
observations with a pollutant-variable DFBETAS larger than 2/sqrt(SS) where SS is the sample
size, for an n-day model for that pollutant; and similarly the range of the proportion of
observations with a DFBETAS larger than 3/sqrt(SS).
56
Table 12: DFBETAS summary statistics by pollutant: range of the n-day exposure window model largest
DFBETAS size; range of the n-day model percentage of observations with a DFBETAS >2/sqrt(sample
size); range of the percentage of observations with a DFBETAS >3/sqrt(sample size)
Pollutant
range of n-day model
greatest DFBETAS size
range of model % of obs.
>2/sqrt(SS)
range of model % of obs.
>3/sqrt(SS)
NO2_24h (0.18, 0.36) (5, 8) (1, 2)
O3_24h squared (0.20, 0.41) (5, 10) (1, 3)
PM25_25h log-transformed (0.17, 0.30) (5, 8) (1, 2)
PM10_24h (0.18, 0.27) (5, 7) (1, 3)
FwyHwy_NOx (0.18, 0.42) (4, 5) (1, 2)
nFwyMjr_NOx log-transformed (0.28, 0.46) (4, 7) (1, 2)
nFwyMnr_NOx (0.23, 0.31) (5, 8) (1, 3)
Total_NOx (0.21, 0.38) (4, 6) (0, 2)
As Table 12 shows, no observation had a pollutant-variable DFBETAS greater than 0.47
for any pollutant and exposure window. This means that the deletion of a single observation
never resulted in changing the coefficient by more than one half of the post-deletion standard
error of the coefficient. However, for each pollutant and n-day model, there were at least about
4% of observations (at least ceiling(0.035 × SS) = 15 observations for SS = 426) with a
pollutant-variable DFBETAS greater than 2/sqrt(SS) though at most about 3% (at most
floor(0.035 × SS) = 14 observations) had a pollutant-variable DFBETAS greater than 3/sqrt(SS).
The percentage of observations with a pollutant-variable DFBETAS greater than
2/sqrt(SS) reached its maximum, 10%, with the 9-day model for O
3.
The 10% of observations are
cause for some concern and might reflect model misspecification such as an incorrect
transformation of the O
3
level. However, none of the 1-day pollution variables exceeded
anticipated or physically plausible levels, and the same and other 1-day measurements underlie
other n-day variables.
Only one person in this sample had a 1-day PM
10
level exceeding the highest of annual
extreme values
54
for 2016-2019 provided by the Environmental Protection Agency (EPA) for the
Los Angeles–Long Beach–Anaheim, CA, core-based statistical area (CBSA). (For PM
10
, the
extreme values were averages of the CBSA trend sites’ annual second maxima of daily 24-hour
PM
10
level.) All 1-day PM
2.5
observations were less than the EPA’s 1997 old 24-hour PM
2.5
57
standard of 65 μg/m
3
though seven observations were greater than the highest (in 2016-2019)
average of the CBSA sites’ 98
th
percentile daily 24-hour PM
2.5
level in a year: 27 μg/m
3
. All 1-
day NO
2
observations were less than 53 ppb, the current EPA annual NO
2
standard. All 1-day O
3
observations were less than the EPA annual O
3
standard, which is relevant (because of the higher
variability of 1-day measurements) though not directly comparable. Most of the other variables
(including quantile group of relative humidity etc.) were either categorical variables or GSES1,
which is a score on a principal component of percentages, and were not types of variables that
were the most interesting from the point of view of examining influential observations. The
exceptions are Pregnancy Physical Activity Questionnaire total score, a transformation of which
passed various normality tests and has outliers consistent with normality; and maternal age,
which did not exceed plausible ages in third trimesters of pregnancy.
3.5. Final Preliminary Fixed-effects Binary Logit Models
Having satisfied an influence assessment for exploratory purposes, we considered the
preliminary models built so far to be final preliminary models. For each pollutant, we generated
the scatter plot of the parameter estimate from the fitted n-day model, versus the exposure
window length n. The confidence intervals and P-values shown in the plots are not corrected for
multiple testing.
3.5.1. NO
2
models
As shown in Figure 4 below, the estimated odds ratio (for high perceived stress), adjusted for
several covariates, for a unit increase in the n-day mean daily 24-hour NO
2
level is less than 1 for
the exposure windows considered of length n ≤ 30. Parameter estimates for exposure window
lengths close to each other are expected to be correlated, and none of the P-values are less
than .05 despite being unadjusted P-values. It is notable nonetheless that there might be a
58
negative association between ambient NO
2
and perceived stress for exposure window lengths
≤30 while the relationship between NO
2
and perceived stress for the longer exposure windows
considered seems to be less clear. Even among long exposure windows close to each other—
three observations spaced by fifteen days or less—the point estimate of the coefficient had a sign
change twice. At the same time, the P-values of the estimates for window lengths n > 30 were
well over 0.2—higher than 0.6 in all cases. A similar plot showing the models for the ten shortest
window lengths considered, n ≤ 10, was examined, but the plot is omitted here because it is
consistent with the previous statements and does not reveal anything else worth noting. The 25-
day model had the largest estimated association, an estimated OR of 0.94, but the 95%
confidence interval already contained 1 though it was not yet adjusted for multiple testing.
The 25-day model estimated OR suggests a exp(10 × log(0.94)) ≈ 0.54 OR for a 10 ppb
increase in ambient NO
2
. 10 ppb is approximately the interquartile range (≈10.3 ppb) of 25-day
mean NO
2
level in the sample. The sample consists of persons distributed unevenly in space and
time—the IQR is particular to the sample—but the IQR represents a difference that is possible
within the study period and region. The 0.54 OR estimated for a 10 ppb increase might represent
an important association. Due to the lack of any unadjusted P-value under .05, though, NO
2
models were not explored further in the present study.
59
Figure 4: NO 2 binary logit model of T3 perceived stress: estimate (and 95% confidence interval) and P-
value (not corrected for multiple testing) of adjusted association of mean 24-hour residential ambient
nitrogen dioxide level with high perceived stress, versus length of n-day exposure window preceding day
of third-trimester Perceived Stress Scale administration; n = 1, 5, 10, 15, 20, 25, 30, 45, 60, 75, 90, 105,
120
3.5.2. O
3
models
Figure 5 and Figure 6 below show the estimated association of squared n-day mean 24-hour O
3
concentration with high perceived stress peaking in size at n = 9 days, among the n-day windows
with an O
3
P-value less than .05. To facilitate alternative interpretations, the estimate shown in
the case of this pollutant is the coefficient, but the estimated OR for a 100-unit increase in
squared O
3
level is exp(100 × 0.00116) ≈ 1.1: a positive association between 9-day O
3
and binary
perceived stress. Because the pollutant level was squared, the interpretation of the estimate is
complicated. However, the difference in this sample between the third quartile of squared 9-day
O
3
level (32.26
2
≈ 1041) and the first quartile (21.69
2
≈ 470.5) is about 570. The corresponding
estimated odds ratio is exp(570 × 0.00116) ≈ 1.9. The estimated slope on the pollution variable
seems to increase with exposure window length until n = 8 or 9, at which point the estimate
seems to hover around 0.0010. We chose to later explore the 9-day exposure window for O
3
,
60
rather than the 8-day window despite the lower P-value with the 8-day window, because the 9-
day model’s estimate was larger. The largest estimate over all exposure windows was reached at
90 days and was also positive, but the P-value was about .18. The plots suggest a generally
positive association with perceived stress for the exposure windows considered. None of the O
3
coefficient P-values at this point survived Bonferroni adjustment.
Figure 5: O 3 logit model of T3 perceived stress: estimate (and 95% CI) and P-value of adjusted association
of squared n-day mean 24-hour residential ambient ozone level with third-trimester dichotomous perceived
stress, for selected n-day exposure windows
61
Figure 6: Short-term O 3 logit model of T3 perceived stress: estimate (and 95% CI) and P-value of adjusted
association of squared n-day mean 24-hour residential ambient ozone level with third-trimester
dichotomous perceived stress, for short-term n-day exposure windows (n = 1, 2, …, 9, 10)
3.5.3. PM
2.5
models
Figure 7 and Figure 8 below show estimates of association with dichotomous perceived stress
that are generally positive. The estimate peaks in both size and significance (unadjusted) at the 7-
day exposure window, which we picked for further exploration. The base-2 logarithm had been
taken as a transformation, and so the estimated odds ratio for a doubling of 7-day mean ambient
PM
2.5
level is 2.2. For context, there is a 2.5-fold difference between the lowest and highest
deciles of 7-day PM
2.5
level in this sample, and a 1.6-fold difference between the lower and upper
quartiles. The .004 P-value for the PM
2.5
variable in the 7-day model supported further
exploration per the .05 cutoff chosen for unadjusted P-values of pollutant coefficients in these
exploratory models. Even such a low P-value does not withstand Bonferroni correction, though.
62
Figure 7: PM 2.5 logit model of T3 perceived stress: estimate (and 95% CI) and P-value of adjusted
association of base-2 logarithm of n-day mean 24-hour residential ambient PM 2.5 level with third-trimester
dichotomous perceived stress, for selected n
Figure 8: Short-term PM 2.5 logit model of T3 perceived stress: estimate (and 95% CI) and P-value of
adjusted association of binary logarithm of n-day mean 24-hour residential ambient PM 2.5 level with third-
trimester dichotomous perceived stress, for short-term n
63
3.5.4. PM
10
models
Figure 9 and Figure 10 below show estimates of association with binary perceived stress that are
positive for all exposure window lengths considered of 15 days or shorter. Estimated association
peaks in both size and significance at the 6-day exposure window. That window was flagged for
further exploration though the unadjusted P-value, .040, did not remain <.05 after Bonferroni
adjustment. (Recall that the 7-day window was flagged for PM
2.5
.) At n = 6, the estimated OR for
a unit (ppb) increase in n-day mean ambient PM
10
level is 1.03 as shown in the short-windows
plot (1.035 more precisely). That is an estimated OR of exp(10 × log(1.035)) ≈ 1.4 for a 10 ppb
increase in the 6-day PM
10
. For context, the sample interquartile range of 6-day PM
10
level is
13.1 ppb. For window lengths 20 days or longer, the estimated OR fluctuated around 1 with P-
values >.2 though the estimate reached as high as 1.03 and as low as 0.98.
Figure 9: PM 10 binary logit model of T3 perceived stress: estimate (and 95% CI) and P-value of adjusted
association of binary logarithm of n-day mean 24-hour residential ambient PM 10 level with third-trimester
dichotomous perceived stress, for selected n
64
Figure 10: Short-term PM 10 logit model of T3 perceived stress: estimate (and 95% CI) and P-value of
adjusted association of binary logarithm of mean 24-hour residential ambient PM 10 level with third-
trimester dichotomous perceived stress, for short-term n
3.5.5. CALINE4 Total NO
x
models
Figures 11 and 12 show generally negative associations between n-day mean total traffic-source
NO
x
and binary perceived stress. Recall that associations of ambient NO
2
were negative for
window lengths n = 30 days or shorter. In the case of CALINE 4 Total NO
x
(an exposure that,
like NO
2
, was estimated at the residence level, but is more granular than the regional ambient
NO
2
estimate), several n-day models have a pollutant parameter P-value <.2, and a peak in
estimate size is reached at n = 5. Odds ratios seem to generally increase toward 1 starting with n
= 25. At n = 5, the estimated OR for a unit increase in n-day mean Total NO
x
is 0.87. The
unadjusted P-value is .027. For context, the sample IQR of 5-day mean Total NO
x
is about 1.9
ppb.
None of the CALINE4 Total NO
x
coefficient P-values in this stage remained <.05 after
Bonferroni adjustment. Because of the larger estimate, the 5-day exposure window was flagged
for exploration though the parameter in the 4-day model had the slightly lower P-value .022.
65
Figure 11: CALINE4 Total NO x logit model of T3 perceived stress: estimate (and 95% CI) and P-value of
adjusted association of n-day mean total traffic nitrogen oxides level with third-trimester dichotomous
perceived stress, for selected n
Figure 12: Short-term CALINE4 Total NO x logit model of T3 perceived stress: estimate (and 95% CI) and
P-value of adjusted association of n-day mean total traffic nitrogen oxides level with third-trimester
dichotomous perceived stress, for short-term n
66
3.5.6. CALINE4 Freeway NO
x
models
Figure 13 and Figure 14 below show negative associations of CALINE4 freeway-source NO
x
with binary perceived stress for all exposure windows considered. All unadjusted P-values are
<.05 though none remained <.05 after adjustment. Estimated odds ratios fluctuate around 0.81
for a unit increase in n-day mean Freeway NO
x
. The lowest estimated OR is 0.79 (unadjusted P =
.006), at n = 25. There is a slightly lower P-value (.004) at n = 15, but the 25-day model was
flagged for examination instead due to its slightly larger association.
Figure 13: CALINE4 Freeway NO x logit model of T3 perceived stress: estimate (and 95% CI) and P-value
of adjusted association of n-day mean estimated freeway-source nitrogen oxides level with third-trimester
dichotomous perceived stress, for selected n
67
Figure 14: Short-term CALINE4 Freeway NO x logit model of T3 perceived stress: estimate (and 95% CI)
and P-value of adjusted association of n-day mean freeway-source nitrogen oxides level with third-
trimester dichotomous perceived stress, for short-term n
3.5.7. CALINE4 Major-road NO
x
models
Figure 15 below shows a positive estimated association of logarithmically transformed non-
freeway major road -source NO
x
with perceived stress, for all exposure windows considered. The
size and significance of the association generally increase with window length n, and they peak
at n = 120. The point estimate of the OR for a 2-fold increase in 120-day mean CALINE4 major-
road NO
x
is 1.22. For context, there is a 2.4-fold difference between the sample lower and upper
quartiles of 120-day mean major-road NO
x
. The P-value of the parameter estimate is .046, which
does not remain <.05 after multiple-testing adjustment, but the 120-day window was flagged for
exploration. A plot for exposure windows 10 days or shorter is omitted here because the plot is
consistent with the previous statements and seems to reveal only minor details of fluctuation of
the estimate for window lengths n ≤ 10.
68
Figure 15: CALINE4 Major-road NO x logit model of T3 perceived stress: estimate (and 95% CI) and P-
value of adjusted association of n-day mean non-freeway major road -source nitrogen oxides level with
third-trimester dichotomous perceived stress stress, for selected n
3.5.8. CALINE4 Minor-road NO
x
models
For all exposure windows considered, Figure 16 below shows a positive estimated association of
minor road -source NO
x
with dichotomous perceived stress. The size and significance of the
association generally increase with n-day exposure window length and peak at n = 105. The OR
estimate for a 1 ppb increase in 105-day mean CALINE4 minor-road NO
x
is 1.48. For context,
the sample interquartile range of 105-day minor-road NO
x
is 0.8 ppb. The P-value of the
parameter estimate is .017, does not withstand multiple-testing correction, but we flagged the
105-day window for exploration. A plot for window lengths of ≤10 days is omitted here because
it is consistent with the previous statements and reveals only minor details of estimate fluctuation
for those short windows.
69
Figure 16: CALINE4 Minor-road NO x logit model of T3 perceived stress: estimate (and 95% CI) and P-
value of adjusted association of n-day mean minor road -source nitrogen oxides level with third-trimester
dichotomous perceived stress, for selected n
3.6. Revised Fixed-effects Binary Logit Models
For each pair of pollutant and exposure window flagged for exploration above, Table 13 below
shows the results of augmented backward elimination done to select confounders specific to that
pollutant and exposure window, from the initial list created using a priori knowledge. (Month of
PSS administration was excluded from any ABE set containing the month because of
multicollinearity, as described in Methods. Also, if the n-day mean pollutant concentration was
transformed in the exploratory stage, the same transformation was used.) Table 13 also shows the
results of fitting the logistic regression model of binary perceived stress with that confounder set,
for that pollutant and exposure window. In general, unadjusted Wald and likelihood ratio test P-
values were less than .05. The exceptions were the .094 and .091 unadjusted P-values for squared
9-day mean O
3
concentration. However, only the logarithm (base 2) of 7-day mean PM
2.5
level
had a Bonferroni-adjusted P-value less or equal to .05. The estimated odds ratio for a doubling of
7-day mean ambient PM
2.5
level was 1.99 (99.76% adjusted CI: 1.02, 4.07; adjusted likelihood
70
ratio test P=.040). (The adjustment was for 21 tests, which included the tests for the 20
preliminary n-day models for PM
2.5
, and the 1 test for this revised model.)
Table 13: Revised main-effects models: pollutant and window -specific confounder set, estimate of
association, unadjusted profile-likelihood 95% confidence interval, adjusted (unadjusted 99.76%) CI
reflecting Bonferroni correction for 20 + 1 = 21 tests, unadjusted Wald and likelihood ratio test P-values,
and Bonferroni-adjusted P-values
Pollutant
(Transformation),
Exposure Window
Confounder Set (All
models have
HOUSING_ISSUES
*
,
MAT_EDU
*
,
MAT_MARITAL_RECO
DED
*
, PP AQ_TOTAL.)
Estimated
OR (or
Coefficient)
Unadjusted,
95% CI
Adjusted,
99.76% CI
Unadjusted
Wald and
likelihood
ratio test P-
values
Adjusted
Wald and
likelihood
ratio test P-
values
O 3 (square), 9-day …, MAT_AGE,
_MET1_OCTILE
(0.00077) -0.00012,
0.00168
-0.00061,
0.00219
.094, .091 >.999, >.999
PM 2.5 (log 2), 7-day …, GSES1,
_MET1_OCTILE
1.99 1.29, 3.14 1.02, 4.07 .002, .002 .050, .040
PM 10, 7-day …, GSES1,
_RHA VG_OCTILE
1.03 1.00, 1.06 0.99, 1.07 .028, .026 .578, .541
CALINE4 Freeway
NO x, 25-day
…, GSES1,
MAT_YEARS
*
,
_MET1_OCTILE
*
,
_VS_OCTILE
*
0.79 0.66, 0.94 0.60, 1.02 .008, .005 .172, .115
CALINE4 Major-road
NO x (log 2), 120-day
…, GSES1,
HOME_OUTDOOR,
MAT_AGE,
_MET1_OCTILE,
_VS_OCTILE
1.22 1.01, 1.49 0.91, 1.66 .045, .043 .941, .903
CALINE4 Minor-road
NO x, 105-day
…, GSES1,
_MET1_OCTILE
1.48 1.08, 2.04 0.91, 2.45 .015, .014 .324, .292
CALINE4 Total NO x, 5-
day
…, GSES1,
_MET1_OCTILE
0.89 0.78, 1.00 0.73, 1.06 .046, .042 .970, .873
*
used as an unordered categorical variable (indicator variables)
3.7. Sensitivity Analysis
Table 14 below shows the results of sensitivity analysis, fitting the same revised model for log
2
7-day mean PM
2.5
but either without influential observations or with other measured variables.
Highlights: Without 18 influential observations, with a pollutant variable DFBETAS
>2/sqrt(sample size), the estimated OR was even higher: 3.57 (99.67% adjusted CI: 1.55, 8.81;
adjusted likelihood ratio test P<.001). Without 5 influential observations with a DFBETAS
>3/sqrt(sample size), the estimated OR was 2.70 (99.67% adjusted CI: 1.30, 5.93; adjusted LR
test P<.001). (The adjustment was still for 21 tests, because these new tests were intended to cast
doubt on the revised model, not support the hypothesis of association of PM
2.5
with perceived
71
stress.) Using month of PSS administration (MONTH) instead of MEAN7_MET1_OCTILE, the
estimate of the OR changed very little (to 2.06; 99.67% CI 0.96, 4.63), but the adjusted P-value
was .090. Using TWOSEASON (cool or warm season) instead of MEAN7_MET1_OCTILE did
not change the rounded estimated OR, but the 99.67% CI and LR test P-value became (1.32,
3.04) and .017 respectively. The point estimate, adjusted CI and adjusted P-value remained
virtually unchanged after including AFRICAN_AMERICAN. This was true also of ISES1, the
measure of individual socioeconomic status. Including the CES-D score, a depression measure,
increased the adjusted LR test P-value for the pollutant variable to .309 though the estimated OR
decreased only slightly to 1.93 (99.67% CI: 0.85, 4.55). In all of the multiple-pollutant models
considered, the estimated OR for log
2
7-day PM
2.5
increased while the P-value decreased. In
terms of change in the point estimate of the OR for the PM
2.5
variable, the least dramatic example
is the model adding O
3
only: 2.46 (99.67% CI: 1.16, 5.51; adjusted LR test P=.006).
With the same independent variables but different PSS-10 score dichotomization cut
points (≥14, ≥15, ≥19) close to the original, sample median -based cut point (≥17), the
associations with the newly dichotomized score were not significant after P-value adjustment.
72
Table 14: Variations on revised log 2 7-day PM 2.5 model: AIC, estimated odds ratio for pollutant variable,
and unadjusted and adjusted (Bonferroni, 21 tests) confidence intervals and P-values
Model
AIC if
comparable
with original
model’ s AIC
Estimated
OR
Unadjusted,
95% CI
Adjusted,
99.76% CI
Unadjusted
Wald and
likelihood
ratio test P-
values
Adjusted
Wald and
likelihood
ratio test P-
values
original revised model 577.9 1.99 1.29, 3.14 1.02, 4.07 .002, .002 .050, .040
without 18 influential observations
(pollutant variable DFBETAS >
2/sqrt(sample size))
― 3.57 2.07, 6.33 1.55, 8.81 <.001, <.001 <.001, <.001
without 5 influential observations
(pollutant variable DFBETAS >
3/sqrt(sample size))
― 2.70 1.68, 4.46 1.30, 5.93 <.001, <.001 .001, .001
with _MET1_MEDIAN
*
instead of
_MET1_OCTILE
*
573.6 2.01 1.34, 3.06 1.07, 3.90 .001, .001 .021, .015
with _MET1_QUARTILE
*
instead
of _MET1_OCTILE
*
575.0 1.91 1.26, 2.92 1.01, 3.72 .002, .002 .052, .040
with MONTH
*
instead of
_MET1_OCTILE
*
582.2 2.06 1.25, 3.45 0.96, 4.63 .005, .004 .109, .090
with FOURSEASON
*
instead of
_MET1_OCTILE
*
577.5 2.03 1.33, 3.17 1.06, 4.08 .001, .001 .028, .021
with TWOSEASON
*
instead of
_MET1_OCTILE
*
573.8 1.99 1.32, 3.04 1.06, 3.86 .001, .001 .024, .017
with AFRICAN_AMERICAN
*
579.7 1.99 1.28, 3.13 1.01, 4.06 .002, .002 .052, .041
with CESD 442.0 1.93 1.14, 3.33 0.85, 4.55 .017, .015 .349, .309
with CESD_16PLUS
*
(depressed) 519.5 2.02 1.26, 3.29 0.98, 4.36 .004, .003 .086, .069
with ISES1 579.9 1.99 1.29, 3.14 1.02, 4.07 .002, .002 .050, .039
with MEAN7_NO2_24h 576.6 2.83 1.58, 5.16 1.16, 7.25 .001, <.001 .011, .008
with MEAN7_O3_24h 576.3 2.46 1.51, 4.11 1.16, 5.51 <.001, <.001 .009, .006
with MEAN7_NO2_24h and
MEAN7_O3_24h
577.3 2.89 1.61, 5.29 1.18, 7.45 <.001, <.001 .009, .007
dependent variable: PSS_14PLUS
instead of PSS_17PLUS
― 1.30 0.84, 2.04 0.66, 2.63 .248, .246 >.999, >.999
dependent variable: PSS_15PLUS
instead of PSS_17PLUS
― 1.37 0.90, 2.12 0.71, 2.70 .148, .146 >.999, >.999
dependent variable: PSS_19PLUS
instead of PSS_17PLUS
― 1.66 1.05, 2.66 0.82, 3.48 .032, .030 .671, .628
*
used as an unordered categorical variable (indicator variable(s))
As an indication of the linearity of the revised logit model for log
2
7-day mean PM
2.5
,
Figure 17 below shows the relationship between the pollutant variable and the log-odds of high
perceived stress predicted for a participant based on the participant’s values for all independent
variables in the model. Each point represents a participant. The relationship appears roughly
linear, but we considered spline models in order to assess linearity more objectively.
73
Figure 17: Estimated log-odds of high perceived stress vs. log 2 7-day PM 2.5
The same PM
2.5
logit model was fitted with a natural cubic spline function, of the log
2
7-
day PM
2.5
level, with k = 2 to 10 degrees of freedom. There were k – 1 interior knots at the k-
quantiles of log
2
7-day PM
2.5
. None of these nine spline models had an AIC that was less than the
AIC of the linear model, 577.9. The minimum of the spline-model AICs was 578.1.
Figure 18 below shows the results of the simulation of an unmeasured binary confounder.
The shaded area fills the convex hull of the associations—of the unmeasured confounder with
exposure level, and of the unmeasured confounder with the outcome—for which the estimated
74
confidence interval of the OR for the pollutant variable contained 1. Instances in which the
confidence interval contain 1 involved an OR (association of the unmeasured confounder with
the outcome) greater than 2.5 and an estimated difference of means (association of the
unmeasured confounder with exposure level) of between approximately 0.25 and 0.40 in log
2
of
the PM
2.5
level (that is, a difference of between about 20% and 30%). The unmeasured-
confounder sample prevalences involved ranged from 32% to 83%.
Figure 18: Sensitivity analysis: means (across 100 simulations) of confounder sample prevalence, and of
pollutant odds ratio estimate and 95% confidence limits, for associations of an unmeasured binary
confounder with exposure level and outcome
3.8. Effect Modification
None of the interaction terms considered were significant when included in the revised log
2
7-day
mean PM
2.5
model (one at a time and together with a corresponding main-effect term). In the
separate variations of the revised model, the unadjusted Wald P-values for clinical depression
(CES-D score ≥ 16) and CES-D scores’ being greater than the sample median were .627
and .634, respectively. The TWOSEASON (cool/warm season by Los Angeles temperature
statistics) interaction term was insignificant with a P-value of .100. The lowest P-value among
the three FOURSEASON (Northern hemisphere meteorological season) season interaction terms
75
was .376. The MEAN7_SRAD_MEDIAN (7-day mean shortwave radiation > sample median)
interaction term had a P-value .140.
Disregarding interaction term P-values, Table 15 below shows the results of stratifying
the dataset by each possible effect modifier considered, one at a time. Pollutant-variable P-values
and confidence intervals corrected (Bonferroni) for the same number of tests as before, 21, are
again provided for consistency and with the understanding that fitting the revised model to these
stratified datasets was intended to critique the same model, not discover the hypothesized
association between PM
2.5
and perceived stress.
In this context, it is notable that no OR point estimate was less than 1. At the same time,
for no stratum was the pollutant-variable adjusted P-value less or equal to .05, and only for the
not-depressed stratum and the warm-season stratum was the unadjusted P-value ≤.05. In all cases
where the unadjusted P-value was greater than .05, the stratum size was half of the original
sample size or less.
For the not-depressed stratum, the pollutant estimated OR was 1.96, which was close to
both the non-stratified estimated OR (1.99) and the depressed stratum estimated OR (2.23). For
the two levels of CESD_MEDIAN, though, the estimated pollutant odds ratios were 1.79 and
1.47, which might reflect some confounding by depression. By contrast with CESD_16PLUS,
the assessment of CESD_MEDIAN suggests the OR is higher among more-depressed persons,
but the CESD_MEDIAN stratum OR estimates were imprecise. In general, the assessment of
depression (as measured by CESD_16PLUS or CESD_MEDIAN) as a possible effect modifier
was ambiguous. Among the various depression strata, the most significant association of log
2
7-
day mean PM
2.5
with dichotomous perceived stress was within the not-depressed stratum
(unadjusted P=.005; adjusted P=.104).
76
The assessments of TWOSEASON, FOURSEASON and MEAN7_SRAD_MEDIAN as
possible effect modifiers were in agreement in the sense that, for each of those meteorological or
seasonal variables, the pollutant estimated OR was by far the highest for the variable’s warmest
level. This contrasts with the finding of Mehta et al.
6
of a larger estimated association in the cold
season in Boston (October–March). Mehta et al. used linear mixed-effects regression with raw
PSS-14 score (whereas a fixed effects logistic model is used here with a dichotomization of PSS-
10), but found the associations for moving averages of PM
2.5
in the cold season were twice as
large as the corresponding associations in the warm season. By contrast, here, the ratio of the
estimated OR for the warmest level, to the estimated OR for any cooler level of the same
variable, is higher than 2.
Table 15: Effect modification: stratified estimates of association of log 2 of 7-day mean PM 2.5 level with
binary perceived stress, by depression, meteorological or seasonal variable; unadjusted and Bonferroni-
adjusted (21 tests) profile penalized log-likelihood confidence intervals and P-values
Variable / Stratum (Number of Observations)
Estimated
OR
Unadjusted,
95% CI
Adjusted,
99.76% CI
Unadjusted
P-value
Adjusted P-
value
CESD_16PLUS
Not Depressed (355) 1.96 1.22, 3.21 0.95, 4.27 .005 .104
Depressed (71) 2.23 0.28, 22.98
*
.457 >.999
CESD_MEDIAN
CESD ≤ sample median, 7.5 (213) 1.79 0.85, 3.94 0.56, 6.30 .128 >.999
CESD > sample median, 7.5 (213) 1.47 0.76, 2.90 0.53, 4.31 .255 >.999
TWOSEASON
Cold Season (187) 1.51 0.92, 2.56 0.70, 3.46 .106 >.999
Warm Season (239) 3.77 1.57, 9.60 0.99, 16.55 .003 .056
FOURSEASON
Winter (98) 1.57 0.79, 3.29 0.54, 5.13 .202 >.999
Spring (93) 2.15 0.59, 8.93 0.29, 21.56 .253 >.999
Summer (110) 4.91 0.99, 28.73 0.41, 84.81 .052 >.999
Fall (125) 2.29 0.85, 6.56 0.49, 12.32 .103 >.999
MEAN7_SRAD_MEDIAN
MEAN7_SRAD ≤ sample median, 242.4
(213)
1.53 0.96, 2.48 0.74, 3.29 .076 >.999
MEAN7_SRAD > sample median, 242.4
(213)
3.73 1.37, 10.87 0.80, 20.37 .010 .201
*
The profile-likelihood confidence limits were not converged.
3.9. Spatial Analysis
For geographically weighted logistic regression with the revised PM
2.5
multivariable model, the
optimal adaptive bandwidth was 424 nearest neighbors. This was very close to the number of
77
observations, 426. Such a high adaptive bandwidth relative to the total number of observations
does not traditionally support further exploration for spatially varying coefficients using
geographically weighted regression. The Monte Carlo tests for spatial coefficient variation were
done anyway, for the optimal adaptive bandwidth (k
optimum
), the one-half of the optimum to the
nearest whole number (k
optimum÷2
), and the rounded one-fourth of the optimum (k
optimum÷4
). With
neither of these adaptive bandwidths was the variance of the 426 local coefficients on the
pollutant variable high in comparison with the empirical distribution of the variance under the
null hypothesis of no meaningful association between the coefficient and observation location.
For adaptive bandwidths k
optimum
, k
optimum÷2
, and k
optimum÷4
, the Monte Carlo test P-values
were .974, .960, and .650, respectively. Consequently, the apparent spatial variation (of the
adjusted odds ratio) in the visualizations that were generated was considered uninterpretable.
(The minimum and maximum of the local point estimates are at the extremes of a color palette.
The variation is exaggerated as a result.) Wishing to avoid a misleading impression, we hesitate
to show the maps. However, Figure 19 below does show the local point estimate of the OR was
close to (within 0.25 of) 1.99, the estimated OR with the fitted global model, throughout the area
considered even when half of the optimal kernel adaptive bandwidth was used. No local estimate
was lower than 1.9 or higher than 2.25. Together, the uniformity and closeness to the global-
model point estimate indicate the absence of unaddressed major confounding by factors
associated with geographic location. The lower empirical-bootstrap 95% confidence limit also
varies little though it goes below 1 in an area of South Los Angeles south of Korea Town. The
area is small even relative to the area in which MADRES participants are densely located.
Despite the lack of statistical significance of spatial variation, if the local estimates had been
generally less than the global-model estimate by more than 10% this would have been a cause for
concern.
78
Figure 19: Geographically weighted logistic regression: local point estimate (color) and lower empirical-
bootstrap 95% confidence limit (contour line) of log 2 7-day mean PM 2.5 adjusted odds ratio for high
perceived stress; Gaussian kernel, one-half of optimal adaptive bandwidth (Area where the lower
confidence limit is less than 1 is darkened.)
Figure 20 below shows results with a kernel adaptive bandwidth one-fourth of the
optimum. Though the Monte Carlo randomization test for spatial coefficient variation was
negative, the map appears to show more spatial variation of the OR estimate (a function of the
coefficient). The map might also show more spatial variation of the lower 95% confidence limit,
enough to bring the limit below 1 over larger areas. However, the local point estimates of the OR
are uniformly either close to (within 0.1) or well above 1.99. This indicates the absence of a
geographic Simpson’s paradox.
79
Figure 20: Geographically weighted logistic regression: local point estimate (color) and lower empirical-
bootstrap 95% confidence limit (contour line) of log 2 7-day mean PM 2.5 adjusted odds ratio for high
perceived stress; Gaussian kernel, one-quarter of optimal adaptive bandwidth (Area where the lower
confidence limit is less than 1 is darkened.)
To address possible effects due to spatially outlying or isolated observations, the mean
distances (Haversine) of the observations to the closest 21 observations were calculated, and the
90
th
percentile was obtained. (The 21 included the original observation and was calculated by
taking the whole number nearest to the sample size divided arbitrarily by 20.) Observations with
such a mean distance greater than the 90
th
percentile were excluded before running the Monte
Carlo permutation tests again. There were 43 observations omitted, making the total 383. The
optimal adaptive bandwidth was 383. For the new k
optimum
, k
optimum÷2
, and k
optimum÷4
, the Monte Carlo
test P-values were .828, .973, and 678. Apparent patterns of variation in visualizations were
again considered to be uninterpretable.
80
In terms of meaningful spatial variation, the spatial analysis was not productive due to a
lack of significance. However, the absence of significant spatial coefficient variation was some
evidence that there was not effect modification, by factors associated with geographic location,
substantial enough to invalidate the global model.
4. Discussion
The results, particularly the finding of a 1.99 estimated adjusted OR (99.76% adjusted CI: 1.02,
4.07) for a doubling of 7-day mean ambient PM
2.5
concentration, suggest an association between
PM
2.5
exposure and perceived stress. For the exposure windows considered and the logit models
specified, other pollutants did not have a significant association with a binary measure of
perceived stress in this MADRES subsample. The findings are limited in that binary logistic
regression was used, rather than methods using PSS-10 score as a continuous variable or as an
ordinal variable with more than two categories. Power to detect a significant association might
have been limited. Also, the negative findings for other windows and pollutants depended on the
approach used with initial models, in which the variables included as confounders were common
to all n-day window models for a pollutant. Though it is common in similar studies to adjust for
the same covariates with all exposure windows, it might be that some of these exploratory-stage
n-day models for a pollutant were severely misspecified.
The finding of a significant association of log
2
7-day PM
2.5
with dichotomized PSS-10
score was not resistant to changes in the dichotomization cut point though it is conceivable that
variables selected as confounders should have been changed. Logistic regression in general posed
a problem as alternative kernel functions, such as the bisquare and tricube kernels, could not be
used with these data in geographically weighted regression, with the bandwidth selection and
Monte Carlo procedures described, without encountering separation. It is possible that using
81
geographically weighted linear, rather than logistic, regression with smaller bandwidths would
reveal confounding or effect modification not addressed in a global linear model. However,
global (or local) relationships between a pollutant and perceived stress might be nonlinear.
For future research, a generalized linear mixed-effects model could be used with a
random intercept for the study participant. This would allow including first- and second-trimester
measurements for most of the same participants—and for participants not in the current sample,
for whom third-trimester measurements were not available for various reasons including ending
study participation or having joined the study recently.
As there might be problems with both standard linear regression and binary logistic
regression as noted in this thesis, an alternative might be to fit a polynomial or spline regression
model with specific pollutants and exposure windows. If the proportional odds assumption can
be satisfied, ordinal logistic regression might also be appropriate in specific cases.
The reasons for a possibly greater association of PM
2.5
with perceived stress in the
summer, which stands in contrast to the opposite finding of Mehta et al.
6
with a different
population in Boston (a larger association in the winter), are worth investigating. In the summer,
average high and low temperatures are lower in Boston than in Los Angeles. This is very true
also of the winter, when the average high and low temperatures in Boston are more than 20 °F
lower than those in Los Angeles.
As a result of Bonferroni correction of P-values, several associations (for models with
window-specific confounders) were not considered significant and were not explored further.
That is despite interesting point estimates in some cases, such as OR estimates less than 1 for
CALINE4 freeway NO
x
and CALINE4 total NO
x
. Among family-wise error rate (FWER)
controlling procedures, a less-conservative but effective alternative is Holm’s sequential
Bonferroni procedure.
55
A false discovery rate controlling procedure could also be used.
82
However, the results with the predetermined Bonferroni method have already been seen. It might
be appropriate to use an FDR or different FWER controlling procedure with the same hypotheses
and otherwise same methods, but with a different cohort or with future or recently recruited
MADRES participants who are not in the present study’s subsample.
Another area for exploration is exposure windows other than periods immediately
preceding the day of PSS administration. Single- and multi-day lags not adjacent to PSS
administration date could be considered, as well as averages (e.g., exponential) besides simple
averages.
With regard to confounders, some authors
34,56
have raised that data-based confounder
selection, even after prescreening variables, can lead to incorrect point estimates and confidence
intervals particularly when models are fitted to the same data. If appropriate for the purpose of
association estimation, remedies could include coefficient-shrinkage or bootstrap methods not
used in this thesis.
Despite the various limitations of this study, a significant association of 7-day mean PM
2.5
with perceived stress was found. There was a peak at n = 7 among the n-day models for PM
2.5
in
the exploratory stage. This thesis did not establish causation. As mentioned in the Introduction,
co-exposure to PM
2.5
, NO
2
and sulfur dioxide during the same several-week period was
associated with mitochondrial damage in controlled experiments with mice. This thesis did not
focus on multiple-pollutant models. Though the most significant association of PM
2.5
with
perceived stress might be for a seven-day mean, repeated exposures over years to a high
concentration of PM
2.5
for seven days might cause repeated elevations in perceived stress and
might be associated indirectly with health consequences that persist after perceived stress
subsides.
83
References
1. Bastain TM, Chavez T, Habre R, et al. Study design, protocol and profile of the Maternal
And Developmental Risks from Environmental and Social Stressors (MADRES) pregnancy
cohort: a prospective cohort study in predominantly low-income Hispanic women in urban
Los Angeles. BMC Pregnancy Childbirth. 2019;19(1):189. doi:10.1186/s12884-019-2330-7
2. Lee E-H. Review of the psychometric evidence of the Perceived Stress Scale. Asian Nurs
Res. 2012;6(4):121-127. doi:10.1016/j.anr.2012.08.004
3. Taylor JM. Psychometric analysis of the Ten-Item Perceived Stress Scale. Psychol Assess.
2015;27(1):90-101. doi:10.1037/a0038100
4. Richardson AS, Arsenault JE, Cates SC, Muth MK. Perceived stress, unhealthy eating
behaviors, and severe obesity in low-income women. Nutr J. 2015;14. doi:10.1186/s12937-
015-0110-4
5. Baik SH, Fox RS, Mills SD, et al. Reliability and validity of the Perceived Stress Scale-10
in Hispanic Americans with English or Spanish language preference. J Health Psychol.
2019;24(5):628-639. doi:10.1177/1359105316684938
6. Mehta AJ, Kubzansky LD, Coull BA, et al. Associations between air pollution and
perceived stress: the Veterans Administration Normative Aging Study. Environ Health.
2015;14(1):10. doi:10.1186/1476-069X-14-10
7. Hahad O, Lelieveld J, Birklein F, Lieb K, Daiber A, Münzel T. Ambient air pollution
increases the risk of cerebrovascular and neuropsychiatric disorders through induction of
inflammation and oxidative stress. Int J Mol Sci. 2020;21(12):4306.
doi:10.3390/ijms21124306
8. Pun VC, Manjourides J, Suh H. Association of ambient air pollution with depressive and
anxiety symptoms in older adults: results from the NSHAP study. Environ Health Perspect.
2017;125(3):342-348. doi:10.1289/EHP494
9. Power MC, Kioumourtzoglou M-A, Hart JE, Okereke OI, Laden F, Weisskopf MG. The
relation between past exposure to fine particulate air pollution and prevalent anxiety:
observational cohort study. BMJ. 2015;350:h1111. doi:10.1136/bmj.h1111
10. Newbury JB, Arseneault L, Beevers S, et al. Association of air pollution exposure with
psychotic experiences during adolescence. JAMA Psychiatry. 2019;76(6):614.
doi:10.1001/jamapsychiatry.2019.0056
11. Bai L, Zhang X, Zhang Y , et al. Ambient concentrations of NO2 and hospital admissions for
schizophrenia. Occup Environ Med. 2019;76(2):125-131. doi:10.1136/oemed-2018-105162
84
12. Choi K-H, Bae S, Kim S, Kwon H-J. Indoor and outdoor PM2.5 exposure, and anxiety
among schoolchildren in Korea: a panel study. Environ Sci Pollut Res. 2020;27(22):27984-
27994. doi:10.1007/s11356-020-08900-3
13. Smit R, Kingston P. Measuring on-road vehicle emissions with multiple instruments
including remote sensing. Atmosphere. 2019;10(9):516. doi:10.3390/atmos10090516
14. Robinson DL. Composition and oxidative potential of PM2.5 pollution and health. J
Thorac Dis. 2017;9(3). doi:10.21037/jtd.2017.03.92
15. Krall JR, Anderson GB, Dominici F, Bell ML, Peng RD. Short-term exposure to particulate
matter constituents and mortality in a national study of U.S. urban communities. Environ
Health Perspect. 2013;121(10):1148-1153. doi:10.1289/ehp.1206185
16. Weichenthal SA, Lavigne E, Evans GJ, Godri Pollitt KJ, Burnett RT. Fine particulate matter
and emergency room visits for respiratory illness : effect modification by oxidative
potential. Am J Respir Crit Care Med. 2016;194(5):577-586. doi:10.1164/rccm.201512-
2434OC
17. Marchetti S, Hassan SK, Shetaya WH, et al. Seasonal variation in the biological effects of
PM2.5 from Greater Cairo. Int J Mol Sci. 2019;20(20). doi:10.3390/ijms20204970
18. Nuyts V , Nawrot TS, Scheers H, Nemery B, Casas L. Air pollution and self-perceived stress
and mood: a one-year panel study of healthy elderly persons. Environ Res.
2019;177:108644. doi:10.1016/j.envres.2019.108644
19. Franklin M, Yin X, McConnell R, Fruin S. Association of the built environment with
childhood psychosocial stress. JAMA Netw Open. 2020;3(10).
doi:10.1001/jamanetworkopen.2020.17634
20. Roberts S, Arseneault L, Barratt B, et al. Exploration of NO2 and PM2.5 air pollution and
mental health problems using high-resolution data in London-based children from a UK
longitudinal cohort study. Psychiatry Res. 2019;272:8-17.
doi:10.1016/j.psychres.2018.12.050
21. Liu L, Yan Y , Nazhalati N, Kuerban A, Li J, Huang L. The effect of PM2.5 exposure and
risk perception on the mental stress of Nanjing citizens in China. Chemosphere.
2020;254:126797. doi:10.1016/j.chemosphere.2020.126797
22. Kawachi I, Colditz GA, Ascherio A, et al. Coronary heart disease/myocardial infarction:
prospective study of phobic anxiety and risk of coronary heart disease in men. Circulation.
1994;89(5):1992-1997.
23. Lin Y , Zhou L, Xu J, et al. The impacts of air pollution on maternal stress during pregnancy.
Sci Rep. 2017;7(1):40956. doi:10.1038/srep40956
85
24. Ku T, Ji X, Zhang Y , Li G, Sang N. PM2.5, SO2 and NO2 co-exposure impairs
neurobehavior and induces mitochondrial injuries in the mouse brain. Chemosphere.
2016;163:27-34. doi:10.1016/j.chemosphere.2016.08.009
25. Chu C, Zhang H, Cui S, et al. Ambient PM2.5 caused depressive-like responses through
Nrf2/NLRP3 signaling pathway modulating inflammation. J Hazard Mater. 2019;369:180-
190. doi:10.1016/j.jhazmat.2019.02.026
26. Salvi A, Liu H, Salim S. Involvement of oxidative stress and mitochondrial mechanisms in
air pollution-related neurobiological impairments. Neurobiol Stress. 2020;12:100205.
doi:10.1016/j.ynstr.2019.100205
27. Mokoena ML, Harvey BH, Viljoen F, Ellis SM, Brink CB. Ozone exposure of Flinders
Sensitive Line rats is a rodent translational model of neurobiological oxidative stress with
relevance for depression and antidepressant response. Psychopharmacology (Berl).
2015;232(16):2921-2938. doi:10.1007/s00213-015-3928-8
28. Pan W-L, Chang C-W, Chen S-M, Gau M-L. Assessing the effectiveness of mindfulness-
based programs on mental health during pregnancy and early motherhood - a randomized
control trial. BMC Pregnancy Childbirth. 2019;19. doi:10.1186/s12884-019-2503-4
29. Tanpradit K, Kaewkiattikun K. The effect of perceived stress during pregnancy on preterm
birth. Int J Womens Health. 2020;12:287-293. doi:10.2147/IJWH.S239138
30. Ragland DR. Dichotomizing continuous outcome variables: dependence of the magnitude
of association and statistical power on the cutpoint. Epidemiology. 1992;3(5):434-440.
31. Iacobucci D, Posavac SS, Kardes FR, Schneider MJ, Popovich DL. The median split:
robust, refined, and revived. J Consum Psychol. 2015;25(4):690-704.
doi:https://doi.org/10.1016/j.jcps.2015.06.014
32. Iacobucci D, Posavac SS, Kardes FR, Schneider MJ, Popovich DL. Toward a more nuanced
understanding of the statistical properties of a median split. J Consum Psychol.
2015;25(4):652-665. doi:https://doi.org/10.1016/j.jcps.2014.12.002
33. McClelland GH, Lynch JG, Irwin JR, Spiller SA, Fitzsimons GJ. Median splits, Type II
errors, and false–positive consumer psychology: don’t fight the power. J Consum Psychol.
2015;25(4):679-689. doi:https://doi.org/10.1016/j.jcps.2015.05.006
34. VanderWeele TJ. Principles of confounder selection. Eur J Epidemiol. 2019;34(3):211-219.
doi:10.1007/s10654-019-00494-6
35. Lawless MH, Harrison KA, Grandits GA, Eberly LE, Allen SS. Perceived stress and
smoking-related behaviors and symptomatology in male and female smokers. Addict Behav.
2015;51:80-83. doi:10.1016/j.addbeh.2015.07.011
86
36. Yang Z, Song Q, Li J, Zhang Y . Air pollution as a cause of obesity: micro-level evidence
from Chinese cities. Int J Environ Res Public Health. 2019;16(21):4296.
doi:10.3390/ijerph16214296
37. Gifi A. Algorithm descriptions for ANACOR, HOMALS, PRINCALS, and OVERALS.
Research Report RR 89-01. February 1989. http://www.datatheory.nl/pdfs/89/89_01.pdf
38. Mori Y , Kuroda M, Makino N. Nonlinear Principal Component Analysis and Its
Applications. Springer Nature; 2016. doi:10.1007/978-981-10-0159-8
39. Michailidis G, de Leeuw J. The Gifi system of descriptive multivariate analysis. Stat Sci.
1998;13(4):307-336. doi:10.1214/ss/1028905828
40. Jackson DA. Stopping rules in principal components analysis: A comparison of heuristical
and statistical approaches. Ecology. 1993;74(8):2204.
41. Dunteman G. In: Principal Components Analysis. SAGE Publications, Inc.; 2021:66-75.
doi:10.4135/9781412985475
42. Powers MG, Seltzer W, Shi J. Gender differences in the occupational status of
undocumented immigrants in the United States: experience before and after legalization. Int
Migr Rev. 1998;32(4):1015-1046. doi:10.2307/2547670
43. Flores-Yeffal NY . English proficiency and trust networks among undocumented Mexican
migrants. Ann Am Acad Pol Soc Sci. 2019;684(1):105-119. doi:10.1177/0002716219855024
44. Li X, Feng YJ, Liang HY . The impact of meteorological factors on PM2.5 variations in
Hong Kong. IOP Conf Ser Earth Environ Sci. 2017;78:012003. doi:10.1088/1755-
1315/78/1/012003
45. Bruno D, Ryan G, Kaplan C, Slemmer J. Climate of Los Angeles, California. NOAA
Technical Memorandum NWS WR-261. Published January 2000. Accessed December 22,
2020. https://repository.library.noaa.gov/view/noaa/14744
46. Lederer DJ, Bell SC, Branson RD, et al. Control of confounding and reporting of results in
causal inference studies : guidance for authors from editors of respiratory, sleep, and critical
care journals. Ann Am Thorac Soc. 2019;16(1):22-28. doi:10.1513/AnnalsATS.201808-
564PS
47. Dunkler D, Plischke M, Leffondré K, Heinze G. Augmented backward elimination: a
pragmatic and purposeful way to develop statistical models. PLoS ONE. 2014;9(11).
doi:10.1371/journal.pone.0113677
48. Schisterman EF, Cole SR, Platt RW. Overadjustment bias and unnecessary adjustment in
epidemiologic studies. Epidemiol Camb Mass. 2009;20(4):488-495.
doi:10.1097/EDE.0b013e3181a819a1
87
49. Conway JR, Lex A, Gehlenborg N. UpSetR: an R package for the visualization of
intersecting sets and their properties. Bioinformatics. 2017;33(18):2938-2940. doi:10.1093/
bioinformatics/btx364
50. Dorie V , Harada M, Carnegie NB, Hill J. A flexible, interpretable framework for assessing
sensitivity to unmeasured confounding. Stat Med. 2016;35(20):3453-3470.
doi:10.1002/sim.6973
51. Rolf H H Groenwold, Nelson DB, Nichol KL, Hoes AW, Hak E. Sensitivity analyses to
estimate the potential impact of unmeasured confounding in causal research. Int J
Epidemiol. 2010;39(1):107-117. doi:10.1093/ije/dyp332
52. VanderWeele TJ. Sensitivity analysis: distributional assumptions and confounding
assumptions. Biometrics. 2008;64(2):645-649. doi:https://doi.org/10.1111/j.1541-
0420.2008.01024.x
53. Brunsdon C, Fotheringham AS, Charlton ME. Geographically weighted regression: a
method for exploring spatial nonstationarity. Geogr Anal. 1996;28(4):281-298.
doi:https://doi.org/10.1111/j.1538-4632.1996.tb00936.x
54. Air quality trends by city 2000-2019. Published online June 8, 2020. Accessed January 3,
2021. https://www.epa.gov/sites/production/files/2020-06/airqualitytrendsbycity2000-
2019.xlsx
55. Eichstaedt KE, Kovatch K, Maroof DA. A less conservative method to adjust for
familywise error rate in neuropsychological research: the Holm’s sequential Bonferroni
procedure. NeuroRehabilitation. 2013;32(3):693-696. doi:10.3233/NRE-130893
56. Greenland S. Invited commentary: variable selection versus shrinkage in the control of
multiple confounders. Am J Epidemiol. 2008;167(5):523-529. doi:10.1093/aje/kwm355
88
Abstract (if available)
Abstract
BACKGROUND: Observational research has shown an association of air pollutants—ambient particulate matter smaller than 2.5 µm in aerodynamic diameter (PM₂.₅), personal nitrogen dioxide (NO₂), and near-roadway nitrogen oxides (NOₓ)—with perceived stress as measured by different versions of the Perceived Stress Scale (PSS) in older adults in the northeastern United States and Western Europe, and in children in Southern California, respectively. Observational evidence has also connected perceived stress to obesity in mothers of young children, with lower socioeconomic status. Murine experiments have shown air pollution could change brain, cognitive or mental health, for exposure durations longer than five days and predetermined exposure intensities. However, the precise exposure time window, during which an ambient pollution concentration can vary, with the largest association with perceived stress in humans has not been determined for particular pollutants. Also, the relationship between air pollution and perceived stress among persons who are pregnant in Los Angeles County and have lower income might be particularly relevant to addressing the causes of obesity-related health outcomes in known environmental health disparities populations such as the Hispanic and African or Black American communities. As such, we investigated the relationship between multiple pollutants and perceived stress in the Los Angeles Maternal and Developmental Risks from Environmental and Social Stressors (MADRES) pregnancy cohort, for several short and long periods of exposure estimated for participants’ locations of residence. ❧ METHODS: A total of 426 MADRES cohort participants who took the 10-item Perceived Stress Scale in their third trimester of pregnancy in 2016-2019 were selected to develop and fit binary logistic regression models of perceived stress (median-split PSS score). The mean air-pollutant level of the n days before the PSS administration date was the regressor of interest; n = 1, 2, ..., 9, 10, 15, 20, 25, 30, 45, 60, 75, 90, 105, 120. The pollutants examined were residential ambient NO₂, ozone (O₃), PM₂.₅, and particulate matter less than 10 µm in aerodynamic diameter (PM₁₀), and road class -specific NOₓ that had been estimated using the CALINE4 line-source dispersion model. Principal component analysis was used to create approximate latent measures of meteorology and socioeconomic status. Confounders were selected by using a priori information, followed by augmented backward elimination. Maternal age, time having lived in the United States, ethnicity, language, race, individual-level and geographic indicators of socioeconomic status, marital status, meteorological variables, month of PSS and physical activity were identified as possible confounders prior to automated selection. Clinical depression risk, meteorology and seasonality were considered as effect modifiers. After adjusting for multiple testing, the logistic models in which the pollutant parameter estimate was significant were evaluated by simulating a binary unmeasured confounder. The same models were evaluated for evidence of spatial non-stationarity and geographic confounding by using geographically weighted logistic regression. ❧ RESULTS: There was a significant (adjusted P=.04) association between the logarithm (to base 2) of 7-day mean PM₂.₅ concentration and dichotomous perceived stress. In initial exploratory models, the association of PM₂.₅ with perceived stress was not significant for the very shortest exposure windows (one, two and three days) or windows longer than seven days (unadjusted P>.05). The estimated odds ratio for a doubling of the 7-day PM₂.₅ level was 1.99 (95% CI 1.02, 4.07), where the Bonferroni-adjusted 95% confidence interval was calculated for 21 tests (a 99.76% CI). There was no significant spatial variation of the parameter estimate with either an optimal kernel bandwidth or selected non-optimal bandwidths. Also, no evidence was obtained of extreme geographic confounding. The estimated OR remained greater than 1 with simulations of an unmeasured confounder with various prevalences and associations with exposure/outcome; although, there were exceptions involving an OR (association of the unmeasured confounder with the outcome) greater than 2.5 and an estimated difference of means (association of the unmeasured confounder with exposure level) of between approximately 20% and 30%, or between about 0.25 and 0.40 in log₂ of the PM₂.₅ level. Stratification suggested that a positive association between 7-day PM₂.₅ might be larger in warmer months or at higher temperatures, but all of the interaction terms were nonsignificant. ❧ DISCUSSION: Overall, we found evidence of a positive association of 7-day mean ambient PM₂.₅ with perceived stress. Results were nonsignificant for shorter or longer windows and for other pollutants, or inconclusive due to the method of ceasing exploration of an exposure window when an unadjusted P-value less than .05 was obtained with a preliminary confounder set common to all windows.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Assessment of the mortality burden associated with ambient air pollution in rural and urban areas of India
PDF
Spatial analysis of PM₂.₅ air pollution in association with hospital admissions in California
PDF
Spatial modeling of non-tailpipe emissions and its association with children's lung function
PDF
Psychometric study of an English version of Perceived Stress Scale in minority adolescents
PDF
A cohort study of air-pollution and childhood obesity incidence
PDF
Examining exposure to extreme heat and air pollution and its effects on all-cause, cardiovascular, and respiratory mortality in California: effect modification by the social deprivation index
PDF
Association of traffic-related air pollution and age-related macular degeneration in the Los Angeles Latino Eye Study
PDF
Prediction modeling with meta data and comparison with lasso regression
PDF
The impact of perceived parental and self-reported stress on BMI and body composition in young adults
PDF
The association between sun exposure and multiple sclerosis
PDF
Personal exposure to particulate matter PM2.5 sources during pregnancy and birthweight
PDF
Associations between ambient air pollution and hypertensive disorders of pregnancy
PDF
Assessing the impact of air pollution on adverse birth outcomes in a low resource setting
PDF
Linking air pollution to integrative gene and metabolites networks in young adult with asthma
PDF
Air pollution and breast cancer survival in California teachers: using address histories and individual-level data
PDF
Prenatal environmental exposures and fetal growth in the MADRES cohort
PDF
Native American ancestry among Hispanic Whites is associated with higher risk of childhood obesity: a longitudinal analysis of Children’s Health Study data
PDF
Comparison of models for predicting PM2.5 concentration in Wuhan, China
PDF
Uncertainty quantification in extreme gradient boosting with application to environmental epidemiology
PDF
The effects of heat and air pollution on mental-health related mortality
Asset Metadata
Creator
Yu, Jeremy K.
(author)
Core Title
Associations of ambient air pollution exposures with perceived stress in the MADRES cohort
School
Keck School of Medicine
Degree
Master of Science
Degree Program
Biostatistics
Degree Conferral Date
2021-05
Publication Date
05/07/2021
Defense Date
05/07/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Air pollution,OAI-PMH Harvest,perceived stress,PM2.5
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Habre, Rima (
committee chair
), Franklin, Meredith (
committee member
), Siegmund, Kimberly (
committee member
)
Creator Email
jkyu@usc.edu,refactored1@zoho.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC112720121
Unique identifier
UC112720121
Identifier
etd-YuJeremyK-9610.pdf (filename)
Legacy Identifier
etd-YuJeremyK-9610
Document Type
Thesis
Format
application/pdf (imt)
Rights
Yu, Jeremy K.
Type
texts
Source
20210510-wayne-usctheses-batch-836-shoaf
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
perceived stress
PM2.5