Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Urban air pollution and environmental justice: three essays
(USC Thesis Other)
Urban air pollution and environmental justice: three essays
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
Urban Air Pollution and Environmental Justice:
Three Essays
By
Yougeng Lu
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(URBAN PLANNING AND DEVELOPMENT)
August 2022
Copyright 2022 Yougeng Lu
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
ii
Dedication
I dedicate this dissertation to my wife, Jing, and my daughter, Luna.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
iii
Acknowledgements
I would like to express my sincere appreciation and gratitude to Professor Genevieve Giuliano,
my advisor and dissertation committee chair for her guidance and support through the Ph.D.
program. Professor Giuliano is such a great mentor, who has taught me how to be a qualified and
competent researcher and tutor. I admire her passion for research and teaching. I hope I can have
the opportunity to pass on what I have learned from her to my students if I become a professor in
the future. I would also like to thank my dissertation committee members, Professor Marlon
Boarnet and Professor Rima Habre for their valuable suggestions on my dissertation and their
generous help through the Ph.D. program. I am indebted to my colleagues Jack Quan Yuan, Sue
Dexter, Jiawen Fang, Clemens Pilgram, Robert Binder, and my peers in the PhD program of
Urban Planning and Development for their generous help both professionally and personally
during my years of study at the University of Southern California. My wife, Jing, has always
been supportive when I hit a research bottleneck. Without her encouragement, I do not think I
could have got the Ph.D. degree. My daughter, Luna, who was born a month before my
graduation, has brought us so much joy. I am particularly thankful to my father, Rong Lu, my
mother, Jing Lin, my father-in-law, Liwen Liu, and my mother-in-law, Yan Qin, for their support
and encouragement.
Yougeng Lu
May 2022
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
iv
Table of Contents
Dedication ....................................................................................................................................... ii
Acknowledgements ........................................................................................................................ iii
List of Figures ............................................................................................................................... vii
List of Tables ................................................................................................................................. ix
Abstract ........................................................................................................................................... x
CHAPTER 1: Introduction ............................................................................................................. 1
1. Dissertation Overview ............................................................................................................. 1
2. Contribution of this Research .................................................................................................. 4
3. Organization of the Dissertation .............................................................................................. 6
CHAPTER 2: Estimating Hourly PM 2.5 Concentrations at the Neighborhood Scale Using a
Low-Cost Air Sensor Network: A Los Angeles Case Study .......................................................... 8
Abstract ........................................................................................................................................... 8
1. Introduction ............................................................................................................................. 9
2. Material and methods ............................................................................................................ 12
2.1 Study Area ....................................................................................................... 12
2.2 Dependent variable .......................................................................................... 14
2.2.1 PM 2.5 concentrations from the PurpleAir network .................................... 14
2.2.2 Quality control for PurpleAir PM 2.5 measurements .................................. 15
2.3 Independent variables ...................................................................................... 20
2.3.1 Meteorology and ambient, central site PM 2.5 data .................................... 20
2.3.2 Land use variables ..................................................................................... 20
2.3.3 Traffic variables ........................................................................................ 21
2.3.4 Temporal and spatial trending variables ................................................... 22
2.4 Data integration ............................................................................................... 22
2.5 Model development ......................................................................................... 25
2.5.1 Spatial and temporal convolutional layers ................................................ 25
2.5.2 PM 2.5 prediction model .............................................................................. 27
2.5.3 Cross-validation of PM 2.5 prediction model considering the unbalanced
nature of hourly PM
2.5
Purple Air data.............................................................................. 28
2.6 Prediction of hourly PM 2.5 in Los Angeles County ......................................... 29
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
v
3. Results ................................................................................................................................... 30
3.1 Descriptive characteristics ............................................................................... 30
3.2 Model performance results .............................................................................. 32
3.3 Predicted PM 2.5 surfaces .................................................................................. 35
3.3.1 Predicting PM 2.5 on weekday and weekend days ...................................... 35
3.3.2 Predicting PM 2.5 on a day impacted by wildfire smoke ............................ 37
3.3.3 Predicting PM 2.5 on a holiday day ............................................................. 38
4. Discussion .............................................................................................................................. 40
5. Conclusion ............................................................................................................................. 43
CHAPTER 3: Beyond Air Pollution at Home: Assessment of Personal Exposure to PM 2.5
Using Activity-Based Travel Demand Model and Low-Cost Air Sensor Network Data ............. 45
Abstract ......................................................................................................................................... 45
1. Introduction ........................................................................................................................... 47
2. Data and Method ................................................................................................................... 50
2.1 Study area ........................................................................................................ 50
2.2 Data .................................................................................................................. 51
2.2.1 Activity-based travel demand modeling ................................................... 51
2.2.2 Air quality modeling ................................................................................. 55
2.3 Method ............................................................................................................. 56
2.3.1 Static and Dynamic exposure .................................................................... 57
2.3.2 Exposure measurement error ..................................................................... 57
3. Result ..................................................................................................................................... 60
3.1 Spatiotemporal distribution of PM 2.5 concentrations and population activity . 60
3.2 Individual exposure assessment....................................................................... 61
3.3 The impact of activity location and mobility on exposure estimation ............ 65
4. Discussion .............................................................................................................................. 70
5. Conclusion ............................................................................................................................. 75
CHAPTER 4: Whose Exposure Levels Were Incorrectly Estimated: Assessing Classification
Errors in Individual’s PM 2.5 Exposure Using a Machine Learning Approach ............................. 76
Abstract ......................................................................................................................................... 76
1. Introduction ........................................................................................................................... 77
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
vi
2. Method ................................................................................................................................... 80
2.1 Study area ........................................................................................................ 80
2.2 Data .................................................................................................................. 80
2.2.1 PM 2.5 concentration data ........................................................................... 80
2.2.2 Human movement data.............................................................................. 81
2.3 Individual exposure assessment....................................................................... 82
2.3.1 Static and dynamic exposure assessment .................................................. 82
2.3.2 Exposure classification error ..................................................................... 83
2.4 Random forest classification model ................................................................ 84
3. Results ................................................................................................................................... 86
3.1 Exposure classification error analysis ............................................................. 86
3.2 Descriptive analysis of variables ..................................................................... 87
3.3 Random forest results ...................................................................................... 89
3.3.1 Model performance and variable importance ............................................ 89
3.3.2 Partial dependence analysis ....................................................................... 92
3.4 Case studies in exposure classification error assessment ................................ 94
4. Discussion .............................................................................................................................. 96
5. Conclusion ........................................................................................................................... 100
CHAPTER 5: Conclusions ......................................................................................................... 101
Bibliography ............................................................................................................................... 105
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
vii
List of Figures
Figure 1 The study area of Los Angeles County, showing locations of low-cost Purple Air
PM2.5 sensors. .............................................................................................................................. 14
Figure 2 The framework of data quality control for PurpleAir PM2.5 readings. ......................... 16
Figure 3 Scatter plots of Purple Air dual-channel hourly PM2.5 measurements before (left)
and after (right) quality control. .................................................................................................... 19
Figure 4 The correlation of the PM2.5 measurements between PurpleAir sensors by varying
distance. ........................................................................................................................................ 27
Figure 5 Hourly random forest model performance results a. base model training; b. random
CV results...................................................................................................................................... 33
Figure 6 Hourly random forest model performance results based on relative humidity
quartiles A. humidity Q1 (<47.09%); B. humidity Q2 (47.08-66.14%); C. humidity Q3
(66.14-80.27%); D. humidity Q4 (>80.27%). ............................................................................... 34
Figure 7 Variable importance of the top 20 predictors in the Random Forest model. ................. 35
Figure 8 Hourly mean PM2.5 concentrations in Los Angeles County and corresponding wind
rose maps at four hours: 9 AM, 2 PM, 6 PM, and 11 PM on a typical weekday (Wednesday,
September 18, 2019) (A-D) and weekend (Sunday, September 22, 2019) (E-H). ....................... 37
Figure 9 Hourly mean PM2.5 concentrations in Los Angeles County and corresponding wind
rose maps at four hours: 9 AM, 2 PM, 6 PM, and 11 PM on a typical day with wildfire
(Sunday, November 11, 2018) (A-D) and holiday day (Thursday, July 4, 2019) (E-H). ............. 39
Figure 10 Average traffic volumes by hour on different day type. .............................................. 40
Figure 11 The study area of Los Angeles County, showing daily PM2.5 concentration
distribution on a typical weekday in 2019. ................................................................................... 51
Figure 12 Distribution of average hourly PM2.5 concentration at different times (a-d) and
population density (e-h). ............................................................................................................... 61
Figure 13 Distribution of the percentage of the relative difference between static and dynamic
exposure at individual TAZs for (a) all individuals, (b) nonworkers, and (c) workers. ............... 65
Figure 14 Boxplots of static and dynamic exposures estimated (μg/m3) for four types of
workers based on their residence and work locations and commuting distance: (a) workers
live in low-pollution TAZs and work in low-pollution TAZs; (b) workers live in
low-pollution TAZs and work in high-pollution TAZs; (c) workers live in high-pollution
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
viii
TAZs and work in low-pollution TAZs; (d) workers live in high-pollution TAZs and work
in high-pollution TAZs. ................................................................................................................ 67
Figure 15 The impact of mobility on exposure measurement error factors for four worker
groups. ........................................................................................................................................... 68
Figure 16 Summary statistics for four worker groups. ................................................................. 69
Figure 17 Distribution of static and dynamic PM2.5 exposure for exposure classification
groups. ........................................................................................................................................... 87
Figure 18 Distribution of selected mobility and sociodemographic variables for exposure
classification groups...................................................................................................................... 89
Figure 20 Random Forest model performance: (a) confusion matrix of predicted exposure
classification groups; (b) variable importance rank. ..................................................................... 92
Figure 21 Partial Dependence (PD) plots for the most important variables in the random
forest classification model for (a) the Accurate group, (b) the Overestimated group, and (c)
the Underestimated group. ............................................................................................................ 94
Figure 23 Daily exposure profile for three selected individuals from different exposure
classification groups: (a) Person 1 from the Accurate group, (b) Person 2 from the
Overestimated group, and (c) Person 3 from the Underestimated group; and (d) spatial
distribution of selected individuals’ daily activity places. ............................................................ 95
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
ix
List of Tables
Table 1 Description of dependent and predictor variables (N=2,502,999)................................... 23
Table 2 Descriptive statistics of variables (N=2,502,999). ........................................................... 30
Table 3 Hourly random forest model results (N=2,502,999)........................................................ 33
Table 4 Example of an individual’s trajectory data from SCAG ABM. ...................................... 53
Table 5 Descriptive statistics of complete SCAG ABM data and sample data. ........................... 54
Table 6 Comparison between static and dynamic PM2.5 exposure estimate (μg/m3) for the
overall population, workers, and nonworkers. .............................................................................. 62
Table 7 The average fraction of a day spent in each microenvironment and contribution of
each microenvironment to total daily PM2.5 exposure. ............................................................... 64
Table 8 Comparison between static and dynamic exposure estimate of PM2.5 for workers
based on race/ethnicity and income level. .................................................................................... 69
Table 9 Descriptive statistics of the mobility and sociodemographic variables used in analysis
of exposure classification errors. .................................................................................................. 90
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
x
Abstract
This dissertation examines a rising environmental justice problem in air pollution exposure
assessments. Through three independent yet interrelated empirical studies, this dissertation
develops a machine learning based model to capture PM 2.5 variations in high spatiotemporal
resolution, identifies the effect of overlooking individual mobility on misclassification errors in
exposure estimations, and evaluates the factors leading to such errors at the individual level. The
use of low-cost PM 2.5 sensors and machine learning method allow for exposure predictions in
fine spatial and temporal resolution. Results suggest that ignoring individual mobility in
exposure measurement to outdoor PM 2.5 can lead to erroneous classification results, which may
further result in ineffective environmental and public health policy implications. Individual
mobility patterns and sociodemographic characteristics are the two major factors contributing to
exposure measurement errors. Exposure measurement errors increase for people who exhibit
high mobility levels, especially for workers who have longer distances and time away from the
place of residence. Disparate exposure measurement errors can also be detected across
sociodemographic groups. Low income and high residential pollution levels are correlated with
exposure overestimation while high income and low residential pollution levels are correlated
with exposure underestimation. The findings also suggest that the exposure discrepancies
between the socially disadvantaged and the privileged documented in previous studies may be
diminished by human mobility.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
1
CHAPTER 1: Introduction
1. Dissertation Overview
Exposure to ambient fine particulate matter (PM 2.5), known as particulate matter with
aerodynamic diameter <2.5 µm, has been associated with a variety of adverse health outcomes,
including cardiovascular disease (Madrigano et al., 2013; Neophytou et al., 2014), acute
respiratory symptoms (Bose et al., 2015), diabetes (Yang et al., 2018), and stroke (Shah et al.,
2015). Therefore, accurate PM 2.5 exposure estimations are important for assessments of
detrimental health effects of PM
2.5
and subsequent environmental and public health policy
development.
Personal exposure to outdoor PM 2.5 occurs through dynamic spatiotemporal interactions
between individuals and their environment, which are in turn determined by mobility and time-
activity patterns. Accordingly, exposure estimation requires characterization of PM 2.5
concentrations when and where a person or group spends time (Gurram et al., 2019; Kim &
Kwan, 2021a; Park, 2020; Park & Kwan, 2017). Traditional approaches to quantifying exposure
to outdoor PM2.5 assume that concentrations at the residential address are adequate surrogates of
personal exposures (Bae et al., 2007; Elliott & Smiley, 2019; Houston et al., 2004; Rowangould,
2013). The underlying assumption is that individuals spend most of their time indoors (Lu, 2021;
Park, 2020), as well as that outdoor PM 2.5 infiltrates into the indoor environment where personal
exposure occurs. Given that people are mobile, their exposure to PM 2.5 of outdoor origin can
occur in various locations. This static residential approach may introduce exposure measurement
error and bias in health and environmental impact assessments, leading to ineffective policy
interventions.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
2
Furthermore, the varying spatial and temporal resolution of pollutant surfaces used in
prior exposure studies can also impede accurate exposure assessments (Korhonen et al., 2019; Y.
Li et al., 2016). The spatial resolution of assigned PM 2.5 concentrations in previous studies varied
substantially, ranging from 0.25 to 10 km
2
(Dewulf et al., 2016; Gurram et al., 2019; Lu, 2021;
Park & Kwan, 2017; E. Setton et al., 2011; X. Yu et al., 2020), while the temporal resolution of
outdoor PM 2.5 concentrations ranged from monthly or seasonally to annually (M. Nyhan et al.,
2016; Pennington et al., 2016; E. M. Setton et al., 2008). The coarse spatial resolution of PM 2.5
concentrations may not reflect critical spatial gradients, and greater temporal aggregation does
not capture how concentrations vary over time. Accurate exposure estimations require air
pollution concentrations in the fine spatiotemporal scale.
Although there is greater awareness of the importance of human mobility and air
pollution variations in exposure estimations, few studies have integrated both factors in personal
exposure estimations and examined their impacts on potential exposure measurement errors.
There is also very little information on whether impacts differs across sociodemographic groups.
To fill all the above mentioned research gaps, I developed three independent yet interrelated
essays, which integrate human mobility and hourly PM
2.5
concentration at neighborhood scale in
outdoor PM 2.5 exposure measurement and examine whether individuals’ sociodemographic
characteristics and mobility patterns can affect PM 2.5 exposure measurement errors in a stepwise
sequence.
In the Essay One, Estimating Hourly PM 2.5 Concentrations at the Neighborhood Scale
Using a Low-Cost Air Sensor Network: A Los Angeles Case Study, an hourly, 500×500 m
2
gridded PM
2.5
model that integrates PurpleAir low-cost air sensor network data is developed for
Los Angeles County. A stringent quality control scheme is designed to validate the reliability of
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
3
the PurpleAir data in PM 2.5 concentration prediction. A number of spatially and temporally
varying predictors are included in a random forest model to predict hourly PM 2.5 concentration in
fine spatial resolution. Results show that the PM 2.5 prediction model achieves high prediction
accuracy. Taking advantage of the utilization of low-cost PM 2.5 sensors, this model is able to
predict spatial and diurnal patterns in PM 2.5 on typical weekdays and weekends, as well as non-
typical days, such as holidays and wildfire days at a high spatiotemporal resolution.
Essay Two, Beyond Air Pollution at Home: Assessment of Personal Exposure to PM 2.5
Using Activity-Based Travel Demand Model and Low-Cost Air Sensor Network Data,
investigates the effect of human mobility on personal exposure to outdoor PM 2.5 by integrating
travel data simulated by an activity-based travel demand model and hourly PM
2.5
surfaces
developed in the first essay. In this essay, I compare the mobility-based PM 2.5 exposures and the
residence-based PM 2.5 exposures for a sample of Los Angeles County residents at the individual
level to examine the effect of taking daily travel into account on exposure measurement. Results
suggest that exposure measurement errors increase for people with high mobility levels,
especially for workers with long commute distances. This essay highlights the importance of
taking travel patterns into account in exposure estimations but does not examine any statistical
relationship between mobility and exposure measurement errors.
To reveal the underlying factors that can lead to exposure measurement errors, I test
statistical relationships between a variety of mobility and sociodemographic factors and personal
exposure measurement errors in Essay Three, Whose Exposure Levels Were Incorrectly
Estimated: Assessing Classification Errors in Individual’s PM 2.5 Exposure Using a Machine
Learning Approach. As with Essay Two, individuals’ mobility-based and residence-based
exposures to outdoor PM 2.5 are compared to obtain exposure measurement errors. Based on their
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
4
exposure measurement errors, a sample of Los Angeles County residents are subdivided into
three exposure classification groups. Consistent with the second essay, the statistical results
indicate that the magnitude of exposure measurement errors is associated with human mobility
levels. Significant sociodemographic disparities are detected across different groups, suggesting
a robust statistical relationship between individuals’ sociodemographic characteristics and
exposure measurement errors. This essay further reveals that the exposure discrepancies between
the socially disadvantaged and the privileged documented in previous studies may be diminished
by human mobility.
The three essays fall within an unified theme but respectively focus on separate key
issues related to the research topic. Essay One develops an hourly PM
2.5
concentration prediction
model at 0.25 km
2
, allowing more accurate exposure assessments in the second and third essays.
Essay Two highlights the importance of human mobility in personal exposure to outdoor PM 2.5
and identifies the possible errors in exposure estimations when ignoring human mobility in
exposure estimations. The Essay Three examines the effects of distinct mobility and
sociodemographic variables on exposure measurement errors.
2. Contribution of this Research
Concerns regarding social inequities in the distribution of air pollution have led to a considerable
amount of academic research during the last two decades (Bae et al., 2007; Bell & Ebisu, 2012;
Gilbert & Chakraborty, 2011; Houston et al., 2004). Although previous research has made
significant strides in estimating environmental justice issues, two major limitations still exist.
First, air pollution exposure assessment has conventionally relied on air quality monitoring
stations operated by regulatory agencies (Gilbert & Chakraborty, 2011; Lu et al., 2021; Park &
Kwan, 2017). Due to the costs of operation and maintenance, these monitors are placed in only a
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
5
few locations. They are designed to capture average ambient air quality, not daily temporal, or
neighborhood level air quality. Therefore these sparse monitoring stations are unable to capture
variations in PM 2.5 pollution with high spatiotemporal resolution. Second, as noted earlier,
exposure estimates in most previous studies have not considered the spatiotemporal difference in
daily activity patterns across individuals. Most existing research assumes that residential location
is appropriate for estimating exposure throughout the day. However, people also spend time in
other places such as workplaces and leisure locations (Park, 2020; Park & Kwan, 2017, 2018).
Omitting air pollution exposure at microenvironments other than the place of residence can lead
to incorrect exposure estimations.
Essay One is one of the first studies that utilizes low-cost air quality sensors and machine
learning methods to predict hourly PM 2.5 at the neighborhood scale. The recent development of
low-cost air quality sensor technology has created new opportunities to capture variation in
PM 2.5 concentrations with high spatiotemporal coverage. Machine learning methods are helpful
for modeling complex nonlinear relationships between predictor variables and response variables
and perform better than ordinary linear models (e.g., Land Use Regression model). Taking
advantage of the combination of low-cost air sensor data and machine learning, the PM
2.5
concentration model developed in Essay One predicts hourly PM 2.5 for regularly occurring,
typical air quality conditions as well as less frequent and perhaps more extreme conditions like
holidays or wildfire days – an increasingly frequent and public health relevant issue.
As stated above, the integration of high spatiotemporal air pollution concentrations and
detailed human mobility data in personal exposure assessments is limited in the existing
literature. In Essay Two, to fill the research gap, I assess personal exposure by coupling hourly
PM 2.5 concentrations at the neighborhood scale with activity travel data for a sample of Los
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
6
Angeles County residents. This essay contributes to the environmental health literature by
considering spatiotemporal variations of air pollution and human mobility patterns in personal
exposure estimations. The results highlight the importance of human mobility and air pollution
variabilities in accurate exposure assessments.
In the Essay Three, I examine whether the effects of ignoring individual mobility on
exposure measurement errors identified in the Essay Two differ across sociodemographic groups
and what sociodemographic and mobility characteristics can affect exposure measurement errors.
Despite the growing concern on human mobility's effect on exposure levels, factors that cause
exposure measurement errors have not been seriously studied in current literature. This essay
offers important insights into the literature by investigating the underlying factors contributing to
exposure measurement errors. For the first time, I verify principal factors (i.e., human mobility
level, income, and residential pollution) for probable exposure measurement errors based on
statistical models. These factors have previously only been frequently discussed in concept.
Additionally, in contrast to conclusions of previous studies, this essay finds that the exposure
disparities between the socially disadvantaged and the privileged documented in prior exposure
studies may be lessened by human mobility.
3. Organization of the Dissertation
The dissertation consists of three interrelated essays. Each is a standalone research paper. Essay
One and Two has been published; Essays Three are under review. Given their focus on the same
central research problem, the introductory material is somewhat overlapping across the three
essays. This dissertation concludes with a final chapter presenting overall conclusions and a
brief discussion on policy implications..
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
7
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
8
CHAPTER 2: Estimating Hourly PM 2.5 Concentrations at the Neighborhood Scale Using a Low-
Cost Air Sensor Network: A Los Angeles Case Study
*This essay is co-authored by Yougeng Lu, Dr. Genevieve Giuliano, and Dr. Rima Habre
1
Abstract
Predicting PM 2.5 concentrations at a fine spatial and temporal resolution (i.e., neighborhood,
hourly) is challenging. Recent growth in low cost sensor networks is providing increased spatial
coverage of air quality data that can be used to supplement data provided by monitors of
regulatory agencies. We developed an hourly, 500 ×500 m gridded PM 2.5 model that integrates
PurpleAir low-cost air sensor network data for Los Angeles County. We developed a quality
control scheme for PurpleAir data. We included spatially and temporally varying predictors in a
random forest model with random oversampling of high concentrations to predict PM 2.5. The
model achieved high prediction accuracy (10-fold cross-validation (CV) R
2
=0.93, root mean
squared error (RMSE) = 3.23 μg/m
3
; spatial CV R
2
=0.88, spatial RMSE=4.33 μg/m
3
; temporal
CV R
2
=0.90, temporal RMSE=3.85 μg/m
3
). Our model was able to predict spatial and diurnal
patterns in PM 2.5 on typical weekdays and weekends, as well as non-typical days, such as
holidays and wildfire days. The model allows for far more precise estimates of PM 2.5 than
existing methods based on few sensors. Taking advantage of low-cost PM 2.5 sensors, our hourly
random forest model predictions can be combined with time-activity diaries in future studies,
enabling geographically and temporally fine exposure estimation for specific population groups
in studies of acute air pollution health effects and studies of environmental justice issues.
1
Lu, Y., Giuliano, G., & Habre, R. (2021). Estimating hourly PM2. 5 concentrations at the neighborhood scale
using a low-cost air sensor network: A Los Angeles case study. Environmental Research, 195, 110653.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
9
1. Introduction
Epidemiological studies have linked exposure to fine particulate matter (PM 2.5) to several
adverse health outcomes, including all-cause mortality (Di et al., 2017; Lepeule et al., 2012),
cardiovascular disease (Madrigano et al., 2013; Neophytou et al., 2014), respiratory morbidity
(Bose et al., 2015), and diabetes (Yang et al., 2018). According to World Health Organization
guidelines, long-term exposure to PM 2.5 concentrations exceeding 50 µg/m
3
is associated with
increased short-term mortality, mainly attributed to lung cancer (World Health Organization,
2006). Many studies conclude that short-term exposure to PM 2.5 can lead to increase in PM-
related mortality, asthma exacerbations, heart attacks, strokes, emergency department and
hospital admissions (Kloog et al., 2013; Shi et al., 2016; Zanobetti et al., 2014). Hence, better
temporal resolution is needed to be able to understand the acute impacts of PM 2.5 on health
effects. Estimating PM 2.5 exposures at a high spatiotemporal resolution is critical for studying its
impacts on health at the neighborhood scale, understanding environmental health disparities, and
addressing public health concerns, especially for vulnerable populations such as children and the
elderly.
The United States federal and state environmental protection agencies measure ambient
PM 2.5 concentrations with standard federal reference (FRM) or federal equivalent (FEM)
methods to track compliance with National Ambient Air Quality Standards to protect public
health (Hall et al., 2014). As regulatory monitoring is designed to support compliance with
ambient air quality standards mainly at the city or regional levels, it lacks spatial coverage to
reflect detailed PM 2.5 variations at the neighborhood level (Bi, Wildani, et al., 2020;
Rowangould, 2013). Also, FRM/FEM monitors are costly and have strict siting criteria and are
intentionally placed further away from local sources or interferences to measure background air
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
10
quality instead of impacts from specific sources (Gao et al., 2015; Kumar et al., 2015; X. R.
Wang & Oliver Gao, 2011). They are often deployed in secured areas to minimize human
interaction and tampering and further away from direct influences of emission sources (e.g.,
freeway, restaurant). Consequently, these monitors capture broad area or regional level patterns
in the distribution of air pollutants. They are not designed to capture more localized gradients at
high spatial resolution in the neighborhood or within a city (Gao et al., 2015; Kumar et al., 2015;
X. R. Wang & Oliver Gao, 2011).
The recent rise in low-cost sensor technology has created new opportunities to measure
air quality with higher spatiotemporal coverage than can be afforded by regulatory FRM/FEM
monitors (Kumar et al., 2015; Williams et al., 2019). Several studies have integrated data from a
variety of recently emerging low-cost PM 2.5 sensors to estimate ambient PM 2.5 concentrations at
fine spatiotemporal resolution (De Nazelle et al., 2012; M. Nyhan et al., 2016). Due to low cost,
dense deployment, and high market growth rate, low-cost air sensors are expected to 1)
supplement conventional FRM/FEM air pollution monitor networks (Clements et al., 2017), 2)
improve exposure assessment in epidemiological studies of air pollutants and human health, 3)
increase citizens’ awareness and engagement towards air quality issues (Commodore et al.,
2017), and 4) strengthen emergency response management (e.g., monitoring real-time smoke
plume spread during wildfires), hazardous leak detection, and source compliance monitoring
(Rai et al., 2017; Snyder et al., 2013).
Despite enabling real-time air pollution monitoring at finer geographic and temporal
scales, these ubiquitous low-cost air sensors have important limitations mainly related to their
lower accuracy compared to FRM/FEM instruments. Low-cost air sensors are usually owned and
operated by individuals and may not be appropriately sited away from interferences or
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
11
consistently maintained. Data anomalies or losses might result from wireless communications
loss, power outages, or other measurement interferences such as dust deposition. Relative
humidity interferences are another important source of error in low-cost optical sensors such as
the Shinyei, AirBeam, or Purple Air (Gao et al., 2015; Kelly et al., 2017; Zusman et al., 2020).
All of these factors lead to a need for quality control (L. J. Chen et al., 2018).
Given that most optical particulate matter (PM) sensors operate by counting particles
based on optical scattering and converting those counts to mass concentrations using static pre-
established equations, data from these devices usually need to be calibrated against collocated
reference-grade instruments. In addition, most of these optical particle sensors are not capable of
detecting particles smaller than ~300nm in aerodynamic diameter, which contributes
substantially to the particle number distributions in areas with important primary emissions
(Dinoi et al., 2017; J. Li & Biswas, 2017). However, growing efforts have been made to calibrate
or adjust readings from low-cost PM 2.5 sensors to obtain reasonable accuracy (Kelly et al., 2017;
Sayahi et al., 2019).
Several recent studies have utilized low-cost sensors to predict PM 2.5 for areas without
ground monitors at both fine spatial and temporal resolution (Bi, Stowell, et al., 2020; Bi,
Wildani, et al., 2020; Gupta et al., 2018; Huang et al., 2019; J. Li et al., 2020). However, fewer
have been able to predict PM 2.5 at daily to sub-daily temporal resolutions (e.g., hourly to daily),
which can be important for exposure and health studies (Brokamp et al., 2018; Di et al., 2016; X.
Hu et al., 2017). Epidemiological studies usually use time-activity diaries and air pollution
distribution to estimate exposure to outdoor air pollution. Exposure is calculated by the time
spent at each location visited and real-time pollutant concentrations at that microenvironment
(M. Nyhan et al., 2016; M. M. Nyhan et al., 2019). Exposure tends to be overestimated (at home)
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
12
or underestimated (at the workplace) if daily average pollutant concentration is used instead of
real-time concentration, as pollutant concentrations vary over a day (Dhondt et al., 2012; Park &
Kwan, 2017). Diurnal patterns in exposure can be influenced by operation schedules of sources
(e.g., traffic, industrial activities) but also how people move over a day. This is especially
relevant for air pollutants impacted by diverse sources and highly variable in time and space,
such as particle matter (e.g., PM 2.5).
To date, limited studies have used low-cost sensor measurements to improve PM2.5
prediction with high spatiotemporal resolutions at the hourly level. We develop a multivariate
random forest model incorporating meteorological, traffic, topographical, and land-use variables
to estimate hourly averaged ground-level PM
2.5
concentrations at a resolution of 500m × 500m
using low-cost PM 2.5 sensor data from the Purple Air network throughout Los Angeles County,
U.S. for 2018 and 2019.
2. Material and methods
2.1 Study Area
We define our study area as Los Angeles (LA) County, located in the southwest of California
(CA). LA County is the most populous county in the United States, with over 10 million
residents and a population density of 815 people/km
2
as of 2019. The American Lung
Association reports that the Los Angeles-Long Beach metropolitan area is the one of the
metropolitan areas with the most severe PM pollution among the 200 metropolitan areas in the
U.S. (American Lung Association, 2020). Several studies concluded that air pollution is one of
the most serious public health hazards in Los Angeles (American Lung Association, 2020; Kim
& Kwan, 2020; Lurmann et al., 2015). Among all sources, vehicular emissions contribute
approximately 30% of the total PM 2.5 mass concentrations in Southern CA (Habre et al., 2021;
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
13
Hasheminassab et al., 2014). Shipping and freight activities at the Los Angeles and Long Beach
ports are also significant sources of PM 2.5 (Ault et al., 2009).
Our modeling domain is defined by the administrative boundary of LA County, which
spreads approximately from -118.945° to -117.647° in the West-East direction and from 33.704°
to 34.821° in the South-North direction (Figure 1). We defined our modeling grid as a 500-meter
hexagon grid (side length) resulting in a study domain of 19,297 hexagon grid cells, using the
North American Datum of 1983 geographic projection system
( NAD_1983_StatePlane_California_V_FIPS_0405_Feet). We used hexagon grid network
because the centroid of hexagon has equal distance to each endpoint and edge, which is more
physically realistic assumption in space versus a square grid. The study period is from January 1,
2018, to December 31, 2019, and contains a total of 730 days and 17,520 hours. Figure 1
illustrates our modeling domain.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
14
Figure 1 The study area of Los Angeles County, showing locations of low-cost Purple Air PM 2.5
sensors.
2.2 Dependent variable
2.2.1 PM 2.5 concentrations from the PurpleAir network
PM 2.5 data were collected from the citizen-science low-cost PurpleAir (PA) sensor network
providing real-time measurements of PM 1, PM 2.5, and PM 10 concentrations. Each PA sensor
consists of two Plantower PMS 5003 laser sensors. The PA uses a fan to draw air past a laser
beam and measures 90-degree light scattering with a photo-diode detector. The optical detector
then converts scattered light to a voltage pulse, and the particle count is calculated by counting
the number of pulses from the scattering signal (Sayahi et al., 2019). Counts are then converted
to mass concentrations (µg/m
3
) using manufacturer-provided equations for PM 1.0, PM 2.5, and
PM
10
. The optical method is very prone to relative humidity interferences and does not detect
particles ~<300 nm in size, introducing measurement error. In addition, PA sensors also contain
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
15
a BME280 sensor enclosed inside the sampling chamber to measure relative humidity (%) and
temperature (°F). According to the manufacturer, the effective measurement range of PM 2.5 for
PA sensor is between 0 to 500 μg/m
3
, and the working temperature and humidity ranges are -10
to 60 °C and 0-99%, respectively (Zhou & Zheng, 2016). PA sensor usage and deployment have
proliferated in the last several years, with more than 9,000 sensors in use worldwide and a
growth rate of 30 sensors per day (Morawska et al., 2018).
Los Angeles County has a relatively dense network of PA sensors (Figure 1). During our
study period, there were 361 PA active sensors in Los Angeles County and the surrounding area;
however, not every PA sensor works properly and continuously throughout the study period. In
total, we obtained all instantaneous measurements and averaged them to 3,112,134 hourly PM
2.5
measurements from 361 outdoor PA sensors in Los Angeles County throughout our study period
from January 1, 2018, to December 31, 2019.
2.2.2 Quality control for PurpleAir PM 2.5 measurements
PA sensors are densely deployed because of their affordability and low maintenance needs.
However, this in turn introduces uncertainty about their data quality because malfunctions can go
undetected for long periods. Instead of attempting to calibrate PA sensor data to the nearest
reference monitor which was >500m away in our modeling domain (as compared to the desired
<4m collocation distance), we developed a rigorous quality control scheme using both channels’
readings to minimize outliers and eliminate malfunctioning sensors or readings. Figure 2 shows
the framework and process of quality control for PA PM 2.5 measurements.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
16
Figure 2 The framework of data quality control for PurpleAir PM 2.5 readings.
We first identified malfunctioning sensors with a very low frequency of change in their
readings across time. It is common to expect sensor performance to drift over time; however, if it
responds to actual changes in air quality, we expect its readings to fluctuate or display some
variability over time. We calculated the centered 5-hour (previous 2 hours and following 2
hours) moving standard deviation of PA PM
2.5
readings by channel (Eq. 1). In Eq.1, 𝑿 𝒊 is the PA
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
17
PM 2.5 measurement at hour 𝒊 and 𝝁 is the population mean for either Channel A or Channel B.
We discarded all hourly records with 5-hour moving standard deviation of zero, i.e., the sensor’s
reading remains constant during the five hours. Out of the original 3,112,134 PM 2.5
measurements, 84,277 (2.71%) were discarded based on this criterion.
𝑺𝑫
(𝒕 −𝒏 , 𝒕 +𝒏 )
= √
𝟏 𝟐𝒏 +𝟏 ∑ (𝑿 𝒊 − 𝝁 )
𝟐 𝟐𝒏 +𝟏 𝒊 =𝟏 (1)
Second, we discarded apparent PM 2.5 outliers with extreme hourly values greater than
500 μg/m
3
that exceed the sensor’s effective measurement range in both channels, in total 27,386
records (0.88%) (Zhou & Zheng, 2016). Then, we calculated the median absolute deviation
(MAD) by channel within a calendar month to filter out outliers as shown in Eq. 2 and Eq. 3.
Briefly, 𝑿 𝒊 is the PM 2.5 reading of one of the channels (A or B) at hour 𝒊 , 𝑿 ̃
is the median of 𝑿 𝒊 in
a month, and b = 1.4826, a constant linked to the assumption of normality of the data,
disregarding the abnormality induced by outliers (Leys et al., 2013). We then defined the
threshold to discard outliers based on the gap between the PM 2.5 measurement and a multiple of
the MAD. This study set this multiplier to three as in the previous studies (Miller, 1991) and
removed 9,690 (0.31%) PM 2.5 measurements consistent with the decision rule in Eq. 3.
𝑴𝑨𝑫 = 𝒃 ∗ 𝒎𝒆𝒅𝒊𝒂𝒏 (|𝑿 𝒊 − 𝑿 ̃
|) (2)
𝑿 𝒊 < 𝑿 ̃
− 𝟑 ∗ 𝑴𝑨𝑫 or 𝑿 𝒊 > 𝑿 ̃
+ 𝟑 ∗ 𝑴𝑨𝑫 (3)
Third, we aimed to identify periods of prolonged interruption or data loss due to power
outages or data communications loss, for example, that could lead to operation interruption. In
order to avoid potential errors caused by measurement discontinuity, we discarded the records
where the monthly completeness is less than 75% per channel for each PA sensor (Zusman et al.,
2020). An additional 219,508 (7.05%) PM 2.5 readings were discarded in this step.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
18
Fourth, we evaluated the degree of agreement from dual-channel readings for each sensor
within a given month and calculated statistical anomaly detection indicators as the coefficient of
determination (R
2
), mean absolute error (MAE), and mean absolute percentage error (MAPE)
(Eq. 4 and Eq. 5). We set thresholds for R
2
, MAE, and MAPE between channels A and B within
a month to be 0.8, 4, and 0.3, respectively, based on visual inspection of these relationships. R
2
and MAE were used to test the correlation between channel A and channel B readings. However,
a few studies have reported that the low-cost sensor response begins to saturate at high particle
concentrations (>50 μg/m
3
), leading to higher measurement bias and uncertainty in this interval
(Kelly et al., 2017; Rai et al., 2017). Therefore, MAPE was adopted as an additional criterion to
handle the low-cost sensor's measurement limit at high particle concentrations. For PM 2.5
concentration >50 μg/m
3
, the condition to flag data from both channels A and B based on their
mismatch considers both MAE and MAPE simultaneously.
Based on these rules, we first discarded records R
2
< 0.8 between both channels. Then
MAE and MAPE were computed from dual-channel readings for the remaining records. In Eq. 4
and Eq. 5, 𝑷𝑴𝟐 . 𝟓 𝑨 denotes Channel A’s reading, 𝑷𝑴𝟐 . 𝟓 𝑩 denotes Channel B’s reading, and 𝒏
indicates the total number of observations for Channel A and Channel B. With MAE and MAPE
thresholds as 4 and 0.3, we then removed records with MAE and MAPE greater than both
thresholds simultaneously. 79,162 (2.54%) PM 2.5 records were discarded during this process.
𝑴𝑨𝑬 =
∑ |𝑷𝑴𝟐 .𝟓 𝑨 −𝑷𝑴𝟐 .𝟓 𝑩 |
𝒏 𝒊 =𝟏 𝒏 (4)
𝑴𝑨𝑷𝑬 =
𝟏 𝒏 ∑ |
𝑷𝑴𝟐 .𝟓 𝑨 −𝑷𝑴𝟐 .𝟓 𝑩 𝑷𝑴𝟐 .𝟓 𝑨 𝒏 𝒊 =𝟏 | (5)
Finally, in order to quality control remaining sensor readings with data available from
only one channel, we performed a linear regression of hourly readings for each sensor with its
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
19
neighboring PA sensors within 3 kilometers (L. J. Chen et al., 2018). We set the threshold of R
2
to be 0.6 - a more relaxed standard compared to sensors with dual channels (0.8) as PM 2.5
concentration can be affected by real spatial variations at this larger scale of 3 km. Even when
two PM 2.5 sensors are geographically closer than 3km, local sources and conditions in their
immediate vicinity can introduce variability in their readings. We then discarded 13,261 (0.43%)
single-channel readings with R
2
less than the 0.6 or had no neighboring PA sensors within three
kilometers.
Compared to the original data, the quality controlled data (N=2,701,571) had an excellent
dual-channel agreement with an R
2
of 0.98 and a slope of 1.00 (Figure 3). We used the average
of the dual-channel readings (or single-channel readings when only one) as the final PM
2.5
measurements in our model.
Figure 3 Scatter plots of Purple Air dual-channel hourly PM 2.5 measurements before (left) and
after (right) quality control.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
20
2.3 Independent variables
2.3.1 Meteorology and ambient, central site PM 2.5 data
Meteorology data were obtained from meteorological monitoring sites operated by the California
Air Resource Board (CARB) (https://www.arb.ca.gov/aqmis2/metselect.php). The
meteorological fields used in this analysis include temperature (°F), relative humidity (%),
resultant wind speed (mph), wind direction (°), and precipitation (mm). All meteorological
parameters were obtained at the hourly (average) level, consistent with PA PM 2.5 measurements.
We used regulatory PM 2.5 measurements as primarily temporal parameters in our models
with some degree of spatial information from 13 monitors available in our study region. Hourly
average PM 2.5 concentrations between January 2018 and December 2019 were drawn from the
EPA Air Quality System (AQS). We also inspected the AQS data for outliers and conducted
basic cleaning before using it (https://www.epa.gov/outdoor-air-quality-data).
2.3.2 Land use variables
Land use terms are used to capture potential PM 2.5 emission sources and sinks at a fine spatial
scale. The land use data were obtained from various sources. The percentage of developed
impervious surface and tree canopy was obtained from the 2016 National Land Cover Database
(NLCD) at a 30 m spatial resolution (http://www.mrlc.gov). Parcel-based land use data were
provided by the Southern California Association of Governments (SCAG) General Land Use
Plan 2016 (http://gisdata-scag.opendata.arcgis.com/search). We obtained primary PM 2.5 emitting
facilities as source points from the 2017 EPA National Emissions Inventory (NEI)
(https://www.epa.gov/air-emissions-inventories/2017-national-emissions-inventory-nei-data).
We calculated the number of PM
2.5
emission source points (e.g., industrial factory) and
warehouse and storage number, reflecting nearby freight activity density, within our 500m ×
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
21
500m grid cells. Greenspace was retrieved as the normalized differential vegetation index
(NDVI) at a 30m square resolution from Landsat 8 remote sensing satellite imagery
(https://earthexplorer.usgs.gov/). The Landsat satellite orbits the earth every sixteen days and the
quality of satellite imagery is greatly affected by clouds. We therefore used the monthly NDVI
value excluding days with massive cloud cover. We downloaded elevation data at a spatial
resolution of 10 m and distance to the nearest coastline data from the United States Geological
Survey (https://viewer.nationalmap.gov/basic/). Finally, we obtained population density data at
the census tract level from the U.S. Census Bureau 2014-2018 American Community Survey
(ACS) (https://www.census.gov/data.html).
2.3.3 Traffic variables
Traffic emissions are considered a major source of PM 2.5 (Habre et al., 2021; Hasheminassab et
al., 2014; Ito et al., 2016). Several traffic features were used as potential PM 2.5 predictors.
Distance to the nearest freeway and arterial road was computed from U.S. Census TIGER/Line
Shapefiles (https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-
file.html). We defined freeway to encompass both State highways (i.e., State Route) and federal
highways (i.e., Interstate highway and U.S. Route) in California since both types experience
large traffic volumes. Arterial road includes all roads except freeways and highways. Distance to
the nearest railroad, airport, and freight intermodal facility were extracted from the California
Department of Transportation (Caltrans) (https://gisdata-caltrans.opendata.arcgis.com/search).
The length of freeways, arterial roads, and railroads were separately summed over the 500m ×
500m grid cell. Freeway traffic volume was drawn from the Archived Data Management System
(ADMS) at the University of Southern California. ADMS gives directional volume for each
highway segment at 30-second intervals (https://adms.usc.edu/app). We aggregated traffic
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
22
volume data at the hourly level to match with the temporal resolution of PA PM 2.5
measurements. Finally, truck volume data was obtained from Caltrans based on the 2018 Annual
Average Daily Truck Traffic (AADTT) report (https://dot.ca.gov/programs/traffic-
operations/census).
2.3.4 Temporal and spatial trending variables
We included hour of the day and month variables to additionally account for temporal variation
at the diurnal and monthly to seasonal scales. We also included weekday dummy variable to
account for the impact of different travel behaviors between weekdays and weekend. Previous
studies have demonstrated that PM 2.5 concentrations significantly increase during holidays (e.g.,
Independent Day, Thanksgiving, Christmas, and New Year) (Gorin et al., 2006; Zhao et al.,
2018). Smoke caused by wildfire is a crucial ambient PM 2.5 source that contributes to the health
burden in many regions, especially in California, where wildfire frequency and intensity have
significantly increased over recent decades (Gupta et al., 2018; Reid et al., 2015). In our study
period, extremely high PM 2.5 measurements (>100 µg/m
3
) were observed on July 4 and 5
primarily due to fireworks and from October to December which is known as the wildfire season
in southern CA. Therefore, we included binary dummy variables for federal holidays and
wildfire days based on statistics from the California Department of Forestry and Fire Protection
to determine if any fires occurred in LA County or nearby counties
(https://www.fire.ca.gov/stats-events/). Finally, the latitude and longitude geographic coordinates
of the modeling grid centroids were included to account for spatial variation.
2.4 Data integration
A 500m × 500m hexagon grid network covering the study domain was used to integrate all
datasets mentioned above. PA PM 2.5 measurements were spatially matched to the grid and
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
23
averaged within each grid if multiple PA sensors locating in the same grid. The meteorological
parameters and regulatory PM 2.5 values were interpolated to the grid using the inverse distance
weighting (IDW) method. IDW method predicts the value of an unmeasured location deriving
from weighting the surrounding measured values with decay distance parameter (Jerrett et al.,
2013). Areas of six land-use type, including single-family residential land use, multifamily
residential land use, commercial land use, industrial land use, open space and water body (rivers
and lakes), and percentage of impervious surface and tree canopy were calculated for each grid
cell and within 1km, 5km and 10km radius buffer, respectively. Length of multiple roads
(freeway, arterial road, and railroad) and the grid centroid distance to nearest transportation
facility (freeway, arterial road, railroad, airport, and freight intermodal facility), population
density, elevation, and monthly NDVI was separately calculated for each grid. The number of
PM 2.5 emission source points and warehousing were assigned to each grid. General traffic
volume and truck traffic volume were also matched to the 500-m modeling grid. Finally, we
have 101 predictor variables. All data integration work was processed using ArcGIS (version
10.2, ESRI). We obtained 2,502,999 observations with complete data on all predictors after data
integration, including 17,515 unique hours for year 2018 and 2019. All days in our study period
(730 days) have at least one hourly measurement. In total, we have 304 unique hexagon grid
cells that have PA PM 2.5 measurement in the study region. All variables are listed in Table 1.
Table 1 Description of dependent and predictor variables (N=2,502,999).
Variable Unit
Tempor
al
resolutio
n
Spatial
resolutio
n
Buffer size Time
Dependent
Variable
PurpleAir PM 2.5
measurement
µg/m
3
Hourly
2018, 2019
Meteorolog
y variable
Temperature
Fahrenheit
(°F)
Hourly
2018, 2019
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
24
Relative
humidity
percentage
(%)
Hourly
2018, 2019
Wind speed
meter per
second
(m/s)
Hourly
2018, 2019
Wind direction degree (°) Hourly
2018, 2019
Precipitation
millimeter
(mm)
Hourly
2018, 2019
Backgroun
d PM 2.5
Regulatory PM 2.5
measurement
μg/m
3
Hourly
2018, 2019
Land use
variable
Single-family
residential land
area
square
kilometer
(km
2
)
Self, 1 km,
5 km, 10
km
2016
Multi-family
residential land
area
km
2
Self, 1 km,
5 km, 10
km
2016
Industrial land
area
km
2
Self, 1 km,
5 km, 10
km
2016
Commercial land
area
km
2
Self, 1 km,
5 km, 10
km
2016
Open space area km
2
Self, 1 km,
5 km, 10
km
2016
Water body area km
2
Self, 1 km,
5 km, 10
km
2016
Impervious
surface
%
Self, 1 km,
5 km, 10
km
2016
Tree canopy %
Self, 1 km,
5 km, 10
km
2016
PM 2.5 emission
source point
count
Self, 1 km,
5 km, 10
km
2017
Warehouse
number
count
2018
Normalized
differential
vegetation index
(NDVI)
Monthly 30 m
2018, 2019
Elevation meter (m)
10 m
2019
Distance to
coastline
kilometer
(km)
2019
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
25
Population
density
people/km
2
Census
tract
2014-2018
Traffic
variable
Freeway traffic
Volume/distance
to nearest
freeway
vehicle/k
m
Hourly
2018, 2019
Freeway truck
Volume/distance
to nearest
freeway
vehicle/k
m
Daily
2018
Distance to
nearest freeway
km
2018
Distance to
nearest arterial
road
km
2018
Distance to
nearest railroad
km
2018
Freeway length km
2018
Arterial road
length
km
2018
Railroad length km
2018
Distance to
nearest airport
km
2018
Distance to
nearest freight
intermodal
facility
km
2018
Temporal
dummy
variable
Monthly dummy 0/1
Hourly dummy 0/1
Weekday
dummy
0/1
Holiday dummy 0/1
Wildfire day
dummy
0/1
Spatial
variable
Latitude decimal
Longitude decimal
2.5 Model development
2.5.1 Spatial and temporal convolutional layers
Previous studies suggest a spatial and temporal autocorrelation exists among neighboring PM 2.5
measurements (Brokamp et al., 2018; Di et al., 2016; X. Hu et al., 2017). To control for spatial
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
26
autocorrelation, we included an additional convolutional layer for nearby PM 2.5 measurements.
We use the distance inversed weighted function following methods proposed by Di et al. (2016)
and Hu et al. (2017). Figure 4 shows the correlation of PA PM 2.5 measurements varying with a
series of distances. As shown in Figure 4, the correlation coefficient (R) value shows a steady
downward trend as the distance between two PA sensors increases. When the distance between
two PA sensors reaches 50km, the R-value has dropped to a deficient level (<0.4). Hence, we
adopted 50km as a distance threshold to account for spatial autocorrelation. For each modeling
grid cell, distance inversed weighted averages of nearby PM 2.5 measurements within 50km were
calculated for that grid cell. The distance inversed weighted function can be expressed as
𝒔 𝒊 =
∑ 𝒘 𝒊𝒌
𝒔 𝒌 𝒏 𝒊 =𝟏 ∑ 𝒘 𝒊𝒌
𝒏 𝒊 =𝟏 (6)
where 𝒔 𝒊 is the value of convolutional layer at grid cell 𝒊 , 𝒘 𝒊𝒌
is the value of the spatial
weight at grid cell 𝒊 , 𝒔 𝒌 is PM 2.5 concentration at grid cell 𝒌 ; 𝒘 𝒊𝒌
∝ 𝟏 /𝒅 𝒊𝒌
𝟐 ; 𝒅 𝒊𝒌
is the distance
between grid 𝒊 and grid 𝒌 within 50km.
The moving average of PM 2.5 measurements over the previous five hours of each grid and
its neighboring grids within 1km were included in the model to account for temporal
autocorrelation.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
27
Figure 4 The correlation of the PM2.5 measurements between PurpleAir sensors by varying
distance.
2.5.2 PM 2.5 prediction model
We developed a Random Forest (RF) regression model to estimate hourly PM 2.5 concentration
over LA County using the PA hourly PM 2.5 measurements as the dependent variable. In
summary, 101 predictor variables included meteorology, land use, traffic, topography, and other
related factors. As an ensemble learning method, RF model develops a number of decision trees,
learns from all decision trees, and combines predictions from a multitude of decision trees to
generate the best fitting model (Breiman, 2001). In recent years, the RF approach has been
increasingly applied to estimate air pollution since it can model the complex and nonlinear
relationships between observations and predictor variables and offer flexible and automated
procedures for predicting target variables.
The main parameters of RF are the number of trees and number of variables randomly
selected for each tree. An average prediction from each of the trees is estimated as the final
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
28
model. Prediction strength is measured by applying the model to data withheld from the given
sample. In the implementation of the RF algorithm, each tree is trained on about 2/3 of the total
training data. As the forest is built, each decision tree can thus be tested on the samples not used
in making or training that tree. The test result is referred to as out of bag (OOB) error –an
internal error estimate of a random forest as it is being constructed. The number of decision trees
in the forest (k) and the number of predictors randomly tried at each split (m) are two major
hyperparameters of RF. In this study, the number of trees (k) was set to be 200 and the candidate
feature attribute number (m) was determined to be 30 according to OOB error estimates.
2.5.3 Cross-validation of PM 2.5 prediction model considering the unbalanced nature of hourly
PM
2.5
Purple Air data
We conducted random 10-fold cross-validations (CV) to test our model’s prediction
performance. The entire training dataset was randomly split into ten sets, with each subset
containing approximately 10% of the training data. In each round of cross-validation, we used
nine subsets for model training and predicted the held-out test subset. The process was repeated
ten times until every subset was tested. However, the dependent variable has a highly skewed
distribution, with most readings being relatively low PM
2.5
. PM
2.5
concentration under 50 μg/m
3
accounts for 97.8% of total observations. To account for the dominance of low readings, we use
random oversampling to increase the number of high PM 2.5 observations in our training groups
(Batista et al., 2004; He & Garcia, 2009). The random oversampling replicates selected minority
class observations and adding them to the training group to adjust the class distribution balance
accordingly (He & Garcia, 2009). We performed the random oversampling on the training
dataset within each fold separately.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
29
We also conducted spatial and temporal 10-fold CV combined with random
oversampling to evaluate prediction performance of our RF model at different (or new) locations
and time periods than those used to train the model. This is a more stringent test of prediction
performance in space and time compared to random CV. We classified the modeling grids into
ten spatial clusters (based on grid cell’s geographic locations) using K-nearest neighbor (KNN)
algorithm with each group containing about 10% of the grid cells. We divided the hourly PM 2.5
measurements into ten groups for temporal cross-validation, each comprising approximately
10% of the data according to hour of the day. For each spatial or temporal CV fold, we trained
the model on nine groups and tested it on the remaining group, and iterated this process until
every group was predicted.
We calculated statistical indicators as CV coefficient of determination (R
2
) and root mean
squared error (RMSE) between CV predictions and measurements to assess the performance of
the proposed model.
2.6 Prediction of hourly PM 2.5 in Los Angeles County
Hourly PM 2.5 concentration surfaces were generated based on the model at each 500m × 500m
grid cell across the modeling domain (LA County, 2018-2019). To illustrate the fine spatial and
temporal variation in PM 2.5 captured by the model, we created prediction surfaces for four
different days chosen to represent a snapshot of scenarios known to affect PM 2.5 diurnal and/or
spatial patterns in this domain: weekday (Wednesday, September 18, 2019), weekend (Sunday,
September 22, 2019), day impacted by wildfires (Woolsey fire, November 11, 2018), and a
holiday (Independence Day, July 4, 2019). For each day, we plotted predicted PM 2.5 surfaces at 9
AM (morning traffic peak hour), 2 PM (mid-day), afternoon traffic peak hour (6 PM), and late-
night (11 PM).
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
30
3. Results
3.1 Descriptive characteristics
Descriptive statistics for Purple Air PM 2.5 concentrations and all predictors in their final form
used in model training at the hour-grid level over our modeling period are presented in Table 2.
The hourly mean PA PM 2.5 concentration was 13.93 μg/m
3
, while regulatory PM 2.5 has a higher
hourly mean PM 2.5 concentration of 63.69 μg/m
3
. PA PM 2.5 measurements also have a higher
maximum PM 2.5 concentration than regulatory monitor measurements (476.9 μg/m
3
vs. 358.99
μg/m
3
, respectively).
Table 2 Descriptive statistics of variables (N=2,502,999).
Variable Unit Mean Std. Dev. Min Max
PurpleAir PM 2.5 measurement µg/m
3
14.48 13.83 0.00 476.90
Regulatory PM 2.5 measurement µg/m
3
63.91 88.10 -5.00 358.99
Spatial lag term µg/m
3
14.60 12.12 0.02 287.82
Temporal lag term µg/m
3
14.37 12.47 0.00 307.51
Temperature °F 64.02 10.33 10.14 114.55
Relative humidity % 62.38 22.04 0.00 100.08
Wind speed m/s 4.77 2.58 1.10 41.05
Wind direction degree 183.29 68.45 0.00 359.43
Precipitation mm 0.06 0.51 0.00 106.80
Freeway traffic Volume/distance to
nearest freeway
vehicle/
km
191.59 439.38 0.00 4660.68
Freeway truck Volume/distance to
nearest freeway
vehicle/
km
1786.28 4392.58 0.24
22655.7
9
Distance to nearest freeway km 1.10 0.94 0.00 6.01
Distance to nearest arterial road km 0.03 0.04 0.00 0.43
Distance to nearest railroad km 0.38 0.73 0.00 3.22
Freeway length km 6.66 2.27 0.03 14.24
Arterial road length km 4.38 4.74 0.00 29.06
Railroad length km 0.12 0.35 0.00 3.32
Distance to nearest airport km 8.55 5.71 0.12 31.82
Distance to nearest freight intermodal
facility
km 15.09 11.71 0.53 60.62
Single-family residential land area km
2
0.19 0.15 0.00 0.51
Multi-family residential land area km
2
0.06 0.08 0.00 0.40
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
31
Industrial land area km
2
0.02 0.07 0.00 0.44
Commercial land area km
2
0.02 0.04 0.00 0.36
Open space area km
2
0.03 0.06 0.00 0.43
Water body area km
2
0.01 0.04 0.00 0.48
Impervious surface % 51.99 21.00 0.17 92.33
Tree canopy % 4.48 5.23 0.00 26.79
Single-family residential land area-1
km
km
2
3.35 1.90 0.00 8.68
Multi-family residential land area-1 km km
2
0.97 0.85 0.00 4.84
Industrial land area -1 km km
2
0.55 0.93 0.00 6.26
Commercial land area-1 km km
2
0.44 0.36 0.00 2.23
Open space area-1 km km
2
0.89 1.17 0.00 7.49
Water body area-1 km km
2
0.21 0.52 0.00 4.80
Impervious surface -1 km % 49.71 19.40 0.59 86.84
Tree canopy -1 km % 4.12 3.82 0.00 23.49
Single-family residential land area-5
km
km
2
25.70 10.58 0.20 57.27
Multi-family residential land area-5 km km
2
8.28 5.65 0.00 29.23
Industrial land area -5 km km
2
5.10 5.76 0.00 27.38
Commercial land area-5 km km
2
3.69 1.99 0.04 10.83
Open space area-5 km km
2
10.41 11.22 0.16 66.82
Water body area-5 km km
2
2.04 3.47 0.00 21.96
Impervious surface -5 km % 47.45 18.52 0.70 78.41
Tree canopy -5 km % 4.31 3.36 0.00 15.08
Single-family residential land area-10
km
km
2
93.22 28.87 3.63 167.36
Multi-family residential land area-10
km
km
2
29.38 18.30 0.14 83.12
Industrial land area -10 km km
2
23.20 17.63 0.00 64.50
Commercial land area-10 km km
2
13.40 5.90 0.46 28.62
Open space area-10 km km
2
47.60 44.49 5.23 229.54
Water body area-10 km km
2
6.44 8.40 0.00 40.51
Impervious surface -10 km % 46.34 17.70 0.96 71.32
Tree canopy -10 km % 4.51 3.21 0.01 14.96
NDVI 0.13 0.06 -0.03 0.32
Warehouse number count 0.26 1.15 0.00 17.00
PM 2.5 emission source point count 4.44 24.65 0.00 264.00
Population density
people/
km
2
3646.40 2896.44 0.00
15024.9
7
Distance to coastline km 21.15 16.12 0.00 80.61
Elevation m 151.05 177.33 0.73 1133.75
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
32
Wildfire 0/1 0.19 0.39 0.00 1.00
Holiday 0/1 0.03 0.16 0.00 1.00
Weekday 0/1 0.71 0.45 0.00 1.00
Latitude decimal 34.04 0.16 33.75 34.58
Longitude decimal -118.25 0.20
-
118.89
-117.71
3.2 Model performance results
Figure 5 presents the performance results of our hourly RF model, a baseline training R
2
=0.96
and RMSE=2.71 µg/m
3
,
and random CV R
2
=0.93 and RMSE=3.23 µg/m
3
, implying good
agreement between measured and predicted hourly PM 2.5 in LA County. The spatial CV results
show slightly lower R
2
=0.88 and higher RMSE=4.33 µg/m
3
, suggesting the model maintains
relatively high accuracy in predicting hourly PM 2.5 concentrations at unmonitored locations
(Table 3). Our model also performs well when predicting time periods outside of the training
range (temporal CV R
2
= 0.90 and RMSE= 3.85 µg/m
3
). The CV RMSE of our hourly model
shows a reasonable accuracy in PM 2.5 prediction in comparison with existing literature where CV
RMSE ranged from 2.83 µg/m
3
to 5.62 µg/m
3
in daily models (Bi, Stowell, et al., 2020; Bi,
Wildani, et al., 2020; Di et al., 2016; X. Hu et al., 2017). Nevertheless, Figure 5 shows that the
hourly model has good prediction performance but tends to underestimate PM
2.5
concentrations
at high concentrations.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
33
Figure 5 Hourly random forest model performance results a. base model training; b. random CV
results.
Table 3 Hourly random forest model results (N=2,502,999).
R
2
RMSE (μg/m
3
) slope
Baseline training 0.96 2.71 1.03
Random CV 0.93 3.23 1.01
Spatial CV 0.88 4.33 1.01
Temporal CV 0.90 3.85 1.01
Previous studies have argued that PA sensors may overestimate the PM mass with high
relative humidity (Tryner et al., 2020). Figure 6 shows the hourly model’s performance under
conditions with different conditions. It shows that the model performs quite well under moderate
to high humidity conditions. Compared with the case of moderate and high humidity, the model
performs relatively less accurate at PM 2.5 prediction under lower humidity (<47%), but still
maintains a high level (R
2
=0.91, RMSE=3.47 µg/m
3
).
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
34
Figure 6 Hourly random forest model performance results based on relative humidity quartiles A.
humidity Q1 (<47.09%); B. humidity Q2 (47.08-66.14%); C. humidity Q3 (66.14-80.27%); D.
humidity Q4 (>80.27%).
Figure 7 presents the relative importance of the top 20 predictors in the RF model ranked
in order of importance based on out-of-bag (OOB) error. Results show that the 6-hour temporal
lag and the spatial convolutional layer are the two most important predictors in the model,
indicating a large portion of variability in hourly PM 2.5 concentrations can be explained by
spatial and temporal autocorrelation. Controlling for PM 2.5 spatial and temporal autocorrelation
effects helps us to track variations in other independent variables. The high rank of
spatiotemporally varying variables, such as meteorological conditions (temperature, relative
humidity, and wind speed) and traffic-related variables (traffic volume/distance to the nearest
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
35
road), reflect the importance of hourly-varying and spatially resolved predictors. Temporally
invariable (or spatially static) variables like elevation, land use, and traffic proximity indicators
(distance to the nearest arterial roads) were also among the top 20 predictors of hourly PM 2.5.
Figure 7 Variable importance of the top 20 predictors in the Random Forest model.
3.3 Predicted PM 2.5 surfaces
3.3.1 Predicting PM 2.5 on weekday and weekend days
Figures 8 and 9 show the spatial distributions of hourly mean predicted PM 2.5 concentrations
with accompanying wind rose maps for the hours of 9 AM, 2 PM, 6 PM, and 11 PM for each of
the selected snapshot days. As shown in Figure 5A-D for Wednesday, September 18. 2019,
predicted PM 2.5 shows heterogeneous spatial distribution patterns across the day. At 9 AM, the
hourly mean PM 2.5 concentration was 9.40 μg/m
3
. Grid cells concentrated in densely populated
areas (e.g., downtown LA and South LA) and along major freeways (e.g., I-10, U.S. 101, and I-
5) had higher PM 2.5 concentrations likely related to increased traffic emissions at morning peak
hour. The impact of wind speed and direction on predicted PM 2.5 concentrations is also
illustrated in Figure 5A with the highest concentrations in northern and northeastern areas and
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
36
prevailing winds from the south and southwest. Figure 8C shows the distribution of PM 2.5 at 6
PM on a typical weekday. The average PM 2.5 concentration was 5.1 μg/m
3
, lower than morning
peak hour (9 AM) and mid-day (2 PM) levels. The mean hourly PM 2.5 concentrations at 11 PM
was 5.4 μg/m
3
. As seen in Figure 8D, the prevailing southeast wind may bring pollution from
south and east to northwestern areas.
Figure 8E-H show maps of predicted PM 2.5 concentrations on a typical weekend (Sunday,
September 22, 2019). The average hourly PM 2.5 concentrations at 9 AM, 2 PM, and 6 PM were
5.6, 3.4, and 4.1 μg/m
3
, respectively, which were 3.8 μg/m
3
(40%), 3.6 μg/m
3
(51%), and 1.0
μg/m
3
(19%) lower in comparison to the same hours on the corresponding weekday from the
same week (Figure 8A-D), with the exception of 11 PM. Average PM
2.5
concentration was 7.2
μg/m
3
, or 1.8 μg/m
3
(34%) higher than 11 PM on Wednesday, September 18, 2019, and more
concentrated in urban areas with high population density, presumably related to increased traffic
and recreational activities on weekends compared to weekdays.
Changes in traffic emissions (timing of peaks and overall levels) are thought to largely
contribute to the spatial and temporal differences between PM 2.5 concentrations on weekdays
compared to weekends (as shown in Figure 10 with weekday and weekend). A more dispersed
PM 2.5 distribution (Figure 8A) is observed at morning peak hour on weekday, especially along
the direction of freeway, likely due to the massive commuting traffic from all directions on
weekday peak hour. At the same time, weekend shows a more concentrated PM 2.5 distribution in
the urban areas (Figure 8E). The traffic is assumed to be the factor causing the different PM 2.5
distribution patterns on peak hours between weekdays and weekends.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
37
Figure 8 Hourly mean PM 2.5 concentrations in Los Angeles County and corresponding wind rose
maps at four hours: 9 AM, 2 PM, 6 PM, and 11 PM on a typical weekday (Wednesday,
September 18, 2019) (A-D) and weekend (Sunday, September 22, 2019) (E-H).
3.3.2 Predicting PM 2.5 on a day impacted by wildfire smoke
Figure 9A-D show the spatiotemporal variation in predicted PM 2.5 concentrations on a single day
(Sunday, November 11, 2018) during the Woolsey Fire which started on November 8, 2018 and
affected air quality in LA and Ventura counties. The fire was finally contained on November 21,
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
38
2018, and resulted in three fatalities, more than 1,600 destroyed structures and 96,949 acres
burned.
During the Woolsey fire, elevated PM 2.5 concentrations were predicted at all four hours
(67.1, 35.9, 25.6, and 20.1 μg/m
3
at 9 AM, 2 PM, 6 PM, and 11 PM, respectively) throughout the
study area and declined in later hours of the day (Figure 9A-D). The highest PM 2.5 levels
occurred on the boundary of Los Angeles County and Ventura County, where the Woolsey Fire
started. Elevated PM 2.5 levels were also observed in the San Gabriel Mountains. As shown in the
wind rose plots, strong and arid Santa Ana winds from the northeast also prevailed and likely
fueled the wildfires.
3.3.3 Predicting PM 2.5 on a holiday day
Figure 9E-H show the spatial distribution of hourly PM 2.5 predictions at four hours on
Independence Day (4
th
of July) in 2019. The average hourly PM 2.5 concentrations at 9 AM, 2
PM, 6 PM, and 11 PM were 15.0, 15.6, 18.0, and 45.0 μg/m
3
, respectively. Traffic volumes were
lower than typical weekdays and weekends on July 4, 2019 (Figure 10), indicating traffic is not
likely to be the major driver behind the relatively high PM 2.5 concentrations. The highest
concentrations occurred at 11 PM in densely populated urban areas, hypothesized to be due to
recreational fires and fireworks for celebrating Independence Day at night. Winds from the west
and southwest likely also intensified the transport and buildup of PM 2.5 towards east inland areas.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
39
Figure 9 Hourly mean PM 2.5 concentrations in Los Angeles County and corresponding wind rose
maps at four hours: 9 AM, 2 PM, 6 PM, and 11 PM on a typical day with wildfire (Sunday,
November 11, 2018) (A-D) and holiday day (Thursday, July 4, 2019) (E-H).
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
40
Figure 10 Average traffic volumes by hour on different day type.
4. Discussion
We built a random forest model to predict hourly PM 2.5 concentrations within 500m × 500m
grids using the low-cost PurpleAir PM 2.5 monitoring network in Los Angeles County for 2018
and 2019, including more than 2.5 million observations. Despite the significant importance of
PM 2.5 regulatory monitoring networks at ensuring compliance with air quality standards that
protect public health, leading to important reductions in air pollution over time especially in
southern California (Lurmann et al., 2015), the sparse and uneven distribution of regulatory
monitors has limited ability in capturing intra-urban air pollution contrasts that are heavily linked
to environmental justice issues. The rise of low-cost air quality sensor networks has increased
our ability to capture PM 2.5 gradients at the neighborhood-scale and collect more spatially and
temporally resolved PM 2.5 data for epidemiological and environmental justice studies.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
41
In contrast to previous studies, our PM 2.5 predictive model was developed at the hourly
level. The majority of earlier PM 2.5 models were annual, seasonal, monthly, or daily (Bi, Stowell,
et al., 2020; Brokamp et al., 2018; G. Chen et al., 2018; Cohen et al., 2017; Huang et al., 2018,
2019). Only a few studies constructed air pollutant models at hourly level using land-use
regression (LUR) model with divergent accuracy levels (R
2
ranges from 0.07 to 0.80) and more
focused on traffic-related pollutants and variation (e.g., black carbon, NO 2, PM 2.5) (Dons et al.,
2013; Masiol et al., 2018; Weissert et al., 2019). Our hourly PM 2.5 predictive model achieved
high prediction accuracy, both in terms of spatial and temporal cross-validation (CV R
2
=0.93,
spatial CV R
2
=0.88, and temporal CV R
2
=0.90), which is on par with or exceeds performances
of published random forest based models, especially for the southern California region (Bi,
Stowell, et al., 2020; X. Hu et al., 2017; Huang et al., 2018, 2019). The hourly-resolved
spatiotemporal PM 2.5 model results can be useful for epidemiological studies to predict PM 2.5
exposure at the neighborhood scale for individual hours, especially for exploring the association
between PM 2.5 exposure and acute health effects or disease, which has been limited by the lack of
fine resolution data.
In addition, our model was able to predict hourly PM
2.5
for regularly occurring, typical
air quality conditions but also less frequent and perhaps more extreme conditions like holidays
and wildfire days – an increasingly frequent and public health relevant issue in southern CA.
Future developments in hourly models can further improve air quality prediction for influential
events (e.g., wildfire) and provide citizens guidance on outdoor physical activities if applied in
real-time prediction mode.
Our model integrated a large number of variables previously shown to be informative to
PM 2.5 modeling in previous studies, with the exception of satellite remote sensing aerosol optical
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
42
depth (AOD) as its current temporal availability (2 snapshots per day) is still limited for sub-
daily applications. However, newer geostationary satellites are expected to increase the
availability of hourly AOD data for modeling applications in the near future. Since air pollution
emission, dispersion, and transformation in the atmosphere are a result of many complex
mechanisms and interrelated processes, machine learning methods are increasingly useful for
modeling these nonlinear relationships and perform better than ordinary linear models (e.g., LUR
model). Similar to recently published spatiotemporal PM 2.5 machine-learning models (Brokamp
et al., 2018; Di et al., 2016; X. Hu et al., 2017), the inclusion of both spatial and temporal
convolution layers of PM 2.5 to capture correlations between nearby PM 2.5 measurements
significantly improved cross-validated predictions. These autocorrelation terms ranked as the
highest in terms of feature importance for predicting PM 2.5 concentrations. In future studies, the
convolution layers can also be expanded to include other features, like meteorological
conditions.
Finally, field calibration of low-cost sensors based on collocation against reference-grade
monitors is highly desirable and has been reported in previous studies (Carvlin et al., 2017; Jiao
et al., 2016; Liu et al., 2019). However, there are no PA sensors collocated with reference-grade
monitors in our study area (distance <10m) and no established universal approach to calibrate
data from PA sensors (Rai et al., 2017; Williams et al., 2019). Therefore, we conducted a
stringent quality control process to filter erroneous readings and identify anomalies in low-cost
PM 2.5 measurements in lieu of calibration, and incorporated reference PM 2.5 data and
meteorological conditions as predictors instead. This rigorous quality control process enables us
to minimize erroneous records when developing our hourly PM 2.5 concentration model.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
43
One potential limitation is that our prediction model shows a pattern of underestimation
in predicting relatively high PM 2.5 concentrations. This underestimation is likely due to skewness
of hourly PM 2.5 data compared to longer averaging times (daily or annually), where high
concentrations appear less frequently than less extreme concentrations. In this study, we utilized
random oversampling in the cross-validation to increase the representation of high PM 2.5
observations in the training set and provide a more balanced distribution. Studies have shown
that a balanced dataset offers improved overall performance compared to an imbalanced dataset
for several machine learning algorithms (Estabrooks et al., 2004; Laurikkala, 2001). However,
since random oversampling simply appends replicated data to the original data set, observations
in certain classes may be overfitted by the model (He & Garcia, 2009). Other advanced sampling
methods in imbalanced learning applications may be used in further studies such as Synthetic
Minority Over-sampling Technique (SMOTE) and Adaptive Synthetic Sampling Approach for
Imbalanced Learning (ADASYN). Both oversampling methods synthetically generate new data
points for the minority classes, therefore avoiding losing the data variance caused by artificial
duplication of existing data points (Chawla et al., 2002; He et al., 2008).
5. Conclusion
In this study, we developed a machine learning model integrating spatially dense measurements
from a low-cost air sensor network, highly resolved traffic data as well as a suite of
spatiotemporal variables to estimate hourly intra-urban PM 2.5 distribution patterns in Los
Angeles. Although issues related to model performance and prediction errors are continuously
evolving and need to be better understood, the prevalence of low-cost air sensor has the potential
to strongly empower citizen science, increase monitoring coverage in urban, suburban and rural
areas, fill knowledge gaps about hyperlocal air quality, and better understand the impact of
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
44
extreme events and conditions on air quality. Ultimately, these sub-daily models will provide
valuable exposure estimation data to look at the impact of PM 2.5 on acute health and related
environmental justice issues.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
45
CHAPTER 3: Beyond Air Pollution at Home: Assessment of Personal Exposure to PM 2.5 Using
Activity-Based Travel Demand Model and Low-Cost Air Sensor Network Data
2
Abstract
Assessing personal exposure to air pollution is challenging due to the limited availability of
human movement data and the complexity of modeling air pollution at high spatiotemporal
resolution. Most health studies rely on residential estimates of outdoor air pollution instead
which introduces exposure measurement error. Personal exposure for 100,784 individuals in Los
Angeles County was estimated by integrating human movement data simulated from the
Southern California Association of Governments activity-based travel demand model with
hourly PM 2.5 predictions from a 500m gridded model incorporating low-cost sensor monitoring
data. Individual exposures were assigned considering PM 2.5 levels at homes, workplaces, and
other activity locations. These dynamic exposures were compared to the residence-based
exposures, which do not consider human movement, to examine the degree of exposure
measurement error. The results suggest that exposures were underestimated by 13% (range 5-
22%) on average when human movement was not considered, and much of the error was
eliminated by accounting for work location. Exposure measurement error increased for people
who exhibited higher mobility levels, especially for workers with long commute distances.
Overall, the personal exposures of workers were underestimated by 22% (5- 61%) relative to
their residence-based exposures. For workers who commute greater than 20 miles, their exposure
levels can be at most underestimated by 61%. Omitting mobility resulted in underestimating
exposures for people who reside in areas with cleaner air but work in more polluted areas.
2
Lu, Y. (2021). Beyond air pollution at home: Assessment of personal exposure to PM2. 5 using activity-based
travel demand model and low-cost air sensor network data. Environmental Research, 201, 111549.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
46
Similarly, exposures were overestimated for people living in areas with poorer air quality and
working in cleaner areas. These could lead to differential estimation biases across racial, ethnic
and socioeconomic lines that typically correlate with where people live and work and lead to
important exposure and health disparities. This study demonstrates that ignoring human
movement and spatiotemporal variability of air pollution could lead to differential exposure
misclassification potentially biasing health risk assessments. These improved dynamic
approaches can help planners and policymakers identify disadvantaged populations for which
exposures are typically misrepresented and might lead to targeted policy and planning
implications.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
47
1. Introduction
Exposure to air pollution is known to cause many adverse health outcomes, such as
cardiovascular disease (Madrigano et al., 2013; Neophytou et al., 2014; Zhang et al., 2018),
respiratory morbidity (Bose et al., 2015), and diabetes (Yang et al., 2018). Epidemiological
studies have shown that exposure to air pollution is the second leading cause of non-
communicable disease worldwide (Neira et al., 2018) . Among all air pollutants, particulate
matter with a diameter of less than 2.5 μm (i.e., PM 2.5) is known to be the most detrimental to
human health (Park & Kwan, 2020; Song et al., 2019). According to World Health Organization
(WHO) guidelines, long-term exposure to PM 2.5 concentrations exceeding 50 μg/m
3
is associated
with increased mortality caused by lung cancer (World Health Organization, 2006). Many
studies conclude that short-term exposure to PM 2.5 can increase PM-related mortality, asthma
exacerbations, heart attacks, strokes, premature death, emergency department, and hospital
admissions (Shi et al., 2016; Song et al., 2019; Zanobetti et al., 2014). Therefore, an accurate
estimation of individual exposure to PM 2.5 is needed to better understand the adverse impacts of
PM 2.5 on health effects.
Individual exposure to air pollution occurs through dynamic spatiotemporal interactions
between individuals and air pollutant distribution (Beckx et al., 2009; Dewulf et al., 2016; Dias
& Tchepel, 2018; Park & Kwan, 2017; Steinle et al., 2013). However, this dynamic characteristic
presents two significant challenges for accurate assessments of exposure to air pollution and its
health risks: estimation of air pollution distribution in high spatiotemporal resolution and the
limited availability of human movement data. In many prior air pollution health studies, models
that use the station or monitor-based records are commonly utilized to estimate the air pollution
distribution (Dons et al., 2014; Habermann et al., 2015; Novotny et al., 2011; Su et al., 2008;
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
48
Weichenthal et al., 2016). Nevertheless, as regulatory monitoring is designed to support
compliance with ambient air quality standards mainly at the city or regional levels, it lacks
spatial coverage to capture the spatial variability of air pollutant concentrations (Bi, Wildani, et
al., 2020; Rowangould, 2013). The recent rise in low-cost sensor technology has created new
opportunities to measure air quality with higher spatiotemporal coverage (Fishbain et al., 2017;
Jerrett et al., 2017; Kumar et al., 2015; Williams et al., 2019). Several recent studies have
integrated data from a variety of recently emerging low-cost sensors to estimate ambient PM 2.5
concentrations at fine spatiotemporal resolution (Bi, Stowell, et al., 2020; Bi, Wildani, et al.,
2020; J. Li et al., 2020; Lu et al., 2021). Compared to conventional regulatory air monitors, the
dense low-cost air sensors improve the accuracy of ground-level PM 2.5 estimations.
Estimating population exposure to ambient PM 2.5 is challenging. In most existing
research, population exposure estimates have ignored spatially and temporally varying
populations present in the study area. The study object – the person -- is assumed to be static, and
the population data are based on place of residence from the US Census. But people move
throughout the day to conduct activities such as work, school or household maintenance (Bae et
al., 2007; Jerrett et al., 2014; Rowangould, 2013; Zhang et al., 2018). Each person’s exposure
level depends on air quality at all locations visited over the course of the day. When these
locations are not taken into account, exposure estimates will be biased to the extent that air
quality varies from one place to another. (Cole-Hunter et al., 2018; Dias & Tchepel, 2018;
Kwan, 2013; M. Li et al., 2019; M. M. Nyhan et al., 2019; Park & Kwan, 2017, 2018, 2020; Yoo
et al., 2015). Many studies find that exposure is underestimated in suburban and rural areas,
while exposure within the urban core may be overestimated when population mobility is not
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
49
considered (Dewulf et al., 2016; Picornell et al., 2019; Tayarani & Rowangould, 2020; X. Yu et
al., 2020).
More recent studies have recognized the importance of human mobility and characterized
human movement patterns using travel diaries or questionnaires (e.g., Kim & Kwan, 2021b;
Park, 2020a; Park & Kwan, 2017a), GPS-based wearable devices (e.g., De Nazelle et al., 2013;
Do et al., 2021; J. Ma et al., 2020; Nieuwenhuijsen et al., 2015; Wang et al., 2018; X. Yu et al.,
2019), and mobile phone or social media data (e.g., Dewulf et al., 2016; Guo et al., 2020; M.
Nyhan et al., 2016; M. M. Nyhan, Britter, et al., 2019; Picornell et al., 2019; Song et al., 2019; H.
Yu et al., 2018; X. Yu, Ivey, Huang, Gurram, & Sivaraman, 2020). However, several limitations
remain for these approaches. As for the travel diary and questionnaire methods, complication or
bias can be introduced by posing confusing questions or questions being framed in a way leading
participants to answer in a particular direction. Bias is also a problem when people need
assistance to complete the forms (Crosbie, 2006; Freeman et al., 1999; Freeman & De Tejada,
2002; Steinle et al., 2013). In terms of GPS-based wearable device methods, the cost of personal
monitoring and sampling processes limits the number of samples, thereby reducing the reliabiliry
of results (Apte et al., 2017; De Nazelle et al., 2013). Regarding mobile phone and social media
data-based methods, representativeness is still problematic as the data collected only reflects the
individual movement characteristics of people who used the application at a specific time.
Possible bias in the mobility patterns can occur for people who are less likely to carry mobile
phones or use social media (e.g., children and elderly).
To date, the integration of air pollution concentrations at high spatiotemporal resolution
and human movement data for personal exposure assessments is limited in the current literature.
To fill the gap, I assess personal exposure by coupling hourly PM 2.5 concentrations at 500 meter
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
50
resolution with human movement data in Los Angeles County. A random forest model was
developed to estimate hourly averaged ground-level PM 2.5 concentrations incorporating factors
affecting PM 2.5, including meteorological, traffic, topographical, and land-use variables (Lu et
al., 2021). Human movement data was obtained from an activity-based travel demand model
developed by the Southern California Association of Governments (SCAG). This exposure
assessment examines how ignoring individual mobility and PM 2.5 variations can produce
estimation bias in PM 2.5 exposure estimation and how this exposure measurement error differs
between individuals.
2. Data and Method
2.1 Study area
I evaluated exposure measurement error for the population residing in Los Angeles County in
California (Figure 11). Los Angeles County was selected for the case study because of its severe
PM pollution and diverse demography. Los Angeles County is a large and sprawling region
located on the west coast, covering 4,753 square miles. In 2019, more than 10 million people
resided in Los Angeles County, making it the second most populated metropolitan area in the
United States. Los Angeles is well known for its unsatisfactory air quality records, and this area
is one of the metropolitan areas with the most severe PM pollution in the United States
(American Lung Association, 2020). Several studies concluded that PM pollution is one of the
most concerning public health hazards in Los Angeles (Habre et al., 2021; Hasheminassab et al.,
2014; Lurmann et al., 2015).
The area is also attractive from an exposure inequality perspective because of its
sociodemographic diversity. Los Angeles County has a large population of ethnic minorities.
According to the U.S. Census, 26.1% of the county population in 2019 is non-Hispanic Whites,
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
51
48.6% is Hispanics, 9.0% is African Americans, and 15.4% is Asians. The significant
demographic diversity and possible complexity of interactions between air pollution and human
mobility in Los Angeles County make it an appropriate case study area.
Figure 11 The study area of Los Angeles County, showing daily PM 2.5 concentration distribution
on a typical weekday in 2019.
2.2 Data
2.2.1 Activity-based travel demand modeling
The daily activity and travel patterns of individuals were modeled using the SCAG Activity-
Based Model (ABM). SimAGENT is the base framework of the SCAG ABM, which comprises
three core modules to simulate individual mobility and synthetic demographic features (Bhat et
al., 2013; Pendyala et al., 2012; Ziemke et al., 2015). The first is the PopGen module, a synthetic
population generator capable of synthesizing a population while simultaneously controlling for
known distributions of household and person-level attributes. The second is the CEMSELTS
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
52
module, which is capable of modeling the medium- and long-term socioeconomic choices of
individuals. The third is the CEMDAP module, a daily activity and travel scheduling module that
can simulate individual behaviors and produce the complete daily activity-travel patterns of each
individual of the synthetic population and outline the sequence that a person undertakes during a
day.
ABM is an agent-based model of human activity (including travel) that can provide high-
resolution information both in time and space and hence show promise for predicting
disaggregate exposures under different urban transportation and land-use policy scenarios.
Several studies from the Netherlands (Beckx et al., 2009), United States (Gurram et al., 2019;
Tayarani & Rowangould, 2020), and Canada (Hatzopoulou & Miller, 2010), provide the seminal
work using ABM to estimate exposure to air pollution. One of the advantages of using ABM
data is that researchers can generate a large dataset according to their needs compared to travel
diary data and GPS trajectory data. In contrast to mobile phone and social media data, ABM data
that simulates trip maker’s demographic and socioeconomic characteristics can provide an
opportunity for exploratory analysis. Detailed analytical procedures can be developed and
refined for implementation when real data are collected and become available.
The SCAG ABM provides us with the estimated daily travel activity pattern of each
person residing in six counties of Southern California (Imperial, Los Angeles, Orange, Riverside,
San Bernardino, and Ventura) for a typical weekday in 2019. The model output included more
than 19.5 million trip records from about 5 million Los Angeles County residents. The data
contained information about each trip's origin and destination, purpose, departure time, arrival
time, and travel mode (Table 4), along with demographic and socioeconomic information about
the trip maker and their household. The origin and destination of trips generated from the model
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
53
are assigned to 5,769 traffic analysis zones (TAZs) in the study region, similar in size and
geography to the census block group. The TAZs are used to identify the location of households
and activities. The SCAG ABM data validated against data from the American Community
Survey (ACS) 2003, Census 2000, and a combination of the survey data and traffic observations
(Bhat et al., 2013; Pendyala et al., 2012). The validation results indicate that the model predicts
individual’s “activity purpose-number” and replicates the characteristics of the population quite
well. A comparison of modeled trip flows for various purposes with survey data shows less than
a 5% difference for most trip flows (Bhat et al., 2013). Most synthetic population’s demographic
and socioeconomic characteristics deviated less than 5% from the ACS and Census survey data.
Although SCAG ABM data provides detailed spatiotemporal information on the fixed-
location activities of individuals, it does not model their travel routes/paths between different
activity locations. Therefore, we utilized the OSMnx python package and assumed that each
individual takes the shortest path distance to estimate the probable travel paths between various
activity locations (Boeing, 2017). OSMnx is a free, open-source python package that downloads
political/administrative boundary geometries, building footprints, and street networks from
OpenStreetMap. It allows researchers to develop network analysis and transportation routing
with Python codes. The OSMnx package can generate realistic travel routes because its routing
algorithm reflects different roadways from OpenStreetMap and the speed limits of varying road
segments. In addition, the OSMnx package can retrieve routable networks from OpenStreetMap
with different transport modes based on simulated travel modes for each trip (e.g., driving,
walking, cycling, and public transit) from SCAG ABM data. Travel time is also considered in the
route estimation.
Table 4 Example of an individual’s trajectory data from SCAG ABM.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
54
Person ID Trip origin* Trip destination* Trip purpose Depart time Arrival Time
1 5910 6226 Work 7:50 8:10
1 6226 6141 Eat out 12:15 12:21
1 6141 6226 Work 12:40 12:45
1 6226 6020 Shop 17:50 18:01
1 6020 5910 Home 18:30 18:40
* Four-digit Traffic Zone (TAZ) number
A clustered random sampling of the population was used in this study. Using a sample
from the full model output decreases the computational burden of assigning and retaining
information from all 19.5 million trips in the study region. Using clustered sampling (i.e., at TAZ
level) ensures that the selected sample is representative at each TAZ. Finally, 387,398 daily
movement trajectories from 100,784 individuals were used in the exposure assessments. The
difference between the characteristics of individuals and their trip attributes in the sample and
complete synthetic population is not statistically significant (Table 5).
Table 5 Descriptive statistics of complete SCAG ABM data and sample data.
Complete Data Sample Data
Race/Ethnicity
Non-Hispanic White 25.31% 25.05%
Hispanic 49.94% 50.07%
African American 7.63% 7.71%
Asian 14.34% 14.37%
Age 34.07 33.99
Gender (% of Male) 49.68% 49.65%
Household Size
1-2 24.70% 24.70%
3-4 37.75% 37.49%
5 and more 37.55% 37.81%
Household Income
Low income (< $46,429) 34.74% 34.95%
Middle income ($46,429-$92,358) 30.55% 30.58%
High income (> $92,358) 34.72% 34.47%
Occupation
Worker (full time) 48.04% 47.93%
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
55
Education attainment
With Bachelor or higher degree 42.18% 42.42%
Average Trip Distance (Miles) 7.74 7.72
Trip Purpose
Home 32.19% 32.19%
Work 20.47% 20.55%
Educational 3.13% 3.10%
other 44.21% 44.16%
2.2.2 Air quality modeling
The ground-level PM
2.5
concentrations used in this study were estimated using my recently
developed hourly 500× 500 m gridded PM 2.5 model that integrates PurpleAir low-cost air sensor
network data and Random Forest algorithm for Los Angeles County (Lu et al., 2021). Hourly
PM 2.5 data were collected from the low-cost PurpleAir sensor network that provides real-time
measurements of PM 2.5 concentrations. Each Purple Air sensor consists of two Plantower PMS
5003 laser sensors. Taking advantage of low costs, PurpleAir sensors have a relatively dense
deployment and are capable of capturing more localized gradients at high spatial resolution
compared to the standard federal reference (FRM) or federal equivalent (FEM) monitors. Several
recent studies have utilized PurpleAir sensors to estimate ambient PM 2.5 concentrations at fine
spatiotemporal resolution (Bi, Wildani, et al., 2020; Lu et al., 2021; Mousavi & Wu, 2021). Prior
studies have shown that low-cost PM 2.5 sensors are able to predict PM 2.5 concentrations with
reasonable accuracy (Jiao et al., 2016; Kelly et al., 2017; Sayahi et al., 2019).
An hourly PM 2.5 prediction model is then developed using the Random Forest approach
and integrating measurements from the PurpleAir sensor network, high resolution traffic data,
and a suite of spatiotemporal variables (e.g., meteorological and land-use variables) to estimate
PM
2.5
distribution in Los Angeles County at 500 meter resolution. The Random Forest approach
has been increasingly applied to estimate air pollution since it can model the complex and
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
56
nonlinear relationships between observations and predictor variables and offer flexible and
automated procedures for predicting target variables (Bi, Stowell, et al., 2020; Bi, Wildani, et al.,
2020; Di et al., 2016; X. Hu et al., 2017; Huang et al., 2018, 2019). The PM 2.5 estimations were
evaluated against the ground-level PM 2.5 measurements using the 10-fold cross-validation
method. The results showed that the hourly PM 2.5 Random Forest model could capture more than
90% of the hourly PM 2.5 variations in Los Angeles County. A comprehensive description of the
hourly PM 2.5 prediction model used in this study can be found in Lu et al. (2021). The hourly
PM 2.5 measurements were spatially matched to each TAZ in line with the human movement data
and averaged within each TAZ if multiple grids intersect with the same TAZ.
2.3 Method
Human exposure to air pollution was often estimated at aggregated zones such as census tract or
ZIP code level (Zhang et al., 2018). However, studies indicate that the difference in acute
exposures can be substantial even at two neighboring locations. A Southern California research
shows the acute PM 2.5 concentrations can range from approximately 50 μg/m
3
(Redlands, CA) to
greater than 1,000 μg/m
3
(Moreno Valley, CA) in peak hours, further justifying the need for
individual-level analysis of exposure risk (Do et al., 2021). SCAG ABM data provides us with
the simulated human movement trajectories on a typical weekday in 2019. Therefore, I applied
the prediction model to estimate hourly PM 2.5 concentrations in Los Angeles County on a
randomly selected weekday—Wednesday, September 18, 2019—consistent with the previous
study (Lu et al., 2021). Each individual’s exposure to PM 2.5 on that day was then measured by
integrating hourly PM 2.5 maps and human movement data.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
57
2.3.1 Static and Dynamic exposure
This study estimates two types of individual exposures. Each individual’s static exposure is
assessed based on the daily average PM 2.5 concentration at their home. The static exposure does
not consider individual mobility patterns and assumes them to stay home all day. The SCAG
ABM data provides the TAZ where each individual's home is located. I then calculate the
weighted daily PM 2.5 concentration within each TAZ from the hourly PM 2.5 model.
Each individual's dynamic exposure is estimated by matching detailed movement
trajectory data with modeled PM 2.5 concentrations at the corresponding locations. For this
integration, it is assumed that PM 2.5 concentrations are constant over an hour and ambient PM 2.5
concentration is the same as indoor at the exact location. The dynamic exposure is estimated as
in Eq. (7).
𝑬𝒙𝒑 𝒋 = ∑ ∑ 𝑷𝑴 𝟐 . 𝟓 𝒊 ,𝒕 ∗ 𝑻𝒊𝒎𝒆 𝒊 ,𝒏 ,𝒕 𝑵 𝒏 =𝟏 𝑻 𝒕 =𝟏 (7)
where 𝑬𝒙𝒑 𝒋 denotes the dynamic exposure of individual 𝐣 , 𝐢 and 𝐭 denote the
corresponding activity location and time during a day, 𝑻 denotes 24 hours of a day, 𝐍 represents
the total number of microenvironments experienced by individual 𝐣 within a specified temporal
window (e.g., an hour) and 𝐧 denotes the 𝒏 th microenvironment, 𝑷𝑴 𝟐 . 𝟓 𝒊 ,𝒕 denotes the ambient
PM 2.5 concentration at location 𝒊 during time 𝒕 , and 𝑻𝒊𝒎𝒆 𝒊 ,𝒏 ,𝒕 denotes the percentage of time
during a day (24 hours) that the individual stays in the 𝒏 th microenvironment at location 𝒊 during
time 𝒕 .
2.3.2 Exposure measurement error
“Exposure measurement error” is defined as the disparity between an individual’s static
exposure and dynamic exposure. Three indicators were adopted in this study to explore the
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
58
exposure measurement error: (1) mean percentage error (MPE), (2) mean absolute percentage
error (MAPE), and (3) bias factor. MPE and MAPE were used to test the correlation between
static exposure and dynamic exposure (Eq. (9) and Eq. (10)).
𝑴𝑷𝑬 =
𝟏𝟎𝟎 %
𝒏 ∑ (
𝑆𝑡𝑎𝑡𝑖𝑐 𝑖 −𝐷 𝑦𝑛𝑎𝑚𝑖𝑐 𝑖 𝑆𝑡𝑎𝑡𝑖𝑐 𝑖 )
𝑛 𝑖 =1
(9)
𝑀𝐴𝑃𝐸 =
100%
𝑛 ∑ |
𝑆 𝑡𝑎𝑡𝑖𝑐 𝑖 −𝐷 𝑦𝑛𝑎𝑚𝑖𝑐 𝑖 𝑆𝑡𝑎𝑡𝑖𝑐 𝑖 |
𝑛 𝑖 =1
(10)
where 𝑆𝑡𝑎𝑡𝑖𝑐 𝑖 denotes the static exposure of individual 𝑖 , 𝐷 𝑦𝑛𝑎𝑚𝑖𝑐 𝑖 denotes the dynamic
exposure of individual 𝑖 , and 𝑛 indicates the total number of studied individuals in this study.
Furthermore, I also calculated the expected bias factors to quantify potential biases if
individual daily mobility is not considered (M. M. Nyhan et al., 2019; E. Setton et al., 2011; E.
M. Setton et al., 2008; X. Yu et al., 2020). General classic bias theory suggests that:
𝑍 = 𝑋 + 𝐸 (11)
where 𝑍 is the value of the surrogate (or observed) measure, 𝑋 is the true value, and 𝐸 is the bias
in measuring 𝑋 . For the calculations, 𝑍 is assumed to be analogous to the static exposure, the
dynamic exposure is equivalent to 𝑋 , and 𝐸 is the bias associated with the corresponding static
exposure. For the classical bias model, 𝐸 is assumed to be independent of 𝑋 , but in this data set,
𝐸 shows varying degrees of correlation with 𝑋 . Therefore, in the presence of the correlation
between 𝐸 and 𝑋 , the bias factor can be calculated according to the following equation
(Wacholder, 1995):
𝐵 =
𝜎 2
+𝜑 𝜎 2
+2𝜑 +𝜔 2
(12)
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
59
where, 𝜎 2
is the variance of the dynamic exposures (𝑋 ) of all subjects, 𝜑 is the
covariance between the dynamic exposures (𝑋 ) and errors in exposure estimation (𝐸 ), and 𝜔 2
is
the variance of errors in exposure estimation (𝐸 ). The bias factor 𝐵 represents the expected bias
in relative risk estimates when the static exposure method is applied. For example, a bias factor
(𝐵 ) of 0.75 suggests that the bias caused by applying the static exposure method would lead to
the relative risk being underestimated by 25%.
Previous studies indicate that time spent at work was one of the strongest predictors of
personal exposure to air pollution. They contend that time spent at work contributes most to the
differences in exposure (Kousa et al., 2001; E. Setton et al., 2011; E. M. Setton et al., 2008;
Spengler et al., 1986). Accordingly, all samples are further subdivided into two groups based on
their employment status: workers and nonworkers and the exposure measurement error is
calculated for both groups.
This study also investigates the impact of residence and work location on personal
exposure to understand how exposure misestimation varies between populations (Cole-Hunter et
al., 2018). Kim and Kwan (2021) have referred the impact of residence and work locations on
personal exposure to be “neighborhood effect averaging”, which could lead to upward averaging
(amplification) or downward averaging (attenuation) of individual exposure if individual
mobility is not considered. The hypothesis is that the exposures of people who live in low-
pollution neighborhoods tend to be underestimated as they are highly likely to travel to higher-
pollution neighborhoods when undertaking their daily activities. Conversely, people who live in
high-pollution neighborhoods are more likely to travel to lower-pollution neighborhoods when
undertaking their daily activities. Thus, their exposure levels tend to be overestimated without
considering their daily activities (Kim & Kwan, 2021b, 2021a; Kwan, 2018).
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
60
To explore how people’s residence and work locations affect individual exposures, TAZs
in the study region were classified as low-pollution and high-pollution TAZs based on the daily
PM 2.5 concentration level. It is noted that all TAZs’ daily PM 2.5 concentrations are lower than
EPA’s 24-hour standard of PM 2.5 (i.e., 35 μg/m
3
) on the study date. Hence, the median value of
daily PM 2.5 concentrations (i.e., 8.4 μg/m
3
) was used to identify whether a TAZ has been
classified as a low-pollution TAZ or a high-pollution TAZ. All TAZs with a daily PM 2.5
concentration greater than 8.4 μg/m
3
were categorized as high-pollution TAZs, while TAZs with
a daily PM 2.5concentration smaller than 8.4 μg/m
3
were classified as low-pollution TAZs (Figure
11). Static and dynamic exposures were then calculated for all workers based on the categories of
where they live and work (low pollution TAZ, high pollution TAZ) and commuting distances (0-
5, 5-10, 10-20, and above 20 miles) to evaluate the impact of residence and work location and
mobility on exposures. These population subgroups were chosen because previous work has
suggested potential disparities in their exposure (E. Setton et al., 2011; E. M. Setton et al., 2008;
X. Yu et al., 2020).
3. Result
3.1 Spatiotemporal distribution of PM 2.5 concentrations and population activity
Figure 12 shows the spatial distribution of PM 2.5 and population location of the chosen sample
throughout an average weekday (September 18, 2019) in Los Angeles County. As shown in
Figure 12a–d, PM 2.5 shows heterogeneous spatial distribution patterns across the day. The
concentration maps show relatively high concentrations of PM 2.5 along major roadways and
urban core during the morning peak hour. At the evening peak hour, a less severe PM 2.5 pollution
is found across the study area. This may be due to differences in meteorology between morning
and evening (e.g., calmer winds and lower mixing heights in the morning). The population
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
61
density maps show how people in the region generally move from their residences, which are
distributed sparsely in space (Figure 12e), to employment clusters and activity centers along the
region's major highways (e.g., Wilshire corridor) and urban cores (e.g., Downtown Los Angeles)
during the day (Figure 12f-g) and then back home again in the evening (Figure 12h). The
patterns shown in Figure 12 reveal how a large portion of the population moves into the highest
concentration areas during the day for work, school, and other activities.
Figure 12 Distribution of average hourly PM 2.5 concentration at different times (a-d) and
population density (e-h).
3.2 Individual exposure assessment
The static and dynamic PM 2.5 exposure was computed for all individuals, workers, and
nonworkers, respectively. The results are shown in Table 6. The mean static exposure of all
individuals was 8.13 μg/m
3
, and the mean dynamic exposure was 8.18 μg/m
3
. In terms of the
exposure measurement error between the static and dynamic exposure, the MPE and MAPE were
-1.2% and 4.4%, respectively. The bias associated with using the static exposure surrogate
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
62
instead of the dynamic exposure was found to be 0.87, indicating estimated relative risk for
PM 2.5 will be underestimated by 13% when mobility was ignored during exposure estimation.
As described in Table 6, the mean static and dynamic exposure for nonworker individuals
were 8.12 μg/m
3
and 8.13 μg/m
3
, while the MPE and MAPE were -0.3% and 2.5%, respectively.
The bias value of 0.95 suggests the exposure will be underestimated by only 5% for nonworkers
if other activity locations are not considered. As such, the static and dynamic exposures of
nonworkers were quite similar. Conversely, workers were found to have a larger exposure
measurement error if not considering other activity locations. The mean static and dynamic
exposure of workers were 8.14 μg/m
3
and 8.24 μg/m
3
. The MPE and MAPE were much larger
for workers (-2.1% and 6.4%) than those of nonworkers, with a smaller bias value: 0.78,
suggesting that exposure levels may be underestimated by 22% when place of work is not taken
into account.
Table 6 Comparison between static and dynamic PM 2.5 exposure estimate (μg/m
3
) for the overall
population, workers, and nonworkers.
Mean
Std.
Dev.
Min Max MPE MAPE Bias
Overall
(N = 100,784)
Static exposure 8.13 1.44 2.65 11.97
-1.2% 4.4% 0.87
Dynamic exposure 8.18 1.36 2.71 13.86
Worker
(N = 48,309)
Static exposure 8.14 1.44 2.65 11.97
-2.1% 6.4% 0.78
Dynamic exposure 8.24 1.30 3.09 13.77
Non-worker
(N = 52,475)
Static exposure 8.12 1.44 2.65 11.97
-0.3% 2.5% 0.95
Dynamic exposure 8.13 1.41 2.71 13.86
The exposure measurement error is caused by people spending time out of their homes,
particularly those with higher mobility. Table 7 summarizes the average fraction of a day spent
in each microenvironment. The average individual in Los Angeles County spends 66.4% of the
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
63
day at home, followed by 9.6% at work and 8.0% at school. The time spent in each
microenvironment differs by employment status. Table 7 indicates a significant difference in the
time spent by workers and nonworkers at different activity sites. As shown in Table 7, workers
spent a greater amount of time at work and traveling than nonworkers, while the amount of time
spent at home and school of nonworkers is larger than that of workers.
Table 7 also summarizes the contribution of each microenvironment to total PM 2.5
exposure for all populations, workers, and nonworkers. Broadly, the contribution of each
microenvironment to the exposure tends to be proportional to the amount of time spent in the
microenvironment. However, I found concentrations at home are on average lower than
elsewhere, while concentrations at work and school are higher. Home is the place that accounts
for the greatest amount of daily exposure to PM 2.5 because of the relatively large amount of time
spent at home. On average, exposure at home accounts for 60.7% of daily exposure, which is
smaller than the percentage of time spent at home during a day (66.4%). Work and school
contribute to most of the remaining daily exposure with disproportionately less time spent at the
two locations, accounting for 11.9% and 10.2% of daily exposure with only 9.6% and 8.0% of
time spent at these two microenvironments. For workers, home locations were responsible for
54.1% of the total exposure, while they account for 66.9% for nonworkers. This difference is
mainly because nonworkers generally spend more time at home than workers. For workers,
exposure at work contributed significantly to the total exposure (24.6%), which may have led to
a larger total exposure for workers than for nonworkers. The results indicate that individual
mobility cannot be neglected in the estimation of exposure, and greater bias may occur in
exposure assessments if only exposure at residence is considered (Cole-Hunter et al., 2018;
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
64
Elliott & Smiley, 2019; Park, 2020; E. Setton et al., 2011; E. M. Setton et al., 2008; Tayarani &
Rowangould, 2020).
Table 7 The average fraction of a day spent in each microenvironment and contribution of each
microenvironment to total daily PM 2.5 exposure.
Microenvironment
Overall (N = 100,784)
Worker (N = 48,309)
Non-worker (N =
52,475)
%time
spent
Contribution to
PM
2.5
exposure
%time
spent
Contribution to
PM
2.5
exposure
%time
spent
Contribution to
PM
2.5
exposure
Home 66.4% 60.7%
60.3% 54.1%
72.0% 66.9%
Work 9.6% 11.9%
20.1% 24.6%
0.0% 0.0%
School 8.0% 10.2%
0.7% 0.8%
14.7% 19.0%
Family/personal
business
4.3% 4.7%
5.0% 5.5%
3.7% 4.0%
Shopping 2.2% 2.2%
2.7% 2.7%
1.7% 1.8%
Social/recreational 2.5% 2.5%
2.6% 2.6%
2.5% 2.4%
Other 1.8% 2.1%
2.3% 2.7%
1.4% 1.5%
In transit 5.2% 5.7%
6.3% 7.0%
4.1% 4.4%
Figure 13 portrays the exposure measurement error computed in each TAZ for all
individuals (Figure 13a), nonworkers (Figure 13b), and workers (Figure 13c), respectively. As
observed in Figure 13, the size and direction of exposure measurement error vary extensively
across the study area. Figure 13a shows that the exposure of individuals residing in low-pollution
TAZs (e.g., suburban and rural areas) was underestimated by up to 30% or more. In comparison,
those living in high-pollution TAZs (e.g., urban cores) were overestimated by 10% or more when
individual mobility is not taken into account. The trend is more evident among workers (Figure
13c). The exposures of workers who live in low-pollution areas were more likely to be
underestimated, whereas those who live in high-pollution areas were likely to be overestimated.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
65
However, few exposure measurement errors exist between static and dynamic exposures among
nonworkers, which could be attributed to their relatively low mobility.
Figure 13 Distribution of the percentage of the relative difference between static and dynamic
exposure at individual TAZs for (a) all individuals, (b) nonworkers, and (c) workers.
3.3 The impact of activity location and mobility on exposure estimation
The boxplots of static and dynamic exposures estimated for four worker groups based on their
residence and work locations are displayed in Figure 14. The number of workers in this dataset
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
66
accounts for nearly half of the sample size (48%). Of these, 35% of the workers live in low-
pollution TAZs and work in low-pollution TAZs, 13% of workers live in low-pollution TAZs
and work in high-pollution TAZs, 12% of workers live in high-pollution TAZs and work in low-
pollution TAZs, and 40% of workers both live and work in high-pollution TAZs. For each
worker group, their static and dynamic exposures are also calculated by different levels of
commuting distance.
Individual exposure differs by residence and work locations as well as mobility. Among
four worker groups, those living and working in high-pollution TAZs have the highest static and
dynamic exposures. In contrast, those living and working in low-pollution TAZs have the lowest
exposure levels. In general, the median values of static and dynamic exposures of workers living
and working in TAZs with the same air pollution levels (i.e., low-pollution and low-pollution,
high-pollution and high-pollution) were close to each other at all commuting distance levels,
whereas the lengths of boxes (i.e., the range between the first and third quartiles) varied across
different commuting distances (Figure 14a and 14d). For workers living and working in TAZs
with different air pollution levels (i.e., low-pollution and high-pollution, high-pollution and low-
pollution), we can observe a larger exposure measurement error, where the exposure
measurement error expands with increasing mobility (Figure 14b and 14c). However, the
direction of exposure measurement error varies with different combinations of residence and
work locations. As shown in Figure 14b and 14c, workers who live in low-pollution TAZs and
work in high-pollution TAZs have higher dynamic exposures than static exposures. In contrast,
the dynamic exposure of workers who live in high-pollution TAZs and work in low-pollution
TAZs is generally lower than their corresponding static exposure. These findings are consistent
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
67
with previous studies in the same area and elsewhere (Kim & Kwan, 2021a; Kwan, 2018; X. Ma
et al., 2020).
Figure 14 Boxplots of static and dynamic exposures estimated (μg/m
3
) for four types of workers
based on their residence and work locations and commuting distance: (a) workers live in low-
pollution TAZs and work in low-pollution TAZs; (b) workers live in low-pollution TAZs and
work in high-pollution TAZs; (c) workers live in high-pollution TAZs and work in low-pollution
TAZs; (d) workers live in high-pollution TAZs and work in high-pollution TAZs.
It is also found that the exposure measurement error increased with mobility. The
estimated bias factors for four worker groups with different commuting distance levels are
presented in Figure 15. With increased commuting distance, the estimated bias factors of
workers generally decrease regardless of their residence and work locations, with the only
exception for workers living and working in high-pollution TAZs with commuting distances over
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
68
20 miles. The smaller bias factor, a value of 0.39, is observed for workers who live in high-
pollution TAZs but work in low-pollution TAZs with commuting distances over 20 miles. This
value suggests that the estimated relative risk for PM 2.5 exposure will be underestimated by 61%
when individual mobility was ignored during exposure estimation.
Figure 15 The impact of mobility on exposure measurement error factors for four worker groups.
The demographic and socioeconomic characteristics of four worker groups are
summarized in Figure 16. As shown in Figure 16, a larger portion of whites and high-income
workers live in low-pollution TAZs, while more ethnic minority and low-income workers live in
high-pollution TAZs. Table 8 shows that whites and high-income workers have lower exposure
levels than ethnic minorities and low-income people regardless of whether the individual
mobility is considered in exposure estimation. The results suggest that ethnic minority and low-
income groups are disproportionately burdened with PM
2.5
pollution at residence, work, and
other activity locations than whites and high-income populations.
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
0-5 mile 5-10 mile 10-20 mile >20 mile
Low-pollution to Low-pollution Low-pollution to High-pollution
High-pollution to Low-pollution High-pollution to High-pollution
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
69
When human mobility is not considered, exposure will be underestimated for all
race/ethnic population groups, especially for whites and high-income populations. As Table 8
described, white and high-income workers were found to have higher MPE and MAPE values
but relatively low bias factor values than ethnic minority and low-income workers, revealing the
exposures of white and high-income workers are more likely to be underestimated. At the same
time, ethnic minority and low-income people may not experience such underestimation in
exposure estimation. Conversely, their exposures were prone to be overestimated if mobility is
not taken into account. Thus, the difference in the average PM 2.5 exposure between low- and
high-income people, whites and ethnic minorities may become smaller when human mobility is
considered.
Figure 16 Summary statistics for four worker groups.
Table 8 Comparison between static and dynamic exposure estimate of PM 2.5 for workers based
on race/ethnicity and income level.
Mean Std. Dev. Min Max MPE MAPE Bias
White Static exposure 7.92 1.60 2.87 11.97 -2.9% 7.1% 0.79
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
70
(N = 13,963)
Dynamic exposure 8.06 1.44 3.94 13.43
Hispanic
(N = 22,037)
Static exposure 8.30 1.36 2.65 12.17
-1.5% 6.1% 0.77
Dynamic exposure 8.35 1.22 3.09 12.93
Black
(N = 3,557)
Static exposure 7.89 1.36 3.14 11.97
-2.5% 6.4% 0.81
Dynamic exposure 8.02 1.27 3.96 12.96
Asian
(N = 7,605)
Static exposure 8.24 1.33 2.65 12.75
-2.1% 6.0% 0.77
Dynamic exposure 8.34 1.20 3.64 13.77
High-income
(N = 20,252)
Static exposure 8.00 1.44 2.65 12.75
-2.9% 6.9% 0.78
Dynamic exposure 8.15 1.30 3.45 13.43
Middle-income
(N = 15,429)
Static exposure 8.14 1.47 2.65 12.17
-2.0% 6.5% 0.78
Dynamic exposure 8.22 1.32 3.09 13.77
Low-income
(N = 12,628)
Static exposure 8.38 1.39 2.65 11.97
-1.0% 5.6% 0.80
Dynamic exposure 8.40 1.26 3.10 13.36
4. Discussion
This study contributes to the environmental health literature by demonstrating that considering
both the spatiotemporal variability of air pollution and human movement data is vitally important
for accurate exposure assessments. It also presents geospatial methods that enable researchers to
take both of these variables into account. Using human mobility data from the SCAG’s activity-
based travel demand model and hourly PM 2.5 maps developed from a random forest approach, I
demonstrate that accounting for individual mobility and spatiotemporal variations of PM 2.5
generally results in higher exposure estimates for the study subjects in Los Angeles County. By
comparing static and dynamic exposures, I found a static analysis of exposure to PM 2.5 where all
exposure is assumed to occur at home is likely to result in significant exposure measurement
error. Exposures at home, work, school, or while traveling can account for most people's daily
exposure depending on where they live and work. The results show that the exposures of people
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
71
who live in low-pollution areas and work in high-pollution areas are likely to be underestimated,
while the exposures of people who live in high-pollution areas and work in low-pollution areas
are often overestimated.
Consistent with previous studies, time spent at work was one of the strongest predictors
of personal exposure to air pollution (Kim & Kwan, 2021a, 2021b; E. Setton et al., 2011; E. M.
Setton et al., 2008). The results revealed that the exposures of workers have significantly higher
odds of being mistakenly estimated than those of nonworkers without considering individual
mobility. This study also explored the impact of mobility on PM 2.5 exposures. The results
showed that the impact of mobility on exposure could be substantial at the individual level,
particularly for individuals who are highly mobile. Ignoring mobility in exposure assessment
could lead to up to 61% underestimating exposure for workers living in high-pollution TAZs and
working in low-pollution TAZs with commuting distance over 20 miles. The magnitude of
underestimation is related to different mobility levels, where populations with longer commuting
distances generally have more considerable exposure bias between static and dynamic exposure.
This study also implies sociodemographic disparities in personal exposure to PM 2.5.
Ethnic minority and low-income groups bear more risks of air pollution than white and high-
income populations. The results indicate that relative to Whites and high-income groups, ethnic
minority and low-income groups experience higher levels of toxic air pollution at home, work,
and other activity venues (Elliott & Smiley, 2019). Previous studies have concluded that the
residence-based PM 2.5 exposure of ethnic minority and low-income people is significantly higher
than that of white and high-income people (Chi et al., 2016; Gilbert & Chakraborty, 2011; Hajat
et al., 2015). This study suggests that exposures of white and high-income people are more likely
to be underestimated compared to ethnic minority and low-income people due to their relatively
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
72
high mobility. Thus, the difference in the average PM 2.5 exposure between White and ethnic
minority, low- and high-income people may become smaller when human mobility is considered.
Inaccurate assessments of human exposure to air pollution can lead to erroneous environmental
justice studies and ineffective public policies. Researchers and policymakers should consider
human mobility when estimating human exposure and developing subsequent policies.
These findings have important implications for all studies on mobility-dependent
exposures such as noise pollution and traffic emissions. First, studies on environmental
exposures need to consider human mobility and spatiotemporal variations of pollution;
otherwise, assessments of individual exposure could be erroneous. Second, ignoring human
mobility when examining social inequalities in environmental exposures might misinform
policymakers (Elliott & Smiley, 2019; Kim & Kwan, 2021a; X. Ma et al., 2020). Policymakers
should be aware of the effects of human mobility on individual exposures when developing
policy interventions to address the specific needs of disadvantaged social groups. High mobility
might attenuate the exposures of ethnic minority and low-income people in their residential
neighborhoods while amplifying the exposures of white and high-income people as they
generally live in areas with cleaner air. A more accurate estimate of exposure helps planners and
policymakers identify the disadvantaged populations in air pollution exposure and entails
targeted policy and planning implications. Furthermore, accurate assessments of human exposure
can be useful for epidemiological studies to explore the association between environmental
pollution exposure and health effects or disease, which has been limited by the lack of human
mobility data.
This study demonstrates several strengths in estimating PM
2.5
concentration and
individual mobility. First, I combine the machine learning approach (i.e., random forest) and
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
73
low-cost PurpleAir PM sensor data to estimate PM 2.5 concentrations at fine spatial and temporal
scales (hourly at 500m spatial resolution). Low-cost PM sensors, which have relatively dense
deployment and high spatiotemporal coverage and resolution, could supplement conventional
regulatory FRM/FEM monitor networks and have been increasingly used in air pollution
modeling (Bi, Stowell, et al., 2020; Bi, Wildani, et al., 2020; Kumar et al., 2015; Lu et al., 2021).
Additionally, the random forest approach used in this study has been widely utilized in air
pollution modeling as it enables the modeling of complex and nonlinear associations between
predictor variables. (e.g., land use, meteorological condition, and the traffic flow) and the
outcome variable (i.e., PM 2.5 concentrations) (Di et al., 2016; X. Hu et al., 2017; Huang et al.,
2018, 2019). Such a modeling approach is superior not only because of its high practicality but
also because of its effective characterization of air pollution variations at finer spatiotemporal
scales, which to a large extent overcomes the limitations of the insufficient characterization of air
pollution variations in traditional modeling methods used in previous studies.
Second, this study uses an activity-based travel demand model to simulate people’s
trajectories and their synthetic demographics. The data can well suggest the effect of ignoring
individual mobility on exposure measurement error and the differential impacts on exposure
measurement error caused by sociodemographic characteristics. The model opens up new
opportunities for capturing individual mobility and can be applied to other populous regions
where travel activity survey data exists (Gurram et al., 2019; Park & Kwan, 2020; Tayarani &
Rowangould, 2020). Such detailed data offers a new large-scale population-representative
dataset of individual mobility and enables researchers to analyze subgroup differences to
investigate environmental injustice and health disparities.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
74
However, there are several limitations to this study that should be addressed in future
research. First, since this research utilized data of one typical weekday (i.e., Wednesday,
September 18, 2019), it remains unclear whether the pattern of individual exposures and
exposure measurement error across populations can be still similar on a different type of day,
such as weekend, holiday, and a day with an extreme event (e.g., a day with wildfire or during
the COVID pandemic). This uncertainty can be caused by the varying ground-level PM 2.5
concentrations and people’s daily mobility patterns across seasons and over time (Kim & Kwan,
2021a; Lu et al., 2021). Therefore, future exposure studies should use data that are obtained from
various times. Second, the travel routes estimated in this study may not reveal the study objects'
actual travel routes due to limitation of OSMnx package. The exposures during travel have been
found to be higher than other activities although travel only accounts for a small fraction of daily
time (Gurram et al., 2019; Park, 2020). Future research should consider more factors that can
affect people’s decisions in choosing travel routes when simulating travel routes between two
locations, such as safety, personal preference, or road quality. Third, this study assumes that
indoor concentration is equivalent to outdoor concentration. However, being indoors may have a
protective effect from ambient air pollution sources (e.g., traffic emission). Indoor PM 2.5
concentration data is not available in this study, which is significantly affected by a variety of
factors, including ventilation capability and humidity conditions (Park, 2020). Fourth, this study
is only applied to Los Angeles County. Los Angeles has its unique characteristics such as urban
structure, demographic composition, and land use layout. These characteristics shape the
mobility patterns of residents in Los Angeles. It is expected that patterns of population mobility,
spatiotemporally variabilities of PM 2.5 concentrations, pollutant emission sources, and
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
75
meteorology conditions can vary substantially across different regions. Therefore, further studies
are needed to better understand how the findings from this study may fit other areas.
5. Conclusion
Ignoring human mobility, air pollution variations, or both can lead to bias at exposure estimation.
In this study, simulated human mobility data from 100,784 individuals and hourly PM 2.5 data
collected from low-cost PurpleAir sensor networking was integrated to explore the impact of
individual mobility on exposure to PM 2.5 in Los Angeles. It suggests that the traditional
residence-based exposure estimation method remained informative for people with low mobility
(e.g., nonworkers). However, the results suggest that the exposure levels of workers were
extensively underestimated, especially for those with long commuting distances. The magnitude
and direction of exposure measurement error were strongly associated with time spent and air
pollution levels at various activity locations. Additionally, although the difference between the
exposures of white and high-income populations and those of ethnic minority and low-income
populations decreased when individual mobility was considered, this study demonstrated that
ethnic minority and low-income people were generally exposed to greater PM 2.5 pollution. To
summarize, considering individual mobility in estimating exposure will help planners and
policymakers identify the disadvantaged populations in air pollution exposure and entail targeted
policy and planning implications.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
76
CHAPTER 4: Whose Exposure Levels Were Incorrectly Estimated: Assessing Classification
Errors in Individual’s PM 2.5 Exposure Using a Machine Learning Approach
Abstract
Due to a paucity of human movement data, the traditional method for estimating pollution
exposure is static: exposure is based on place of residence. However, local air quality varies over
both time and space. Therefore, a static estimate may be quite different from actual exposure.
This paper examines classification errors in static residence-based exposure estimation. Using a
random forest classification model, this study examines the impacts of a variety of factors on
potential classification errors in PM 2.5 exposure. Human movement data was combined with
hourly PM 2.5 surfaces to estimate and compare residence-based and mobility-based exposures for
100,784 Los Angeles County residents. All study subjects were subdivided into three groups
according to their exposure classification errors. The results show that exposure classification
errors increase for individuals with high mobility levels. Significant sociodemographic
disparities are observed across different exposure classification groups. As mobility levels
increase, the exposures of low-income people living in polluted neighborhoods are more likely to
be overestimated. In contrast, the exposures of high-income people living in neighborhoods with
cleaner air are likely to be underestimated. While low-income group exposure is overestimated
and high-income group exposure is underestimated, it is still the case that low-income groups are
exposed more on average.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
77
1. Introduction
Human mobility has been demonstrated to be vital for accurate assessments of air pollution
exposure. It is also argued to be a major limitation of many epidemiological studies on social
inequalities in air pollution exposure (Elliott & Smiley, 2019; Guo et al., 2020; Kim & Kwan,
2021a; Lu, 2021; X. Ma et al., 2020; Park & Kwan, 2017; Shafran-Nathan et al., 2017). Due to a
paucity of data on diurnal human movement at the individual level, most exposure assessment
research concentrates on residential places, neglecting the reality that people spend time at other
venues throughout the day (Elliott & Smiley, 2019; Gilbert & Chakraborty, 2011; Park & Kwan,
2017; Shafran-Nathan et al., 2017). This residence-based method could introduce substantial
exposure measurement errors, subsequently misleading policymakers in developing air pollution
regulations and public health interventions.
Prior research has documented significant social inequalities and sociodemographic
disparities in exposure to air pollution at residence. Numbers of studies argue that ethnic
minorities and low-income residents bear disproportionate air pollution at places of residence
compared to their white and economically affluent counterparts (Bae et al., 2007; Elliott &
Smiley, 2019; Houston et al., 2004; Mohai & Saha, 2015; Rowangould, 2013). Yet, the
sociodemographic disparities in residence-based exposure across various population groups may
be erroneous when incorporating people’s mobility patterns in exposure assessment. People's
daily mobility can attenuate high residence-based exposure while amplifying low residence-
based exposure (Kim & Kwan, 2021a; Lu, 2021). When people travel outside of their residential
communities, they may encounter comparable or different levels of exposure than in their
residential neighborhoods. As a result of spatiotemporal variances of air pollution, a person’s
exposure can be higher or lower than their exposure at residence, depending on their mobility
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
78
levels (Kim & Kwan, 2021b). Therefore, overlooking variation of mobility effects on exposure
measurement can lead to exposure classification error: some people exposed to high air pollution
may be misclassified into groups with low exposure levels, and vice versa.
A growing number of empirical studies have documented mobility effects on individual
exposure across different study areas, including the U.S. (Kim & Kwan, 2021a; Lu, 2021; M. M.
Nyhan et al., 2019; Tayarani & Rowangould, 2020), Europe (Dewulf et al., 2016; Picornell et al.,
2019; Shafran-Nathan et al., 2017), and China (J. Ma et al., 2020; Xu et al., 2019; X. Yu et al.,
2020). However, factors that lead to exposure classification errors have not been well
understood. Yu et al. (2020) found that people who exhibit higher levels of mobility have larger
exposure classification errors. In Shareck et al. (2014), the authors argued that unequally
distributed features and resources across spaces might induce exposure disparities by
constraining sites where people perform their everyday activities due to personal and societal
factors (Shareck et al., 2014). For example, due to accessibility limitations, low-income groups
usually travel shorter distances from their homes than their high-income counterparts (Morency
et al., 2011; Vallée et al., 2010). Compared to whites, blacks and Latinos usually have lower
mobility levels because of financial constraints (L. Hu et al., 2020). Full-time employees tend to
travel longer daily distances (Järv et al., 2015; Morency et al., 2011; Páez et al., 2010), whereas
part-time employees and unemployed people are more place-bound (Lu, 2021; Vallée et al.,
2010). Educated people show more mobility than less-educated groups (L. Hu et al., 2020;
Vallée et al., 2010). Females spend more time at home and on non-work activities (e.g., grocery
shopping) than males (Park, 2020).
Place-based theory has long dominated exposure assessment in current epidemiology and
environmental justice research, shaping how policymakers construct environmental and public
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
79
health policies. Nonetheless, failing to account for variations in mobility effects between various
demographic groups may result in less effective policy solutions. This study examines (1) what
factors may cause classification errors in air pollution exposure measurement; (2) what
sociodemographic groups are more likely to be misclassified in exposure assessment; (3) how
will exposure mismeasurement affect estimated health outcomes and potential policy solutions.
In this study, residence-based and mobility-based PM 2.5 exposures for a sample of Los
Angeles County residents were estimated by coupling hourly PM 2.5 surfaces and human
movement data. Random forest classification models were used to examine the impacts of a
series of mobility and sociodemographic variables on exposure classification error. The results
show that the magnitude of exposure classification errors is positively associated with human
mobility levels, and sociodemographic characteristics influence the direction of exposure
classification errors. On average, as mobility levels increase, the exposure of low-income
residents living in highly polluted areas tends to be overestimated, whereas exposure of high-
income residents living in areas with cleaner air is more likely to be underestimated. However,
even though exposure of the low-income is often overestimated, low-income groups are exposed
to more air pollution on average and their health risks of air pollution are likely to be
underestimated.
The remainder of the paper is organized as follows. Section 2 presents data and methods
used in this study. Section 3 summarizes results. Section 4 discusses findings and results.
Conclusions are presented in in Section 5.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
80
2. Method
2.1 Study area
Los Angeles is well recognized for its notoriously severe air pollution problem as one of the U.S.
metropolitan regions with the highest level of particulate matter pollution (American Lung
Association, 2020). Los Angeles has the most developed highway system and the busiest traffic
in the U.S. PM 2.5 pollution, a primary air pollutant created by vehicles, has been a serious public
health problem in Los Angeles for decades. In Los Angeles, PM 2.5 concentrations vary spatially
and temporally, with the highest pollution observed during peak hours and within core urban
areas (Lu et al., 2021). Los Angeles is therefore a good case study to examine variation in
exposure levels across time and space taking daily travel patterns into account, and test whether
exposure patterns are related to sociodemographic population characteristics.
2.2 Data
2.2.1 PM 2.5 concentration data
Ground-level PM 2.5 data were obtained from the PM 2.5 model developed by Lu et al. (2021).
They have created an hourly, 0.25-kilometer gridded PM 2.5 model for Los Angeles County that
incorporates low-cost air sensor data (i.e., PurpleAir) and machine learning techniques. The
dense deployment of PurpleAir sensors allows them to capture spatiotemporal variations of
localized PM
2.5
concentrations at fine resolution (Bi, Wildani, et al., 2020; Lu et al., 2021;
Mousavi & Wu, 2021). Twenty-four hourly PM 2.5 concentration surfaces over the course of a
typical weekday on Wednesday, September 18, 2019, were generated at a 0.25-kilometer grid
level for Los Angeles County to assess individual’s PM 2.5 exposure.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
81
2.2.2 Human movement data
Daily travel trajectories for100,784 Los Angeles County residents were simulated using an
activity-based travel demand model developed by the Southern California Association of
Governments (SCAG) for an average weekday in 2019 (Pendyala et al., 2012; Ziemke et al.,
2015). American Community Survey (ACS) 2003 and Census 2000 have been used to validate
this SCAG simulated travel trajectory data. Validation results show that the SCAG activity-based
travel demand model has a good performance in predicting "activity purpose-number" and
mimicking corresponding population features at the individual level. According to the validation
results, the majority of the synthetic population deviated less than 5% from the reference group
in terms of demographic and socioeconomic characteristics (Pendyala et al., 2012).
The SCAG travel trajectory dataset contains 387,398 trip records for 100,784 Los
Angeles County residents (10% of total Los Angeles County population). Each trip record
includes a personal ID, origin-destination pair of the trip, trip purpose, trip departure and arrival
timestamps, trip duration, and travel mode. The personal ID is unique for each individual and is
used to connect with synthetic demographic features provided by the SCAG (Bhat et al., 2013;
Lu, 2021). The origin and destination of each trip is allocated to the geographic unit of the traffic
analysis zone (TAZ), whose size is similar to the census tract. The centroid of TAZ is assumed to
be each trip's origin or destination point.
However, the travel trajectory dataset lacks information on travel paths between the
activity sites. This study used the OSMnx python package to estimate probable travel paths
between two activity TAZs (Boeing, 2017; Lu, 2021). It is assumed that people would always
choose the shortest path distance between two activity locations when developing potential travel
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
82
paths. Table 9 lists all mobility and sociodemographic variables and their summary statistics
collected from the SCAG travel trajectory dataset.
2.3 Individual exposure assessment
2.3.1 Static and dynamic exposure assessment
The study region was subdivided into 0.25 square kilometer hexagon grids. As noted earlier,
hourly PM 2.5 concentrations of each grid (twenty-four hours in total) were generated by utilizing
the PM
2.5
model developed by Lu et al. (2021) for Wednesday, September 18, 2019. Due to
limitation in computing resources, PM 2.5 concentrations are assumed to be constant during an
hour within each hexagon grid. The hourly PM
2.5
concentrations were spatially matched to each
TAZ in compliance with the travel trajectory data and averaged within each TAZ if multiple
PM 2.5 hexagon grids locating in the same TAZ. Two types of individual PM 2.5 exposures were
then assessed: (1) static PM 2.5 exposure at residence; and (2) dynamic PM 2.5 exposure that
considers individuals’ daily mobility patterns.
The individual static and dynamic exposures are estimated as in Eq. (13) and (14):
𝑆𝑡𝑎𝑡𝑖𝑐 𝑖 =
∑ 𝑃𝑀
ℎ,𝑡 𝑇 𝑡 =1
𝑇 (13)
𝐷𝑦𝑛𝑎𝑚𝑖𝑐 𝑖 =
∑ ∑ 𝑃𝑀
𝑛 ,𝑡 ·𝑃 𝑛 𝑁 𝑛 =1
𝑇 𝑡 =1
𝑇 (14)
where 𝑃𝑀
ℎ,𝑡 is PM 2.5 concentration in hour 𝑡 at TAZ ℎ where individual 𝑖 ’s home is
located. 𝑇 denotes 24 hours of a day. 𝑃𝑀
𝑛 ,𝑡 is PM 2.5 concentration in hour 𝑡 at TAZ 𝑛 where
individual 𝑖 is located within hour 𝑡 . 𝑁 represents the total number of TAZs (microenvironments)
individual 𝑖 has stayed during hour 𝑡 (𝑁 ≥ 1). 𝑃 𝑛 denotes the percentage of time during hour 𝑡
that individual 𝑖 stays in TAZ 𝑛 .
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
83
2.3.2 Exposure classification error
Prior research has documented the occurrence of exposure misclassification when human
mobility is not taken into account in exposure assessment (Guo et al., 2020; X. Yu et al., 2020).
Exposure of individuals who have high residence-based exposures is likely to be reduced by their
mobility, while exposure of individuals who have relatively low residence-based exposures is
likely to be increased (Kim & Kwan, 2021a; Kwan, 2018). All study subjects were subdivided
into three groups according to their exposure classification error levels: (1) individuals with
similar dynamic and static exposure levels, which is referred to as the “Accurate” group; (2)
individuals with higher static exposure levels than their dynamic exposure levels, which is
referred to as the “Overestimated” group; and (3) individuals with higher dynamic exposure
levels than their static exposure levels, which is referred to as the “Underestimated” group.
Two statistical indicators were employed to categorize exposure classification groups: (1)
exposure measurement error and (2) mean absolute percentage error (MAPE). The exposure
measurement error was calculated by subtracting an individual’s static exposure from their
dynamic exposure (i.e. 𝐷𝑦𝑛𝑎𝑚𝑖𝑐 𝑖 − 𝑆𝑡𝑎𝑡𝑖𝑐 𝑖 ). A positive exposure measurement error indicates
an individual’s exposure is underestimated, while a negative measurement error indicates
overestimated exposure. MAPE was adopted as an additional criterion to evaluate the degree of
agreement between an individual’s static and dynamic exposure levels: |
𝐷 𝑦𝑛𝑎𝑚𝑖𝑐 𝑖 −𝑆𝑡𝑎𝑡𝑖𝑐 𝑖 𝑆𝑡𝑎𝑡𝑖𝑐 𝑖 | ×
100%. Higher MAPE values indicate differences between static and dynamic exposure levels as
a result of overestimated or underestimated exposures. The thresholds for exposure measurement
error and MAPE were set to ± 0.5 μg/m
3
and 10%, respectively, to determine an individual’s
exposure classification group. A comprehensive description of the classification method is shown
in Eq. (15).
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
84
𝐸 𝑖 = {
Overestimated 𝑖𝑓 𝐸𝑟𝑟𝑜𝑟 𝑖 < −0.5 and 𝑀𝐴𝑃𝐸 𝑖 > 10%
Accurate 𝑖𝑓 − 0.5 ≤ 𝐸𝑟𝑟𝑜𝑟 𝑖 ≤ 0.5 and 𝑀𝐴𝑃𝐸 𝑖 ≤ 10%
Underestimated 𝑖𝑓 𝐸𝑟𝑟𝑜𝑟 𝑖 > 0.5 and 𝑀𝐴𝑃𝐸 𝑖 > 10%
(15)
where 𝐸 𝑖 denotes the exposure classification group that individual 𝑖 belongs to; 𝐸𝑟𝑟𝑜𝑟 𝑖 is
the exposure measurement error for individual 𝑖 .
2.4 Random forest classification model
This study utilized the random forest classification model to examine associations of a variety of
mobility and sociodemographic variables with exposure classification results. In contrast to
traditional linear regression, the random forest model can capture nonlinear relationships
between response variables and predictors and provide a flexible and automated process for
predicting target variables (Breiman, 2001). The random forest model generates a number of
decision trees and trains each decision tree independently using a random sample of the data.
This randomness contributes to the model being more robust than a single decision tree and less
prone to overfitting the training data. Furthermore, the random forest model avoids the probable
multicollinearity across sociodemographic variables, which violates the underlying premise of
independence in many regression models.
In this study, 90% of samples were randomly subsampled as training set and the
remaining 10% as testing set to evaluate the model performance. Since the classes were
unbalanced (79% Accurate group, 9% Overestimated group, and 12% Underestimated group), a
combination of the Synthetic Minority Over-sampling Technique (SMOTE) and random under-
sampling methods was utilized to resample the dataset until balanced training classes were
achieved (Chawla et al., 2002; He & Garcia, 2009). The class-wise sensitivity and specificity, as
well as the mean classification accuracy, were calculated to evaluate the random forest model
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
85
performance. Confusion matrices were utilized to calculate the specificity and sensitivity of
candidate models.
The optimal number of randomly sampled features at each node (m) and decision trees
(k) were determined by minimizing the out-of-bag (OOB) error rate through iterative cross-
validation (Lu et al., 2021). The relative importance of each predictor variable was determined
using the mean decrease in accuracy based on OOB error. Partial dependence plots were
produced to depict the correlations between predictor variables and the probability of being
classified into a given class. A partial dependence plot demonstrates the marginal effect of a
predictor variable on the predicted response while controlling for all other variables in the model
(Friedman, 2001).
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
86
3. Results
3.1 Exposure classification error analysis
Three exposure classification groups were identified by examining the difference between
individuals’ static and dynamic exposures. Table 9 gives summary statistics. Figure 17 shows the
distributions of static and dynamic exposures for each group.
Table 9 shows that the Accurate group is largest. For almost 80% of the observations
there is no difference between static and dynamic exposures (the difference is statistically
significant but not meaningful). Figure 17a shows how close the two distributions are.
The Overestimated group is the smallest (9% of the study sample). For individuals in the
Overestimated group, mean static exposure was 0.96 µg/m
3
higher than their dynamic exposure.
This difference is large (about 10%) and significant. This group has the highest static exposure
level of all groups. The Underestimated group accounts for the remaining 12% of observations.
The mean difference between static and dynamic estimates is 1.15 µg/m
3
or about 17%, even
larger than the difference for the Overestimated group. Figures 17b and 17c show the
distributions for these groups.
Table 9 Summary statistics of static and dynamic PM 2.5 exposures (µg/m
3
) across exposure
classification groups (*** p<0.001).
Mean Std.Dev Median Minimum Maximum
Accurate
(N = 80,411)
Dynamic
exposure
8.17 1.38 8.51 2.71 12.27
Static exposure 8.16 1.39 8.48 2.65 11.97
Difference 0.01***
Overestimated
(N = 8,655)
Dynamic
exposure
8.03 1.06 8.16 3.35 11.46
Static exposure 8.99 1.01 9.15 4.20 11.97
Difference -0.96***
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
87
Underestimated
(N =11,718)
Dynamic
exposure
7.84 1.16 8.06 3.06 13.86
Static exposure 6.69 1.33 6.93 2.65 11.43
Difference 1.15***
Figure 17 Distribution of static and dynamic PM 2.5 exposure for exposure classification groups.
3.2 Descriptive analysis of variables
As shown in Figure 18, the three exposure classification groups exhibit distinct mobility patterns.
The Accurate and Underestimated groups have similar daily trip numbers, which are relatively
higher than the Overestimated group. The Overestimated and Underestimated groups share
similar patterns in travel distance and time. Individuals from the Overestimated and
Underestimated groups travel longer distances and time from home, whereas the Accurate group
travels the shortest distances and time out of home among all groups. Generally, the
Overestimated and Underestimated groups had the highest mobility levels in spatial and temporal
dimensions, while the Accurate group had the lowest mobility levels.
As for the residential pollution level variable, the Overestimated group has relatively
higher pollution levels at residence with mean values of 6.36. The Underestimated group has the
lowest pollution level at residence with a mean value of 5.73.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
88
Furthermore, the groups show distinct distribution patterns in sociodemographic
characteristics. The average household income of individuals from the Accurate, Overestimated,
and Underestimated groups are respectively $71,682, $65,049, and $85,109. The white
percentage of the Accurate, Overestimated, and Underestimated groups are respectively 24%,
25%, and 34%. The Hispanic percentage of the Accurate, Overestimated, and Underestimated
groups are respectively 51%, 52%, and 40%. To sum, the Underestimated group presents the
highest household income levels, highest white percentage, and lowest Hispanic percentage
among all groups, suggesting a high probability of exposure being underestimated for whites and
high-income. The two groups that have significant exposure measurement errors (i.e.,
Overestimated and Underestimated groups) present relatively higher worker percentage. In terms
of education levels, the Underestimated group shows the highest percentage of the educated
population (32%), followed by the Overestimated group (24%), while the Accurate group is less
educated (18%).
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
89
Figure 18 Distribution of selected mobility and sociodemographic variables for exposure
classification groups.
3.3 Random forest results
3.3.1 Model performance and variable importance
The descriptive analysis has revealed mobility and sociodemographic differences across
exposure classification groups. Random forest models were further trained using the same set of
mobility and sociodemographic variables to examine their correlation with exposure
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
90
classification errors. As noted earlier, three exposure classification groups were defined and the
random forest classification algorithm was used to develop a predictive classification model
based on individual’s mobility patterns, residential pollution level, and sociodemographic
characteristics. The hyperparameters for the random forest model were set to 1,500 decision trees
with a minimum sample leaf of 50.
The random forest model yielded a mean classification accuracy (adjusted across all
classes) of 71%. Figure 20a presents the confusion matrix for evaluating the random forest
model’s performance. The sensitivity values for the Accurate, Overestimated, and
Underestimated groups are 73%, 71%, and 70%, respectively, implying good agreement between
actual and predicted classifications.
The relative contribution value of predictor variables to the random forest classification
results is shown in Figure 20b, sorted in order of importance. The variable importance rank
shows that daily trip distance, hours stay out of home, household income, and residential
pollution level are among the most important features. By contrast, ethnicity, employment status,
and education play weaker roles in affecting exposure classification errors. These findings
suggest that an individual's exposure classification error is mainly affected by their mobility
levels, income, and pollution levels at residence.
Table 9 Descriptive statistics of the mobility and sociodemographic variables used in analysis of
exposure classification errors.
Variables Description Mean Std.Dev. Minimum Maximum
Travel behavior Trip number
The number of daily trips
made by an individual
3.84 2.05 2 18
Trip distance
The daily trip distance in
miles an individual travels
24.48 21.44 0.19 234.28
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
91
Hours stay
out-of-home
The hours an individual
spends out-of-home per
day
8.07 3.49 0 22.43
%Driving trip
Proportion of daily trips
of an individual made by
driving
84.3% 32.2% 0% 100%
%Public
transit trip
Proportion of daily trips
of an individual made by
public transit
2.5% 14.1% 0% 100%
%Walk/Bike
trip
Proportion of daily trips
of an individual made by
walking or bicycling
10.0% 25.8% 0% 100%
Residential
pollution
Residence
pollution level
The standard index
showing overall air
pollution level at places
of residence (between 1
and 10): higher value
means higher pollution
6.10 1.16 2.32 9.62
Sociodemographic Age Age 33.99 21.43 0 94
Income
Household income
($ 1,000)
72.22 30.12 12.27 230.90
Categoric
variables
Male
Dummy variable: 1 =
male; 0 = female
0 1
non-Hispanic
White
Dummy variable: 1 =
non-Hispanic White; 0 =
otherwise
0 1
Black
Dummy variable: 1 =
Black; 0 = otherwise
0 1
Hispanic
Dummy variable: 1 =
Hispanic/Latino; 0 =
otherwise
0 1
Asian
Dummy variable: 1 =
Asian; 0 = otherwise
0 1
Worker
Dummy variable: 1 =
employed; 0 = otherwise
(unemployed,
homemaker, student, or
retired)
0 1
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
92
College
Dummy variable: 1 =
College or higher degree;
0 = otherwise
0 1
Figure 19 Random Forest model performance: (a) confusion matrix of predicted exposure
classification groups; (b) variable importance rank.
3.3.2 Partial dependence analysis
The partial dependence plots illustrate the marginal effect of a single variable on the predicted
classification outcome. According to variable importance results, the partial dependence of daily
trip distance, hours stay out of home, household income, and residential pollution levels on the
probability of classification results were examined. Figure 21 plots the partial dependence of the
abovementioned variables for all groups.
As shown in Figure 21a, increasing probabilities of an individual belonging to the
Accurate group were associated with shorter daily trip distance, fewer hours spent out of home,
and higher residential pollution levels. Household income displays a nonlinear relationship with
probabilities of the Accurate group. The most significant marginal influence was depicted at
around $80,000. The middle column of Figure 21a shows a two-dimensional partial dependence
plot of daily trip distance and hours stay out of home to explore the effects of combining two
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
93
mobility variables on probabilities of the Accurate group. The color scheme represents different
probability levels. Yellow tones indicate lower probability, and purple tones denote higher
probability. The two-dimensional plot shows that individuals who travel longer distances and
time away from home are least likely to be categorized to the Accurate group.
A similar effect of mobility variables on classification probability can be observed in
Figure 21b and 21c. Both probabilities of the Overestimated group and the Underestimated group
grow with the daily trip distance and hours stay out of home. The two-dimensional plots indicate
that exposure is more likely to be overestimated or underestimated for people with high mobility
levels. Although the effects of mobility variables on the magnitude of exposure classification
error were similar for the Overestimated group and the Underestimated group, different
household income and residential pollution levels for the two groups resulted in completely
opposite directions of exposure classification errors. Increasing probabilities of the
Overestimated group were associated with lower household income and higher residential
pollution levels (Figure 21b). By contrast, lower household income and higher residential
pollution levels were associated with reduced probabilities of the Underestimated group (Figure
21c). The opposite associations of household income and residential pollution level with the
probabilities of the Overestimated and Underestimated groups suggest that as mobility levels
increased, exposures were more likely to be overestimated for low-income residents living in
highly polluted areas, while exposures were more likely to be underestimated for high-income
residents living in areas with cleaner air.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
94
Figure 20 Partial Dependence (PD) plots for the most important variables in the random forest
classification model for (a) the Accurate group, (b) the Overestimated group, and (c) the
Underestimated group.
3.4 Case studies in exposure classification error assessment
This study randomly selects three individuals from each of the three exposure classification
groups and visualizes how their dynamic and static exposures change over time within a typical
weekday. Figure 23a-c present the daily exposure profile of selected individuals and the change
of dynamic and static exposure levels throughout a weekday. Figure 23d displays the spatial
distribution of selected individuals’ daily activity places and probable travel paths.
As shown in Figure 23, individuals from different groups exhibit a wide range of
variability in the differences between dynamic and static exposures. For example, for Person 1
from the Accurate group, there are indiscernible differences between his static and dynamic
exposures (Figure 23a), likely due to his low mobility levels (Figure 23d). Person 2 from the
Overestimated group, on the other hand, has static exposures that are much greater than his
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
95
dynamic exposures for most of the day. Figure 23d shows apparent movement trajectories of
Person 2 from highly polluted areas where he lives to regions with relatively low air pollution
where he conducts daily activities. The daily exposure profile of Person 3 shows the opposite
pattern: Person 3’s dynamic exposure levels are higher than his static exposure levels for most of
the day (Figure 23c). Figure 23d shows that Person 3 resides in cleaner air areas while
conducting daily activities in highly polluted areas. Although Person 2’s exposure is
overestimated and Person 3’s exposure is underestimated, Person 2, who lives in more polluted
areas, is exposed to greater air pollution than Person 3 (8.78 μg/m
3
vs. 7.07 μg/m
3
), who lives in
neighborhoods with cleaner air.
It is worth noting that the difference between the dynamic exposures of Person 2 and
Person 3 (1.17 μg/m
3
) is considerably smaller than the difference between their static exposures
(3.28 μg/m
3
). This finding suggests that the exposure discrepancies between certain groups of
people may be diminished when human mobility is considered in exposure estimates.
Figure 21 Daily exposure profile for three selected individuals from different exposure
classification groups: (a) Person 1 from the Accurate group, (b) Person 2 from the Overestimated
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
96
group, and (c) Person 3 from the Underestimated group; and (d) spatial distribution of selected
individuals’ daily activity places.
4. Discussion
Recent research has shown that overlooking human mobility can lead to incorrect exposure
assessment, resulting in misleading research conclusions and inefficient policy solutions (Kim &
Kwan, 2021a; Lu, 2021; Park & Kwan, 2017). A growing amount of research has highlighted the
importance of human mobility in exposure assessment (Dewulf et al., 2016; X. Ma et al., 2020;
M. M. Nyhan et al., 2019; Park, 2020), but little is known about factors contributing to
classification errors in exposure measurement and how these errors impact relationship between
exposure and health outcome. This study offers important insights into the literature by
investigating the underlying factors contributing to exposure classification errors. This study
indicates that mobility level is the most critical factor in determining exposure classification
errors. The exposure classification error increases with mobility levels and is substantial for
highly mobile individuals, especially those who travel long distances and spend more time away
from home.
The results also reveal a significant correlation between sociodemographic characteristics
and exposure classification errors. Among various sociodemographic variables, household
income has the greatest effect on exposure classification errors, likely due to the key role of
wealth in determining where people live, their occupations, and places people often visit
(Sampson, 2019). According to the results, household income is more inclined to drive the
direction of exposure classification errors. Residential pollution level is another important factor
affecting exposure classification errors. On average, as mobility levels increase, exposure is
likely to be overestimated for low-income residents living in polluted neighborhoods, while
exposure is typically underestimated for high-income residents living in neighborhoods with
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
97
cleaner air. This finding is consistent with prior empirical research that exposures of people who
have relatively low exposure at residence are likely amplified by their mobility, while exposure
of people with high residential air pollution exposure is usually attenuated (Dewulf et al., 2016;
Picornell et al., 2019; Tayarani & Rowangould, 2020; X. Yu et al., 2020). One probable
explanation is that people who live in neighborhoods with cleaner air are more likely to carry out
their daily activities in neighborhoods with poorer air quality. By contrast, people residing in
neighborhoods with high pollution are more likely to conduct their daily activities in
neighborhoods with less air pollution than in their places of residence (Kim & Kwan, 2021a; Lu,
2021).
Even though the relative exposure measurement error is larger for wealthier people, it is
because they live in areas with cleaner air and their residence-based exposures start out much
lower than the socially disadvantaged. The overall exposure and burden of air pollution is much
higher in the more disadvantaged groups as most of them live in more polluted neighborhoods.
People spend most of their time at home, even those with high mobility levels (Lu, 2021; Park,
2020). If the socially disadvantaged stays within their residential neighborhoods or vicinity most
of the day and spend a lot more time in transit to move shorter distances, either or both of these
factors combined can lead to much worse exposures but less exposure measurement error. Given
the significant contribution of residential air pollution to a person’s overall exposure, although
the exposure of people who live in polluted areas may be overestimated, they are still likely to
have relatively higher exposure levels than those living in less polluted areas.
Moreover, mismeasurement of air pollution exposure can result in bias in the correlation
between exposure to air pollution and its health outcomes, further bias estimates of public health
impact. The direction and magnitude of exposure measurement error can lead to incorrect
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
98
estimates of exposure-disease relationship. Consider an example in which we want to estimate
the health risk of respiratory diseases after exposure to PM 2.5. When a person's exposure is
overestimated, the effect of exposure to PM 2.5 on his risk of respiratory disease should be greater
than estimated because this health risk is due to lower levels of PM 2.5 exposure than those used at
the estimation. In other words, if residence-based exposure is used to estimate the impact of
exposure to PM 2.5 on health risks, this impact may be underestimated for those whose exposure
is overestimated. Conversely, for those whose exposure is underestimated, their risks of
respiratory diseases after exposure to PM 2.5 are likely to be overstated. Prior studies have
documented the disproportionate air pollution burdens of socially disadvantaged populations
(e.g., ethnic minorities, low-income) (Bae et al., 2007; Gilbert & Chakraborty, 2011; Houston et
al., 2004). This study suggests that health risks of the disadvantaged after exposure to air
pollution can be further underestimated by exposure mismeasurement by ignoring human
mobility.
Exposure mismeasurement can lead to incorrect exposure-disease relationship and
therefore ineffective public health and environmental interventions. There is no golden standard
addressing the impact of exposure measurement error in policy solutions. Policymakers should
account for human exposure at not just places of residence but also other locations of activity in
exposure measurement to ensure that their policies reflect the interests of all people when
designing applicable public health and environmental policies. Policies to change levels of
environmental exposures will often affect multiple pollutants, because these exposures share a
common source or arise due to similar activities and are often correlated (Dominici et al., 2012).
Therefore, it is important to account for measurement error in estimates of potential health
outcomes due to air pollution exposure.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
99
Several limitations in this study need to be addressed in further research. First, the human
movement data used in this study were simulated from an activity-based travel demand model.
Given this model simulated people’s daily travel trajectories for a typical weekday in 2019, it
was assumed that individuals have constant activity patterns throughout the year. However,
people’s daily mobility patterns are not consistent over time and may vary across weekdays and
weekends or seasons (Susilo & Kitamura, 2005; Xianyu et al., 2017). It is debatable whether
people’s varied travel behaviors on a different day (e.g., weekend, holiday) can generate similar
exposure classification error patterns identified in this study. Thus, more effort should be placed
into studying how different travel behaviors over time can affect exposure classification error by
collecting human movement data covering multiple time periods.
Second, in this study, only ambient PM 2.5 exposure was estimated as data for indoor
PM 2.5 concentrations were not available. Nevertheless, according to previous studies, people
spend most of their time indoors (e.g., home, workplace, and school), especially those who are
less mobile (Lu, 2021; Park, 2020). Staying indoors may provide some protection from sources
of ambient air pollution (e.g., traffic emission), leading to different results in air pollution
exposure assessments. Future exposure research should consider both indoor and outdoor PM
2.5
concentrations to measure individual exposure accurately.
Third, the unique characteristics of demographic composition and land use layout are
recognized for Los Angeles. As a result, the spatiotemporal mobility patterns and ground-level
PM 2.5 concentration distribution depicted in this study only represent the study area's distinct
features. The population mobility pattern, spatiotemporal variabilities of air pollution
concentrations, sociodemographic mix, and land use layout are expected to vary across different
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
100
regions. Further research is needed to examine whether findings from this study can be applied to
other areas.
5. Conclusion
Ignoring human mobility in exposure estimates can lead to erroneous exposure assessments,
leading to ineffective policy implications in practice. Prior research has emphasized the
importance of human mobility in assessing air pollution exposure, but little is known about
factors that might lead to exposure classification errors. To fill the literature gap, this study
estimates residence-based and mobility-based PM 2.5 exposures for 100,784 Los Angeles County
residents and examines the impact of mobility and sociodemographic variables on potential
classification errors in exposure measurement. Detailed travel trajectory data was integrated with
hourly PM 2.5 surfaces at fine spatial resolution. The findings suggest that the magnitude of
exposure classification errors is linked to people's mobility levels when the direction of exposure
classification errors is driven by sociodemographic variables. Individuals with high mobility
levels are likely to have high exposure classification errors. High income and low residential
pollution levels are associated with exposure underestimation, while low income and high
residential pollution levels are associated with exposure overestimation. This study also reveals
that while the low-income group’s exposure is overestimated and high-income group’s exposure
is underestimated, it is still the case that the low-income group is exposed more on average. The
exposure measurement error introduced by residence-based method can further lead to erroneous
conclusions on relationship between exposure and health risks. The findings suggest that
policymakers should take people’s mobility and sociodemographic characteristics into account in
exposure assessment to ensure their policies reflect not only the preferences of the socially
advantaged but also the interests of the disadvantaged.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
101
CHAPTER 5: Conclusions
This dissertation has focused on one of the fundamental urban planning problems:
environmental pollution and environmental justice. It addressed three questions on PM 2.5
exposure: (1) How does ambient PM 2.5 concentration vary in spatial and temporal dimensions in
a major metropolitan area? (2) What is the impact of taking individual travel and activity
location into account in exposure estimation? (3) Do these impacts differ across
sociodemographic groups?
Essay One developed a method for very fine spatiotemporal estimation of PM 2.5
concentration based on citizen-science low-cost PurpleAir sensor network in Los Angeles
County, which has much denser deployment than the PM
2.5
monitors used for ambient air quality
measurement. I then developed a machine learning model integrating PM 2.5 data, high resolution
traffic data, and a suite of spatiotemporal variables to estimate hourly intra-urban PM 2.5
distribution patterns in Los Angeles. Results show that the PM 2.5 model performs well in
predicting PM 2.5 concentrations at fine spatiotemporal scale not only for regularly occurring,
typical air quality conditions but also less frequent and perhaps more extreme conditions like
holidays and wildfire days, which have become an increasingly frequent and public health
relevant issue in southern California. The prevalence of low-cost air sensor has the potential to
empower citizen science, increase monitoring coverage in urban, suburban and rural areas, fill
knowledge gaps about hyperlocal air quality, and better understand the impact of extreme events
and conditions on air quality. More importantly, this spatiotemporal model will provide valuable
exposure estimation data to look at the effects of PM 2.5 on acute health and related environmental
justice issues.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
102
Recent studies have suggested that ignoring individual activity patterns or air pollution
variations can lead to errors and bias in exposure estimations (Guo et al., 2020; Kim & Kwan,
2021a; M. Nyhan et al., 2016; Park, 2020; Park & Kwan, 2017; X. Yu et al., 2020), but few have
examined both elements’ impacts on exposure measurement errors. In Essay Two, I took a step
further, using the Essay One model results and combining them with simulated individual daily
mobility patterns to examine the effects of considering individual daily activity patterns in
estimating exposure. I compare these estimates with the conventional static method of estimates
based on residence location. The difference in these estimates is the exposure measurement
error. I estimated the exposure measurement error by comparing mobility-based and residence-
based exposure levels for a sample of Los Angeles County residents. Results show that the
exposure measurement error is positively associated with an individual’s mobility level.
Exposure measurement errors increase with mobility levels. The residence-based exposure
estimation method remains informative for people with low mobility. The contribution of various
microenvironments to total daily PM 2.5 exposure is also examined. It is found that the magnitude
and direction of exposure measurement errors are strongly associated with time spent out of
home and PM 2.5 concentrations at various activity locations.
In Essay Two I explored the impact of individual mobility on exposure measurement
errors, but little is known about whether such impact differs across sociodemographic groups.
Essay Three further examined how sociodemographic and mobility variables can affect exposure
measurement errors. I subdivided 100,784 Los Angeles County residents into three exposure
classification groups based on their PM 2.5 exposure measurement errors. A random forest
classification model is used to test the correlations between the probability of exposure
classification group and a variety of sociodemographic and mobility variables. Model results
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
103
show that the magnitude of exposure measurement errors is linked to individuals’ mobility
levels, while the direction of exposure measurement errors is driven by individuals’
sociodemographic characteristics, especially their income levels. Consistent with the conclusions
of the Essay Two, individuals with high mobility levels are likely to have increased exposure
measurement errors. Disparate exposure measurement errors are identified across
sociodemographic groups: the high-income living in areas with low pollution is more likely to
have exposure underestimated. In contrast, the low-income living in areas with high pollution is
more likely to have exposure overestimated. Results also suggest that the relatively large
exposure disparities between the socially disadvantaged and the privileged documented in prior
studies can be lessened by human mobility. These findings suggest that policymakers should take
individuals’ mobility and sociodemographic characteristics into account in exposure assessment
to ensure their policies reflect not only the preferences of the privileged but also the interests of
the disadvantaged.
The three essays have provided a set of important policy implications based on empirical
evidence. First, although issues related to model performance and prediction errors are
continuously evolving and need to be better understood, the rising low-cost air quality sensors
have the potential to be good supplements to conventional regulatory air pollution monitor
networks due to their relatively low purchase and maintenance cost, dense deployment, and
increasing measurement accuracy. By providing real-time air pollution monitoring at finer
geographic scales, the use of low-cost air quality sensors is beneficial to study the impact of air
pollution on acute health effects, increase citizens’ awareness and engagement towards air
quality issues, and strengthen emergency response management (e.g., monitoring real-time
smoke plume spread during wildfires) and source compliance monitoring.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
104
Second, this dissertation suggests that individual mobility and air pollution variation are
the two principal determinants of an individual’s exposure level. Personal exposure to air
pollution occurs through dynamic spatiotemporal interactions between individuals and air
pollution at their activity places. Therefore, exposure estimation requires an individual’s time-
activity pattern and characterization of air pollutant concentrations when and where he/she
spends time. Ignoring one of them or both can lead to erroneous exposure estimations and
ineffective policy implications.
Third, exposure measurement errors are highly correlated with an individual’s mobility
level and sociodemographic characteristics. On the one hand, the exposure measurement error
increases with mobility. On the other hand, the individual’s income level and place of residence
affect the direction of exposure measurement error. I found that the exposure disparities between
whites vs. ethnic minorities and high-income vs. low-income documented in previous studies
may be diminished when human mobility is considered in exposure estimates. However, the
narrowing of air pollution exposure disparities between the socially disadvantaged and the
privileged does not change the fact that socially disadvantaged people who live in more polluted
neighborhoods are still exposed to higher air pollution than other groups.
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
105
Bibliography
American Lung Association. (2020). Most Polluted Cities. https://www.stateoftheair.org/city-
rankings/most-polluted-cities.html
Apte, J. S., Messier, K. P., Gani, S., Brauer, M., Kirchstetter, T. W., Lunden, M. M., Marshall, J.
D., Portier, C. J., Vermeulen, R. C. H., & Hamburg, S. P. (2017). High-Resolution Air Pollution
Mapping with Google Street View Cars: Exploiting Big Data. Environmental Science and
Technology, 51(12), 6999–7008. https://doi.org/10.1021/acs.est.7b00891
Ault, A. P., Moore, M. J., Furutani, H., & Prather, K. A. (2009). Impact of emissions from the
Los Angeles Port region on San Diego air quality during regional transport events.
Environmental Science and Technology, 43(10), 3500–3506. https://doi.org/10.1021/es8018918
Bae, C. H. C., Sandlin, G., Bassok, A., & Kim, S. (2007). The exposure of disadvantaged
populations in freeway air-pollution sheds: A case study of the Seattle and Portland regions.
Environment and Planning B: Planning and Design, 34(1), 154–170.
https://doi.org/10.1068/b32124
Batista, G. E., Prati, R. C., & Monard, M. C. (2004). A Study of the Behavior of Several
Methods for Balancing Machine Learning Training Data. ACM SIGKDD Explorations
Newsletter, 6(1), 20–29. https://doi.org/doi.org/10.1145/1007730.1007735
Beckx, C., Int, L., Arentze, T., Janssens, D., Torfs, R., Broekx, S., & Wets, G. (2009). A
dynamic activity-based population modelling approach to evaluate exposure to air pollution :
Methods and application to a Dutch urban area. Environmental Impact Assessment Review,
29(3), 179–185. https://doi.org/10.1016/j.eiar.2008.10.001
Bell, M. L., & Ebisu, K. (2012). Environmental inequality in exposures to airborne particulate
matter components in the United States. Environmental Health Perspectives, 120(12), 1699–
1704. https://doi.org/10.1289/ehp.1205201
Bhat, C. R., Goulias, K. G., Pendyala, R. M., Paleti, R., Sidharthan, R., Schmitt, L., & Hu, H. H.
(2013). A household-level activity pattern generation model with an application for Southern
California. Transportation, 40(5), 1063–1086. https://doi.org/10.1007/s11116-013-9452-y
Bi, J., Stowell, J., Seto, E. Y. W., English, P. B., Al-Hamdan, M. Z., Kinney, P. L., Freedman, F.
R., & Liu, Y. (2020). Contribution of low-cost sensor measurements to the prediction of PM2.5
levels: A case study in Imperial County, California, USA. Environmental Research, 180(June
2019), 108810. https://doi.org/10.1016/j.envres.2019.108810
Bi, J., Wildani, A., Chang, H. H., & Liu, Y. (2020). Incorporating Low-Cost Sensor
Measurements into High-Resolution PM2.5 Modeling at a Large Spatial Scale. Environmental
Science and Technology, 54(4), 2152–2162. https://doi.org/10.1021/acs.est.9b06046
Boeing, G. (2017). OSMnx: New methods for acquiring, constructing, analyzing, and visualizing
complex street networks. Computers, Environment and Urban Systems, 65, 126–139.
https://doi.org/10.1016/j.compenvurbsys.2017.05.004
Bose, S., Hansel, N. N., Tonorezos, E. S., Williams, D. L., Bilderback, A., Breysse, P. N., Diette,
G. B., & Mccormack, M. C. (2015). Indoor Particulate Matter Associated with Systemic
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
106
Inflammation in COPD. May, 566–572.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Brokamp, C., Jandarov, R., Hossain, M., & Ryan, P. (2018). Predicting Daily Urban Fine
Particulate Matter Concentrations Using a Random Forest Model. Environmental Science and
Technology, 52(7), 4173–4179. https://doi.org/10.1021/acs.est.7b05381
Carvlin, G. N., Lugo, H., Olmedo, L., Bejarano, E., Wilkie, A., Meltzer, D., Wong, M., King, G.,
Northcross, A., Jerrett, M., English, P. B., Hammond, D., & Seto, E. (2017). Development and
field validation of a community-engaged particulate matter air quality monitoring network in
Imperial, California, USA. Journal of the Air and Waste Management Association, 67(12),
1342–1352. https://doi.org/10.1080/10962247.2017.1369471
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic
minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.
https://www.jair.org/index.php/jair/article/view/10302
Chen, G., Li, S., Knibbs, L. D., Hamm, N. A. S., Cao, W., Li, T., Guo, J., Ren, H., Abramson,
M. J., & Guo, Y. (2018). A machine learning method to estimate PM2.5 concentrations across
China with remote sensing, meteorological and land use information. Science of the Total
Environment, 636, 52–60. https://doi.org/10.1016/j.scitotenv.2018.04.251
Chen, L. J., Ho, Y. H., Hsieh, H. H., Huang, S. T., Lee, H. C., & Mahajan, S. (2018). ADF: An
Anomaly Detection Framework for Large-Scale PM2.5 Sensing Systems. IEEE Internet of
Things Journal, 5(2), 559–570. https://doi.org/10.1109/JIOT.2017.2766085
Chi, G. C., Hajat, A., Bird, C. E., Cullen, M. R., Griffin, B. A., Miller, K. A., Shih, R. A.,
Stefanick, M. L., Vedal, S., Whitsel, E. A., & Kaufman, J. D. (2016). Individual and
Neighborhood socioeconomic status and the association between air pollution and cardiovascular
disease. Environmental Health Perspectives, 124(12), 1840–1847.
https://doi.org/10.1289/EHP852
Clements, A. L., Griswold, W. G., Rs, A., Johnston, J. E., Herting, M. M., Thorson, J., Collier-
oxandale, A., & Hannigan, M. (2017). Low-Cost Air Quality Monitoring Tools : From Research
to Practice ( A Workshop Summary ). Sensors, 17(11), 1–20. https://doi.org/10.3390/s17112478
Cohen, A. J., Brauer, M., Burnett, R., Anderson, H. R., Frostad, J., Estep, K., Balakrishnan, K.,
Brunekreef, B., Dandona, L., Dandona, R., Feigin, V., Freedman, G., Hubbell, B., Jobling, A.,
Kan, H., Knibbs, L., Liu, Y., Martin, R., Morawska, L., … Forouzanfar, M. H. (2017). Estimates
and 25-year trends of the global burden of disease attributable to ambient air pollution: an
analysis of data from the Global Burden of Diseases Study 2015. The Lancet, 389(10082), 1907–
1918. https://doi.org/10.1016/S0140-6736(17)30505-6
Cole-Hunter, T., de Nazelle, A., Donaire-Gonzalez, D., Kubesch, N., Carrasco-Turigas, G., Matt,
F., Foraster, M., Martínez, T., Ambros, A., Cirach, M., Martinez, D., Belmonte, J., &
Nieuwenhuijsen, M. (2018). Estimated effects of air pollution and space-time-activity on
cardiopulmonary outcomes in healthy adults: A repeated measures study. Environment
International, 111(November 2017), 247–259. https://doi.org/10.1016/j.envint.2017.11.024
Commodore, A., Wilson, S., Muhammad, O., Svendsen, E., & Pearce, J. (2017). Community-
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
107
based participatory research for the study of air pollution : a review of motivations ,
approaches , and outcomes. https://doi.org/10.1007/s10661-017-6063-7
Crosbie, T. (2006). Using activity diaries: Some methodological lessons. Journal of Research
Practice, 2(1), 5.
De Nazelle, A., Fruin, S., Westerdahl, D., Martinez, D., Ripoll, A., Kubesch, N., &
Nieuwenhuijsen, M. (2012). A travel mode comparison of commuters’ exposures to air
pollutants in Barcelona. Atmospheric Environment, 59, 151–159.
https://doi.org/10.1016/j.atmosenv.2012.05.013
De Nazelle, A., Seto, E., Donaire-Gonzalez, D., Mendez, M., Matamala, J., Nieuwenhuijsen, M.
J., & Jerrett, M. (2013). Improving estimates of air pollution exposure through ubiquitous
sensing technologies. Environmental Pollution, 176, 92–99.
https://doi.org/10.1016/j.envpol.2012.12.032
Dewulf, B., Neutens, T., Lefebvre, W., Seynaeve, G., Vanpoucke, C., Beckx, C., & Van de
Weghe, N. (2016). Dynamic assessment of exposure to air pollution using mobile phone data.
International Journal of Health Geographics, 15(1), 1–14. https://doi.org/10.1186/s12942-016-
0042-z
Dhondt, S., Beckx, C., Degraeuwe, B., Lefebvre, W., Kochan, B., Bellemans, T., Int Panis, L.,
Macharis, C., & Putman, K. (2012). Health impact assessment of air pollution using a dynamic
exposure profile: Implications for exposure and health impact estimates. Environmental Impact
Assessment Review, 36, 42–51. https://doi.org/10.1016/j.eiar.2012.03.004
Di, Q., Kloog, I., Koutrakis, P., Lyapustin, A., Wang, Y., & Schwartz, J. (2016). Assessing
PM2.5 Exposures with High Spatiotemporal Resolution across the Continental United States.
Environmental Science and Technology, 50(9), 4712–4721.
https://doi.org/10.1021/acs.est.5b06121
Di, Q., Wang, Y., Zanobetti, A., Wang, Y., Koutrakis, P., Choirat, C., Dominici, F., & Schwartz,
J. D. (2017). Air Pollution and Mortality in the Medicare Population. New England Journal of
Medicine, 376(26), 2513–2522. https://doi.org/10.1056/nejmoa1702747
Dias, D., & Tchepel, O. (2018). Spatial and temporal dynamics in air pollution exposure
assessment. International Journal of Environmental Research and Public Health, 15(3).
https://doi.org/10.3390/ijerph15030558
Dinoi, A., Donateo, A., Conte, M., Conte, M., & Belosi, F. (2017). Comparison of atmospheric
particle concentration measurements using different optical detectors: Potentiality and limits for
air quality applications. Measurement: Journal of the International Measurement Confederation,
106, 274–282. https://doi.org/10.1016/j.measurement.2016.02.019
Do, K., Yu, H., Velasquez, J., Grell-Brisk, M., Smith, H., & Ivey, C. E. (2021). A data-driven
approach for characterizing community scale air pollution exposure disparities in inland
Southern California. Journal of Aerosol Science, 152(August 2020), 105704.
https://doi.org/10.1016/j.jaerosci.2020.105704
Dominici, F., Peng, R. D., Barr, C. D., & Bell, M. L. (2012). Protecting human health from air
pollution: shifting from a single-pollutant to a multi-pollutant approach. Epidemiology
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
108
(Cambridge, Mass.), 21(2), 187–194. https://doi.org/10.1097/EDE.0b013e3181cc86e8.Protecting
Dons, E., Van Poppel, M., Int Panis, L., De Prins, S., Berghmans, P., Koppen, G., &
Matheeussen, C. (2014). Land use regression models as a tool for short, medium and long term
exposure to traffic related air pollution. Science of the Total Environment, 476–477, 378–386.
https://doi.org/10.1016/j.scitotenv.2014.01.025
Dons, E., Van Poppel, M., Kochan, B., Wets, G., & Int Panis, L. (2013). Modeling temporal and
spatial variability of traffic-related air pollution: Hourly land use regression models for black
carbon. Atmospheric Environment, 74, 237–246. https://doi.org/10.1016/j.atmosenv.2013.03.050
Elliott, J. R., & Smiley, K. T. (2019). Place, Space, and Racially Unequal Exposures to Pollution
at Home and Work. Social Currents, 6(1), 32–50. https://doi.org/10.1177/2329496517704873
Estabrooks, A., Jo, T., & Japkowicz, N. (2004). A Multiple Resampling Method for Learning
from Imbalanced Data Sets. Computational Intelligence, 20(1), 18–36.
Fishbain, B., Lerner, U., Castell, N., Cole-Hunter, T., Popoola, O., Broday, D. M., Martinez
Iñiguez, T., Nieuwenhuijsen, M., Jovasevic-Stojanovic, M., Topalovic, D., Jones, R. L., Galea,
K. S., Golumbic, Y. N., Golumbic, Y. N., Baram-Tsabari, A., Yacobi, T., Drahler, D., Robinson,
J. A., Kocman, D., … Bartonova, A. (2017). An evaluation tool kit of air quality micro-sensing
units. Science of the Total Environment, 575(September 2016), 639–648.
https://doi.org/10.1016/j.scitotenv.2016.09.061
Freeman, N. C. G., & De Tejada, S. S. (2002). Methods for collecting time/activity pattern
information related to exposure to combustion products. Chemosphere, 49(9), 979–992.
https://doi.org/10.1016/S0045-6535(02)00271-0
Freeman, N. C. G., Lioy, P. J., Pellizzari, E., Zelon, H., Thomas, K., Clayton, A., &
Quackenboss, J. (1999). Responses to the Region 5 NHEXAS time/activity diary. Journal of
Exposure Analysis and Environmental Epidemiology, 9(5), 414–426.
https://doi.org/10.1038/sj.jea.7500052
Friedman, J. (2001). Greedy Function Approximation : A Gradient Boosting Machine. The
Annals of Statistics, 29(5), 1189–1232. https://www.jstor.org/stable/2699986
Gao, M., Cao, J., & Seto, E. (2015). A distributed network of low-cost continuous reading
sensors to measure spatiotemporal variations of PM2.5 in Xi’an, China. Environmental
Pollution, 199, 56–65. https://doi.org/10.1016/j.envpol.2015.01.013
Gilbert, A., & Chakraborty, J. (2011). Using geographically weighted regression for
environmental justice analysis: Cumulative cancer risks from air toxics in Florida. Social Science
Research, 40(1), 273–286. https://doi.org/10.1016/j.ssresearch.2010.08.006
Gorin, C. A., Collett, J. L., & Herckes, P. (2006). Wood smoke contribution to winter aerosol in
fresno, CA. Journal of the Air and Waste Management Association, 56(11), 1584–1590.
https://doi.org/10.1080/10473289.2006.10464558
Guo, H., Zhan, Q., Chak, H., Yao, F., Zhou, X., Wu, J., & Li, W. (2020). Science of the Total
Environment Coupling mobile phone data with machine learning : How misclassi fi cation errors
in ambient PM2 . 5 exposure estimates are produced ? Science of the Total Environment, 745,
141034. https://doi.org/10.1016/j.scitotenv.2020.141034
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
109
Gupta, P., Doraiswamy, P., Levy, R., Pikelnaya, O., Maibach, J., Feenstra, B., Polidori, A.,
Kiros, F., & Mills, K. C. (2018). Impact of California Fires on Local and Regional Air Quality:
The Role of a Low‐Cost Sensor Network and Satellite Observations. GeoHealth, 2(6), 172–181.
https://doi.org/10.1029/2018gh000136
Gurram, S., Stuart, A. L., & Pinjari, A. R. (2019). Agent-based modeling to estimate exposures
to urban air pollution from transportation: Exposure disparities and impacts of high-resolution
data. Computers, Environment and Urban Systems, 75, 22–34.
https://doi.org/10.1016/j.compenvurbsys.2019.01.002
Habermann, M., Billger, M., & Haeger-Eugensson, M. (2015). Land use regression as method to
model air pollution. Previous results for Gothenburg/Sweden. Procedia Engineering, 115(0), 21–
28. https://doi.org/10.1016/j.proeng.2015.07.350
Habre, R., Girguis, M., Urman, R., Fruin, S., Lurmann, F., Shafer, M., Gorski, P., Franklin, M.,
McConnell, R., Avol, E., & Gilliland, F. (2021). Contribution of tailpipe and non-tailpipe traffic
sources to quasi-ultrafine, fine and coarse particulate matter in southern California. Journal of
the Air and Waste Management Association, 71(2), 209–230.
https://doi.org/10.1080/10962247.2020.1826366
Hajat, A., Hsia, C., & O’Neill, M. S. (2015). Socioeconomic Disparities and Air Pollution
Exposure: a Global Review. Current Environmental Health Reports, 2(4), 440–450.
https://doi.org/10.1007/s40572-015-0069-5
Hall, E. S., Kaushik, S. M., Vanderpool, R. W., Duvall, R. M., Beaver, M. R., Long, R. W., &
Solomon, P. A. (2014). Integrating Sensor Monitoring Technology into the Current Air Pollution
Regulatory Support Paradigm: Practical Considerations. American Journal of Environmental
Engineering, 2014(6), 147–154. https://doi.org/10.5923/j.ajee.20140406.02
Hasheminassab, S., Daher, N., Ostro, B. D., & Sioutas, C. (2014). Long-term source
apportionment of ambient fine particulate matter (PM 2.5) in the Los Angeles Basin: A focus on
emissions reduction from vehicular sources. Environmental Pollution, 193(x), 54–64.
https://doi.org/10.1016/j.envpol.2014.06.012
Hatzopoulou, M., & Miller, E. J. (2010). Linking an activity-based travel demand model with
traffic emission and dispersion models : Transport ’ s contribution to air pollution in Toronto.
Transportation Research Part D, 15(6), 315–325. https://doi.org/10.1016/j.trd.2010.03.007
He, H., Bai, Y., Garcia, E. A., & Li, S. (2008). ADASYN: Adaptive synthetic sampling approach
for imbalanced learning. Proceedings of the International Joint Conference on Neural Networks,
3, 1322–1328. https://doi.org/10.1109/IJCNN.2008.4633969
He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on
Knowledge and Data Engineering, 21(9), 1263–1284.
Houston, D., Wu, J., Ong, P., & Winer, A. (2004). Structural disparities of urban traffic in
Southern California: Implications for vehicle-related air pollution exposure in minority and high-
poverty neighborhoods. Journal of Urban Affairs, 26(5), 565–592.
https://doi.org/10.1111/j.0735-2166.2004.00215.x
Hu, L., Li, Z., & Ye, X. (2020). Delineating and modeling activity space using geotagged social
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
110
media data. Cartography and Geographic Information Science, 47(3), 277–288.
https://doi.org/10.1080/15230406.2019.1705187
Hu, X., Belle, J. H., Meng, X., Wildani, A., Waller, L. A., Strickland, M. J., & Liu, Y. (2017).
Estimating PM2.5 Concentrations in the Conterminous United States Using the Random Forest
Approach. Environmental Science and Technology, 51(12), 6936–6944.
https://doi.org/10.1021/acs.est.7b01210
Huang, K., Bi, J., Meng, X., Geng, G., Lyapustin, A., Lane, K. J., Gu, D., Kinney, P. L., & Liu,
Y. (2019). Estimating daily PM2.5 concentrations in New York City at the neighborhood-scale:
Implications for integrating non-regulatory measurements. Science of the Total Environment,
697, 134094. https://doi.org/10.1016/j.scitotenv.2019.134094
Huang, K., Xiao, Q., Meng, X., Geng, G., Wang, Y., Lyapustin, A., Gu, D., & Liu, Y. (2018).
Predicting monthly high-resolution PM2.5 concentrations with random forest model in the North
China Plain. Environmental Pollution, 242, 675–683.
https://doi.org/10.1016/j.envpol.2018.07.016
Ito, K., Johnson, S., Kheirbek, I., Clougherty, J., Pezeshki, G., Ross, Z., Eisl, H., & Matte, T. D.
(2016). Intraurban Variation of Fine Particle Elemental Concentrations in New York City.
Environmental Science and Technology, 50(14), 7517–7526.
https://doi.org/10.1021/acs.est.6b00599
Järv, O., Müürisepp, K., Ahas, R., Derudder, B., & Witlox, F. (2015). Ethnic differences in
activity spaces as a characteristic of segregation: A study based on mobile phone usage in
Tallinn, Estonia. Urban Studies, 52(14), 2680–2698. https://doi.org/10.1177/0042098014550459
Jerrett, M., Burnett, R. T., Beckerman, B. S., Turner, M. C., Krewski, D., Thurston, G., Martin,
R. V., Van Donkelaar, A., Hughes, E., Shi, Y., Gapstur, S. M., Thun, M. J., & Pope, C. A.
(2013). Spatial analysis of air pollution and mortality in California. American Journal of
Respiratory and Critical Care Medicine, 188(5), 593–599. https://doi.org/10.1164/rccm.201303-
0609OC
Jerrett, M., Donaire-Gonzalez, D., Popoola, O., Jones, R., Cohen, R. C., Almanza, E., de Nazelle,
A., Mead, I., Carrasco-Turigas, G., Cole-Hunter, T., Triguero-Mas, M., Seto, E., &
Nieuwenhuijsen, M. (2017). Validating novel air pollution sensors to improve exposure
estimates for epidemiological analyses and citizen science. Environmental Research, 158(April),
286–294. https://doi.org/10.1016/j.envres.2017.04.023
Jerrett, M., McConnell, R., Wolch, J., Chang, R., Lam, C., Dunton, G., Gilliland, F., Lurmann,
F., Islam, T., & Berhane, K. (2014). Traffic-related air pollution and obesity formation in
children: A longitudinal, multilevel analysis. Environmental Health: A Global Access Science
Source, 13(1), 1–9. https://doi.org/10.1186/1476-069X-13-49
Jiao, W., Hagler, G., Williams, R., Sharpe, R., Brown, R., Garver, D., Judge, R., Caudill, M.,
Rickard, J., Davis, M., Weinstock, L., Zimmer-dauphinee, S., & Buckley, K. (2016). Community
Air Sensor Network ( CAIRSENSE ) project : evaluation of low-cost sensor performance in a
suburban environment in the southeastern United States. 5281–5292.
https://doi.org/10.5194/amt-9-5281-2016
Kelly, K. E., Whitaker, J., Petty, A., Widmer, C., Dybwad, A., Sleeth, D., Martin, R., &
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
111
Butterfield, A. (2017). Ambient and laboratory evaluation of a low-cost particulate matter sensor.
Environmental Pollution, 221, 491–500. https://doi.org/10.1016/j.envpol.2016.12.039
Kim, J., & Kwan, M. P. (2020). How Neighborhood Effect Averaging Might Affect Assessment
of Individual Exposures to Air Pollution: A Study of Ozone Exposures in Los Angeles. Annals of
the American Association of Geographers, 4452.
https://doi.org/10.1080/24694452.2020.1756208
Kim, J., & Kwan, M. P. (2021a). Assessment of sociodemographic disparities in environmental
exposure might be erroneous due to neighborhood effect averaging: Implications for
environmental inequality research. Environmental Research, 195(September 2020), 110519.
https://doi.org/10.1016/j.envres.2020.110519
Kim, J., & Kwan, M. P. (2021b). How Neighborhood Effect Averaging Might Affect
Assessment of Individual Exposures to Air Pollution: A Study of Ozone Exposures in Los
Angeles. Annals of the American Association of Geographers, 111(1), 121–140.
https://doi.org/10.1080/24694452.2020.1756208
Kloog, I., Ridgway, B., Koutrakis, P., Coull, B. A., & Schwartz, J. D. (2013). Long-and short-
term exposure to PM2. 5 and mortality: using novel exposure models. Epidemiology, 24(4), 555–
561. https://doi.org/10.1097/EDE.0b013e318294beaa
Korhonen, A., Lehtomäki, H., Rumrich, I., Karvosenoja, N., Paunu, V. V., Kupiainen, K.,
Sofiev, M., Palamarchuk, Y., Kukkonen, J., Kangas, L., Karppinen, A., & Hänninen, O. (2019).
Influence of spatial resolution on population PM2.5 exposure and health impacts. Air Quality,
Atmosphere and Health, 12(6), 705–718. https://doi.org/10.1007/s11869-019-00690-z
Kousa, A., Monn, C., Rotko, T., Alm, S., Oglesby, L., & Jantunen, M. J. (2001). Personal
exposures to NO2 in the EXPOLIS-study: Relation to residential indoor, outdoor and workplace
concentrations in Basel, Helsinki and Prague. Atmospheric Environment, 35(20), 3405–3412.
https://doi.org/10.1016/S1352-2310(01)00131-5
Kumar, P., Morawska, L., Martani, C., Biskos, G., Neophytou, M., Di Sabatino, S., Bell, M.,
Norford, L., & Britter, R. (2015). The rise of low-cost sensing for managing air pollution in
cities. Environment International, 75, 199–205. https://doi.org/10.1016/j.envint.2014.11.019
Kwan, M. P. (2013). Beyond Space (As We Knew It): Toward Temporally Integrated
Geographies of Segregation, Health, and Accessibility: Space-Time Integration in Geography
and GIScience. Annals of the Association of American Geographers, 103(5), 1078–1086.
https://doi.org/10.1080/00045608.2013.792177
Kwan, M. P. (2018). The neighborhood effect averaging problem (NEAP): An elusive
confounder of the neighborhood effect. International Journal of Environmental Research and
Public Health, 15(9). https://doi.org/10.3390/ijerph15091841
Laurikkala, J. (2001). Improving Identification of Difficult Small Classes by Balancing Class
Distribution. Conference on Artificial Intelligence in Medicine in Europe, 63–66.
Lepeule, J., Laden, F., Dockery, D., & Schwartz, J. (2012). Chronic exposure to fine particles
and mortality: An extended follow-up of the Harvard six cities study from 1974 to 2009.
Environmental Health Perspectives, 120(7), 965–970. https://doi.org/10.1289/ehp.1104660
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
112
Leys, C., Ley, C., Klein, O., Bernard, P., & Licata, L. (2013). Detecting outliers: Do not use
standard deviation around the mean, use absolute deviation around the median. Journal of
Experimental Social Psychology, 49(4), 764–766. https://doi.org/10.1016/j.jesp.2013.03.013
Li, J., & Biswas, P. (2017). Optical Characterization Studies of a Low-Cost Particle Sensor.
Aerosol and Air Quality Research, 17(7), 1691–1704. https://doi.org/10.4209/aaqr.2017.02.0085
Li, J., Zhang, H., Chao, C. Y., Chien, C. H., Wu, C. Y., Luo, C. H., Chen, L. J., & Biswas, P.
(2020). Integrating low-cost air quality sensor networks with fixed and satellite monitoring
systems to study ground-level PM2.5. Atmospheric Environment, 223(January), 117293.
https://doi.org/10.1016/j.atmosenv.2020.117293
Li, M., Gao, S., Lu, F., Tong, H., & Zhang, H. (2019). Dynamic Estimation of Individual
Exposure Levels to Air Pollution Using Trajectories Reconstructed from Mobile Phone Data.
International Journal of Environmental Research and Public Health, 16(22), 4522.
https://doi.org/doi.org/10.3390/ijerph16224522
Li, Y., Henze, D. K., Jack, D., & Kinney, P. L. (2016). The influence of air quality model
resolution on health impact assessment for fine particulate matter and its components. 51–68.
https://doi.org/10.1007/s11869-015-0321-z
Liu, H., Schneider, P., Haugen, R., & Vogt, M. (2019). Performance Assessment of a Low-Cost
PM 2 . 5 Sensor for a near Four-Month Period in Oslo , Norway.
https://doi.org/10.3390/atmos10020041
Lu, Y. (2021). Beyond air pollution at home: Assessment of personal exposure to PM2.5 using
activity-based travel demand model and low-cost air sensor network data. Environmental
Research, 201, 111549. https://doi.org/10.1016/j.envres.2021.111549
Lu, Y., Giuliano, G., & Habre, R. (2021). Estimating hourly PM2.5 concentrations at the
neighborhood scale using a low-cost air sensor network: A Los Angeles case study.
Environmental Research, 195, 110653. https://doi.org/10.1016/j.envres.2020.110653
Lurmann, F., Avol, E., & Gilliland, F. (2015). Emissions reduction policies and recent trends in
Southern California’s ambient air quality. Journal of the Air and Waste Management
Association, 65(3), 324–335. https://doi.org/10.1080/10962247.2014.991856
Ma, J., Tao, Y., Kwan, M. P., & Chai, Y. (2020). Assessing Mobility-Based Real-Time Air
Pollution Exposure in Space and Time Using Smart Sensors and GPS Trajectories in Beijing.
Annals of the American Association of Geographers, 110(2), 434–448.
https://doi.org/10.1080/24694452.2019.1653752
Ma, X., Li, X., Kwan, M. P., & Chai, Y. (2020). Who could not avoid exposure to high levels of
residence‐based pollution by daily mobility? Evidence of air pollution exposure from the
perspective of the neighborhood effect averaging problem (NEAP). International Journal of
Environmental Research and Public Health, 17(4), 1–19. https://doi.org/10.3390/ijerph17041223
Madrigano, J., Kloog, I., Goldberg, R., Coull, B. A., Mittleman, M. A., & Schwartz, J. (2013).
Long-term exposure to PM2.5 and incidence of acute myocardial infarction. Environmental
Health Perspectives, 121(2), 192–196. https://doi.org/10.1289/ehp.1205284
Masiol, M., Zíková, N., Chalupa, D. C., Rich, D. Q., Ferro, A. R., & Hopke, P. K. (2018).
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
113
Hourly land-use regression models based on low-cost PM monitor data. Environmental
Research, 167(April), 7–14. https://doi.org/10.1016/j.envres.2018.06.052
Miller, J. (1991). Short Report: Reaction Time Analysis with Outlier Exclusion: Bias Varies with
Sample Size. The Quarterly Journal of Experimental Psychology Section A, 43(4), 907–912.
https://doi.org/10.1080/14640749108400962
Mohai, P., & Saha, R. (2015). Which came first, people or pollution? Assessing the disparate
siting and post-siting demographic change hypotheses of environmental injustice. Environmental
Research Letters, 10(11). https://doi.org/10.1088/1748-9326/10/11/115008
Morawska, L., Thai, P. K., Liu, X., Asumadu-Sakyi, A., Ayoko, G., Bartonova, A., Bedini, A.,
Chai, F., Christensen, B., Dunbabin, M., Gao, J., Hagler, G. S. W., Jayaratne, R., Kumar, P., Lau,
A. K. H., Louie, P. K. K., Mazaheri, M., Ning, Z., Motta, N., … Williams, R. (2018).
Applications of low-cost sensing technologies for air quality monitoring and exposure
assessment: How far have they gone? Environment International, 116(April), 286–299.
https://doi.org/10.1016/j.envint.2018.04.018
Morency, C., Paez, A., Roorda, M. J., Mercado, R., & Farber, S. (2011). Distance traveled in
three Canadian cities: Spatial analysis from the perspective of vulnerable population segments.
Journal of Transport Geography, 19(1), 39–50. https://doi.org/10.1016/j.jtrangeo.2009.09.013
Mousavi, A., & Wu, J. (2021). Indoor-Generated PM2.5during COVID-19 Shutdowns across
California: Application of the PurpleAir Indoor-Outdoor Low-Cost Sensor Network.
Environmental Science and Technology, 55(9), 5648–5656.
https://doi.org/10.1021/acs.est.0c06937
Neira, M., Prüss-Ustün, A., & Mudu, P. (2018). Reduce air pollution to beat NCDs: from
recognition to action. The Lancet, 392(10154), 1178–1179. https://doi.org/10.1016/S0140-
6736(18)32391-2
Neophytou, A. M., Costello, S., Brown, D. M., Picciotto, S., Noth, E. M., Hammond, S. K.,
Cullen, M. R., & Eisen, E. A. (2014). Marginal structural models in occupational epidemiology:
Application in a study of ischemic heart disease incidence and PM2.5 in the US aluminum
industry. American Journal of Epidemiology, 180(6), 608–615.
https://doi.org/10.1093/aje/kwu175
Nieuwenhuijsen, M. J., Donaire-Gonzalez, D., Rivas, I., De Castro, M., Cirach, M., Hoek, G.,
Seto, E., Jerrett, M., & Sunyer, J. (2015). Variability in and agreement between modeled and
personal continuously measured black carbon levels using novel smartphone and sensor
technologies. Environmental Science and Technology, 49(5), 2977–2982.
https://doi.org/10.1021/es505362x
Novotny, E. V., Bechle, M. J., Millet, D. B., & Marshall, J. D. (2011). Erratum: National
satellite-based land-use regression: NO 2 in the United States (Environmental Science &
Technology (2011) 45 (4407-4414) DOI:10.1021/es103578x). Environmental Science and
Technology, 45(19), 8596. https://doi.org/10.1021/es202856z
Nyhan, M., Grauwin, S., Britter, R., Misstear, B., McNabola, A., Laden, F., Barrett, S. R. H., &
Ratti, C. (2016). “exposure track” - The impact of mobile-device-based mobility patterns on
quantifying population exposure to air pollution. Environmental Science and Technology,
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
114
50(17), 9671–9681. https://doi.org/10.1021/acs.est.6b02385
Nyhan, M. M., Britter, I. K. R., & Koutrakis, C. R. P. (2019). Quantifying population exposure
to air pollution using individual mobility patterns inferred from mobile phone data. Journal of
Exposure Science & Environmental Epidemiology, 238–247. https://doi.org/10.1038/s41370-
018-0038-9
Páez, A., Mercado, R. G., Farber, S., Morency, C., & Roorda, M. (2010). Relative accessibility
deprivation indicators for urban settings: Definitions and application to food deserts in montreal.
Urban Studies, 47(7), 1415–1438. https://doi.org/10.1177/0042098009353626
Park, Y. M. (2020). Assessing personal exposure to traffic-related air pollution using individual
travel-activity diary data and an on-road source air dispersion model. Health and Place, 63,
102351. https://doi.org/10.1016/j.healthplace.2020.102351
Park, Y. M., & Kwan, M. P. (2017). Individual exposure estimates may be erroneous when
spatiotemporal variability of air pollution and human mobility are ignored. Health and Place,
43(February 2016), 85–94. https://doi.org/10.1016/j.healthplace.2016.10.002
Park, Y. M., & Kwan, M. P. (2018). Beyond residential segregation: A spatiotemporal approach
to examining multi-contextual segregation. Computers, Environment and Urban Systems,
71(September 2017), 98–108. https://doi.org/10.1016/j.compenvurbsys.2018.05.001
Park, Y. M., & Kwan, M. P. (2020). Understanding racial disparities in exposure to traffic-
related air pollution: Considering the spatiotemporal dynamics of population distribution.
International Journal of Environmental Research and Public Health, 17(3).
https://doi.org/10.3390/ijerph17030908
Pendyala, R., Bhat, C., Goulias, K., Paleti, R., Konduri, K., Sidharthan, R., Hu, H. H., Huang,
G., & Christian, K. (2012). Application of socioeconomic model system for activity-based
modeling. Transportation Research Record, 2303, 71–80. https://doi.org/10.3141/2303-08
Pennington, A. F., Strickland, M. J., Klein, M., Zhai, X., Russell, A. G., Hansen, C., & Darrow,
L. A. (2016). Measurement error in mobile source air pollution exposure estimates due to
residential mobility during pregnancy. 27(5), 513–520. https://doi.org/10.1038/jes.2016.66
Picornell, M., Ruiz, T., Borge, R., García-Albertos, P., de la Paz, D., & Lumbreras, J. (2019).
Population dynamics based on mobile phone data to improve air pollution exposure assessments.
Journal of Exposure Science and Environmental Epidemiology, 29(2), 278–291.
https://doi.org/10.1038/s41370-018-0058-5
Rai, A. C., Kumar, P., Pilla, F., Skouloudis, A. N., Di Sabatino, S., Ratti, C., Yasar, A., &
Rickerby, D. (2017). End-user perspective of low-cost sensors for outdoor air pollution
monitoring. Science of the Total Environment, 607–608, 691–705.
https://doi.org/10.1016/j.scitotenv.2017.06.266
Reid, C. E., Jerrett, M., Petersen, M. L., Pfister, G. G., Morefield, P. E., Tager, I. B., Raffuse, S.
M., & Balmes, J. R. (2015). Spatiotemporal prediction of fine particulate matter during the 2008
Northern California wildfires using machine learning. Environmental Science and Technology,
49(6), 3887–3896. https://doi.org/10.1021/es505846r
Rowangould, G. M. (2013). A census of the US near-roadway population: Public health and
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
115
environmental justice considerations. Transportation Research Part D: Transport and
Environment, 25, 59–67. https://doi.org/10.1016/j.trd.2013.08.003
Sampson, R. J. (2019). Neighbourhood effects and beyond: Explaining the paradoxes of
inequality in the changing American metropolis. Urban Studies, 56(1), 3–32.
https://doi.org/10.1177/0042098018795363
Sayahi, T., Butterfield, A., & Kelly, K. E. (2019). Long-term field evaluation of the Plantower
PMS low-cost particulate matter sensors. Environmental Pollution, 245, 932–940.
https://doi.org/10.1016/j.envpol.2018.11.065
Setton, E. M., Peter, C. P., Cloutier-Fisher, D., & Hystad, P. W. (2008). Spatial variations in
estimated chronic exposure to traffic-related air pollution in working populations: A simulation.
International Journal of Health Geographics, 7(2), 1–17. https://doi.org/10.1186/1476-072X-7-
39
Setton, E., Marshall, J. D., Brauer, M., Lundquist, K. R., Hystad, P., Keller, P., & Cloutier-
Fisher, D. (2011). The impact of daily mobility on exposure to traffic-related air pollution and
health effect estimates. Journal of Exposure Science and Environmental Epidemiology, 21(1),
42–48. https://doi.org/10.1038/jes.2010.14
Shafran-Nathan, R., Yuval, Levy, I., & Broday, D. M. (2017). Exposure estimation errors to
nitrogen oxides on a population scale due to daytime activity away from home. Science of the
Total Environment, 580, 1401–1409. https://doi.org/10.1016/j.scitotenv.2016.12.105
Shah, A. S. V., Lee, K. K., McAllister, D. A., Hunter, A., Nair, H., Whiteley, W., Langrish, J. P.,
Newby, D. E., & Mills, N. L. (2015). Short term exposure to air pollution and stroke: Systematic
review and meta-analysis. BMJ (Online), 350(January). https://doi.org/10.1136/BMJ.h1295
Shareck, M., Frohlich, K. L., & Kestens, Y. (2014). Considering daily mobility for a more
comprehensive understanding of contextual effects on social inequalities in health: A conceptual
proposal. Health and Place, 29, 154–160. https://doi.org/10.1016/j.healthplace.2014.07.007
Shi, L., Zanobetti, A., Kloog, I., Coull, B. A., Koutrakis, P., Melly, S. J., & Schwartz, J. D.
(2016). Low-concentration PM2.5 and mortality: Estimating acute and chronic effects in a
population-based study. Environmental Health Perspectives, 124(1), 46–52.
https://doi.org/10.1289/ehp.1409111
Snyder, E. G., Watkins, T. H., Solomon, P. A., Thoma, E. D., Williams, R. W., Hagler, G. S. W.,
Shelow, D., Hindin, D. A., Kilaru, V. J., & Preuss, P. W. (2013). The changing paradigm of air
pollution monitoring. Environmental Science and Technology, 47(20), 11369–11377.
https://doi.org/10.1021/es4022602
Song, Y., Huang, B., He, Q., Chen, B., Wei, J., & Mahmood, R. (2019). Dynamic assessment of
PM 2 . 5 exposure and health risk using remote sensing and geo-spatial big data *.
Environmental Pollution, 253, 288–296. https://doi.org/10.1016/j.envpol.2019.06.057
Spengler, J. D., Letz, R., Quackenboss, J. J., Kanarek, M. S., & Duffy, C. P. (1986). Personal
Exposure to Nitrogen Dioxide: Relationship to Indoor/Outdoor Air Quality and Activity Patterns.
Environmental Science and Technology, 20(8), 775–783. https://doi.org/10.1021/es00150a003
Steinle, S., Reis, S., & Eric, C. (2013). Quantifying human exposure to air pollution — Moving
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
116
from static monitoring to spatio-temporally resolved personal exposure assessment. Science of
the Total Environment, The, 443, 184–193. https://doi.org/10.1016/j.scitotenv.2012.10.098
Su, J. G., Brauer, M., Ainslie, B., Steyn, D., Larson, T., & Buzzelli, M. (2008). An innovative
land use regression model incorporating meteorology for exposure analysis. Science of the Total
Environment, 390(2–3), 520–529. https://doi.org/10.1016/j.scitotenv.2007.10.032
Susilo, Y. O., & Kitamura, R. (2005). Analysis of day-to-day variability in an individual’s action
space: Exploration of 6-week mobidrive travel diary data. Transportation Research Record,
1902, 124–133. https://doi.org/10.3141/1902-15
Tayarani, M., & Rowangould, G. (2020). Estimating exposure to fine particulate matter
emissions from vehicle traffic: Exposure misclassification and daily activity patterns in a large,
sprawling region. Environmental Research, 182, 108999.
https://doi.org/10.1016/j.envres.2019.108999
Tryner, J., L’Orange, C., Mehaffy, J., Miller-Lionberg, D., Hofstetter, J. C., Wilson, A., &
Volckens, J. (2020). Laboratory evaluation of low-cost PurpleAir PM monitors and in-field
correction using co-located portable filter samplers. Atmospheric Environment, 220(June 2019),
117067. https://doi.org/10.1016/j.atmosenv.2019.117067
Vallée, J., Cadot, E., Grillo, F., Parizot, I., & Chauvin, P. (2010). The combined effects of
activity space and neighbourhood of residence on participation in preventive health-care
activities: The case of cervical screening in the Paris metropolitan area (France). Health and
Place, 16(5), 838–852. https://doi.org/10.1016/j.healthplace.2010.04.009
Wacholder, S. (1995). When measurement errors correlate with truth: Surprising effects of
nondifferential misclassification. Epidemiology, 6(2), 157–161.
https://doi.org/10.1097/00001648-199503000-00012
Wang, J., Kwan, M. P., & Chai, Y. (2018). An innovative context-based crystal-growth activity
space method for environmental exposure assessment: A study using GIS and GPS trajectory
data collected in Chicago. International Journal of Environmental Research and Public Health,
15(4), 1–24. https://doi.org/10.3390/ijerph15040703
Wang, X. R., & Oliver Gao, H. (2011). Exposure to fine particle mass and number
concentrations in urban transportation environments of New York City. Transportation Research
Part D: Transport and Environment, 16(5), 384–391. https://doi.org/10.1016/j.trd.2011.03.001
Weichenthal, S., Ryswyk, K. Van, Goldstein, A., Bagg, S., Shekkarizfard, M., & Hatzopoulou,
M. (2016). A land use regression model for ambient ultrafine particles in Montreal, Canada: A
comparison of linear regression and a machine learning approach. Environmental Research, 146,
65–72. https://doi.org/10.1016/j.envres.2015.12.016
Weissert, L. F., Alberti, K., Miskell, G., Pattinson, W., Salmond, J. A., Henshaw, G., &
Williams, D. E. (2019). Low-cost sensors and microscale land use regression: Data fusion to
resolve air quality variations with high spatial and temporal resolution. Atmospheric
Environment, 213(March), 285–295. https://doi.org/10.1016/j.atmosenv.2019.06.019
Williams, R., Duvall, R., Kilaru, V., Hagler, G., Hassinger, L., Benedict, K., Rice, J., Kaufman,
A., Judge, R., Pierce, G., Allen, G., Bergin, M., Cohen, R. C., Fransioli, P., Gerboles, M., Habre,
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
117
R., Hannigan, M., Jack, D., Louie, P., … Ning, Z. (2019). Deliberating performance targets
workshop: Potential paths for emerging PM2.5 and O3 air sensor progress. Atmospheric
Environment: X, 2(April), 100031. https://doi.org/10.1016/j.aeaoa.2019.100031
World Health Organization. (2006). WHO. Air quality guidelines for particulate matter, ozone,
nitrogen dioxide and sulphur dioxide. Global update 2005.
https://www.euro.who.int/__data/assets/pdf_file/0005/78638/E90038.pdf
Xianyu, J., Rasouli, S., & Timmermans, H. (2017). Analysis of variability in multi-day GPS
imputed activity-travel diaries using multi-dimensional sequence alignment and panel effects
regression models. Transportation, 44(3), 533–553. https://doi.org/10.1007/s11116-015-9666-2
Xu, Y., Jiang, S., Li, R., Zhang, J., Zhao, J., Abbar, S., & González, M. C. (2019). Unraveling
environmental justice in ambient PM 2.5 exposure in Beijing: A big data approach. Computers,
Environment and Urban Systems, 75(December 2018), 12–21.
https://doi.org/10.1016/j.compenvurbsys.2018.12.006
Yang, Y., Guo, Y., Qian, Z. (Min), Ruan, Z., Zheng, Y., Woodward, A., Ai, S., Howard, S. W.,
Vaughn, M. G., Ma, W., Wu, F., & Lin, H. (2018). Ambient fine particulate pollution associated
with diabetes mellitus among the elderly aged 50 years and older in China. Environmental
Pollution, 243, 815–823. https://doi.org/10.1016/j.envpol.2018.09.056
Yoo, E. H., Rudra, C., Glasgow, M., & Mu, L. (2015). Geospatial Estimation of Individual
Exposure to Air Pollutants: Moving from Static Monitoring to Activity-Based Dynamic
Exposure Assessment. Annals of the Association of American Geographers, 105(5), 915–926.
https://doi.org/10.1080/00045608.2015.1054253
Yu, H., Russell, A., Mulholland, J., & Huang, Z. (2018). Using cell phone location to assess
misclassification errors in air pollution exposure estimation. Environmental Pollution, 233, 261–
266. https://doi.org/10.1016/j.envpol.2017.10.077
Yu, X., Ivey, C., Huang, Z., Gurram, S., & Sivaraman, V. (2020). Quantifying the impact of
daily mobility on errors in air pollution exposure estimation using mobile phone location data.
Environment International, 141, 105772. https://doi.org/10.1016/j.envint.2020.105772
Yu, X., Stuart, A. L., Liu, Y., Ivey, C. E., Russell, A. G., Kan, H., Henneman, L. R. F., Sarnat, S.
E., Hasan, S., Sadmani, A., Yang, X., & Yu, H. (2019). On the accuracy and potential of Google
Maps location history data to characterize individual mobility for air pollution health studies.
Environmental Pollution, 252, 924–930. https://doi.org/10.1016/j.envpol.2019.05.081
Zanobetti, A., Dominici, F., Wang, Y., & Schwartz, J. D. (2014). A national case-crossover
analysis of the short-term effect of PM 2 . 5 on hospitalizations and mortality in subjects with
diabetes and neurological disorders. 1–11.
Zhang, S., Wolf, K., Breitner, S., Kronenberg, F., Stafoggia, M., Peters, A., & Schneider, A.
(2018). Long-term effects of air pollution on ankle-brachial index. Environment International,
118(January), 17–25. https://doi.org/10.1016/j.envint.2018.05.025
Zhao, N., Liu, Y., Vanos, J. K., & Cao, G. (2018). Day-of-week and seasonal patterns of PM2.5
concentrations over the United States: Time-series analyses using the Prophet procedure.
Atmospheric Environment, 192(April), 116–127. https://doi.org/10.1016/j.atmosenv.2018.08.050
Urban Air Pollution and Environmental Justice: Three Essays Lu (2022)
118
Zhou, Y., & Zheng, H. (2016). Digital universal particle concentration sensor pms5003 series
data manual. https://www.aqmd.gov/docs/default-source/aq-spec/resources-page/plantower-
pms5003-manual_v2-3.pdf
Ziemke, D., Nagel, K., & Bhat, C. (2015). Integrating CEMDAP and MATSIM to increase the
transferability of transport demand models. Transportation Research Record, 2493, 117–125.
https://doi.org/10.3141/2493-13
Zusman, M., Schumacher, C. S., Gassett, A. J., Spalt, E. W., Austin, E., Larson, T. V., Carvlin,
G., Seto, E., Kaufman, J. D., & Sheppard, L. (2020). Calibration of low-cost particulate matter
sensors: Model development for a multi-city epidemiological study. Environment International,
134(November 2019), 105329. https://doi.org/10.1016/j.envint.2019.105329
Abstract (if available)
Abstract
This dissertation examines a rising environmental justice problem in air pollution exposure assessments. Through three independent yet interrelated empirical studies, this dissertation develops a machine learning based model to capture PM2.5 variations in high spatiotemporal resolution, identifies the effect of overlooking individual mobility on misclassification errors in exposure estimations, and evaluates the factors leading to such errors at the individual level. The use of low-cost PM2.5 sensors and machine learning method allow for exposure predictions in fine spatial and temporal resolution. Results suggest that ignoring individual mobility in exposure measurement to outdoor PM2.5 can lead to erroneous classification results, which may further result in ineffective environmental and public health policy implications. Individual mobility patterns and sociodemographic characteristics are the two major factors contributing to exposure measurement errors. Exposure measurement errors increase for people who exhibit high mobility levels, especially for workers who have longer distances and time away from the place of residence. Disparate exposure measurement errors can also be detected across sociodemographic groups. Low income and high residential pollution levels are correlated with exposure overestimation while high income and low residential pollution levels are correlated with exposure underestimation. The findings also suggest that the exposure discrepancies between the socially disadvantaged and the privileged documented in previous studies may be diminished by human mobility.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Personal exposure to particulate matter PM2.5 sources during pregnancy and birthweight
PDF
Location of warehouses and environmental justice: Three essays
PDF
Personal PM2.5 exposure during pregnancy in an environmental health disparities population
PDF
Environmental justice in real estate, public services, and policy
PDF
Essays on congestion, agglomeration, and urban spatial structure
PDF
Spatial and temporal expenditure-pricing equity of rail transit fare policies
PDF
Active travel, outdoor leisure, and neighborhood environment: path analysis, Los Angeles County
PDF
Unraveling decentralization of warehousing and distribution centers: three essays
PDF
The long-term impact of COVID-19 on commute, employment, housing, and environment in the post-pandemic era
PDF
Urban consumer amenities and their accessibility
PDF
Examining exposure to extreme heat and air pollution and its effects on all-cause, cardiovascular, and respiratory mortality in California: effect modification by the social deprivation index
PDF
Associations of ambient air pollution exposures with perceived stress in the MADRES cohort
PDF
Healthy mobility: untangling the relationships between the built environment, travel behavior, and environmental health
PDF
The impact of demographic shifts on automobile travel in the United States: three empirical essays
PDF
The demand for reliable travel: evidence from Los Angeles, and implications for public transit policy
PDF
Productive frictions and urbanism in transition: planning lessons from traffic flows and urban street life in Ho Chi Minh City, Vietnam
PDF
Assessing the impact of air pollution on adverse birth outcomes in a low resource setting
PDF
Investigating daily effects of activity space-based built environment exposures on physical activity behaviors in Hispanic women during pregnancy and early postpartum
PDF
Household mobility and neighborhood impacts
PDF
Spatial analysis of PM₂.₅ air pollution in association with hospital admissions in California
Asset Metadata
Creator
Lu, Yougeng
(author)
Core Title
Urban air pollution and environmental justice: three essays
School
School of Policy, Planning and Development
Degree
Doctor of Philosophy
Degree Program
Urban Planning and Development
Degree Conferral Date
2022-08
Publication Date
07/09/2022
Defense Date
04/07/2022
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
environmental justice,exposure measurement error,Los Angeles,OAI-PMH Harvest,PM2.5
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Giuliano, Genevieve (
committee chair
), Boarnet, Marlon (
committee member
), Habre, Rima (
committee member
)
Creator Email
70228003lyg@gmail.com,yougengl@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC111371070
Unique identifier
UC111371070
Legacy Identifier
etd-LuYougeng-10821
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Lu, Yougeng
Type
texts
Source
20220713-usctheses-batch-952
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
environmental justice
exposure measurement error
PM2.5